The DPM Detector. Code:

Similar documents
FAST PEDESTRIAN DETECTION BASED ON A PARTIAL LEAST SQUARES CASCADE

Automated Driving: Design and Verify Perception Systems

Fast and Robust Optimization Approaches for Pedestrian Detection

HPC and the Automotive Industry

SPEED IN URBAN ENV VIORNMENTS IEEE CONFERENCE PAPER REVIW CSC 8251 ZHIBO WANG

Roehrig Engineering, Inc.

Optimal Vehicle to Grid Regulation Service Scheduling

Regularized Linear Models in Stacked Generalization

2. There are 2 types of batteries: wet cells and dry cells.

Statistical Learning Examples

COMP 776: Computer Vision

Supervised Learning to Predict Human Driver Merging Behavior

Maserati GranSport Drive by Wire installation

Analysis of Partial Least Squares for Pose-Invariant Face Recognition

Text Generation and Neural Style Transfer

Leveraging AI for Self-Driving Cars at GM. Efrat Rosenman, Ph.D. Head of Cognitive Driving Group General Motors Advanced Technical Center, Israel

Tutorial. Running a Simulation If you opened one of the example files, you can be pretty sure it will run correctly out-of-the-box.

method to quantify and classify the traffic conflict severity by analyzing time-to-collision (TTC) and non-complete braking time (TB) (Lu et al., 2012

Draft Unofficial description of the UNRC charger menus

How to Build with the Mindstorm Kit

Intelligent Fault Analysis in Electrical Power Grids

Some tips and tricks I learned from getting clutch out of vehicle Skoda Octavia year 2000

Assignment 3 solutions

Relating your PIRA and PUMA test marks to the national standard

Relating your PIRA and PUMA test marks to the national standard

From Developing Credit Risk Models Using SAS Enterprise Miner and SAS/STAT. Full book available for purchase here.

Learning to Set-Up Your Warrior Drive Belt Arizona Warrior (Rev4) BEFORE GETTING STARTED

One- Touch Installation Instructions

A PRACTICAL GUIDE TO RACE CAR DATA ANALYSIS BY BOB KNOX DOWNLOAD EBOOK : A PRACTICAL GUIDE TO RACE CAR DATA ANALYSIS BY BOB KNOX PDF

Chapter 2 Analysis on Lock Problem in Frontal Collision for Mini Vehicle

Stabinger Viscometer Series. SVM Series

AI Driven Environment Modeling for Autonomous Driving on NVIDIA DRIVE PX2

ECSE-2100 Fields and Waves I Spring Project 1 Beakman s Motor

Oil Palm Ripeness Detector (OPRID) and Non-Destructive Thermal Method of Palm Oil Quality Estimation

OGO-MAP/MAF User Manual

Dynamics of Machines. Prof. Amitabha Ghosh. Department of Mechanical Engineering. Indian Institute of Technology, Kanpur. Module No.

PARTIAL LEAST SQUARES: APPLICATION IN CLASSIFICATION AND MULTIVARIABLE PROCESS DYNAMICS IDENTIFICATION

Vehicle Steering Control with Human-in-the-Loop

Improving the gearshift feel in an SW20.

THE TORQUE GENERATOR OF WILLIAM F. SKINNER

Electric Current- Hewitt Lecture

Speakers and Motors. Three feet of magnet wire to make a coil (you can reuse any of the coils you made in the last lesson if you wish)

How to build an autonomous anything

TONY S TECH REPORT. Basic Training

Math is Not a Four Letter Word FTC Kick-Off. Andy Driesman FTC4318 Green Machine Reloaded

Low Speed Rear End Crash Analysis

20th. SOLUTIONS for FLUID MOVEMENT, MEASUREMENT & CONTAINMENT. Do You Need a Booster Pump? Is Repeatability or Accuracy More Important?

Solar Power. Questions Answered. Richard A Stubbs. Richard A Stubbs 2003, distribution permitted see text for details

Cost-Efficiency by Arash Method in DEA

Jia Xing et al. Correspondence to: Shuxiao Wang

Smartphone based weather and infrastructure monitoring: Traffic Sign Inventory and Assessment

DIY: Shiver Valve Check, Illustrated

Busy Ant Maths and the Scottish Curriculum for Excellence Foundation Level - Primary 1

User Manual Version 2. Copyright 2009, Pete Giarrusso, Inc. D/B/A Chopper Design Services All Rights Reserved

The Basics. Chapter 1. In this unit, you will learn:

205 Gti seat insert preparation, layout and sewing guide Contents

Secondary Diagnosis. By Randy Bernklau

Fourth Grade. Multiplication Review. Slide 1 / 146 Slide 2 / 146. Slide 3 / 146. Slide 4 / 146. Slide 5 / 146. Slide 6 / 146

Fourth Grade. Slide 1 / 146. Slide 2 / 146. Slide 3 / 146. Multiplication and Division Relationship. Table of Contents. Multiplication Review

INVESTIGATION ONE: WHAT DOES A VOLTMETER DO? How Are Values of Circuit Variables Measured?

Wireless Energy Transfer Through Magnetic Reluctance Coupling

Performing ASTM 6584 free and total glycerin in BioDiesel using an SRI Gas Chromatograph and PeakSimple software

Integrating remote sensing and ground monitoring data to improve estimation of PM 2.5 concentrations for chronic health studies

SHAFT ALIGNMENT: Where do I start, and what is the benefit?

PSIM Tutorial. How to Use Lithium-Ion Battery Model

PLS score-loading correspondence and a bi-orthogonal factorization

Disco 3 Clock Spring / Rotary Coupler replacement

Unit 2: Lesson 2. Balloon Racers. This lab is broken up into two parts, first let's begin with a single stage balloon rocket:

PRO/CON: Self-driving cars are just around the corner. Is it a good thing?

The purpose of this lab is to explore the timing and termination of a phase for the cross street approach of an isolated intersection.

LINEAR MOTION SYSTEM COMPONENTS

CSE 40171: Artificial Intelligence. Artificial Neural Networks: Neural Network Architectures

PROTECTION OF THREE PHASE INDUCTION MOTOR AGAINST VARIOUS ABNORMAL CONDITIONS

A14-18 Active Balancing of Batteries - final demo. Lauri Sorsa & Joonas Sainio Final demo presentation

Pothole Detection using Machine Learning

Optimal Policy for Plug-In Hybrid Electric Vehicles Adoption IAEE 2014

SMART PASSENGER TRANSPORT

A-Class Hatchback MBAPABCH0038 aus.indd 1 2/8/18 11:04 am

Force Control for Machining applications. With speakers notes

Main Fuel Tank #9662 Date 3/17/23 rev. 0. Pic #1 Pic #2. Pic #4. Pic #3. Pic #5 Pic #6

What s new. Bernd Wiswedel KNIME.com AG. All Rights Reserved.

Section 4 WHAT MAKES CHARGE MOVE IN A CIRCUIT?

Prerequisites for Increasing the Axle Load on Railway Tracks in the Czech Republic M. Lidmila, L. Horníček, H. Krejčiříková, P.

MONTANA TEEN DRIVER CURRICULUM GUIDE Lesson Plan & Teacher Commentary. Module 2.1 Preparing to Drive

Long Transfer Lines Enabling Large Separations between Compressor and Coldhead for High- Frequency Acoustic-Stirling ( Pulse-Tube ) Coolers

Circuit simulation software

ROBOTICS 01PEEQW. Basilio Bona DAUIN Politecnico di Torino

Main Fuel Tank #9668 Date 3/17/18 rev. 0. Pic #1 Pic #2. Pic #3. Pic #4. Pic #5 Pic #6

APPENDIX A: Background Information to help you design your car:

TechniCity Final Project: An Urban Parking Solution for Columbus, OH

The Car Tutorial Part 2 Creating a Racing Game for Unity

Figure 1 Linear Output Hall Effect Transducer (LOHET TM )

Statistical Estimation Model for Product Quality of Petroleum

Risk-Based Collision Avoidance in Semi-Autonomous Vehicles

Compose GYPSUM Mud-In

Newton s First Law. Evaluation copy. Vernier data-collection interface

Application Note. Monitoring Bearing Temperature with ProPAC

Challenge H: For an even safer and more secure railway. SADCAT, a contactless system for OCS monitoring

Energy Systems Operational Optimisation. Emmanouil (Manolis) Loukarakis Pierluigi Mancarella

Lesson Plan: Electricity and Magnetism (~100 minutes)

Transcription:

The DPM Detector P. Felzenszwalb, R. Girshick, D. McAllester, D. Ramanan Object Detection with Discriminatively Trained Part Based Models T-PAMI, 2010 Paper: http://cs.brown.edu/~pff/papers/lsvm-pami.pdf Code: http://www.cs.berkeley.edu/~rbg/latent/ Sanja Fidler CSC420: Intro to Image Understanding 1/ 53

The HOG Detector The HOG detector models an object class as a single rigid template Figure: Single HOG template models people in upright pose. Sanja Fidler CSC420: Intro to Image Understanding 2/ 53

But Objects Are Composed of Parts Sanja Fidler CSC420: Intro to Image Understanding 3/ 53

Even Rigid Objects Are Composed of Parts Sanja Fidler CSC420: Intro to Image Understanding 4/ 53

Figure: Objects are a collection of deformable parts [Pic from: R. Girshik] Sanja Fidler CSC420: Intro to Image Understanding 5/ 53 Objects Are Composed of Deformable Parts Revisit the old idea by Fischler & Elschlager 1973 Objects are composed of parts at specific relative locations. Our model should probably also model object parts. Di erent instances of the same object class have parts in slightly di erent locations. Our object model should thus allow slight slack in part position.

The DPM Model The DPM model starts by borrowing the idea of the HOG detector. It takes a HOG template for the full object. (If you take something that works, things can only get better, right?) Sanja Fidler CSC420: Intro to Image Understanding 6/ 53

The DPM Model DPM now wants to add parts. It wants to add them at locations relative to the location of the root filter. Relative makes sense: if we move, we take our parts with us. Sanja Fidler CSC420: Intro to Image Understanding 7/ 53

The DPM Model Add a part at a relative location and scale. Sanja Fidler CSC420: Intro to Image Understanding 8/ 53

The DPM Model Each part has an appearance, whichismodeledwithahogtemplate Each part s template is at twice the resolution as the root filter Sanja Fidler CSC420: Intro to Image Understanding 9/ 53

The DPM Model Give some slack to the location of the part. Why is this a good idea? Sanja Fidler CSC420: Intro to Image Understanding 10 / 53

The DPM Model People are of di erent heights, thus have feet at di erent locations relative to the head. And we want to detect all people, not just the average ones. Sanja Fidler CSC420: Intro to Image Understanding 11 / 53

The DPM Model People are of di erent heights, thus have feet at di erent locations relative to the head. And we want to detect all people, not just the average ones. Sanja Fidler CSC420: Intro to Image Understanding 11 / 53

The DPM Model People are of di erent heights, thus have feet at di erent locations relative to the head. And we want to detect all people, not just the average ones. Sanja Fidler CSC420: Intro to Image Understanding 12 / 53

The DPM Model People are of di erent heights, thus have feet at di erent locations relative to the head. And we want to detect all people, not just the average ones. Sanja Fidler CSC420: Intro to Image Understanding 13 / 53

The DPM Model We will, however, trust less detections where parts are not exactly in their expected location. DPM penalizes part shifts with a quadratic function: a(x v x ) 2 + b(x v x )+c(y v y ) 2 + d(y v y ) (here a, b, c, d are weights that are used to penalize di erent terms) Sanja Fidler CSC420: Intro to Image Understanding 14 / 53

The DPM Model And finally, DPM has a few parts. Typically 6 (but it s a parameter you can play with). How many weights does a 6-part DPM model have? How shall we score this part-model guy in an image (how to do detection)? Sanja Fidler CSC420: Intro to Image Understanding 15 / 53

Remember the HOG Detector The HOG detector computes image pyramid, HOG features, and scores each window with a learned linear classifier [Pic from: R. Girshik] Sanja Fidler CSC420: Intro to Image Understanding 16 / 53

DPM Detector For DPM the story is quite similar (pyramid, HOG, score window with a learned linear classifier), but now we also need to score the parts. [Pic from: R. Girshik] Sanja Fidler CSC420: Intro to Image Understanding 17 / 53

Scoring Sanja Fidler CSC420: Intro to Image Understanding 18 / 53

Scoring More specifically, we will score a location (window) in the image as follows: nx score(l, p 0 )= max F i HOG(l, p i ) p 1,...,p n i=0 nx i=1 w i def (dx, dy, dx 2, dy 2 ) where F 0 is the (learned) HOG template for root filter F i is the (learned) HOG template for part i HOG(l, p i ) means a HOG feature cropped in window defined by part location p i at level l of the HOG pyramid i w def are (learned) weights for the deformation penalty (dx, dy, dx 2, dy 2 )with(dx, dy) =(x i, y i ) ((x 0, y 0 )+v i )tellushow far the part i is from its expected position (x 0, y 0 )+v i ) Main question: How shall we compute that nasty max p1,...,p n? Sanja Fidler CSC420: Intro to Image Understanding 19 / 53

Scoring More specifically, we will score a location (window) in the image as follows: nx score(l, p 0 )= max F i HOG(l, p i ) p 1,...,p n i=0 nx i=1 w i def (dx, dy, dx 2, dy 2 ) where F 0 is the (learned) HOG template for root filter F i is the (learned) HOG template for part i HOG(l, p i ) means a HOG feature cropped in window defined by part location p i at level l of the HOG pyramid i w def are (learned) weights for the deformation penalty (dx, dy, dx 2, dy 2 )with(dx, dy) =(x i, y i ) ((x 0, y 0 )+v i )tellushow far the part i is from its expected position (x 0, y 0 )+v i ) Main question: How shall we compute that nasty max p1,...,p n? Sanja Fidler CSC420: Intro to Image Understanding 19 / 53

Scoring Push the max inside (why can we do that?): score(l, p 0 )=F 0 HOG(l, p 0 )+ nx i=1 max F i HOG(l, p i ) w i def def (x i, y i ) p i Sanja Fidler CSC420: Intro to Image Understanding 20 / 53

Scoring Push the max inside: score(l, p 0 )=F 0 HOG(l, p 0 )+ nx i=1 max F i HOG(l, p i ) w i def def (x i, y i ) p i We can compute this with dynamic programming. Any idea how? Sanja Fidler CSC420: Intro to Image Understanding 20 / 53

Computing the Score with Dynamic Programming Figure: We can compute F i HOG(l, p i )forthefulllevell via cross-correlation of the HOG feature matrix at level l with the template (filter) F i Sanja Fidler CSC420: Intro to Image Understanding 21 / 53

Computing the Score with Dynamic Programming Sanja Fidler CSC420: Intro to Image Understanding 22 / 53

Computing the Score with Dynamic Programming Sanja Fidler CSC420: Intro to Image Understanding 23 / 53

Computing the Score with Dynamic Programming Sanja Fidler CSC420: Intro to Image Understanding 24 / 53

Computing the Score with Dynamic Programming Sanja Fidler CSC420: Intro to Image Understanding 25 / 53

Computing the Score with Dynamic Programming Figure: We can compute these scores e ciently with something called distance transforms (this is exact). But works equally well: Simply limit the scope of where each part could be to a small area, e.g., a few HOG cells up,down,left,right relative to yellow spot (this is approx). Sanja Fidler CSC420: Intro to Image Understanding 26 / 53

Computing the Score with Dynamic Programming Sanja Fidler CSC420: Intro to Image Understanding 27 / 53

Computing the Score with Dynamic Programming Sanja Fidler CSC420: Intro to Image Understanding 28 / 53

Detection [Pic from: Felzenswalb et al., 2010] Sanja Fidler CSC420: Intro to Image Understanding 29 / 53

Training You can t train this model as simple as the HOG detector, via SVM. For those taking CSC411: Why not? Sanja Fidler CSC420: Intro to Image Understanding 30 / 53

Training You can t train this model as simple as the HOG detector, via SVM. For those taking CSC411: Why not? Because the part positions are not annotated (we don t have ground-truth, and SVM needs ground-truth). We say that the parts are latent. You can train the model with something called latent SVM. For ML bu s: Check the Felzenswalb paper For those with even stronger ML stomach: Yu, Joachims, Learning Structural SVMs with Latent Variables, ICML 09. Sanja Fidler CSC420: Intro to Image Understanding 30 / 53

Results Figure: Performance of the HOG detector on person class on PASCAL VOC [Pic from: R. Girshik] Sanja Fidler CSC420: Intro to Image Understanding 31 / 53

Results Figure: DPM version 1: adds the parts [Pic from: R. Girshik] Sanja Fidler CSC420: Intro to Image Understanding 31 / 53

Results Figure: DPM version 2: adds another template (called mixture or component). Supposed to detect also people sitting down (e.g., occluded by desk). [Pic from: R. Girshik] Sanja Fidler CSC420: Intro to Image Understanding 31 / 53

Results Figure: DPM version 3: adds multiple mixtures (components) [Pic from: R. Girshik] Sanja Fidler CSC420: Intro to Image Understanding 31 / 53

Results [Pic from: R. Girshik] Sanja Fidler CSC420: Intro to Image Understanding 31 / 53

Learned Models [Pic from: Felzenswalb et al., 2010] Sanja Fidler CSC420: Intro to Image Understanding 32 / 53

Learned Models [Pic from: Felzenswalb et al., 2010] Sanja Fidler CSC420: Intro to Image Understanding 33 / 53

Learned Models (Takes some imagination to see a cat...) [Pic from: Felzenswalb et al., 2010] Sanja Fidler CSC420: Intro to Image Understanding 34 / 53

Results [Pic from: Felzenswalb et al., 2010] Sanja Fidler CSC420: Intro to Image Understanding 35 / 53

Results [Pic from: Felzenswalb et al., 2010] Sanja Fidler CSC420: Intro to Image Understanding 36 / 53

DPM As you already know, the code is available: Trivia: http://www.cs.berkeley.edu/~rbg/latent/ Takes about 20-30 seconds per image per class. Speed-ups exist. Depending on the size of the dataset, training takes around 12 hours (for most PASCAL classes). Has some cool post-processing tricks: bounding box prediction and context re-scoring. Each typically results in around 2% improvement in AP. In the code, if you switch o the parts, you get the Dalal & Triggs HOG detector. Sanja Fidler CSC420: Intro to Image Understanding 37 / 53

Results Sanja Fidler CSC420: Intro to Image Understanding 38 / 53

Object Class Detection Pre 2014 HOG detector Deformable Part-based Model Post 2014 (neural networks) R-CNN Fast(er) R-CNN Yolo, SSD [Credit for the slides to follow: Bin Yang] Sanja Fidler CSC420: Intro to Image Understanding 39 / 53

The CNN Era [Slide credit: Renjie Liao] Sanja Fidler CSC420: Intro to Image Understanding 40 / 53

RCNN: Regions with CNN Features [Slide credit: Ross Girshick] Sanja Fidler CSC420: Intro to Image Understanding 41 / 53

Training Sanja Fidler CSC420: Intro to Image Understanding 42 / 53

Training Sanja Fidler CSC420: Intro to Image Understanding 42 / 53

Training Sanja Fidler CSC420: Intro to Image Understanding 42 / 53

RCNN: Performance Sanja Fidler CSC420: Intro to Image Understanding 43 / 53

RCNN: Performance Sanja Fidler CSC420: Intro to Image Understanding 44 / 53

Faster R-CNN Sanja Fidler CSC420: Intro to Image Understanding 45 / 53

Region Proposal Network (RPN) Sanja Fidler CSC420: Intro to Image Understanding 46 / 53

Region Proposal Network (RPN) Sanja Fidler CSC420: Intro to Image Understanding 47 / 53

Faster R-CNN: Performance Sanja Fidler CSC420: Intro to Image Understanding 48 / 53

Car Example [Slide credit: Joseph Chet Redmon] Sanja Fidler CSC420: Intro to Image Understanding 49 / 53

Car Example [Slide credit: Joseph Chet Redmon] Sanja Fidler CSC420: Intro to Image Understanding 49 / 53

Car Example [Slide credit: Joseph Chet Redmon] Sanja Fidler CSC420: Intro to Image Understanding 49 / 53

Real Time Object Detection? Sanja Fidler CSC420: Intro to Image Understanding 50 / 53

YOLO: You Only Look Once [Slide credit: Redmon J et al. You only look once: Unified, real-time object detection. CVPR 16] Sanja Fidler CSC420: Intro to Image Understanding 51 / 53

YOLO: Output Parametrization [Slide credit: Redmon J et al. You only look once: Unified, real-time object detection. CVPR 16] Sanja Fidler CSC420: Intro to Image Understanding 52 / 53

SSD: Single Shot MultiBox Detector [Slide credit: Wei L, et al. SSD: Single Shot MultiBox Detector. ECCV 16] Sanja Fidler CSC420: Intro to Image Understanding 53 / 53

That s It For CSC420... But There Is Much More of Computer Vision For Those Interested! Sanja Fidler CSC420: Intro to Image Understanding 54 / 53