Intro to ggplot2. Hadley Wickham. Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University

Similar documents
R Graphics with Ggplot2: Day 2 November 16, 2016

Lysol One Million Box Tops Giveaway The major prize winners are:

ggplot2: easy graphics with R

Investigation of Relationship between Fuel Economy and Owner Satisfaction

Individual Sport Report

Florida Gulf Coast PVA Tournament Circuit 16 Odessa, Florida 8-10 March 2012

Hayes Family of Lancashire England

Herculaneum Municipal Court HERCULANEUM MUNICIPAL COURT 1 Parkwood Court, Herculaneum, MO 63048

Motor Trend MPG Analysis

North-Carolina Standard newspaper, September Notice

Civil War Veteran Graves of Orange County, California

City of Rockwall MF-14 SF-7. Feet Z ZONING CHANGE (MF-14 TO DT) ZONING- LOCATION MAP =

Pre Written Final Company Last First Class ID Driving Deduction Trip Test Score

THE GREAT WESTERN SKEET CHAMPS - JUNE 19-21, 2015

Philadelphia Catholic League Cross Country Record Book: Season Championship School Head Coach

Jeff Lane Counsel Washington, DC D

Descendant List- Martin and Katherine (Birkenstock) Kuhn 10 May 2012

Greenbrier Pines Homeowners Association, LTD Member List 46 Homeowners/62 Lots Phase 1 Lots 16 HO/18 Lots

Date:11/22/10 Time:11:46:45 Page:1 of 7

SS VD KH KS BP JI R3I R3HCBS CBI AS TT CI CS PH PS SM OBO OBS GAI GAS

Get started with online permitting without any out-ofpocket expenses and minimal investment of time

California Feebate: Revenue Neutral Approach to Support Transition Towards More Energy Efficient Vehicles

These are arrest report photos taken from the local police and sheriff s offices IN LAURENS COUNTY. They do not reflect guilt or innocence and ARE

Helena & Lewis & Clark County Death Books Index-located at Lewis & Clark County Courthouse Surname Given Name Death Date Book Page

BLACK DOG BENEFIT SHOOT 08/12/2017

Problem Set 3 - Solutions

New Years Re-Entry Doubles 31st December 2017 & 7th January 2018

Florida Gulf Coast 22 Doubles Scores - Class

January 2017 Wood Family Calendar

U.S. Navy Fleet AFV Program Report for Fiscal Year 2006 February 12, 2007

PAGE 1 OF 6 CHARGES CHARGES CHARGES CHARGES ADDRESS: NW 22ND AVE APT B208, MIAMI GARDENS, FL 33056

Votes Cast. Votes Cast

HKPGS LEADERBOARD. Division 3 (WAS < 110) Division 4 (WAS < 120)

Level 1 Mathematics and Statistics, 2017

BBA HOLIDAY DOUBLES DOUBLES (HANDICAP) 12/20/2016 2:05 PM

Appendix D. Cars with the Lowest Adjusted MPG by Model Year

O2 PUBLIC INTOXICATION - ALCOHOL/OTHER

Rear Loader. Cont. # Last Name First Name Employment Pretrip Written Offset Serp Alley Right StraightParallel Stop Time Total Points

2018 Veterans Benefit Shoot NSCA Main Event HOA CH DUSTY R STAGG RU RU MARK GARNETT Class M M1 BILLY CAMPBELL STEVEN E MILLER 88 88

Delta Medix letter for BlueCare Traditional members. April Dear First Priority Life Member:

Advances in Engineering & Scientific Research. Research Article. Received February 17, 2017; Accepted April 19, 2017; Published May 10, 2017;

MOUNTAIN VIEW CLASSIFIED PERSONNEL

APRIL 28, ANNUAL TOWN ELECTION

The Session.. Rosaria Silipo Phil Winters KNIME KNIME.com AG. All Right Reserved.

Reconciliation Form Registered voters eligible to participate

Amarillo Lone Star Eight Ball League

Total % of WARD I WARD II WARD III WARD IV Office Votes Vote Total Total Total Total

Stat 301 Lecture 26. Model Selection. Indicator Variables. Explanatory Variables

STATE FITASC IN STATE

TOTAL VOTES x EARLY VOTING ELECTION DAY Provisional

Principles of Vehicle Extrication

Construction Set: Smart Grid System

PUTNAM COUNTY NYS PROPOSAL 1

LEAGUE STANDINGS RUN DATE: 04/18/18 22:12 CORNER BROOK CENTRE BOWL PAGE 1 LEAGUE #433: COMMERCIAL WEEK #30-4/18/18 CENTER PHONE:

Busy Ant Maths and the Scottish Curriculum for Excellence Foundation Level - Primary 1

The Purpose of the Workgroup

APRIL 24, ANNUAL TOWN ELECTION

Santa Barbara Chapter (Est. 1959), Model A Ford Club of America February 2017

Appendix F. Ship Drift Analysis West Coast of North America: Alaska to Southern California HAZMAT Report ; April 2000

POWER SYSTEM ANALYSIS I

Product Plan. Joe Veltri. November 4, 2009

Registered Voters: 3,806 of 24,561 (15.50%)

CLEAR WATER PLUMBING CENTURY SERVICE MOBILE HOME AC APPLICATION TYPE A3 TOTAL: # OF PERMITS 9 1,662,743 8, ,497.00

TRANSCRIPT: U.S. SECRET SERVICECOMMAND POST RADIO TRAFFIC FROM MARCH 30, 1981

SUBJECT TO SCRUTINY. PROVISIONAL RESULTS V2 PRINTED AT 20:37 ON 30 MAR 2015 LINCOLNSHIRE TRACK & FIELD LEAGUE 2015 MATCH 1 AT GRANTHAM ON 29 MAR 2015

COMANCHE NATION 2019 FISCAL YEAR BUDGET 6/2/2018. Page 1 of 22. Cast Votes: % % % %

City of Minnetonka Maximum Parking Regulations Urban GIS. Group Members Brad Johnston Mark Kelley Jonathan Winge

Philip-Lorca dicorcia: Hustlers September 12 - November 2, 2013

Jury. 11/30/2014 Driving- Homicide By Vehicle In 2Nd Degree - Misdemeanor

S OUTH FLORIDA CORVAIR TIMES

Bracket Diagrams Sunday 11:30 am $5 Brackets - 8 person 3 game HDCP - Handicap

2018 Automotive Fuel Economy Survey Report

Final Amateur & Master Major Sub Rankings for the Northwestern Regional Classic XV held at the Shelton Rifle & Pistol Club in, WA Match Date: 8/12/06

Votes Cast. Votes Cast

Database of Cherokee Benges by Jim Hicks

6.6 Optimization Problems III:

Car Show For Kids 2014 Winners

Statement of Votes Cast GENERAL MUNICIPAL ELECTION CAMDEN COUNTY, MISSOURI TUESDAY, APRIL 4, 2017 April Municipal General OFFICIAL RESULTS

BONNER COUNTY RESULTS GENERAL ELECTION NOVEMBER 4, 2008

Martin Band Instrument Company

Munster Masters Indoors Nenagh Indoor Arena, Dec 12th 2010

2007/08 Tax Commitment Account Name & Address Land Building Exemption Assessment Tax

Meeting Agenda Disturbance Monitoring SDT Project

Basic voltmeter use. Resources and methods for learning about these subjects (list a few here, in preparation for your research):

Update. This week A. B. Kaye, Ph.D. Associate Professor of Physics. Michael Faraday

The Road to 200 Hits

NC State Bullseye Pistol Championships Butner, NC September 25 & 26, LR Match

Stat 401 B Lecture 27

Mississippi Corvette Club Meeting Minutes FEBRUARY 2017

Math 135 S18 Exam 1 Review. The Environmental Protection Agency records data on the fuel economy of many different makes of cars.

Feasibility Report: DFW Airport Car Rental Companies. By: James Aller, Eric Knudsen, Michelle McAllister

Finding Aid for the Sidney Reznick Papers No online items

Vehicle Safety Risk Assessment Project Overview and Initial Results James Hurnall, Angus Draheim, Wayne Dale Queensland Transport

2016 IPA NATIONAL POWERLIFTING BENCH PRESS CHAMPIONSHIPS November 19-20, 2016 York Barbell Company York, Pennsylvania

MSS 240, BERNARD A. BEHREND COLLECTION, SEPARATION LIST

GENERAL ELECTION 2008 HOLMES COUNTY, FLORIDA 11/4/2008 Compiled Results PRESIDENT

Team standings. Team players

ANNUAL TOWN ELECTION MAY 19,2015

DATA QUALITY ASSURANCE AND PERFORMANCE MEASUREMENT OF DATA MINING FOR PREVENTIVE MAINTENANCE OF POWER GRID

Breaker Failure Protection PSRC K2 WG

Transcription:

Intro to ggplot2 Hadley Wickham Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University November 2010

HELLO my name is Hadley

had.co.nz/courses/ 10-tokyo

Outline

Data analysis is the process by which data becomes understanding, knowledge and insight

Understand Visualise Access Transform Model Communicate

Understand Visualise Access Transform Model Communicate

displ hwy 15 20 25 30 35 40 2 3 4 5 6 7 class 2seater compact midsize minivan pickup subcompact suv

4000 3000 count 2000 1000 0 56 58 60 62 64 66 68 70 depth

15000 price 10000 count 500 1000 1500 2000 5000 1 2 3 4 5 carat

George Georgia 0.04 0.03 0.02 0.01 0.0025 0.0020 0.0015 0.0010 0.0005 prop 1880 1900 1920 1940 1960 1980 2000 Georgie 1880 1900 1920 1940 1960 1980 2000 sex boy girl 4e 04 3e 04 2e 04 1e 04 1880 1890 1900 1910 1920 1930 1940 1950 year

0.95 0.90 0.85 prop 0.80 sex boy girl 0.75 0.70 1880 1900 1920 1940 1960 1980 2000 year

diff abs(mean) 0.5 1.0 1.5 2.0 Angel Bernice Billie Bonnie Carol Cecil Charles Charlie Clyde Connie Dale Dana David Eddie Elizabeth Frances Francis Frank Gail Gene George Hazel Helen Henry Ira Jackie James Jamie Jean Jerry Jesse Jessie Jimmie Joe John Johnnie Joseph June Kelly Lee Leslie Lynn Margaret Marion Mary Michael Ollie Ora Patsy Pearl Ray Richard Robert Robin Ruby Shannon Shirley Sidney Terry Thomas Tracy William Willie 0.5 1.0 1.5 2.0 2.5

Plotting basics

Learning a new language is hard!

Scatterplot basics install.packages("ggplot2") library(ggplot2)?mpg head(mpg) str(mpg) summary(mpg) Always explicitly specify the data qplot(displ, hwy, data = mpg)

displ hwy 15 20 25 30 35 40 2 3 4 5 6 7 qplot(displ, hwy, data = mpg)

Additional variables Can display additional variables with aesthetics (like shape, colour, size) or facetting (small multiples displaying different subsets)

displ hwy 15 20 25 30 35 40 2 3 4 5 6 7 class 2seater compact midsize minivan pickup subcompact suv Legend chosen and displayed automatically. qplot(displ, hwy, colour = class, data = mpg)

Your turn Try mapping different variables to the colour, size, and shape aesthetics. Is there a difference between discrete and continuous variables? What happens when you use multiple aesthetics? http://had.co.nz/courses/10-tokyo

Aside: workflow Keep a copy of the slides open so that you can copy and paste the code. For complicated commands, write them in the script editor and then copy and paste.

Discrete Continuous Colour Rainbow of colours Gradient from red to blue Size Discrete size steps Linear mapping between radius and value Shape Different shape for each Doesn t work

Faceting Small multiples displaying different subsets of the data. Useful for exploring conditional relationships. Useful for large data.

Your turn qplot(displ, hwy, data = mpg) + facet_grid(. ~ cyl) qplot(displ, hwy, data = mpg) + facet_grid(drv ~.) qplot(displ, hwy, data = mpg) + facet_grid(drv ~ cyl) qplot(displ, hwy, data = mpg) + facet_wrap(~ class)

Summary facet_grid(): 2d grid, rows ~ cols,. for no split facet_wrap(): 1d ribbon wrapped into 2d

cty hwy 15 20 25 30 35 40 10 15 20 25 30 35 qplot(cty, hwy, data = mpg) What s the problem with this plot?

cty hwy 15 20 25 30 35 40 10 15 20 25 30 35 qplot(cty, hwy, data = mpg, geom = "jitter") geom controls type of plot

class hwy 15 20 25 30 35 40 2seater compact midsize minivan pickup subcompact suv qplot(class, hwy, data = mpg) How could we improve this plot? Brainstorm for 1 minute.

reorder(class, hwy) hwy 15 20 25 30 35 40 pickup suv minivan 2seater midsize subcompact compact qplot(reorder(class, hwy), hwy, data = mpg) Incredibly useful technique!

reorder(class, hwy) hwy 15 20 25 30 35 40 pickup suv minivan 2seater midsize subcompact compact qplot(reorder(class, hwy), hwy, data = mpg, geom = "jitter")

40 35 30 hwy 25 20 15 pickup suv minivan 2seater midsize subcompact compact qplot(reorder(class, hwy), hwy, reorder(class, data hwy) = mpg, geom = "boxplot")

reorder(class, hwy) hwy 15 20 25 30 35 40 pickup suv minivan 2seater midsize subcompact compact qplot(reorder(class, hwy), hwy, data = mpg, geom = c("jitter", "boxplot"))

Your turn Read the help for reorder. Redraw the previous plots with class ordered by median hwy. How would you put the jittered points on top of the boxplots?

This work is licensed under the Creative Commons Attribution-Noncommercial 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc/ 3.0/us/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.