ANALYSIS OF TRAFFIC SPEEDS IN NEW YORK CITY. Austin Krauza BDA 761 Fall 2015

Similar documents
KNIME Software Pieces KNIME.com AG. All Rights Reserved. 1

An Investigation of the Distribution of Driving Speeds Using In-vehicle GPS Data. Jianhe Du Lisa Aultman-Hall University of Connecticut

What s Cooking. Bernd Wiswedel KNIME KNIME.com AG. All Rights Reserved.

Congestion Pricing for New York City

WIM #37 was operational for the entire month of September Volume was computed using all monthly data.

WIM #29 was operational for the entire month of October Volume was computed using all monthly data.

COUNT, CLASSIFICATION & SPEED SAMPLE REPORTS

EMPIRE MOCK TRIAL EDUCATE. CONNECT. EMPOWER.

Traffic Data For Mechanistic Pavement Design

WIM #40 is located on US 52 near South St. Paul in Dakota county.

What s Cooking. Bernd Wiswedel KNIME KNIME AG. All Rights Reserved.

SAN PEDRO BAY PORTS YARD TRACTOR LOAD FACTOR STUDY Addendum

ITSMR Research Note. Motorcyclists and Impaired Driving ABSTRACT INTRODUCTION KEY FINDINGS. September 2013

Driver behavior characterization in roundabout crossings

Formation Flying Experiments on the Orion-Emerald Mission. Introduction

HASIL OUTPUT SPSS. Reliability Scale: ALL VARIABLES

Multi-level Feeder Queue Dispatch based Electric Vehicle Charging Model and its Implementation of Cloud-computing

WIM #41 CSAH 14, MP 14.9 CROOKSTON, MINNESOTA APRIL 2014 MONTHLY REPORT

DC Food Truck Vending Location Trading Platform

WIM #48 is located on CSAH 5 near Storden in Cottonwood county.

Important Formulas. Discrete Probability Distributions. Probability and Counting Rules. The Normal Distribution. Confidence Intervals and Sample Size

JANUARY 2018 MON TUE WED THU FRI SAT SUN

What s Cooking. Bernd Wiswedel KNIME KNIME AG. All Rights Reserved.

PSERC Webinar - September 27,

Lot Collection Notice

Real-time Bus Tracking using CrowdSourcing

Suffix arrays, BWT and FM-index. Alan Medlar Wednesday 16 th March 2016

WIM #31 US 2, MP 8.0 EAST GRAND FORKS, MN JANUARY 2015 MONTHLY REPORT

What s New. Bernd Wiswedel KNIME KNIME AG. All Rights Reserved.

E-ZPass Vehicle Descriptions

CAMPUS CONNECTOR MONDAY - THURSDAY. bold italicized times denote p.m. Monday - Thursday continued on next page. Effective August 29th, 2016

fruitfly fecundity example summary Tuesday, July 17, :13:19 PM 1

THERMOELECTRIC SAMPLE CONDITIONER SYSTEM (TESC)

FHWA/IN/JTRP-2000/23. Final Report. Sedat Gulen John Nagle John Weaver Victor Gallivan

SUCCESSFUL PERFORMANCE PAVEMENT PROJECTS 2015 TxAPA Annual Meeting September 23, 2015 Austin District Mike Arellano, P.E. Date

DEPARTMENT OF TRANSPORTATION

Time Series Topics (using R)

New York City Drivers Manual

MANHATTAN VILLAGE ENHANCEMENT PROJECT

Michigan. Traffic. Profile

Michigan State Police (MSP) Traffic Safety Network Traverse Bay Area

Interstate Freight in Australia,

Performance Measures Using

. Enter. Model Summary b. Std. Error. of the. Estimate. Change. a. Predictors: (Constant), Emphaty, reliability, Assurance, responsive, Tangible

USE RESTRICTED 23 USC 409

Start Time. LOCATION: Scotts Valley Dr QC JOB #: SPECIFIC LOCATION: 0 ft from Tabor St. DIRECTION: EB/WB CITY/STATE: Scotts Valley, CA

Oregon DOT Slow-Speed Weigh-in-Motion (SWIM) Project: Analysis of Initial Weight Data

ENGINE VARIABLE IMPACT ANALYSIS OF FUEL USE AND EMISSIONS FOR HEAVY DUTY DIESEL MAINTENANCE EQUIPMENT

Michigan. Traffic. Profile

Michigan State Police (MSP) Post 21 - Metro North

Appendix B STATISTICAL TABLES OVERVIEW

EXST7034 Multiple Regression Geaghan Chapter 11 Bootstrapping (Toluca example) Page 1

Traffic Safety Network Huron Valley

Michigan State Police (MSP) Post 21 - Metro North

Understanding and Identifying Crashes on Curves for Safety Improvement Potential in Illinois

Washtenaw County Traffic Crash Data & Year Trends. Reporting Criteria

M42. Between Circle Line Pier and East Side, via 42 St. Local Crosstown Service. Bus Timetable. Effective as of September 3, 2017

Washtenaw County Traffic Crash Data & Year Trends. Reporting Criteria

THE CORNERSTONE APARTMENTS TRAFFIC IMPACT STUDY R&M PROJECT NO

SPATIAL AND TEMPORAL PATTERNS OF FATIGUE RELATED CRASHES IN HAWAII

Puerto Rico Observational Survey of Seat Belt Use, 2017

National Household Travel Survey Add-On Use in the Des Moines, Iowa, Metropolitan Area

Evaluation of Renton Ramp Meters on I-405

Alberta Speeding Convictions and Collisions Involving Unsafe Speed

Monthly data generated on Wednesday, July 31, 2013 at 13:04 UTC

QM12/QM42. Between Forest Hills, Queens, and Midtown, Manhattan QM12 via 6 Av in Midtown QM42 via 3 Av in Midtown. Express Service Weekdays Only

Missouri Seat Belt Usage Survey for 2017

LAMPIRAN I Data Perusahaan Sampel kode DPS EPS Ekuitas akpi ,97 51,04 40,

PROCEDURES FOR ESTIMATING THE TOTAL LOAD EXPERIENCE OF A HIGHWAY AS CONTRIBUTED BY CARGO VEHICLES

Bus Timetable Effective as of April 7, 2013 Local Crosstown Service

FINAL REPORT AP STATISTICS CLASS DIESEL TRUCK COUNT PROJECT

Presented at the 2012 Aerospace Space Power Workshop Manhattan Beach, CA April 16-20, 2012

Dell EMC SCv ,000 Mailbox Exchange 2016 Resiliency Storage Solution using 10K drives

Van Buren County Traffic Crash Data & Year Trends. Reporting Criteria

Lampiran IV. Hasil Output SPSS Versi 16.0 untuk Analisis Deskriptif

REPORT No EN-S AJ

Use of the ERD for administrative monitoring of Theta:

ClearRoute Training Courses

WIM #41 CSAH 14, MP 14.9 CROOKSTON, MINNESOTA MAY 2013 MONTHLY REPORT

Accelerating the Development of Expandable Liner Hanger Systems using Abaqus

What s cooking. Bernd Wiswedel KNIME.com AG. All Rights Reserved.

Drilling Example: Diagnostic Plots

Motor Trend Yvette Winton September 1, 2016

Rapid Upgrades With Pg_Migrator

M104. Between Harlem and Port Authority Bus Terminal. Local Service. Bus Timetable. Effective as of September 3, New York City Transit

CommWeigh Axle Standard Module

Kent County Traffic Crash Data & Year Trends. Reporting Criteria

1 TO 2 2 TO 3 12 TO 1 10 TO TO 12

North Shore Alternatives Analysis. May 2012

Cambridge Final Exam Timetable June 2018 Administrative zone 3 (Pre-U)

Embedded Torque Estimator for Diesel Engine Control Application

Survey Report Informatica PowerCenter Express. Right-Sized Data Integration for the Smaller Project

Appendix SAN San Diego, California 2003 Annual Report on Freeway Mobility and Reliability

DOWNTOWN PARKING STUDY AND STRATEGIC PLAN

What s new. Bernd Wiswedel KNIME.com AG. All Rights Reserved.

Freight Performance Measures Using Truck GPS Data and the Application of National Performance Measure Research Data Set (NPMRDS)

Queuing Models to Analyze Electric Vehicle Usage Patterns

Study Area and Location District PSA Ward ANC Phase Description B Existing 600 Block New York Avenue Northeast Westbound

PRODUCT PORTFOLIO. Electric Vehicle Infrastructure ABB Ability Connected Services

USE RESTRICTED 23 USC 409

Transcription:

ANALYSIS OF TRAFFIC SPEEDS IN NEW YORK CITY Austin Krauza BDA 761 Fall 2015

Problem Statement How can Amazon Web Services be used to conduct analysis of large scale data sets? Data set contains over 80 million records in CSV Format How does the average speed of the Verrazano- Narrows Bridge and the Holland tunnel fluctuate: Over a 168 Hour Period (One Week) Over 11 Months (September 2014- July 2015) 12/10/2015 Austin Krauza 2

Software Packages Used Microsoft Excel SAS (Statistical Analysis System) Amazon Web Services Amazon Elastic Map Reduce (EMR) Hive Hadoop Hue Amazon S3 Web Storage 12/10/2015 Austin Krauza 3

What is Amazon Web Services? Cloud Computing Platform Offers various services offsite Low cost usage for users Provides various platforms Hadoop AWS S3 MapReduce 12/10/2015 Austin Krauza 4

Advantages to using AWS Low cost to the user Easily scalable Provides simple interfaces for novice users Allows full customization for advanced users 12/10/2015 Austin Krauza 5

Information Sources Data collected from TRANSCOM scraped using a PHP Script 12/10/2015 Austin Krauza 6

Sample Data id date time stationid type speed traveltime traveltimefloat 1 11/14/2014 23:50 23:50:00 4616439 Averaged 90 94 94 2 11/14/2014 23:50 23:50:00 4575368 Averaged 106 208 208 3 11/14/2014 23:50 23:50:00 4616246 Averaged 92 76 76 4 11/14/2014 23:50 23:50:00 4616223 Averaged 76 86 86 5 11/14/2014 23:50 23:50:00 4575379 Averaged 92 558 558 6 11/14/2014 23:50 23:50:00 4616352 Averaged 90 135 135 7 11/14/2014 23:50 23:50:00 20484203 Averaged 97 54 54 8 11/14/2014 23:50 23:50:00 4575426 Averaged 114 190 190 9 11/14/2014 23:50 23:50:00 5419028 Averaged 111 12 12 10 11/14/2014 23:50 23:50:00 5361701 Averaged 69 107 107 12/10/2015 Austin Krauza 7

Sensors on the Staten Island Expressway 12/10/2015 Austin Krauza 8

Location of Sensors in New York City 12/10/2015 Austin Krauza 9

Clean-up Using SAS data dec2; set dec2; year=substr(var2,1,4); month=substr(var2,6,2); day=substr(var2,9,2); run; newdate= mdy(month,day,year); dow=weekday(newdate); hour=substr(var3,1,2); minute=substr(var3,4,2); how=(((weekday(newdate)-1)*24)+hour); data dec1; set dec1; format newdate date9.; run; proc summary data=dec2 noprint; class newdate; output out=o1; run; 12/10/2015 Austin Krauza 10

Hive Script: External Table drop table transcomext; CREATE external TABLE `transcomext`( `id` int, `datetime` string, `time` string, `stationid` int, `type` string, `speed` int, `traveltime` int, `traveltimefloat` int, `year` smallint, `month` int, `day` bigint, `date` string, `dow` int, `hour` bigint, `minute` bigint, `how` int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.textinputformat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.hiveignorekeytextoutputformat' LOCATION 's3://traffic-111715/data/'; 12/10/2015 Austin Krauza 11

Hive Query: Analysis select avg(speed) as avgspeed, CONCAT(year,'-',month,'-','1') as month1, how as HourWeek, stationid as station from transcomext where stationid in (4763652,4763649,4616219,4763655,4763648, 4616204,4751366,4751367,4456501,4456502) group by stationid, how, CONCAT(year,'-',month,'-','1'); 12/10/2015 Austin Krauza 12

Results of Map Reduce Job 12/10/2015 Austin Krauza 13

Results of Map Reduce Job Statistic Value Duration 3 minutes 6 seconds File Written 14.21765 MB HDFS Written 0.672917 MB S3 Bytes Read 7910.784328 MB (7.9 GB) Map Input Records 79904047 Map Functions Completed 29 Reduce Functions Completed 31 12/10/2015 Austin Krauza 14

Average Speed (Mph) Analysis 50 Average Speeds over 168 Hour Week 45 40 35 30 25 20 15 10 5 0 0 12 24 36 48 60 72 84 96 108 120 132 144 156 Hour of Week Holland Tunnel (NY to NJ) Average of Selected Stations 12/10/2015 Austin Krauza 15

Average Speed (Mph) Analysis 55 Average Speeds over 168 Hour Week 50 45 40 35 30 1 13 25 37 49 61 73 85 97 109 121 133 145 157 Hour of Week Verrazano- Narrows Bridge (SI to BK) Average of Selected Stations 12/10/2015 Austin Krauza 16

Average Speed (Mph) Analysis 60 Average Speeds over 168 Hour Week 50 40 30 20 10 0 0 12 24 36 48 60 72 84 96 108 120 132 144 156 Date Holland Tunnel (NY to NJ) Verrazano- Narrows Bridge (SI to BK) Average of Selected Stations 12/10/2015 Austin Krauza 17

Verrazano Speed (Mph) Holland Speed (Mph) Analysis 52 50 48 46 44 42 40 38 36 34 32 30 30 Day Moving Averages 35 34 33 32 31 30 29 28 27 26 25 Date Verrazano 30 Day Moving Average Linear (Verrazano 30 Day Moving Average) Holland Tunnel 30 Day Moving Average Linear (Holland Tunnel 30 Day Moving Average) 12/10/2015 Austin Krauza 18

Speed (Mph) Analysis Average Speed on the Verrazano Narrows Bridge (Brooklyn Bound) 58 56 54 52 50 48 46 44 42 40 38 36 34 32 30 y = -0.0335x + 1452.7 R² = 0.789 Date Average Speed 30 Day Moving Average 60 Day Moving Average Linear (30 Day Moving Average) 12/10/2015 Austin Krauza 19

Speed (Mph) Analysis 42 Average Speed on the Holland Tunnel (New York Bound) 40 38 36 34 32 30 28 26 24 22 y = -0.0073x + 337.23 R² = 0.2081 Date Average Speed 30 Day Moving Average 60 Day Moving Average Linear (30 Day Moving Average) 12/10/2015 Austin Krauza 20

Regression Analysis SUMMARY OUTPUT Regression Statistics Multiple R 0.532820115 R Square 0.283897275 Adjusted R Square 0.281436441 Standard Error 2.852563774 Observations 293 ANOVA df SS MS F Regression 1.00E+00 9.39E+02 9.39E+02 1.15E+02 Residual 2.91E+02 2.37E+03 8.14E+00 Total 2.92E+02 3.31E+03 Coefficients Standard Error t Stat P-value Intercept 5.85E+00 3.60E+00 1.62E+00 1.06E-01 HOT30Day 1.27E+00 1.18E-01 1.07E+01 6.89E-23 12/10/2015 Austin Krauza 21

Low Periods: VNZ to Brooklyn Rank Speed (MPH) HOW Time (EST) 168 33.78938594 56 Tuesday 8am 167 34.12049655 32 Monday 8am 166 35.14218241 55 Tuesday 7am 165 35.27610664 31 Monday 7am 164 35.28588222 58 Tuesday 10am 12/10/2015 Austin Krauza 22

Low Periods: Holland Tunnel to NY Rank Speed (MPH) HOW Time (EST) 168 13.75552926 138 Friday 7pm 167 12.171702450 137 Friday 6pm 166 13.52144944 114 Thursday 7pm 165 15.08261256 17 Thursday 6pm 164 15.49752670 18 Thursday 5pm 12/10/2015 Austin Krauza 23

Conclusions How can Amazon Web Services be used to conduct analysis of large scale data sets? Amazon Web Services is an effective resource to analyze large scale data sets Data is stored into the Hadoop File System using Amazon S3 Storage Systems Data processed using Map Reduce after pre-processing How does the average speed of the Verrazano- Narrows Bridge and the Holland tunnel fluctuate? Highs: VZN to Brooklyn: 2 am HOT to NY: 4 am Lows: VZN to Brooklyn: 7 am HOT to NY: 5 pm 12/10/2015 Austin Krauza 24

Further Research Predictive Analysis to: Determine the speed at a given time Determine the best route using real time traffic conditions 12/10/2015 Austin Krauza 25