AGENT-BASED MODELING, SIMULATION, AND CONTROL SOME APPLICATIONS IN TRANSPORTATION Montasir Abbas, Virginia Tech (with contributions from past and present VT-SCORES students, including: Zain Adam, Sahar Ghanipoor-Machiani, Linsen Chong, and Milos Mladenovic) Workshop III: Traffic Control New Directions in Mathematical Approaches for Traffic Flow Management IPAM October 27, 2015 1
Presentation Outline Agent based modeling what? why? And how? What is the learning framework? What are the techniques? Examples of learning: Controller agents Driver behavior agents Vehicle agents What if we don t incorporate learning? Conclusions 2
3 Background
Learning Can we predict a condition or a behavior/response from a wealth of data? Can we model and interpret a phenomenon in a state-action framework? The same input data can lead to different performance measures, and we are the reason! 4
Motivation Varying Traffic Behavior Maneuvers Naturalistic Data Detailed Behavioral Data Trajectories B C A 45 t= 4 sec 35 D E Y (m) 25 15 t= 2 sec 5 t= 0 sec (reference time) 5 15 25 35 45 55 X (m) F G H I VISSIM Simulation Advanced VISSIM API- Agent Interface Trained Agents 5
A Learning Framework State S Diagram Policy P State 1 State 2 State 3 State Other states State 6 State 5 State 4 Action 6
Learning Techniques State S Diagram Policy P Machine Learning Q-Learning State 2 State 6 State 1 Other states State 5 State 3 State 4 State Action Reinforcement Learning Etc. 7
Q-Learning Acting on environment, receiving rewards, selecting actions to reach a goal 8
Application: Dilemma Zone Problem Application of learning to controller and to humans Controllers making decisions Humans learning from mistakes 9
To stop or not to stop? That is the question! 10
11 To stop
12 To go
13 Controller Agent Learning the Policy
Environment s State Variables: Total number of vehicles in DZ Agent s Actions: - End the Green - Extend the Green Reward: Vehicles caught in DZ Q-learning algorithm parameters: Learning rate: 0.01 Discount rate: 0.5 14
Off-line and Online Learning Find P* with simulation Update Q-table with real data Markovian Traffic State Estimation 9 8 7 6 5 S e 4 3 2 1 0 0 5 10 15 20 25 30 35 40 Time to max-out (sec) 15
Human Learning Model Brain Analogy Semantic Memory Trained Q table Procedural Memory State Q table E-Greedy Action (stop/go) Updated Q table Dataset Memory Decay Propensity Episodic Memory Distractions Working Memory Emotions
Dealing with High State Dimensionality (Naturalistic driving behavior study)* Training input: traffic states and actions Training output: acceleration and steering Input variables discretized using fuzzy sets Continuous actions are generated from discrete actions Uses all the safety critical events available in training *Safety and Mobility Agent-based Reinforcement-learning Traffic Simulation Add-on Module (SMART SAM) 17
NFACRL Framework S i =the i th input variable (state variable) K =number of input variables NM i =number of fuzzy sets or membership functions for the S i M i a(i) =a(i) th fuzzy set or membership function for the i th input variable R j =the j th fuzzy rule N=number of fuzzy rules λ j =weight between j th fuzzy rule and critic w q j =weight between j th fuzzy rule and action q V =critic value A q =output of q th action Where i = 1, K, a i = 1,.. NM i, j = 1,, N and q = 1,.., P 18
Applications and Cross Validation Test the heterogeneity of the drivers Training: Used the data from Agent A in training with its behavioral rules as output Validation: Used the output rule of Agent A and applied it to driver B Heterogeneity of Agent A, B is represented by degree of accuracy in validation 19
Agent A: Event 1 0.2 0.1 Longitidinal Action Estimation Naturalistic Agent 0.05 0.04 Lateral Action Estimation Naturalistic Agent Acceleration (g) 0-0.1-0.2-0.3 Yaw Angle (radius) 0.03 0.02 0.01 0-0.01-0.02-0.4-0.03-0.5 0 50 100 150 200 250 300 350 Time (0.1s) -0.04 0 50 100 150 200 250 300 350 Time (0.1s) 20
Agent A: Event 2 0.04 0.03 Longitidinal Action Estimation Naturalistic Agent 0.04 0.03 Lateral Action Estimation Naturalistic Agent 0.02 Acceleration (g) 0.01 0-0.01-0.02 Yaw Angle (radius) 0.02 0.01 0-0.01-0.03-0.04-0.02-0.05 0 50 100 150 200 250 300 350 400 Time (0.1s) -0.03 0 50 100 150 200 250 300 350 400 Time (0.1s) 21
Driver Agent B 0.1 0.05 Longitidinal Action Estimation Naturalistic Agent 0.07 0.06 Lateral Action Estimation Naturalistic Agent 0 0.05 Acceleration (g) -0.05-0.1-0.15-0.2 Yaw Angle (radius) 0.04 0.03 0.02 0.01-0.25 0-0.3-0.01-0.35 0 100 200 300 400 500 600 Time (0.1s) -0.02 0 100 200 300 400 500 600 Time (0.1s) 22
Driver Agent A: Own Behavior 0.2 0.1 Longitidinal Action Estimation Naturalistic Agent 0.05 0.04 Lateral Action Estimation Naturalistic Agent 0.03 Acceleration (g) 0-0.1-0.2-0.3 Yaw Angle (radius) 0.02 0.01 0-0.01-0.02-0.4-0.03-0.5 0 50 100 150 200 250 300 350 Time (0.1s) -0.04 0 50 100 150 200 250 300 350 Time (0.1s) 23
Driver B: Own Behavior 0.1 0.05 Longitidinal Action Estimation Naturalistic Agent 0.07 0.06 Lateral Action Estimation Naturalistic Agent 0 0.05 Acceleration (g) -0.05-0.1-0.15-0.2 Yaw Angle (radius) 0.04 0.03 0.02 0.01-0.25 0-0.3-0.01-0.35 0 100 200 300 400 500 600 Time (0.1s) -0.02 0 100 200 300 400 500 600 Time (0.1s) 24
Driver A: Using Behavior from B 0.3 0.2 Longitidinal Action Estimation Naturalistic Agent 0.05 0.04 Lateral Action Estimation Naturalistic Agent 0.03 Acceleration (g) 0.1 0-0.1-0.2 Yaw Angle (radius) 0.02 0.01 0-0.01-0.02-0.3-0.03-0.4 0 50 100 150 200 250 300 350 Time (0.1s) -0.04 0 50 100 150 200 250 300 350 Time (0.1s) 25
Driver B: Using Behavior from A Heterogeneity is clear 0.1 0.05 Longitidinal Action Estimation Naturalistic Agent 0.06 0.05 Lateral Action Estimation Naturalistic Agent 0 0.04 Acceleration (g) -0.05-0.1-0.15-0.2 Yaw Angle (radius) 0.03 0.02 0.01 0-0.25-0.01-0.3-0.02-0.35 0 100 200 300 400 500 600 Time (0.1s) -0.03 0 100 200 300 400 500 600 Time (0.1s) 26
Mega-Agent Behavior Mega-Agent behaves as Driver B 0.1 0.05 Longitidinal Action Estimation Naturalistic Agent 0.06 0.05 Lateral Action Estimation Naturalistic Agent Acceleration (g) 0-0.05-0.1-0.15-0.2 Yaw Angle (radius) 0.04 0.03 0.02 0.01-0.25-0.3-0.35 0 100 200 300 400 500 600 Time (0.1s) 0-0.01-0.02 0 100 200 300 400 500 600 Time (0.1s) 27
Comparison of Mega-Agent to Cross Validation Result Degree of accuracy: R square Event Agent A Agent B Mega long lat long lat long lat Event A 0.98 0.967 0.81 0.83 0.98 0.95 Event B 0.82 0.6 0.97 0.92 0.97 0.9 28
But Why NOT Statistical Modeling? Would lead to wrong conclusions! 29
Future CV/AV Applications Multi-modal applications: modeling, simulation, and optimization Accounting for different priorities, including emergency vehicles Utilization of the computing capabilities of CV/AV Linking arterial control to freeway management scenarios Characterizing and changing network performance 30
Performance Measures Multi-agent System Framework Vehicle Agents User and System Requirements User-Controlled AI-Controlled High-priority Token-based PL selection system AI PL selection system based on performance Pre-set PL based on vehicle type ABM system System Configuration and ABMS Rules Reservation Matrix Revocation-enabled FIFO Trajectory Adjustment Fuel and Emission Optimization Road and Vehicle Characteristics Microscopic simulation framework for system evaluation 31
Multi-agent System Framework Distance Time t1, a1, t2, a2 PI RD State Here I Am RD Required Delay for a vehicle after arriving at the intersection until higher priority vehicles clear all conflict tiles Speed t1 a 1 t 2 a 2 Time Time Rather than driving with constant speed, come to a complete stop for a duration of RD before resuming speed, a vehicle follows a modified trajectory to delay its arrival by RD 32
33
34 Negotiating an Intersection
2.7 5.6 8.5 11.4 14.3 17.2 20.1 23.0 25.9 28.8 31.7 34.6 37.5 40.4 43.3 46.2 49.1 52.0 54.9 57.8 60.7 63.6 66.5 69.4 72.3 75.2 78.1 81.0 83.9 86.8 89.7 92.6 95.5 98.4 101.3 2.3 5.2 8.1 11.0 13.9 16.8 19.7 22.6 25.5 28.4 31.3 34.2 37.1 40.0 42.9 45.8 48.7 51.6 54.5 57.4 60.3 63.2 66.1 69.0 71.9 74.8 77.7 80.6 83.5 86.4 89.3 92.2 95.1 98.0 100.9 Distance Distance Experiment Setup Simulating high and low priority levels in some approaches Tabulated delay values and vehicle trajectories for different approaches Time-Space Diagram for Phase 2 Time-Space Diagram for Phase 4 1620 1620 1.0-4.4 810 0 2.0-4.4 4.0-10.2 26.0-10.2 32.0-4.6 41.0-4.8 50.0-4.6 56.0-10.2 57.0-4.4 70.0-10.2 78.0-4.6 85.0-4.6 810 114.0-4.6 0 8.0-4.6 9.0-4.4 10.0-4.8 11.0-4.4 12.0-4.8 13.0-4.1 14.0-4.6 15.0-4.1 18.0-4.6 20.0-4.4 21.0-4.8 23.0-4.6 28.0-4.4 Time Time 29.0-4.4 35
Experiment Results 36 Agents adapt by forming dense platoons to pass through large gaps more efficiently Interesting emergent behavior can be observed from simple interaction rules Low priority agents are sensitive to traffic demand level Frequent EV calls re-synch the EV approach 250.0 200.0 150.0 100.0 50.0 0.0 10 11 12 13 14 15 16 Phase % EV, Ph2 Scenario 1 2 3 4 5 6 7 8 PL 1 2 1 3 1 2 1 3 4 10 200 200 200 200 200 200 200 200 0 11 400 400 400 400 400 400 400 400 0 12 400 600 400 600 400 600 400 600 0 13 400 800 400 800 400 800 400 800 0 14 200 200 200 200 200 200 200 200 10 15 400 400 400 400 400 400 400 400 20 16 400 600 400 600 400 600 400 600 30 1 2 3 4 5 6 7 8
Concluding Remarks Intelligent agents can capture individual learning, and agent-based modeling can capture the emerging system behavior Think state-action framework it can explain a lot of things Win the chess game, not just the next move 37