Location

Toomey Hall, Room 140

Presentation Date

April 22, 2023, 12:00pm - 1:10pm

Session

Session 6e

Description

There are several methods for an autonomous vehicle (AV) to determine its behavior. One area of interest is in using game-theoretical models to determine an AV’s behavior. If AV designers know the relationship between the action taken by players and their corresponding reward functions, then the behaviors of all players can be predicted. This allows for the simulation of player vehicles at an unsupervised intersection. An unsupervised intersection is one where there are no stop signs or traffic lights to help the players make decisions. The motivation of this project is to study the safety and efficiency of three different game-theoretical solutions.

Behaviors for the players are calculated using Nash equilibrium best-response dynamics, Nash equilibrium potential function optimization, and Stackelberg equilibrium. In Nash equilibrium, both players are equal, and converge to a shared solution. In Stackelberg equilibrium, one player is the leader, and the other is a follower. The leader commits to a move, which influences the behavior of the follower. In this work, the leader is simply the player who is closer to the intersection at time zero. Algorithms have been designed to determine a player’s optimal behavior according to these three options. For this study, these were implemented in MATLAB. These behaviors were combined with each other to create games where player one plays an optimal move from one algorithm, while player 2 plays an optimal move from another algorithm. For example, player 1 plays the Nash-optimal move while player 2 plays the Stackelberg-optimal move. Also considered in this research is the cases where players disagree on their role for purposes of finding the Stackelberg equilibrium. A game can be played where both players consider themselves as the leader. Lastly, worst-case scenarios where one player maintains a constant speed with no regard for safety are also considered. In total, there are twelve different behavior combinations that are tested. For each behavior combination, one thousand games are played. The number of unsafe scenarios that occurred, the runtime of the simulations, and the average velocities of the players are recorded. By measuring these, this study can draw conclusions about which method among Nash best response, Nash potential function optimization, and Stackelberg equilibrium is the most promising.

The different behaviors are chosen to solve a certain challenge. In a simulation, players can easily be given the utility function of the other player. In this study, players have perfect information about the other player to correctly determine the optimal move. In the real world, this level of knowledge is not feasible. Players often do not know other players utility functions. By studying these behavior combinations, this study lets the players keep their perfect information about every utility function while also allowing the players to be wrong about the other players decisions. Importantly, these times when players are wrong leads to potentially dangerous situations. This is necessary to observe if the various game solutions are safer than the others.

A given game is considered to include an unsafe scenario if the smallest distance between the two cars during the game is less than five meters. This is not necessarily a collision between the two players, but it is dangerously close. Runtime is measured across the one thousand games of each behavior using built-in MATLAB functions. This is done to assess the real-use viability of a given method. Quick calculations are desirable to keep up with real-time scenarios. Recording velocities indicates if a solution method is more aggressive or conservative. A larger average velocity indicates that a solution method is more aggressive than others. Since our utility function includes a selfish component, this may be a positive attribute, if the method is safe. In this way, average velocity can be a tiebreaker of sorts.

The average runtime of a Stackelberg game is 0.025s. The average runtime of a Nash game with best-response dynamics is 0.01s. The average runtime of a Nash game with potential function optimization is 0.02s. Note that this is for one game, which is on a time scale of 100 seconds. All of these games can be fast enough for real-time use, though that may not hold with more complicated scenarios with more players.

Of the 4000 total games where player 1 played the lead Stackelberg-optimal move, players were at an unsafe distance in 322 games. Of the 4000 total games where player 1 played the Nash-optimal move from best-response dynamics, players were at an unsafe distance in 107 games. Of the 4000 total games where player 1 played the Nash-optimal move from potential function optimization, players were at an unsafe distance in 281 games.

The results of the simulations are in the process of being verified. The code is being tested to ensure that these results are not the fault of some incorrect syntax.

Meeting Name

32nd Annual Spring Meeting of the NASA-Mo Space Grant Consortium

Document Type

Presentation

Document Version

Final Version

File Type

text

Language(s)

English

Rights

© 2023 The Authors, all rights reserved.

Share

COinS
 
Apr 22nd, 12:00 PM Apr 22nd, 1:10 PM

Safety Performance of Different Game-Theoretical Models in an Unsupervised Intersection Crossing Scenario

Toomey Hall, Room 140

There are several methods for an autonomous vehicle (AV) to determine its behavior. One area of interest is in using game-theoretical models to determine an AV’s behavior. If AV designers know the relationship between the action taken by players and their corresponding reward functions, then the behaviors of all players can be predicted. This allows for the simulation of player vehicles at an unsupervised intersection. An unsupervised intersection is one where there are no stop signs or traffic lights to help the players make decisions. The motivation of this project is to study the safety and efficiency of three different game-theoretical solutions.

Behaviors for the players are calculated using Nash equilibrium best-response dynamics, Nash equilibrium potential function optimization, and Stackelberg equilibrium. In Nash equilibrium, both players are equal, and converge to a shared solution. In Stackelberg equilibrium, one player is the leader, and the other is a follower. The leader commits to a move, which influences the behavior of the follower. In this work, the leader is simply the player who is closer to the intersection at time zero. Algorithms have been designed to determine a player’s optimal behavior according to these three options. For this study, these were implemented in MATLAB. These behaviors were combined with each other to create games where player one plays an optimal move from one algorithm, while player 2 plays an optimal move from another algorithm. For example, player 1 plays the Nash-optimal move while player 2 plays the Stackelberg-optimal move. Also considered in this research is the cases where players disagree on their role for purposes of finding the Stackelberg equilibrium. A game can be played where both players consider themselves as the leader. Lastly, worst-case scenarios where one player maintains a constant speed with no regard for safety are also considered. In total, there are twelve different behavior combinations that are tested. For each behavior combination, one thousand games are played. The number of unsafe scenarios that occurred, the runtime of the simulations, and the average velocities of the players are recorded. By measuring these, this study can draw conclusions about which method among Nash best response, Nash potential function optimization, and Stackelberg equilibrium is the most promising.

The different behaviors are chosen to solve a certain challenge. In a simulation, players can easily be given the utility function of the other player. In this study, players have perfect information about the other player to correctly determine the optimal move. In the real world, this level of knowledge is not feasible. Players often do not know other players utility functions. By studying these behavior combinations, this study lets the players keep their perfect information about every utility function while also allowing the players to be wrong about the other players decisions. Importantly, these times when players are wrong leads to potentially dangerous situations. This is necessary to observe if the various game solutions are safer than the others.

A given game is considered to include an unsafe scenario if the smallest distance between the two cars during the game is less than five meters. This is not necessarily a collision between the two players, but it is dangerously close. Runtime is measured across the one thousand games of each behavior using built-in MATLAB functions. This is done to assess the real-use viability of a given method. Quick calculations are desirable to keep up with real-time scenarios. Recording velocities indicates if a solution method is more aggressive or conservative. A larger average velocity indicates that a solution method is more aggressive than others. Since our utility function includes a selfish component, this may be a positive attribute, if the method is safe. In this way, average velocity can be a tiebreaker of sorts.

The average runtime of a Stackelberg game is 0.025s. The average runtime of a Nash game with best-response dynamics is 0.01s. The average runtime of a Nash game with potential function optimization is 0.02s. Note that this is for one game, which is on a time scale of 100 seconds. All of these games can be fast enough for real-time use, though that may not hold with more complicated scenarios with more players.

Of the 4000 total games where player 1 played the lead Stackelberg-optimal move, players were at an unsafe distance in 322 games. Of the 4000 total games where player 1 played the Nash-optimal move from best-response dynamics, players were at an unsafe distance in 107 games. Of the 4000 total games where player 1 played the Nash-optimal move from potential function optimization, players were at an unsafe distance in 281 games.

The results of the simulations are in the process of being verified. The code is being tested to ensure that these results are not the fault of some incorrect syntax.