Atlas

change rev function, l1 norm, change ratio of drivers/passengers (swap),

Our modern lives are enabled by algorithms. Think of the last time you ordered a T-shirt online. Behind the scenes, a complex network of servers handled your order, credit card information, and delivery. Each of these steps involve decisions, often under constraint. Finding ways to improve these algorithms, even if by a fraction of a percent, can help improve user experience and save money for businesses.

Examples include,

Google’s Ad business: every time a user enters a page Google algorithms need to do a price bidding to decide which advertiser gets the spot.
Amazon Web Services: compute needs to be allocated in large servers in an efficient way
Ride Sharing (Uber, Lyft): when you need to go to the airport and request an Uber, Uber needs to find the best driver and match you accordingly.
Warehouse storage:

This research article explores ride-sharing as a case study. The challenge is to match drivers and riders (passengers) together in the most optimal (in this case profitable) way. I will explore different solutions, comparing traditional optimization and a deep reinforcement learning techniques, ultimately combining the two to outperform either individually.

Many of the concepts, techniques, and takeaways can be applied to other resource allocation-style problems.

Building the World

<aside> 💡

If you want to create your own custom environment take a look at the official Gymnasium guide on creating a custom environment.

</aside>

Unlike supervised or unsupervised learning, reinforcement learning does not start with a dataset to train on. Instead, we gather experience by interacting with the world, progressively building our dataset. Ideally, the agent would interact in the same world as where it’s deployed. However, that is often not possible. Consider self driving vehicles, where incorrect decisions can lead to serious accidents. Consequently, we often build a virtual world that acts as a safe playground for the agent to explore and learn. Instead of creating a new virtual world each time, one usually creates a gym environment. A gym environment follows a standardized format defined by the Gym API, originally developed by OpenAI and now maintained by the Farama Foundation. Every gym environment consists of a class with four core functions: make, reset, step, and render. By following this standard, we can easily experiment and swap different algorithms.

Cart Pole Gym Environment
Agent needs to learn to keep the pole from falling by moving left/right

Cart Pole Gym Environment Agent needs to learn to keep the pole from falling by moving left/right

The reset and step functions define the logic of the virtual world and are the subject of this article. The full class definition (and all related code) can be found in my the Colab Notebook.

To model a ride share service, at the very least, we need drivers and riders. The assumptions in this paper are,

Passengers spawn following a probability distribution function (PDF). Each passenger has desired drop-off destination, sampled from a conditional PDF based on their location. Lastly, each passenger is willing to wait for a maximum amount of time before disappearing.
Drivers are initially spawned on the map with their own PDF. Each driver can pickup a single passenger and bring them to their destination. The amount of drivers stays constant in the simulation.

Putting this into code gives us the following class definitions,