
Our modern lives are enabled by algorithms. Think of the last time you ordered a T-shirt online. Behind the scenes, a complex network of servers handled your order, credit card information, and delivery. Each of these steps involve decisions, often under constraint. Finding ways to improve these algorithms, even if by a fraction of a percent, can help improve user experiences and cut costs for businesses.
Examples include,
This research article explores ride-sharing as a case study due to its visual nature. The challenge is to match drivers and riders (passengers) together in the most optimal (in this case profitable) way. I will explore different solutions, comparing traditional optimization and deep reinforcement learning techniques, ultimately combining the two to outperform either individually.
The concepts, techniques, and takeaways from this project can be applied to other resource allocation-style problems like the ones described above.
<aside> 💡
If you want to create your own custom environment take a look at the official Gymnasium guide on creating a custom environment.
</aside>
Unlike supervised or unsupervised learning, reinforcement learning does not start with a dataset to train on. Instead, we gather experience by interacting with the world, progressively building our dataset. Ideally, the agent would interact in the same world as where it’s deployed. However, that is often not possible, e.g. self driving vehicles, where incorrect decisions can lead to serious accidents. Consequently, we often build a virtual world that acts as a safe playground for the agent to explore and learn. Instead of creating a new virtual world each time, one usually creates a gym environment. A gym environment follows a standardized format defined by the Gym API, originally developed by OpenAI and now maintained by the Farama Foundation. Every gym environment consists of a class with four core functions: make, reset, step, and render. By following this standard, we can easily experiment and swap different algorithms.

Cart Pole Gym Environment Agent needs to learn to keep the pole from falling by moving left/right
The reset and step functions define the logic of the virtual world and are the subject of this article. The full class definition (and all related code) can be found in my Colab Notebook.
To model a ride share service, we need drivers and riders. The assumptions in this paper are,
Putting this into code gives us the following class definitions,
class Driver():
def __init__(self, start_position: list):
self.position = start_position
self.speed = 1 / 3 # 1 / x
self.has_passenger = False
self.pickup, self.dropoff = None, None
@property
def isavailable(self):
return self.pickup is None or self.dropoff is None
def assign(self, pickup: list, dropoff: list):
self.pickup, self.dropoff = pickup, dropoff
def step(self):
""" Pickup and drop-off passenger. """
if self.isavailable:
return
# Successful drop-off
if self.has_passenger and np.allclose(self.position, self.dropoff):
self.has_passenger = False
self.pickup, self.dropoff = None, None
return
if np.allclose(self.position, self.pickup):
self.has_passenger = True
destination = self.dropoff if self.has_passenger else self.pickup
angle = np.arctan2(destination[1] - self.position[1], destination[0] - self.position[0])
self.position += self.speed * np.round(np.array([np.cos(angle), np.sin(angle)]))