
<aside> 🐍
Check out Colab Notebook
</aside>
change rev function, l1 norm, change ratio of drivers/passengers (swap),
Our modern lives are enabled by algorithms. Think of the last time you ordered a T-shirt online. Behind the scenes, a complex network of servers handled your order, credit card information, and delivery. Each of these steps involve decisions, often under constraint. Finding ways to improve these algorithms, even if by a fraction of a percent, can help improve user experience and save money for businesses.
Examples include,
This research article explores ride-sharing as a case study. The challenge is to match drivers and riders (passengers) together in the most optimal (in this case profitable) way. I will explore different solutions, comparing traditional optimization and a deep reinforcement learning techniques, ultimately combining the two to outperform either individually.
Many of the concepts, techniques, and takeaways can be applied to other resource allocation-style problems.
<aside> 💡
If you want to create your own custom environment take a look at the official Gymnasium guide on creating a custom environment.
</aside>
Unlike supervised or unsupervised learning, reinforcement learning does not start with a dataset to train on. Instead, we gather experience by interacting with the world, progressively building our dataset. Ideally, the agent would interact in the same world as where it’s deployed. However, that is often not possible. Consider self driving vehicles, where incorrect decisions can lead to serious accidents. Consequently, we often build a virtual world that acts as a safe playground for the agent to explore and learn. Instead of creating a new virtual world each time, one usually creates a gym environment. A gym environment follows a standardized format defined by the Gym API, originally developed by OpenAI and now maintained by the Farama Foundation. Every gym environment consists of a class with four core functions: make, reset, step, and render. By following this standard, we can easily experiment and swap different algorithms.

Cart Pole Gym Environment Agent needs to learn to keep the pole from falling by moving left/right
The reset and step functions define the logic of the virtual world and are the subject of this article. The full class definition (and all related code) can be found in my the Colab Notebook.
To model a ride share service, at the very least, we need drivers and riders. The assumptions in this paper are,
Putting this into code gives us the following class definitions,