It’s pleasant evening in Bengaluru. You can see from your window, the clouds in a drizzle. Enough to cool down the summer heat, but not enough to stress test the city’s infamous roads. It’s Friday evening , it had been a perfect week, and you hear the cabin crew announce, that the flight has landed.

The ninety minute flight covered a thousand of the one thousand and forty eight kilometer distance home from Mumbai, the remainder forty eight…

You pull out your phone to book an Uber :
“No Cars Available”
“No Cars Available”
“No Cars Available”
“No Cars Available”
“No Cars Available”
“No Cars Available”
“Ride booked”

And so the nightmare begins.
Does this sound familiar, across destinations, across cities, across countries? You’ve done the bulk of the travel, but the last leg of journey, from the airport to the desired destination, is always the longest?

Far too often I had found myself in this situation, and it has never ceased to amaze me. Recently I found an ancient dataset on Kaggle, and picked it up for kicks, to see if I’d arrive at the same conclusions that Uber might since have implemented, or very different ones

The Data

It’s a log of trips booked to and from over airport over one week in July 2016. It contains information about
- Trips requested, Pickup location, driver id, pickup time and drop time.
This is a lot of usable information for an exploratory analysis.

It’s a fairly clean dataset, requiring only one edit to ensure format consistencies around the datetime columns — “Request Timestamp and Drop Timestamp”

Of the 6745 trips requested, only 2831 were completed. Of the ones not completed, 1264 were cancelled and 2650 simply could not have a car allocated to the request. Given this is a week’s worth of data, the first thing to check for is this data skewed by one day or is largely consistent across the week.

From the above figure, we can see that stats for each day are fairly consistent, with ~1350 requests per day close of half of which get serviced.

To take a closer look, lets divide the day into time slots and see if we observe the same consistency.

While there is a slight variance in time slot, we can still see that for trips starting at the airport, the busiest period begins in the evening and peaks in the 6:00pm — 9:00 pm period.
Whereas for trips starting in the city, the busy period is much earlier in the day peaking in the 6:00am — 9:00am slot.

Figure 3 is one of two large graphs, feel free to skip over the commentary if it already makes sense.

If you are anyway waiting for a cab, read along.

Slicing up the data by day and looking for patterns in trips completed, cancelled, or simply no cars available, we observe

1. The most number of trips completed happen fairly 6:00am — 9:00 am, and 6:00–9:00pm slot.

2. The early hours of the day see a lot of cancellations, the evenings are plagued by a supply demand mismatch.

Zooming further for a day (we have previously agreed that each day can be representative for the week) we see that Requests that are initiated in the city are the ones cancelled most often, and the supply crunch is only seen at the airport.

The supply crunch is further demonstrated in figure 5 where we see the very tall orange sky scraper dwarfing the blue one in the evening rush from airport and similarly morning rush from city.

This Supply Demand crunch is definitely one of the problems that needs to be solved.

The other problem that needs looking into is the number of Trip Cancellations
From fig 6, we see that about two hundred drivers cause nearly half of all cancellations, where as the other hundred are responsible remainder. Further more the top 12% offenders cause more than 25% of all cancellations.

Takeaways from the Data Exploration

  • There is a real supply-demand gap in the evening rush hours from the airport that should be an opportunity for business
  • There is a problem of trips getting cancelled in the morning to the airport, often multiple times by the same drivers. This behavior requires investigation and remediation.

Caveats and further data required

  • The dataset does not speak about user who initiates a requests. It is possible that a user raised multiple requests, and got serviced on his fourth one, ie There is a timing gap rather than a supply gap
  • The dataset doesn't speak of zones in the city where rides are getting cancelled from, are they remote?
  • Given the large number of requests in the evening rush, would be worth investigating the network coverage around the airport to ensure that this is not a network / app behavior amplifying the real problem.

Proposed solutions

  • For the supply demand gap in the evening, we can bring in the Pool feature, softly pushing people to pool by telling them how much their wait time could reduce. “Next cab available in 10 minutes, next pool available in 3 minutes” (Yes the Uberpool feature has existed since, but not so sure at airports)
  • Consider a shuttle service to points in the city where there are a larger number of cabs available
  • For the cases where rides are getting cancelled this is a problem from the city to the airport, would be helpful to know if distance to the airport plays a role in this.
  1. Is the ride from far enough that the driver may not meet his daily quota of rides?
  2. Is the driver worried about driving back without a passenger (The data shows that the number of requests from the airport in the afternoon are very low)
  3. Are the drivers cancelling close to finishing for the day and hence refusing the trip?

These questions need further data and can be solved by the right incentive structure such as zoning the city making it more attractive for furthest drivers to come in.

  • Subsiding return ride if one isn’t found in an amount of time
  • Given drivers don't want to be at the airport at lunch hour, perhaps having a Drivers’ café/ lounge area might make more drivers willing to make these trips.

Financial Engineer, Cyclist, Data curious

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store