What can you learn in a data-for-good coding marathon?


It was promising to be a beautiful, warm weekend in the city. I had signed up for the DataKind data dive. That meant spending the entire weekend nerding out on data to help non-profit organizations. Friends had invited me to go on hikes, to the beach, to drink beer… you know, fun stuff. Yet, I chose to be in a downtown office room full of other mostly strange people. Was it worth it? Read on to find out.

Continue reading


A Comprehensive Analysis of a Very Large Uber Dataset.


Taxis Versus Uber: The NYC Armageddon.

Part 1: Insights from Data Exploration and Visualization

Early in 2017, the NYC Taxi and Limousine Commission (TLC) released a dataset about Uber’s ridership between September 2014 and August 2015. This dataset contains features such as destination, trip distance, and duration that were not available in other sets released before and thoroughly analyzed by others.

The combination of trip distance and duration allows for estimating Uber’s revenue for each trip in NYC. In another hand, the pickup and drop-off locations were anonymized and grouped as taxi zones instead of geographic coordinates. This is a better attempt to preserve data privacy, but it precludes the positioning of such locations on a map.

Continue reading