Time series analysis: A simple example with KNIME and Spark


Demand prediction

I think we all agree that knowing what lies ahead in the future makes life much easier. This is true for life events as well as for prices of washing machines and refrigerators or the demand of electrical energy in the whole city. Knowing how many bottles of olive oil customers will want tomorrow or next week allows for better restocking plans in the retail store. Knowing the likely increase in the price of gas or diesel allows a trucking company to better plan its finances. Examples where such knowledge can be of help are countless.

Demand prediction is a big branch of data science. Its goal is to make estimations about future demand using historical data and possibly other external information. Demand prediction can refer to any kind of numbers: visitors to a restaurant, generated kW/h, school new registrations, beer bottles required on the store shelves, appliance prices, and so on.

Predicting taxi demand in NYC

As an example of demand prediction, we will tackle the problem of predicting taxi demand in New York City. In megacities such as New York, more than 13,500 Yellow taxis roam the streets every day (per the ). This makes understanding and anticipating taxi demand a crucial task for taxi companies or even city planners, to increase the efficiency of the taxi fleet and minimize waiting times between trips.

For this case study, we used the , which can be downloaded at the NYC Taxi and Limousine Commission (TLC) website. This data set spans 10 years of taxi trips in New York City with a wide range of information about each trip, such as pickup and drop-off date/times, locations, fares, tips, distances, and passenger counts. Since we are using this case study just for demonstration, we used only the Yellow taxi subset for the year 2017. For a more general application, it would be useful to include data from a few additional years in the data set, at least to be able to estimate the yearly seasonality.