Restaurants and food stores share a common dilemma: on one hand they want to maximize the amount of products they sell and prevent any shortage from happening. On the other hand they do not want to risk massive wastes of their products. Indeed, the shelf life of fruits, vegetables, or prepared fresh products is shorter than in any other industry, and for this reason, it is a major issue to anticipate the demand. At Deepsight we aim at applying recent progress in the time series field on accumulated sales data to help our clients and predict their future sales with more accuracy. In this article, we will have a look at the data to find out what characteristics would be useful for forecasting.
The typical data we get from our client is the daily sales detailed by product. We usually have between 2 and 5 years of data. Our objective is to make forecasts for 2 weeks: first on the total turnover, then on the turnover by product. As it is simpler and more regular, the total turnover is often predictable with higher accuracy. The turnover can be handled as a time series: Figure 1 gives an idea of the usual shape the sales can take (notice that to ensure anonymity of the data, we have scaled it to an average of 1000€/day). At first sight, it can look completely random, but we will see that there is structure beneath it. Let’s dive in…
An important feature of time series is the presence or not of a periodicity. It can be observed by plotting the autocorrelation of the time series with lags of different values: On Figure 2 we can spot spikes of higher correlation at lags 7 days, 14 days and so on, which shows a weekly periodicity for our sales. It is actually the case for all the sales we have been working with, and the lags 7 can be used as a basic estimator.
It is less obvious to check if there is a period of lower frequency since any other correlation is shadowed by the 7 days periodicity. It is still possible to remove it by applying a rolling average of 7 days. It smoothes the series and transforms it as shown as in Figure 3. Then we can compute the autocorrelation of this new curves with larger lags. The results in Figure 4 shows a small spike at 365 which indicates a bit of yearly periodicity. It is another very useful characteristics to refine predictions.
Several external events have an impact on sales: for restaurants, a rainy day usually means less customers, while football games may bring crowds: weather and sports events are two features of interests. Another case of event is holidays. For a store located in Paris, there might be less clients around during a break, and thus a lower turnover at that time. On Figure 5, the turnover is displayed in front of the french school holidays. Not all are relevant, but as we expected, there is a huge drop around Christmas, and sales are also lower during summer holidays. For this store, there is a 15% decrease in sales during holidays, and in general, we could notice a significant impact of holidays over the sales of our clients.
A core particularity of food sales is the breakdown by product. Stores can modify their products range along time, and this makes it harder to predict the share of every product. Moreover, a store may want to have some products available whatever the prediction would be: restaurants like to renew their menu to keep their clients’ interest, and food stores adapt their shelves to the season. Thus, a simple machine learning method to predict the breakdown by product would not answer the needs of our clients: and we chose to apply a system of constraints that can be setup by our clients. Another singular feature in this field is the possibility for stores to set discounts on specific products, which often increases the sales: we must take it into consideration, and correct the number of sales for these products. On figure 6 is displayed the evolution of the sales quantities of a product that went through changes of prices and promotions: on average this product was sold 38% more when there was a discount.
At Deepsight, we have built a forecast model that takes into account all these particular aspects of food sales, and we are continuously working to improve it. This study allowed us to identify the main characteristics related to the turnover. The choice of the time series model still remains, but this will be the focus of another article.