Machine learning models make predictions based on past data, but there is no recent past like today’s present!
A US energy company was using analytic models that didn’t account for crude-oil prices reaching $30 per barrel. Many banks’ models didn’t anticipate and factor in for negative interest rates. Retailers’ logistics and demand-forecasting models didn’t consider having to move their entire operations to digital. Logistics companies’ forecasting applications didn’t anticipate for the collapse of passenger air travel, which was closely intertwined with air-freight availability and costs.
When the past is not a guide to the future
The economic impact of COVID-19 has been turbulent. It has upended supply chains, transportation, food supplies, retailing, manufacturing, media & entertainment, travel & tourism, virtually every industry vertical, even healthcare as patients postponed their non-COVID treatments. GDPs went tumbling in every country, and some even went into recession. Everything was happening for the first time and at double quick speed. The past was no longer a guide to the future, and it created huge challenge for doing any sort of predictive analytics.
The data used to make good managerial decisions was turned upside down in this unpredictable, uncertain, volatile environment. In the last decade business was moving towards data-driven decision as an explosion of mobile devices, social media networking, connectivity, cloud storage and exponential growth in computing powers, made Big Data analytics possible. Point-of-sale data, the internet of things, text data from social networks, voice, and video — were all automatically collected and reported. Coupled with advances in machine learning and artificial intelligence, these resources allowed leaders and organizations to use analytics and data science for better-informed and improved decisions.
Yet, what we are witnessing now is a large-scale disruption to this data-driven approach as the global pandemic, resulted in a seismic shift in data. Machine learning models make predictions based on past data, but there is no recent past like today’s present.
The big shift away from predictive to descriptive analytics
The pandemic forced companies into an almost instantaneous shift away from more advanced analytics focused on prediction and optimization to descriptive analytics such as reports and data visualization. Executives have side-lined predictive analytics programs and pivoted back toward simple descriptive analytics — good data about the present and recent past that was quickly available.
With outside factors causing significant disruption and internal data about past activities no longer a good predictor of the future, companies are turning outside to figure out what’s going on, particularly about consumer behaviour and demand. A survey by Cognizant found that companies have turned to developing new analytics/models, evaluating, and refreshing their existing analytics/models, refreshing databases and integrating new data streams such as geo-location, social media and cell phone data.
Looking more at external indicators
Companies started looking for other economic indicators, like movement through ports and consumer confidence levels. Car companies have been looking at smog levels in certain cities as an indicator of how much driving is taking place, with more smog meaning more driving, and a hint that activity is returning to normal and people might be buying cars again.
Because of the volatility of the situation, all cycle times for reporting were dramatically compressed. The demand for real-time dashboards increased. Organizations weren’t worried about detailed forecasting, but were just trying to get the shapes of the distributions right. A monthly or quarterly report might now be requested weekly or even daily.
At some companies, data teams were asked to focus on specific pain points. At automaker Ford, executives have been less interested in commonly gathered report and dashboard analytics during the pandemic. Instead, they are more likely to ask for custom analyses involving particular situations (for example, the extent of rail delays in the Mexican port of Veracruz) and new data sources.
Running agile analytics
Even in normal times, demand forecasting is one of the most difficult challenges for data scientists. Changing consumer demand, volatile market conditions, and competitive moves all make predicting demand a trial. As the pandemic hit, structural shifts in demand wreaked havoc on machine learning models that were slow to adapt to the unusual data.
As companies shifted focus to descriptive analytics to understand changes in trends, they put their machine learning models for forecasting demand on hold. They started relying on simple forecasting approaches such as asking, “What did we ship yesterday?” or using time-series smoothing models such as computing moving averages, while closely monitoring the demand data to see if new patterns were emerging.
In the case of automated machine learning, many companies let their models continue to run, using the pandemic as a unique learning opportunity. By closely monitoring how the models were adapting to the unusual data, data scientists could better understand the robustness (or lack thereof) of the models.
Some companies attempted to use new external data sources to try to predict demand. In order to understand and predict consumer demand, analysts at Ford began employing aggregate connected-vehicle trip data that indicates either increases or decreases in driving activity nationally, as well as air pollution levels and car-related internet searches. Some of this data may not be leading indicators of car sales, but they seem to at least move in parallel with them, and they suggest that the marketplace is opening up.
Other companies, lacking valid data for their models, simply made policies more conservative. This has been particularly true in credit risk models. Several banks, for example, raised the credit score requirements for home mortgages by substantial amounts: JPMorgan Chase, for example, raised the required credit score for new and refinanced mortgages to 700, and the minimum down payment to 20%.
Models with shorter time-cycles & learning from the future
Going forward, analysts are faced with challenges like should unusual data during the pandemic be deleted; Should it be replaced with imputed values based on data prior to COVID-19? Is pre-COVID-19 data even relevant? The answer to these questions will surely differ by sector. Companies are turning to using moving averages — where you compute the average of a subset of data to balance out random fluctuations — and other smoothing forecasting techniques as a way to navigate how much to rely on pre- and post-pandemic data.
Trying to model low-probability, highly disruptive events will require an increase in the amount of external data used to better account for how the world is changing. The right external data could provide an earlier warning signal than what can be provided by internal data. A new metric of effectiveness could be to consider how fast external data can be integrated into existing systems for use in analytical models.
There is a need to keep close tabs on machine learning and prescriptive models. Companies are planning to audit data input, model assumptions, and model output more frequently. How will models respond to zero demand, tenfold demand, or anomalies like the negative price of oil? Techniques developed for quality control in industrial engineering, like control limits and acceptance sampling, need to be applied to machine learning to make sure the models are “in control.”
Nearly every organizationis in favour of shortened cycle times for model development and deployment. The new normal for data science will be “all about agility and speed.” The ability to generate customized and adaptive models quickly will be a key determinant of success: It’s a different world from the relatively stable data and analytics world of the past.
Rather than focusing on the past for insights, organizations are increasingly looking forward. Rapid shifts in operating environments and human behaviour mean that the historical correlations some analytical models rely on, have been challenged. To find new patterns in data and better anticipate future decisions, new data sets—including real-time data from across the value chain—are being processed by new analytic approaches based on artificial intelligence (AI). This enhanced approach to decision-making is called “learning from the future.”