Predictions are only as good as the data used to calculate them. But when it comes to quality, not all data is good data. As they say, “garbage in, garbage out.” Of course, the best models should digest and analyze a wide variety of data sets. But even though quantity matters, it’s the “cleanliness” of the data that’s ultimately paramount.
Some data sources are of such low quality that they should be excluded from being incorporated from forecasting models altogether; four trillion incomplete and mismatched data points is exactly that.
Just as high-quality ingredients are essential to a world-class meal, clean data sets are vital to the predictive models that produce insightful, actionable demand forecasts. One spoiled ingredient can affect the overall quality of a dish across the board. Similarly, if the data used to predict product demand is “dirty”—incomplete, inconsistent, and incorrect—the quality and accuracy of a forecast will be off.
Far too often CPG companies will employ less sophisticated forecasting techniques, such as manually crunching aggregated monthly data. It is therefore essential to have access to technology that incorporates what can be thought of as an inspection and cleaning process. Prior to being pushed to predictive models, cleaning algorithms are applied to a number of data sources, detailed below, to shore up any gaps in data that may exist. Then, the data can be properly utilized.
Remember, when supply chain forecasts rely on ample data that’s clean—well organized, timely, precise, consistent, and complete—then accuracy is more likely to follow. The sheer amount of data that’s necessary to feed the best machine-learning models is enormous, requiring enterprise-grade servers and the work of a team of data engineers.
Traditionally, demand forecasts have primarily been calculated using historical shipment data—in other words, using the past in order to identify trends and predict the future. But the most capable predictive models take into account historicals sales plus a variety of data sets, including:
The more data there is, the higher the likelihood of error existing within the data pool. Of course, data cleanliness levels vary from company to company. Some brands have executed a plan to maintain consistent, complete data throughout the supply chain, from their financial statements and products to their inventory at warehouses and fulfillment vendors. But most CPG brands have significant errors within their business data, often because they do not have the proper technological infrastructure in place.
Here are some examples of where gaps in data sets commonly occur, which can ultimately harm demand forecast accuracy.
Historical sales order data is among the most important data sources for a supply chain forecast. The more granular and accurate the data, the more accurate forecasts tend to be. The best sales order data contains details about every order going back many years, which enables the dynamic relationships between seasonal trends and long term growth trends to be ascertained. Because there is an immense wealth of information contained in day-to-day sales patterns alone, it's vital for brands to possess substantial data science computing capabilities designed to process raw daily sales data.
It's critical to gain a deep understanding of the relationship between orders (over time) and inventory (over time). This helps identify inventory problems relative to sales, particularly trends over time. For instance, the analysis of quantities ordered versus quantities shipped is very important in understanding sales that could have occurred versus sales that actually occurred—due to insufficient inventory. This data is vital to producing and providing accurate inventory predictions, which helps brands plan optimum inventory levels to match (seasonal) sales demand.
SKU codes and/or UPC numbers often change during a product’s lifetime. This informational trail provides vital data about a product’s history, especially during the process of establishing and analyzing time relationships between SKUs and UPCs.
Products often become discontinued or obsolete. Data detailing these product sunsets is vital to understanding the demand of comparable products.
It’s essential for brands to have (and utilize) a list of future product launches, as well as to establish a clear vision of expected increases in pipeline shipments and distribution gains. Machine-learning models can’t be applied to a potential new product unless they’re programmed to do so. Put simply, it's quite difficult to produce accurate predictions for new products unless it is known what there is to predict.
For some products, sales are heavily driven by price discounts. It's vital to establish the dynamic numerical relationship between promos and changes (lift/drop) in sales.
Misspellings, human entry mistakes, and other errors can slow down the pace and quality of data analysis. Cleaning algorithms enable data mapping, which helps to identify various common trends in human error.
Top-performing machine learning models work to clean and associate, or map, countless invaluable data points. Some data sources should undergo what can loosely be described as an associative data mapping process, in order to become more useful to predictive models. Here are a some examples:
In order to fully incorporate store-level consumption data, predictive models should utilize a mapping key to match the case SKU with an individual unit UPC. From a machine learning perspective, this allows us to understand how retail customer demand drives wholesale case orders.
Many brands understandably want to optimize their forecasts by fulfillment warehouse. To do this, it’s essential to map key data between wholesale customers and the specific warehouses that fulfill those orders. Doing so enables a high level of accuracy for these types of forecasts.
In short, brands that possess clean internal and external data across the board, plus the necessary computing power, will reap the benefits of forecast accuracy. The trend of CPG brands relying on clean ample data, and the continuous development of machine learning models that analyze the data to produce accurate forecasts across supply chain, is a trend that will continue to enjoy plenty of wind behind its back.
Unioncrate is an AI-powered Supply Chain Planning Platform that gives CPG brands the technology they need to compete and win in a rapidly changing consumer landscape. Our automated demand and supply forecasts deliver unmatched accuracy, collaborative visibility, and actionable intelligence, simplifying a manual-heavy process and slashing hours from your week.