A Good Day for Good Data – Part I

A Good Day for Good Data – Part I

Operational efficiency gains come from ensuring data quality. This first part discusses the key data quality metrics and how to operationalize reliable data.

The first priority for any new data team should be to gather only trusted, usable and clean data, i.e. ensuring data quality.

“You might also believe that improving data quality is a Sisyphean task of pushing a boulder up a hill just to have it roll back down,” writes Kevin Hu, CEO at data observability platform Metaplane.“The bad news is that you’ll always be pushing that boulder. Sorry. The good news is that, with the right metrics, strategies, and processes, that boulder can get smaller, you can get stronger, and the hill can get shallower.”

Key data quality metrics

The very first stage, thus, is to just ensure that the data is ‘good’; and is polished until it’s in the best possible shape. Naturally, all further processing and KPIs’/metrics to be measured are thus linked to data quality – the umbrella term encompassing all factors influencing whether data can be relied upon for its intended use. 

Image: Key metrics to measure good data; Source: Louise de Leyritz, Medium

Some of the key measures of data reliability are thus accuracy and completeness. For example, you stand to face an accuracy problem if your sales data tracks the sale of 1000 items whilst your warehouse records 1200. This can be related to issues in data completeness as well, measured by the degree of validation against a mapping, number of null values, number of constraints satisfied, and missing data. The objective here is rather simple: we need complete data that accurately depicts the reality.

That data should be consistent is a no-brainer, really. Louise de Leyritz from ESSEC Business school discusses this: “Data is inconsistent when aggregations of different values don’t correspond to the supposedly aggregated numbers. For example, you have a consistency issue if the monthly profit is not consistent with the monthly revenue and cost numbers.”

Reliability is one of the most crucial aspects of determining the quality of data. Data can be regarded as reliable only once there is sufficient lineage and adequate assurance about its sourcing. Kevin Hu regards the number of requests to verify data in end-user systems, the amount of certified data products, comprehensiveness of lineage available to end-users, and the number of users using the systems as among the primary metrics for data reliability.

Data usability refers to whether data can be accessed easily, and can be understood and interpreted easily as well. “You have a usability problem,” says de Leyritz, “when a Looker dashboard is hard to interpret. In general, enriching your data with metadata (i.e., documenting your data) makes it usable and easy to interpret on the go.”

Operationalising reliable data

Once a clean and reliable data set has been prepared, the next step is to engage in operational analytics, i.e. making it accessible to ‘operational’ teams such as sales, marketing, and so on. This is different from the classical approach of using warehouse data for reporting and business intelligence. As de Leyritz adds: “Instead of using data to influence long-term strategy, operational analytics informs strategy for the day-to-day operations of the business. To put it simply, it’s putting the company’s data to work so everyone in your organization can make smarter, faster decisions.”

This would mean pushing data into operational tools to allow sales or marketing tools to use data efficiently for their campaigns. Reverse ETL tools, for example, are great in allowing users to automatically push data from the warehouse to operational tools.

“There are two benefits of data operationalization that should allow you to boost your ROI. First, it allows other teams to make more efficient, data-driven decisions. Second, this frees up the analytics team for deeper, more meaningful data analysis. When analytics can move away from basic reporting, providing data facts to other teams, and answering tickets, they can focus on what we really need analysts’ skillset for.”

Know more about the syllabus and placement record of our Top Ranked Data Science Course in KolkataData Science course in BangaloreData Science course in Hyderabad, and Data Science course in Chennai.

© 2023 Praxis. All rights reserved. | Privacy Policy
   Contact Us
Praxis Tech School
PG Program in Data Science