Can ‘Data Gravity’ pose a hindrance to Cloud Computing?
According to the International Data Corporation (IDC), worldwide data creation is set to hit an enormous 163 zettabytes annually by 2025 — almost ten times the data produced in 2017. With the growing dependence of AI and Machine Learning applications on these massive datasets, the challenges posed by ‘Data Gravity’ could cause a major roadblock to Cloud Computing in the future.
Where there is Data, there is Gravity
Of the various changes brought about by the COVID-19 pandemic, the majority adoption of cloud computing is one of central importance. Formally, cloud computing refers to the on-demand availability of computing resources — especially power and data storage — without any direct active management by its users. Its easy accessibility, reliability and cost-effectiveness make it a rather sustainable option for the future, as long as challenges like security and connectivity discrepancies are met head-on.
In fact, tech conglomerate VentureBeat even writes: “In (an) oft-used analogy, if comput(ing) infrastructure is the machinery of today’s world, then data is the oil — meaning infrastructure is not productive without it. It does not make sense for applications and services to run where they do not have quick and easy access to data, which is why data exerts such a gravitational pull on them (i.e. Data Gravity). When services and applications are closer to the data they need, users experience lower latency and applications experience higher throughput, leading to more useful and reliable applications.”
Beating Gravity
‘Data Gravity’, a term first coined in a blog post dating back to 2010 by the then Vice President of software giant GE Digital, Dave McCrory, finds its major challenge pertaining to vendor lock-ins. With much data being amassed at specific locations, a greater number of apps and services start relying on the data at that location, thereby making it increasingly difficult (and expensive) to move the data from one vendor to another location, when needed. This thus inhibits multi-cloud or hybrid-cloud strategies for organisations, i.e. strategies that provide businesses with increased flexibility by transferring workloads between various cloud solutions (both public and private) as costs and needs fluctuate.
Consider an example of a smart cities application such as license plate recognition for border control. Once a license plate has been scanned, apps must produce immediate real-time responses to fit within time constraints, i.e. the latency budget. If the latency for analysis far exceeds the latency budget, the data becomes meaningless (and someone driving a stolen vehicle can simply disappear into another country, for example). Hence, the locations where the data is produced and where it is analysed must remain in close proximity to ensure reduced latency. A great challenge, in this regard, lies also in managing data sovereignty laws which regulate where and how data is transferred across jurisdictional boundaries.
It is thus going to be a major challenge that the data architects and data engineers of the future will have to combat. According to VentureBeat, “with a hybrid cloud infrastructure, organizations can spread out apps and services to where their data is, to be closer to where they need it, addressing any latency problems and data sovereignty requirements.” The key to dealing with these challenges will be to use a common operating platform — such as a mix of Azure, Google Cloud and AWS — to ensure portability of apps and services. With a common operating system, architects can write applications once, and engineer them according to location and necessity to reduce latency and data sovereignty issues.