Dark Data: The Hidden Cost of Digitization
We live in an era of data explosion, where every day, billions of bytes of data are generated, collected, and stored by individuals and organizations. Data is often considered as a valuable asset, as it can provide insights, improve decision-making, and create new opportunities. However, not all data is useful or relevant. In fact, a large portion of the data that is stored by organizations is never used again, and serves no purpose. This is what is known as dark data.
Dark data is the term used to describe the digital data that is collected, processed, and stored by organizations, but never utilized for any other function. It is estimated that around 52% of the world’s stored data is dark, meaning it serves no useful function. Dark data can include anything from outdated spreadsheets, duplicate images, unused backups, obsolete customer records, sensor data, email attachments, and more. Often, dark data is generated as a by-product of regular business activities, such as transactions, communications, or compliance. However, it is not analyzed, shared, or monetized, and thus remains hidden and forgotten in the cloud or on-premise servers.
Dark data poses several challenges for organizations, such as security risks, storage costs, and missed opportunities. Moreover, dark data also has a significant impact on the environment, as it consumes a lot of energy and resources, and contributes to the carbon footprint of digitization. In this article, we will explore the concept of dark data, its implications for organizations and the environment, and how it can be managed effectively.
The Problem of Dark Data
Dark data is a growing problem that affects not only the performance and security of organizations, but also the environment and the society. According to analysts, the volume of dark data stored worldwide will more than quadruple by 2025, to 91 zettabytes (that is: 91 trillion gigabytes), with its accompanying impact on the environment. The rapid growth of dark data raises significant questions about the efficiency and sustainability of current digital practices. Some of the main challenges of dark data are:
- Security risks: Dark data can contain sensitive or confidential information, such as personal data, financial data, or intellectual property. If dark data is not properly protected, it can be vulnerable to cyberattacks, data breaches, or unauthorized access. This can result in legal, reputational, or financial damages for the organization and its stakeholders. For example, in 2019, a data breach exposed the personal information of 106 million customers of Capital One, a US bank. The breach was caused by a misconfigured firewall that allowed an unauthorized access to a cloud server that stored dark data.
- Storage costs: Dark data consumes valuable storage space and resources, which can increase the operational and maintenance costs for the organization. According to Veritas Technologies, an average of 6.4 million tonnes of carbon dioxide was emitted in 2020 as a result of storing dark data, equivalent to the annual emissions of 80 countries. This also contributes to the environmental impact of digitization, which will be discussed in the next section.
- Missed opportunities: Dark data can potentially contain valuable insights or information that can benefit the organization or its customers. For example, dark data can reveal customer preferences, market trends, product performance, or business opportunities. By not utilizing dark data, the organization may miss out on competitive advantages, revenue streams, or customer satisfaction. For example, a study by McKinsey found that companies that leverage big data and analytics effectively can increase their operating margins by more than 60%.
The Environmental Impact of Dark Data
Many people assume that digital data is carbon neutral, but this is not the case. Digitization requires a significant amount of energy, both for the production and consumption of data. In 2020, digitization was reported to generate 4% of global greenhouse gas emissions, comparable to the automotive, aviation, and energy sectors.
Dark data is a major contributor to the environmental impact of digitization, as it requires power to store and maintain, even if it is never used again. According to analysts, the volume of dark data stored worldwide will more than quadruple by 2025, to 91 zettabytes (that is: 91 trillion gigabytes), with its accompanying impact on the environment.
The environmental impact of dark data is not only limited to the direct emissions from the data centers, but also to the indirect emissions from the supply chain, the infrastructure, and the end-users. For example, the manufacturing of devices, the transportation of materials, the cooling of servers, and the disposal of e-waste all contribute to the carbon footprint of digitization. According to a report by the Shift Project, the global digital system accounts for 3.7% of the world’s greenhouse gas emissions, and 4.2% of the world’s electricity consumption.
The environmental impact of dark data is not only a concern for the planet, but also for the people. According to the World Health Organization, air pollution is one of the major causes of death and disease worldwide, affecting more than 90% of the world’s population. Air pollution can cause respiratory infections, cardiovascular diseases, lung cancer, and stroke. Moreover, air pollution can also affect the climate, the ecosystems, and the food security. For example, air pollution can reduce the crop yields, increase the risk of wildfires, and accelerate the melting of glaciers.
The Management of Dark Data
To reduce the negative effects of dark data, organizations need to adopt a more responsible and sustainable approach to data management. This can include the following steps:
- Identify and classify dark data: Organizations need to have a clear understanding of what data they have, where it is stored, and how it is used. This can be achieved by using software tools that can scan, index, and categorize data, and identify the dark data that is redundant, obsolete, or trivial. For example, Veritas Data Insight is a tool that can help organizations discover, analyze, and optimize their unstructured data.
- Delete or archive dark data: Organizations need to eliminate the dark data that is no longer needed, or archive the dark data that may have future value. This can free up storage space, reduce costs, and lower emissions. However, organizations need to ensure that they comply with the relevant data privacy and retention regulations, such as GDPR, when deleting or archiving data. For example, Veritas Enterprise Vault is a tool that can help organizations archive, manage, and delete their data in compliance with the legal and regulatory requirements.
- Utilize or monetize dark data: Organizations need to explore the potential value of dark data, and use it for other purposes, such as analytics, business relationships, or direct monetizing. This can help the organization gain insights, improve decision-making, enhance customer experience, or generate new revenue streams. For example, Veritas Information Studio is a tool that can help organizations extract value from their dark data by providing data intelligence and governance capabilities.
Dark data is a growing problem that affects not only the performance and security of organizations, but also the environment and the society. Organizations need to take action to manage their dark data effectively, and transform it from a liability to an asset. By doing so, they can not only reduce their digital carbon footprint, but also unlock the hidden value of their data.

Comments
Post a Comment
Please post your valuable insights on this article for other readers to better interpret this article and become a more informed person.