What is something that comprises more than half of companies’ data repositories, but most aren’t even aware they have? It’s dark data, information companies unknowingly gather that is not integral to day-to-day business interactions and therefore often sits in the background. While that data is seemingly unnecessary to most companies, it’s invaluable to cybercriminals.
What Is Dark Data?
At a time when many companies are focused on collecting, analyzing, and acting on data they receive from customers, it’s not surprising that the amount of latent (or dark) data is accumulating far beyond what they planned to store, protect, and potentially purge. For example, when you consider that Netflix spent nearly $10 million a month in 2019 to store its data in the cloud, you can see how much dark data storage might be costing a company.
Gartner equates dark data to dark matter in physics. Dark data extends beyond any published sensitive data elements. It can include personal information from customers or past employees, but might also include nontraditional data such as systems backups, log files, configuration files, sensitive internal procedures, email backups or “spools,” scanned document repositories, and human resources information. These are all dark data sources that attackers may want to sell or use.
And while there are some regulatory bodies that aim to protect information that might be considered dark data, such as the US Health Insurance Portability and Accountability Act (HIPAA) and the General Data Protection Regulation (GDPR) in Europe, many companies continue to store this data long after they are required to do so.
How to Protect Dark Data
Every company needs to prioritize its customer and employee data, so how can you protect something you don’t even know you have? Further, how do you prioritize this among the other cyber vulnerabilities in your organization? Here are five steps you can take:
Increase visibility of data: Start by building a data inventory to map the information you know about. Next, perform threat modeling to identify security needs, locate threats and vulnerabilities, assess severity, and prioritize solutions. This will help you understand what data you have and explore how it may be at risk. This process allows you to understand and quantify threats so that you can better prioritize remediation of identified security risks.
Think like the adversary: Leverage offensive testing (such as using ethical hackers and professional security testers) to try to breach defenses like an attacker would. This will help you find and address vulnerabilities.
Counter the adversary: Once you have a complete view of your data footprint and threat model, apply or reinforce security controls in target areas (for example, endpoint detection and response, logging and monitoring, content interception and inspection for Web traffic, and patching). Consider steps 1–3 as a continuous improvement cycle of data discovery.
Shrink the battlespace: Delete sensitive personal data that is no longer necessary. Minimize data collected and design code-level controls to support data retention periods. This limits the proliferation of sensitive data throughout your environment.
Avoid technology infatuation: Data loss prevention (DLP) tools help avoid accidents, but they should not be considered a catch-all for data security. Most DLP technologies are weak and can lull organizations into a false sense of security. Like all things cyber and privacy, data protection is about getting people, process, and technology working in balance and harmony. Reinforce carefully chosen tools with crisp processes (detailed and well-documented playbooks and blueprints), workflows, and runbooks, and make sure they’re managed and led by thoughtful people with real expertise.
Latent Data May Be Risky Data
There are well-known cases where large organizations found themselves the victims of dark data breaches. Data that was latent and secondary to their business models was suddenly extremely costly in terms of brand trust and legal fees.
Just because you don’t see it or use it doesn’t mean your data is not dangerous. Dark data should be a consideration for every organization. It should be accounted for, protected, and regularly purged (as applicable) to keep cybercriminals at bay. Dark data may be your most elusive asset, but it can also be your most costly if you don’t protect it.