More than 1 billion records for CVS Health customers were left in the database of a third-party, unnamed vendor – exposed, unprotected, online. Researchers said the data points revealed could be strung together to create an extremely personal snapshot of someones’s medical situation.
The glitch is likely due to human error, security researcher Jeremiah Fowler said in a post on WebsitePlanet on Thursday: In other words, it’s probably yet another incidence of rampant misconfiguration that’s plaguing cloud-based storage, leading to exposure of sensitive data on an internal network.
According to Fowler’s post, researchers at WebsitePlanet – a portal for web developers and internet marketers – found the non-password-protected database, which had no form of authentication in place to prevent unauthorized entry, on March 21. They coordinated with Fowler in documenting their discovery and on that same day, after they contacted CVS Health, the naked database was closed off from public view.
CVS Health is the parent company behind multiple household brands, including the CVS Pharmacy retail pharmacy chain; CVS Caremark, a pharmacy benefits manager; and Aetna, a health insurance provider.
A CVS spokesperson confirmed the researchers’ findings, saying that CVS Health had been notified of the exposure of a publicly accessible database that contained non-identifiable CVS Health metadata. Upon investigation, they determined that the database was hosted by a third-party vendor, whose name the company didn’t disclose. The database didn’t contain any personally identifiable information (PII) of customers, members or patients, the company said in a statement, and the database was quickly taken down.
As the researcher’s report indicates, there was no risk to customers, members or patients, and we worked with the vendor to quickly take the database down. We’ve addressed the issue with the vendor to prevent a recurrence and we thank the researcher who notified us about this matter. —CVS Health statement.
What Was in That CVS Cache of Data?
Fowler said in his post that there was in fact enough information to derive customers’ PII, including their email addresses. The total size of the database was 204 GB, according to the researchers. It held 1.1 billion records, or, to be precise, 1,148,327,940 files. They were labeled “production” and included information typed into search bars, such as the data types add to cart, configuration, dashboard, index-pattern, more refinements, order, remove from cart, search, server.
The records also exposed fields called Visitor ID, Session ID and device information, such as whether customers were using an iPhone, an Android, an iPad or a desktop PC. The team noted that by stringing together the data, they could reveal emails that could be targeted in a phishing attack, in social engineering, or “potentially used to cross-reference other actions.”
As well, the files gave a “clear understanding of configuration settings, where data is stored and a blueprint of how the logging service operates from the backend,” according to the advisory.
In looking for PII, the researchers performed several search queries for common email extensions, such as Gmail, Hotmail and Yahoo, they said. They were rewarded with results for each query within the dataset, indicating that the records did in fact contain email addresses. Fowler said that, given how many personal email addresses are formatted using portions or all of the user’s name, he was able to identify “a small sampling of individuals by simply searching Google for the publicly exposed email address.”
The records also contained the data types Visitor ID
and Session ID,
indicating the items that visitors searched for, including medications, COVID-19 vaccines and other CVS products. All of this data, strung together, could have created a snapshot of private details about individuals’ health, Fowler said.
“Hypothetically, it could have been possible to match the Session ID with what they searched for or added to the shopping cart during that session and then try to identify the customer using the exposed emails,” he said in the advisory.
The CVS representative who dealt with the researchers said that whatever emails were found in the database didn’t come from CVS customer accounts. Rather, they were entered into the search bar by visitors themselves – presumably, by mistake.
“The search bar captures and logs everything that is entered into the website’s search function and these records were stored as log files,” the advisory explained. “When reviewing the mobile version of the CVS site it is a possible theory that visitors may have believed they were logging into their account, but were really entering their email address into the search bar.”
The searches were formatted as event type
parameters and were set to search.
The email addresses are values for a parameter named query, Fowler continued, which “could explain how so many email addresses ended up in a database of product searches that was not intended to identify the visitor.”
In addition, the records show what device customers used, with a majority of the searches coming from phones and mobile devices such as iPhones or Androids, as well as some searches coming from desktop computers.
Database Exposure: A Bonanza for Cyberattackers
Unfortunately, the activity logging that exposed all this information is a “necessary evil,” the team observed – one that can lead to the exposure of sensitive records.
“Tracking all activity from a website or ecommerce platform helps build valuable insights about visitors and customers,” the advisory explained. “This logging and tracking can often contain metadata or error logs that inadvertently expose more sensitive records.”
The exposed search logs were from searches done on both CVS Health and CVS.com, and provided “valuable analytical data to see what customers are looking for and if they are finding the products they want,” the team said. That includes the possibility that data about configuration, applications, software, operating systems and build information were exposed: data that could identify potential vulnerabilities if they were unpatched or outdated.
That’s information that cybercriminals or adversarial nation-states can exploit, the team said. “Often they use the same methods as legitimate security researchers to identify publicly exposed data,” according to the advisory. “Each record of information serves as a puzzle piece to provide a larger picture of an organization’s network or data-storage methods.”
Cloud Misconfigurations Galore
Thanks to the explosive growth of cloud-based data storage, misconfigurations like these are becoming all too common, said PJ Norris, senior systems engineer at cybersecurity company Tripwire. “Exposing sensitive data doesn’t require a sophisticated vulnerability, and the rapid growth of cloud-based data storage has exposed weaknesses in processes that leave data available to anyone,” he said via email to Threatpost on Thursday.
In September, a survey of 2,064 Google Cloud buckets from Comparitech found that 6 percent of all Google Cloud buckets are misconfigured, left open to the public internet for anybody and everybody to see – including passports, birth certificates and personal profiles from children in India, and a Russian web developer’s email server credentials and chat logs.
In March, arts-and-crafts retailer Hobby Lobby left 138GB of sensitive information open to the public internet – including a slew of customer information – due to a cloud-bucket misconfiguration.
“A misconfigured database on an internal network might not be noticed, and if noticed, might not go public, but the stakes are higher when your data storage is directly connected to the Internet,” Norris noted. “Organizations should identify processes for securely configuring all systems, including cloud-based storage, like Elasticsearch and Amazon S3. Once a process is in place, the systems must be monitored for changes to their configurations. These are solvable problems, and tools exist today to help.”
Common Misconfigurations
Ray Canzanese, threat research director at Netskope, said that the CVS Health exposure is common in infrastructure-as-a-service (IaaS) providers such as Amazon Web Services (AWS), Azure and Google Cloud. Common misconfigurations crop up in security groups, network ACLs (NACLs) and firewall rules, he said via email to Threatpost on Thursday. In fact, Netskope recently analyzed public exposure of compute infrastructure in IaaS environments across the three major IaaS providers, and found that over 35 percent of compute instances expose at least one service to the Internet.
Netskope recommended that, to avoid such exposures, organizations should scan their own cloud environments automatically to discover and lock down exposed resources. Canzanese also recommended zero-trust network architecture as a means to give employees secure access to cloud resources, whether they are hosted on-prem or in the cloud, without exposing them to the internet.
Join Threatpost for “Tips and Tactics for Better Threat Hunting” — a LIVE event on Wed., June 30 at 2:00 PM ET in partnership with Palo Alto Networks. Learn from Palo Alto’s Unit 42 experts the best way to hunt down threats and how to use automation to help. Register HERE for free.