A database of breached emails totaling 773 million unique addresses has turned up on a popular underground hacking forum, giving cybercriminals one of the largest jackpots ever seen when it comes to account-compromise efforts.
Troy Hunt was first alerted to the cache, which totals 87GB of data, after it was seen being hosted on the MEGA cloud service (it has since been removed). The data was organized into 12,000 separate files under a root folder called “Collection #1,” which gives the trove its name. Soon after that, the whole shebang turned up on the cyber-underground.
In examining the data, Hunt found that there are 1.16 billion unique combinations of email addresses and passwords listed. And after deduping and cleaning up the database, Hunt was left with about 773 million unique email addresses – the single largest collection of breached emails ever to be loaded into his compromised-credentials look-up service, Have I Been Pwned (HIBP).
There are also 21 million unique passwords in the database: “As with the email addresses, this was after implementing a bunch of rules to do as much clean-up as I could including stripping out passwords that were still in hashed form, ignoring strings that contained control characters and those that were obviously fragments of SQL statements,” Hunt said in a posting on Thursday.
Interested in learning more about data breach trends? Join the free Threatpost webinar on Wednesday, Jan. 23 at 2 p.m. ET, as editor Tom Spring examines the data breach epidemic with the help of noted breach hunter and cybersecurity expert Chris Vickery.
Thousands of Breaches
As for the sources of the data, there are literally thousands of compromises at the root of the database.
“The post on the forum referenced ‘a collection of 2000+ dehashed databases and combos stored by topic’ and provided a directory listing of 2,890 of the files,” Hunt said.
This list includes a few previous breaches that Hunt recognized, but also others that are new. In all, out of about 2.2 million people that use HIPB’s notification service, 768,000 of them are implicated in this data dump. And there are also around 140 million email addresses that HIBP has never seen before.
“This gives you a sense of the origins of the data but again, I need to stress ‘allegedly,’” Hunt said. “I’ve written before about what’s involved in verifying data breaches and it’s often a non-trivial exercise…it’s entirely possible that some of them refer to services that haven’t actually been involved in a data breach at all.”
In a comment to Threatpost, he elaborated: “The data contains breaches such as 000webhost and the Plex forum, or at least those names are on the list of alleged sources.”
With so many different sources feeding into it, Collection #1 did not just appear one day, fully formed.
“Some of the breaches indicated go back many years (pre 2010), whilst I’ve had other people claim it had data from November last year,” Hunt told Threatpost. “I’ve not verified that last claim, but certainly it’s incidents spanning many years.”
This could be very good news if those impacted regularly change their passwords.
“This massive collection of data harvested through data breaches has been built up over a long period of time, so some of the account details are likely to be outdated now,” Sergey Lozhkin, security expert at Kaspersky Lab, told Threatpost. “However, it is no secret that despite growing awareness of the danger, people stick to the same passwords and even re-use them on multiple websites.”
What’s clear is that the data was in broad circulation for some time before Hunt was aware of it, based on the number of people that contacted him privately and the fact that it was published to a well-known public forum.
“In terms of the risk this presents, more people with the data obviously increases the likelihood that it’ll be used for malicious purposes,” he said. After all, the longer criminals are able to use the data without consumers knowing about it, the more likely they are to succeed.
Those malicious purposes are likely to consist of credential-stuffing, which is a technique used by hackers to gain fraudulent access to an account. It uses automated scripts to try multiple username/password combos against a targeted website. Successful account compromises from credential-stuffing are typically tied to the fact many users reuse the same credentials on multiple accounts.
“This collection can be easily be turned into a single list of emails and passwords: and then all that attackers need to do is to write a relatively simple software program to check if the passwords are working,” said Lozhkin.
From there, attackers can wreak all kind of havoc. “The consequences of account access can range from very productive phishing, as criminals can automatically send malicious e-mails to a victim’s list of contacts, to targeted attacks designed to steal victims’ entire digital identity or money or to compromise their social media network data,” said Lozhkin.
It’s not unusual for these kinds of data dumps to go unnoticed for a length of time; overall, it takes an average of 15 months for a credential breach to be reported, according to Shape Security..
“Half of all credential spills were discovered and reported within the first four months of the compromise. However, because some spills take years to discover. It took an average of 15 months between the day that an attacker accessed the credentials to the day the spill was reported in 2017,” according to a report from Shape Security on credential breaches.
The Threat to Businesses
Credential-stuffing attacks are often aimed at one service or company, as was the case with Dunkin Donuts back in November.
“Massive data breaches like Collection #1 create huge spikes in bot traffic on the login screens of websites, as hackers cycle through enormous lists of stolen passwords,” said Distil co-founder Rami Essaid, via email. “While this is often framed as a problem for the individuals who own the passwords, any online business that has a user login web page is at risk of becoming the next breach headline.”
Aside from the consumer impact, these kinds of attacks also affect businesses, he added. In May 2018, the Distil Research Lab conducted a study of 600 website domains that include login pages, which found that after the credentials from a data breach have been made publicly available, websites experience a 300 percent increase in volumetric attacks. In the days following a public breach, websites experience three times more credential stuffing attacks than the average of two to three attacks per month.
“Password dumps create a ripple effect of organizations spending precious time and resources on damage control,” Essaid explained. “The massive spike in failed logins, then the access into someone else’s account before the hacker changes the password, then the account lock-out for the real user, then the customer service calls to regain access to their account. All because a username and password was stolen from a different website.”
The potential losses tied to credential spills is $50 million a day globally, Shape Security said.
This post was updated on Jan. 18 at 11:15 a.m. EST to include additional comments from Troy Hunt.
Interested in learning more about data breach/exposure trends? Join the free Threatpost webinar on Wednesday, Jan. 23 at 2 p.m. ET, as editor Tom Spring examines the data breach epidemic with the help of noted breach hunter and cybersecurity expert Chris Vickery. Vickery shares how companies can identify their own insecure data, remediate against a data breach and offers tips on protecting data against future attacks.