What are "pastes" and what do they have to do with data breaches?
Often when online services are compromised, the first signs of it appear on "paste"sites like Pastebin. Attackers frequently publish eithersamples or complete dumps of compromised data on these services. Monitoring and reporting onthe presence of email addresses on the likes of Pastebin can give impacted users a head starton mitigating the potential fallout from a breach.
When you search for an email address on this site, both known data breaches and pastes aresearched simultaneously. After the results are returned, they both appear side by side withan indication of where the address was found in a breach versus in a paste.
Identifying pastes and the role of paste sources
Pastebin (among other paste services) stores tens of millions of pastes and adds thousandsmore new ones every day. Rather than attempt to analyse every paste in the system, Have IBeen Pwned monitors the appearance of new pastes as announced by the Twitter accounts in thePaste Sources list.
Paste formats
One of the attractions of paste services is that there are no constraints on the structure ofthe content that can be published there. Consequently, pastes containing email addresses maybe very self-explanatory or appear completely obscure. However, there are some commonpatterns which appear.
Database dumps: These will often take the form of scripts that can be run torecreate the database structure. They typically contain comma-delimited fields representingdifferent columns in the database, often with passwords which may be secured with acryptographic hash. Example:
(`id`, `team_id`, `email`, `name`, `password`, `league`, `active`, `regdate`, `lan`, `lastlogin`, `birthdate`, `favclub`, `favmanager`, `description`, `pers_email`, `mess_id`, `iso`)(14, 568, 'vcpd_@hotmail.com', 'Flavio00', '059b4db7cdb1cbddc3f0e5d95c881597', 1, 1, 1224313200, 0, 0, 0, '', '', '', '', '', ''),(4, 1, 'levi@medeeaweb.com', 'Slash', 'c57aeddaffce62fead6be61022eb1340', 1, 1, 1224313200, 0, 1235380637, 483260400, 'FC Juventus Torino', 'Carlo Ancelotti', 'I''m the admin of this site :D', 'slash@manager-arena.com', 'slashwebdesign', ''),
Email and password pairs: Compromised systems are often dumped into listsof credentials consisting of username (often the email address) and password, occasionallywith other data accompanying it. Example:
majikcityqban82@gmail.com:tinpe***pekanays@yahoo.com:warri***g_vanmeter@yahoo.com:torb***rrothn@yahoo.com:rebsopj***
Logs and code blocks: These can take on a range of different forms and maybe anything from compromised system logs to internal system code. Example:
array("/upload/iblock/ed0/--.jpg","vitaly.cherkasov@autohansa.ru"),array("/upload/iblock/562/--.jpg","andrey.mastakov@autohansa.ru"),array("/upload/iblock/ed2/---.jpg","sergey.smirnov@autohansa.ru"),
Random collections of email addresses: There is often no context given as towhere an email address is sourced from, it simply appears along with others. Example:
dilipsinghrana4@gmail.com,mansinghrana22@gmail.com,dilipsinghrana2@gmail.com,khalisinghrana3@gmail.com,
Each of the above examples is representative of the sort of data structures often seen inpastes. The appearance of the email address may be completely innocuous but it also oftenindicates a serious breach. Only human review and assessment can determine if the pasterepresents a risk that requires a response such as changing passwords.
The reliability of pastes
The presence of an email address on a paste site doesn't always mean it's been compromised ina breach and the process that scans for addresses is entirely autonomous — there's nohuman review. Do take a look at the paste and assess the impact for yourself if your addressappears there.
Paste duplication
Often a paste will appear on a service such as Pastebin multiple times. It may be identicalor contain slight variations but for all intents and purposes, it's the same content. Thismay be because the same individual has published it multiple times or because a breach hasbeen socialised and then re-published by multiple people.
Have I Been Pwned does not store the original paste, only metadata such as the title andauthor if they exist. As such, there is no facility to identify duplicate pastes and insteadhuman discretion should be exercised if multiple pastes are found that appear to be the same.
Acceptable use, transient pastes and the role of Have I Been Pwned
Services like Pastebin are pretty explicit about what is deemed to be "acceptable use"of the service; no email lists, no login details, no password lists and no personalinformation. Despite this, all these data classes frequently appear on Pastebinmany, many times per day. However they're often transient, appearing briefly before beingremoved.
Have I Been Pwned usually consumes the paste data within 40 seconds of it being published.However, only metadata about the paste (title, author, date) and the email addressesappearing in the paste are stored. No further data such as credentials or personalinformation is stored. The entire premise of the service rests on the service beingsearchable via email address so additional data (such as the original paste in its entirety)is not required.