Ranking ‘Darkweb Intelligence’ companies— a data science approach Part 1
I have many years of being a research in the threat intelligence field, I have come across multiple vendors and even worked for some. This series is all about what you do need, don’t need, and helping you stay clear of the snake oil.
This series is taking a look at in-depth Threat Intelligence companies and their platforms. Ranking based on access to collections both open and closed, as well as how the data is categorized. By taking certain dumps/access to areas/certain talking points and comparing them to others we are able to get a accurate mathematical calculation of the accuracy and ability to collect data at scale that can be used for actionable intelligence.
“Having access to an underground forum, that allows anyone to be able to view/read the material isn’t threat intelligence. Threat intelligence is how you make it actionable to a certain company, person, or industry.”
Companies that we looked at:
Digital Shadows Searchlight
SpyCloud ATO Prevention
Alert Logic Dark Web Scanner
DarkOwl Vision
ACID Cyber Intelligence
Dashlane Buisness
HaveIBeenPwned?
Flashpoint
SiloBreaker
Recorded Future
Dehashed
PIIGuard360
Looking Glass Cyber Solutions
Authentic8
Webhose
Sixgill
Surface Watch Labs
VigilantIQ
MarkMonitor
Intel471
Using a data science approach to compare said companies would take time, what was done is we took different sources and ranked them from 1–6 based on difficulty to gain and maintain access to dark web groups, websites, etc.
The Ranking of open/closed sources:
1- No logon required, no maintenance required, ability to scrape page daily. Most phpbb site forums.
2- Logon required, no maintenance required, ability to scrape pages randomly.
3- Logon required, must be maintained, ability to scrape pages at random. The use of news groups and some websites that require TOR.
4- Logon either purchased, or gained through a trusted resource. Maintained regularly with both postings and comments to show behavior. No ability to scrape. VPN’ed site, or maybe TOR.
5- Logon either purchased, or gained through a trusted resource. Maintained socially, with the ability to comment/answer questions regularly. No scraping, data must be gathered manually. VPN’ed site that require certificate and login credentials.
6- Logon gained through the use of a legend, small intimate group of people who have known each other for a long period of time. Trust is earned through illegal activities and/or false flag operations with certain companies to show ability to gain relevant data for said group. Mostly work in chat rooms such as IRC/Discord.
“By taking randomly generated posts from each group, then placing them in a alert pool through certain scripting engines. We are able to maintain capability of the type of data certain companies have access too, and are able to collect for intelligence purposes.”
“Then by looking at edits to certain dumps of personal information for sell, and looking at the changes of a timeline, we’re able to find the details that were removed from them and compare them to known aliases of the sellers of the illegal information for investigation purposes. By using this method, we can see which companies have the original sourced dump, and which ones have the edited versions.”
Second to last, we create a post that is unique to us to be able to see if it has been gathered/stored and in what structured/unstructured format. We look to see if 1. it has been gathered, 2. if it has the unique data, 3. if its searchable.
The last criteria we look at is ease of use, and how certain platforms have an API you can use with Synapse (The Vertex Project), Matlego, i2 Analyst Notebook, etc. Also how your searches are private to you, and the company not recording your searches for metric purposes or to use to develop their own research.
Stay tuned for more.