Impetus
Last Friday, one of the top articles on hacker news was called Breaking the Silk Road’s Captcha
This sounded pretty cool to me, though not necessarily applicable because the current Silk Road 2.0 (I’ll just be calling it SR from now on) isn’t using anything nearly as sophisticated.
I thought it would be really interesting to scrape SR for, let’s say a month or two. I could do cool stuff like make a stock ticker and display the values like COK
XTC
LSD
etc.
Disclaimer
The following information is for educational purposes only, I have no affiliation with the Silk Road 2.0, nor have I ever purchased anything off the site. As far as I know, visiting the site and writing about it with no intention to buy (commit a crime) is perfectly legal.
Some implementation quirks
Before we begin: I only wanted to spend an hour or two doing this. I was late for a dinner and wanted it to run overnight while I was sleeping. If you are looking to build a robust system, you should consider a different solution.
Captcha
Simply download the captcha, run it through some opencv transforms, then feed it to tesseract. If it doesn’t work, just keep on trying until we can get a relatively easy one. I think my sucess rate was >90% with some very tranforms using opencv.
Connecting through tor
The SR site is an anonymous hidden service reachable only through the tor network. You run the tor client daemon on your machine, then use it as a SOCKS5 proxy.
This has some complications, because dns requests also have to go through tor.
The quick and dirty solution is to just spawn the scraper through torsocks which wraps all the net requests from my scraper.
Automatic logouts/timeouts
The SR site seems to be very eager to automatically log out users. When logged out, I simply create a new user. When I am back on the site, I make sure to traverse to the last known point from the root node of our crawl tree. This is to avoid detection.
The nature of web crawling through tor:
Crawling through tor already obfuscates your identity to a certain degree, so we don’t really have to do anything other than cycling
User-Agent
strings to look different from any other client.
Data Extract
I’ve made a one day snapshot available at github.com/dlau/sr-data
I will release the source code for the crawler when I am done, with the SR specific portions removed if anyone is interested. This will all go to the same repo.
Findings
Alright enough technical details, let’s see what useful information we can get out of this.
Knowing very little about recreational drug use, I visited the National Institute of Drug Abuse’s website which conveniently provided the names of, what the US considers, to be the most widely used drugs.
I thought, if I know them, they must be a big deal right?! I guess so. Here are the drugs I picked out:
Total number of listings
Sorted by number of listings
----------------------------
MDMA 1321
Weed 761
LSD 523
Cocaine 475
Amphetamine 215
Heroin 150
Ketamine 67
Opium 53
Mescaline 20
Total 3585
weed is simply marijuana that is smoked, not any other derivative such as hash
To put things in perspective, at the moment of writing this SR has approximately 13,000 listings for drugs. Just a guess, but it looks like prescription drugs account for a large portion of SR drug listings.
Nothing much to say here, other than the fact that MDMA seems to have the most listings.
Highest number of ratings
Just like buying off Amazon, users can review the specific product. SR gives a rating from 1-5 stars and the total number of reviews per product listing.
The average number of ratings per product as shown here seem to be rather uniform, there is on average 29 reviews per product.
MDMA 33822 25
Weed 28213 37
LSD 12122 23
Cocaine 16591 34
Amphetamine 6251 29
Heroin 3132 20
Ketamine 1504 22
Opium 1256 23
Mescaline 62 3
Total 102953
Top 100 Most Reviewed Items
MDMA 48 Weed 22 LSD 10 Cocaine 9 Amphetamine 7 Ketamine 1 Opium 1 Mescaline 1 Heroin 1
In case you are wondering, there were some outliers:
- One had 100g of MDMA for $1510.77. It had 392 ratings.
- Another was selling 100g of mdma for $1186 and 50g for $659. They had 293 ratings and *279 ratings respectively.
- The other was for 1/4lb of bulk medical marijuana for $619.10. It had 378 ratings.
I somehow doubt this guy has sold half a million dollars worth of MDMA at $1.5k a pop in such a huge quantity, but the price seems to be in line with other sellers for an equivalent amount. I’m not entirely sure what the rules are regarding who can give feedback, but there seem to be people buying huge quantites if a user must buy a product to be able to review it. I have never purchased anything from the site, and I wasn’t presented with any choices to review an item.
If only people who purchase the item can review it, then I am a bit less skeptical. I saw one canadian seller listing 1 kilo of MDMA for USD $8k with 1 review!
========================================================
The average price of the top 100 items is $129
The average price of the top 500 items is $188
The average price of the top 1000 items is $236
Prices are converted to USD at time of crawl using exchange rates from the coinbase api.
Countries
Sellers on SR can specify where they ship from and where they ship to.
isocode number of listings
--------------------------------
us 93
au 45
gb 40
de 39
nl 35
ca 32
se 10
cn 6
za 4
be 4
it 2
es 2
nz 2
no 2
ie 2
pl 2
dk 2
sk 1
cz 1
fi 1
fr 1
ch 1
at 1
in 1
Observations
Total Sales Volume
If, indeed every sale can map to a transaction, some vendors are doing huge amounts of business through mail order drugs. While the number is small, if we sum up all the product reviews x product prices, we get a huge number of
USD $20,668,330.05
.REMEMBER! This is on Silk Road 2.0 with a very small subset of their entire inventory.
sqlite> SELECT SUM(review_count * price_usd) FROM silkroad_data WHERE review_count > 0;
20668330.0569627
sqlite> SELECT COUNT(*) FROM silkroad_data;
3579
Comparing to Agora
The agora marketplace seems to have more or less the same number of listings. It would be interesting to see whether or not the sellers are the same or different between the sites.
Junk listings
SR has quite a lot of junk listings, there are all sorts of listings unrelated to the product category. I had to filter out quite a lot of listings which deviated too far from the mean price per unit volume. The Agora Marketplace seems to be a bit better curated and moderated. I think suspect that it has more real inventory than SR.
Closing
I won’t tell you that I know what I’m doing
This is simply a collection of observations from someone who knows pretty much nothing about the drug world. It is probably among the longest articles I have ever written, any suggestions with regard to writing would be greatly appreciated. It must have taken me at least 3 times the amount of time to write this article versus getting all the data!
Need more data
I’ve set up a cron-type job to crawl SR daily and crunch some numbers. It will be interesting to see how things change over time, though a month may not be enough time to see any significant shifts.
This was a bit much for me
It was really creepy looking through all those drug listings, with rocks of all sorts of shapes and colors. I spent way too much time writing this article, hope someone finds it educational.
To be continued …
Part 1 gave us an overall, albeit superficial view of the numbers behind SR 2.0.
Part 2 will focus on pricing, trends and predictions.
In the last part of the series, I will report on changes over time.