Sign in to follow this  
macmee

Trying To Understand How Mainline Dht Works

Recommended Posts

I'm just curious in general about DHTs and found this great article on kademlia:


http://gleamly.com/article/introduction-kademlia-dht-how-it-works


From what I understand and based on these resources on how the bittorrent DHT works:


https://en.wikipedia.org/wiki/Mainline_DHT


It seems like, when I begin seeding a torrent, my client does the following:


  1. determines the key within the DHT corresponding to my torrent
  2. the value for a key is a list of seeders, and my client inserts its own IP into that list

but, consider an attacker with 100 clients, what's to stop that attacker from inserting his own, or a bogus IP for the key for this torrent, even though the attacker is not seeding the torrent at all.


And then, wouldn't it be the case that when some person X comes along to start downloading the torrent, when they look up the seeders in the DHT (find the value in the DHT for the key corresponding to the torrent), then wouldn't X get back a list of 100 bogus IPs and perhaps some small amount of legitimate seeders?


Share this post


Link to post
Share on other sites

Even though this post a year old, I've just discovered this forum, I'll try to give an answer to whoever still reads this.

I just wrote a small paper about BitTorrent's DHT implementation (or Mainline DHT/MLDHT). One of the biggest shortcoming of it, as you just pointed out, is the fact that a node can freely choose it's own ID. The biggest reason behind this is the fact that for a DHT to work efficiently, the node's ID-s have to be as evenly spread out as it possibly can. Now how would you bind an ID to a node without any kind of central server? If you think about it, the only way would be to use the node's public IP address.

Of course, this isn't that simple either. For starters, a node behind a NAT can't figure out it's own public address without the help of another node. This means that a node has to reply with the IP address he sees the request coming, so that the node behind the NAT can calculate an ID based on it. This opens up another attack: a bogus node could reply with a wrong IP address, which could also cause other (altough not as significant) problems.

Another thing is actually generating the node ID. An IPv4 address is 32 bit long, while a node ID is 180 bit, so only using the IP address wouldn't be enough (can't generate every possible 2^180 node address from 2^32 IP address). Long story short: this is a proposed change : http://www.bittorrent.org/beps/bep_0042.html

Share this post


Link to post
Share on other sites

Even though my post is a year old, the other day I did an experiment which is relevant exactly to the concerns I asked about:

Any node can announce to the network that they're seeding a torrent. I generated a random SHA1 and announced I was seeding it.
24 hours goes by and I ask the DHT to do a lookup for the SHA1. I'm not expecting to get anything back because I assume the announce interval is like 15 minutes or something. But to my surprise, I get back, from like 20 peers, two IPs from China and Saudi Arabia.
So for some reason, two nodes in China and Saudi Arabia thought it would be a great idea and lie about seeding a torrent which doesn't even exist.
Why would they do this? What's the advantage in doing it?
 

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this