Sign in to follow this  

Trying To Understand How Mainline Dht Works

Recommended Posts

I'm just curious in general about DHTs and found this great article on kademlia:

From what I understand and based on these resources on how the bittorrent DHT works:

It seems like, when I begin seeding a torrent, my client does the following:

  1. determines the key within the DHT corresponding to my torrent
  2. the value for a key is a list of seeders, and my client inserts its own IP into that list

but, consider an attacker with 100 clients, what's to stop that attacker from inserting his own, or a bogus IP for the key for this torrent, even though the attacker is not seeding the torrent at all.

And then, wouldn't it be the case that when some person X comes along to start downloading the torrent, when they look up the seeders in the DHT (find the value in the DHT for the key corresponding to the torrent), then wouldn't X get back a list of 100 bogus IPs and perhaps some small amount of legitimate seeders?

Share this post

Link to post
Share on other sites

Even though this post a year old, I've just discovered this forum, I'll try to give an answer to whoever still reads this.

I just wrote a small paper about BitTorrent's DHT implementation (or Mainline DHT/MLDHT). One of the biggest shortcoming of it, as you just pointed out, is the fact that a node can freely choose it's own ID. The biggest reason behind this is the fact that for a DHT to work efficiently, the node's ID-s have to be as evenly spread out as it possibly can. Now how would you bind an ID to a node without any kind of central server? If you think about it, the only way would be to use the node's public IP address.

Of course, this isn't that simple either. For starters, a node behind a NAT can't figure out it's own public address without the help of another node. This means that a node has to reply with the IP address he sees the request coming, so that the node behind the NAT can calculate an ID based on it. This opens up another attack: a bogus node could reply with a wrong IP address, which could also cause other (altough not as significant) problems.

Another thing is actually generating the node ID. An IPv4 address is 32 bit long, while a node ID is 180 bit, so only using the IP address wouldn't be enough (can't generate every possible 2^180 node address from 2^32 IP address). Long story short: this is a proposed change :

Share this post

Link to post
Share on other sites

Even though my post is a year old, the other day I did an experiment which is relevant exactly to the concerns I asked about:

Any node can announce to the network that they're seeding a torrent. I generated a random SHA1 and announced I was seeding it.
24 hours goes by and I ask the DHT to do a lookup for the SHA1. I'm not expecting to get anything back because I assume the announce interval is like 15 minutes or something. But to my surprise, I get back, from like 20 peers, two IPs from China and Saudi Arabia.
So for some reason, two nodes in China and Saudi Arabia thought it would be a great idea and lie about seeding a torrent which doesn't even exist.
Why would they do this? What's the advantage in doing it?

Share this post

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Sign in to follow this