Trying To Understand How Mainline Dht Works

macmee · August 10, 2015

I'm just curious in general about DHTs and found this great article on kademlia:

http://gleamly.com/article/introduction-kademlia-dht-how-it-works

From what I understand and based on these resources on how the bittorrent DHT works:

https://en.wikipedia.org/wiki/Mainline_DHT

It seems like, when I begin seeding a torrent, my client does the following:

determines the key within the DHT corresponding to my torrent
the value for a key is a list of seeders, and my client inserts its own IP into that list

but, consider an attacker with 100 clients, what's to stop that attacker from inserting his own, or a bogus IP for the key for this torrent, even though the attacker is not seeding the torrent at all.

And then, wouldn't it be the case that when some person X comes along to start downloading the torrent, when they look up the seeders in the DHT (find the value in the DHT for the key corresponding to the torrent), then wouldn't X get back a list of 100 bogus IPs and perhaps some small amount of legitimate seeders?

Lasent · November 4, 2016

Even though this post a year old, I've just discovered this forum, I'll try to give an answer to whoever still reads this.

I just wrote a small paper about BitTorrent's DHT implementation (or Mainline DHT/MLDHT). One of the biggest shortcoming of it, as you just pointed out, is the fact that a node can freely choose it's own ID. The biggest reason behind this is the fact that for a DHT to work efficiently, the node's ID-s have to be as evenly spread out as it possibly can. Now how would you bind an ID to a node without any kind of central server? If you think about it, the only way would be to use the node's public IP address.

Of course, this isn't that simple either. For starters, a node behind a NAT can't figure out it's own public address without the help of another node. This means that a node has to reply with the IP address he sees the request coming, so that the node behind the NAT can calculate an ID based on it. This opens up another attack: a bogus node could reply with a wrong IP address, which could also cause other (altough not as significant) problems.

Another thing is actually generating the node ID. An IPv4 address is 32 bit long, while a node ID is 180 bit, so only using the IP address wouldn't be enough (can't generate every possible 2^180 node address from 2^32 IP address). Long story short: this is a proposed change : http://www.bittorrent.org/beps/bep_0042.html

macmee · February 22, 2017

Even though my post is a year old, the other day I did an experiment which is relevant exactly to the concerns I asked about:

Any node can announce to the network that they're seeding a torrent. I generated a random SHA1 and announced I was seeding it.
24 hours goes by and I ask the DHT to do a lookup for the SHA1. I'm not expecting to get anything back because I assume the announce interval is like 15 minutes or something. But to my surprise, I get back, from like 20 peers, two IPs from China and Saudi Arabia.
So for some reason, two nodes in China and Saudi Arabia thought it would be a great idea and lie about seeding a torrent which doesn't even exist.
Why would they do this? What's the advantage in doing it?

crawldht · December 19, 2017

The condition you are explaining is actually a very potential attack called sybil attack which is not only a problem with BitTorrent but also with Tor and Bitcoin.

A malicious node can easily setup thousands of nodes of himself. So whenever a client asks for peers, the attacker will give him false IP Addresses which are actually the IP Addresses controlled by the attacker. Thus putting that client into Denial of Service.

The attacker returns false IP Addresses to every client who comes to fetch list of peers from him.

This technique is actually being used by copyright holders who want to slow down the propagation of their copyright data.

The efficiency of attack depends upon the number of malicious nodes an attacker signs up. The attacker can also sign up peers so that he can distribute incorrect data and if he happens to be both DHT node and a peer, he can give you torrent metadata of his malicious data and may start distributing it. It is just that in the end you will realize that hash of torrent doesn't match with the infohash.

microft · April 7, 2020

I would guess that there are entities that want a copy of every new torrent that comes out. The NSA comes to mind. Merely the fact that some human bothered to share some collection of files makes that collection valuable to entities that can afford to collect everything.

Ok so I assume you just announced the hash_value but didn't stick around and manage peers. So these other entities came in and fulfilled that role. They're not seeding, they're trying to download it.

I recently started experimenting with DHT and so am familiar with it (somewhat). I haven't dug into the bittorrent side of things too much.

Sign In

Trying To Understand How Mainline Dht Works

Recommended Posts

macmee

Link to comment

Share on other sites

Lasent

Link to comment

Share on other sites

macmee

Link to comment

Share on other sites

crawldht

Link to comment

Share on other sites

microft

Link to comment

Share on other sites

Join the conversation

Browse

Activity