Jump to content

indexing proposal


Recommended Posts

The idea is to remove the dependence on torrent indexing sites such as TPB through the use of the DHT.  Specifically, the DHT would be used to store signed topic->torrent mappings and trust relationships between signatories.  One would use imdb or other sites for keyword searches to find exact topic values and then search the DHT to identify btih hashes.
The idea builds on BEP5 and BEP44, plus new DHT functions to store the data.  Clients would use this data along with trust value calculations taken from these two papers:

[1] https://domino.fov.uni-mb.si/proceedings.nsf/proceedings/d9e48b66f32a7dffc1256e9f00355b37/$file/josang.pdf

[2] http://www.johndcook.com/UTMDABTR-005-05.pdf

First: add DHT support for indexing.  Two new DHT functions are required: get_reviewers and announce_review.  They mirror the behavior of get_peers and announce_peer but record the key public key of the client instead of his IP address; the target location is the "topic hash" instead of the "btih hash".  The annouce_review request contains a signature of the token to prove control of the key to be registered.  This allows a reviewer (identified by his public key) to announce the availability of a review.  The reviewer must also BEP44-store a signed review, using the topic hash as salt.


Second: store the review itself using BEP44 with the topic hash as salt.  There are three cases.  In the first case, the reviewer provides a direct topic->btih mapping, plus an optional comment string, signed with his public key; this is a positive "first hand" review.  The second case is similar to the first, except it is a negative review; an extra flag indicates that according to the reviewer, this mapping is invalid.  In the third, the reviewer provides a chain of public keys, starting with his own, and ending with the public key that authored a first-hand review.  This indicates a chain of trust that, from the point of view of the 2nd-hand reviewer, provides the best evidence for the trustworthiness of the 1st-hand review.  No trust value is included in the 2nd-hand review, just the list of public keys.
The 2nd-hand form is used by peers that have not yet evaluated whether the topic->btih mapping is correct or not.  Its goal is to help prospective peers establish trust in a mapping by publishing a high-trust path leading to a 1st-hand review.  The prospective peer might establish trust in this mapping if it trusts any of the keys in the path.
Third: store the trust relationships.Trust relationships form a directed graph, with public keys as nodes and trust values as edges.  Each edge is BEP44-stored with: k = the source key; salt = the target key; and v = the trust value.  In practice, you don't need to store the whole graph, only the edges mentioned by a 2nd-hand review.  The 2nd-hand reviewer needs to sign and store the trust value associated with the first link in the chain of public keys, and persist the remaining links (which are signed by their respective keys) as well as the associated 1st-hand review (the final link in the chain).
Fourth: encode the trust values.  A trust value consists of 4 integers, each counting the number of reviews by the target key that were evaluated by the source key.  The 4 integers are: the number of 1st-hand reviews that are valid, the number of 1st-hand reviews that are invalid, the number of 2nd-hand reviews that are valid, and the number of 2nd-hand reviews that are invalid.  Each reviewer maintains these counts independently in a local DB and publishes them as required.
Finally, we describe the behavior of a node researching a given topic.
The client starts by computing the topic hash and issuing a get_reviewers command to the DHT.  The client then queries the DHT to retrieve the reviews, and finally to retrieve the trust graph edges mentioned in 2nd-hand reviews.  The client now needs to establish trustworthiness values for each topic->btih mapping discovered in order to present them as search results to the user.
To calculate trust values, we shall use a modified version of the "beta trust model" described in [1].  The two modifications allows the trust model to be resistant to Cybil attacks.  The resulting protocol emphasizes the amount of positive experience with a target key to establish trust.
The first modification entails replacing equation (7) in [1] with the max of the two beta distributions.  In order to compare two distributions, you need either the techniques in [2], or a montecarlo simulation.
The second modification consists simply of dropping the "forgetting" feature of section 2.6.
By applying the trust graph to this modified beta trust model, one can calculate a beta distribution that represents the trustworthiness of each topic->btih mapping and rank them for presentation to the user.
If the user selects a btih for downloading, the client should publish a 2nd-hand review indicating the best path, starting from its own public key, and ending with a 1st-hand review.
The user should then be encouraged by the client to validate the topic->btih mapping.  The client would then add 1 to the appropriate counter for each public key that provided a review containing that mapping.  Remember that the client maintains a DB containing all of its own outbound trust edges, where each edge contains four counters.  One of these four counters is incremented depending on whether the review is valid or nor, and on whether the review is 1st-hand or not.  The client should also publish a new 1st-hand review to replace its previous 2nd-hand review.
Hopefully, this describes the general idea.  I'm available for questions.
Link to comment
Share on other sites


This topic is now archived and is closed to further replies.

  • Create New...