1

Ask HN: I've Built a DHT Torrent Sniffer and Search Engine. Should I Release?

 1 year ago
source link: https://news.ycombinator.com/item?id=33305671
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Ask HN: I've Built a DHT Torrent Sniffer and Search Engine. Should I Release?

Ask HN: I've Built a DHT Torrent Sniffer and Search Engine. Should I Release?
37 points by sylwester 4 hours ago | hide | past | favorite | 39 comments
Recently I was researching about DHTs and developed a DHT Sniffer in Go which connects to some known DHT Routers and sniffs all the annoucements. I've quickly added ZincSearch and it is now basically a search engine which can search for hashes, name or files contained in the torrents. It is able to index around 5-10k annoucements per second, so the index grows quite fast.

Now, I am thinking about releasing it as open-source for others to study, but not sure if I should, because it might be used for "evil".

I wrote a similar solution 8 years ago. I repurposed the system to identify IPs owned by the government, and notified them if a malicious copy of Windows (but not limited to) was seeded by them. Meaning some unknown actor had theoretically a backdoor in my government. If you wanna discuss, I'm happy to talk [email protected]
Let it rip. DHT has been around for so long now that whatever bad actors/evil use cases you're imagining have already happened. It sounds like a cool project, and I'd be interested to see it.
s.gif
I accidentally read this as "Let it R.I.P."

I totally agree, unroll it!

While the file sharing is distributed, the centralised web-based indexing is still a game of whack-a-mole.

Sorry for the bad news: it's not going to work.

Can you filter the 95% spam and dead torrents? Learn-to-rank is nice to order search results.

It's very very hard what you are trying to do. Takes lots of effort to be somewhat usable. It's duplicating Google, without their monopoly. A few have tried. My university lab has been building a torrent search engine for 17 years and 3 months. Fully decentralised, no servers, no browser, now called Web3. Burning 7 million Euro in tax payer money, building a distributed ledger in Sep 2007. So expect a long road ahead. (our engine: https://github.com/Tribler/tribler/wiki)

s.gif
OP said they already have a working system. How well it works remains to be seen, until it's publicly available.

On what basis do you assume he/she has not succeeded without even having seen their work?

s.gif
We tried too. Several people have made DHT sniffers, including our lab. You see lots of spam and get stuck. Filtering out the noise is weirdly hard.
s.gif
> We are building a micro-economy without banks, without advertisers, and without any government

But with government funding?

s.gif
Tribler is an interesting project, but not directly comparable to what the OP is trying to achieve.
s.gif
indeed, bit different. lots of people worked in this field and made these tools. Around 2014 the state of the art was 20k DHT responses per second, see '100 million DHT replies'. https://doi.org/10.1109/P2P.2014.6934318
s.gif
Thanks for the pointer!

I just read some interesting proposals in the GH issues and recognized your username. This is some nice piece of work - will definitely return to read more about the underlying concepts.

We are working in a similar area with similar problems: Building a new Tor/VPN-like privacy network. See https://safing.io/spn/

s.gif
If you follow through to the site linked in the Github:

> Tribler is a research project of Delft University of Technology

> Work on Tribler has been supported by multiple Internet research European grants. In total we received 3,538,609 Euro in funding for our open source self-organising systems research. Roughly 10 to 15 scientists and engineers work on it full-time. Our ambition is to make darknet technology, security and privacy the default for all Internet users.

> Vision & Mission ... "Push the boundaries of self-organising systems, robust reputation systems and craft collaborative systems with millions of active participants under continuous attack from spammers and other adversarial entities."

Having seen other indexers before I suspect your implementation isn't spec-compliant or well-behaved (perhaps spoofing node-IDs? causing more traffic than necessary?)

If you want to build an indexer you should write a normal implementation and then use http://bittorrent.org/beps/bep_0051.html

Just AGPL it, I hear it's an effective ward against Alphabet.
s.gif
Very good suggestion, considering the sniffer could be used by copyright claims lawyers.
s.gif
Can't see how AGPL would stop anyone using it. Doesn't it just prevent them from modifying the source code without sharing their changes. ?
s.gif
AGPL fixes a "bug" in GPL that many tech companies have exploited to not release modified source code as the GPL requires. Simply put, GPL says that you have to release the complete source code (including any changes you have made) of a GPL licensed code only if you distribute it to other users. Many tech companies thus avoided GPL code. But with the growth of Software-as-a-Service, where an application only runs on the server and is accessed through a browser or an app, many of these same companies created web applications with GPL code that they customised. However, if a user demanded the source code of the GPL code, along with the modifications they made, the tech companies refused to provide it claiming they weren't "distributing" the application (as in giving you the whole application to run on your computer). And since they weren't doing that they claimed they had no legal obligation to release the complete source code.

AGPL fixes this - it recognizes SaaS web applications too as "distribution". So if a source code is licensed under AGPL, anyone who uses it to create web applications and makes it available to the public is now legally obliged to provide the complete source code if any user requests for it. (And ofcourse, as with GPL the user is free to use the source code as they want).

That is why the AGPL is currently the best GNU license to ensure that your open source code always remains open source.

I had been working on this successfully for a couple years in the past before I got tired of it and moved on. I still think it's a magnificent idea, to be able to host your own torrent site and to decentralise the last centralised bit of BitTorrent.

https://github.com/boramalper/magnetico

I'll add to the chorus of people saying "yes, release it".

If you're worried about blowback as a result of "evil" uses / users, is there a way to release it (somewhat) anonymously, so it's difficult to be traced back to you?

I would recommend writing down the worst and best case scenarios that could happen with your software, then determine if you notice either that through severity or quantity the software outweighs the positives, don't release it.
I was planning to start learning GO, I'd be definitely interested to learn from your project :)
There is a custom in science that the good people of IT would be wise to adopt: always define an acronym upon first use in any publication.

This is the customary format: "I've built a distributed hash table (DHT) torrent sniffer..."

This way, everyone undertsands that we are talking about the same thing, rather than, say, dihydrotestosterone or "Don't Hate Taylor".

Assuming that everyone knows to what you are referring with your cryptic acronyms is rude, ignorant, and careless.

If you want to be regarded as professionals, you have to start acting like professionals, rather than some undisciplined children trading stories at recess.

Which evil usages are you concerned about? I think it would be very useful for the public.
s.gif
DHT has existed for 17 years - the cat's out of the bag. The anti-piracy companies have built their own crawlers.
s.gif
They match swarm IP's and then notify ISP's who voluntarily hassle you on their behalf. An IP doesn't equal a person in USA.
I mean, it already exists. [1] Always fun to see what my neighbors behind the same NAT download.

[1]: https://iknowwhatyoudownload.com/

s.gif
Ah, haven’t checked that one for a while. Surprised to find a tech-minded neighbor who has downloaded kali-linux-2022.3-live-everything-amd64.iso. And no pr0n this time round.
Consider that such already is available as open source in Go even.
s.gif
An advantage here would be people self hosting their own based on this project.
s.gif
Btdig is awesome. If this is like an open source version in Go then that is a huge contribution.
That would be very useful... Just release the code and building instructions.
I think you should. From my understanding, use of DHT is already dead in the eyes of most torrenters
I imagine any bad actors who store IPs of torrent seeders have done so a long time ago already so your software will not do any harm that hasn't been done already.

Go for it and open-source it.

yes please. you are not responsible for any "Evil" users of the software might do. This is not even about enabling bad stuff, this is just natural progression of technology.
s.gif
Nuclear and biological weapons are also part of the "natural progression of technology" but it's widely agreed that they shouldn't be released to anybody and everybody, and the people distributing those technologies have a moral responsibility.

If you think your software would be of more use to "evil" than not, then don't release it widely.

s.gif
Yeah, it is everyone's moral responsibility to consider effects of our actions as far as we're able to.
s.gif
Applications are open for YC Winter 2023
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK