Cryptography nerd

  • 0 Posts
  • 27 Comments
Joined 1 year ago
cake
Cake day: August 16th, 2023

help-circle







  • Humans learn a lot through repetition, no reason to believe that LLMs wouldn’t benefit from reinforcement of higher quality information. Especially because seeing the same information in different contexts helps mapping the links between the different contexts and helps dispel incorrect assumptions. But like I said, the only viable method they have for this kind of emphasis at scale is incidental replication of more popular works in its samples. And when something is duplicated too much it overfits instead.

    They need to fundamentally change big parts of how learning happens and how the algorithm learns to fix this conflict. In particular it will need a lot more “introspective” training stages to refine what it has learned, and pretty much nobody does anything even slightly similar on large models because they don’t know how, and it would be insanely expensive anyway.


  • Yes, but should big companies with business models designed to be exploitative be allowed to act hypocritically?

    My problem isn’t with ML as such, or with learning over such large sets of works, etc, but these companies are designing their services specifically to push the people who’s works they rely on out of work.

    The irony of overfitting is that both having numerous copies of common works is a problem AND removing the duplicates would be a problem. They need an understanding of what’s representative for language, etc, but the training algorithms can’t learn that on their own and it’s not feasible go have humans teach it that and also the training algorithm can’t effectively detect duplicates and “tune down” their influence to stop replicating them exactly. Also, trying to do that latter thing algorithmically will ALSO break things as it would break its understanding of stuff like standard legalese and boilerplate language, etc.

    The current generation of generative ML doesn’t do what it says on the box, AND the companies running them deserve to get screwed over.

    And yes I understand the risk of screwing up fair use, which is why my suggestion is not to hinder learning, but to require the companies to track copyright status of samples and inform ends users of licensing status when the system detects a sample is substantially replicated in the output. This will not hurt anybody training on public domain or fairly licensed works, nor hurt anybody who tracks authorship when crawling for samples, and will also not hurt anybody who has designed their ML system to be sufficiently transformative that it never replicates copyrighted samples. It just hurts exploitative companies.








  • Your scenario would specifically require the cops to ask their techs for a detailed report and then deliberately lie about it’s conclusions to attack completely random people, and just FYI the last few rounds of this happened when public WiFi was new and the cops kept losing so badly in courts that this doesn’t really happen much anymore. You don’t even need a great lawyer, just an average one who can find the precedence.

    There’s no “additional fingerprints” of relevance binding any node in a tunnel to the communications in the tunnel. It uses PFS and multiple layers of encryption (tunnels within tunnels). They need to run a debugger against their node to have any chance to really argue that a specific packet came from a specific node, which also would ironically simultaneously prove that node didn’t actually know and was just a blind relay (just like how mailmen aren’t liable for content of packages they deliver).

    Your argument is literally being used to argue that nobody should have privacy because those who don’t break laws don’t need it, yet you yourself are arguing for why we still need privacy if we haven’t broken laws. The collateral damage when such tools aren’t available is so much greater than when privacy tools are available. One of the greatest successes of Signal is how its popularity makes each of its users part of a “haystack” (large anonymity set) and targeting individual users just for using it is infeasible, protecting endless numbers of minorities and other at-risk individuals.

    In addition, it’s extremely rare that mass surveillance like spying on network traffic leads to prosecutions. It’s usually infiltration that works, so you running an I2P node will make zero difference.


  • 1: then they would go after literally anybody running a node

    2: their client will not see peers on another IP. It will just see their own I2P node. Any I2P aware software will also not have any IP addresses as peers, only I2P specific internal addresses. They will not even be able to associate an incoming connection to any one node without understanding the I2P network statistics console.

    3: by this argument all anonymization tools should be illegal, Signal too, etc, and nobody should help anybody maintain privacy. In the real world there’s plenty of reasons why anonymization tools are necessary. And there will be literally zero evidence tying you to a crime. Preexisting legal precedence says an IP address alone is not enough.



  • This is not how the law is applied to packet switching.

    If it was store and forward then maybe just maybe law enforcement would care, but anybody smart enough to set up an I2P node to research it and who tried to track where packets from from would first see the packets originate from their own local node at 127.0.0.1, then in the I2P console they could see that packet came in via an active half-tunnel from their own end interfacing with the endpoint node of the other side’s half-tunnel, and they would know that node has no idea what it’s sending (just like their ISP)