• FaceDeer@fedia.io
    link
    fedilink
    arrow-up
    4
    arrow-down
    1
    ·
    1 day ago

    That article is over a year old. The NYT case against OpenAI turned out to be quite flimsy, their evidence was heavily massaged. What they did was pick an article of theirs that was widely copied across the Internet (and thus likely to be “overfit”, a flaw in training that AI trainers actively avoid nowadays) and then they’d give ChatGPT the first 90% of the article and tell it to complete the rest. They tried over and over again until eventually something that closely resembled the remaining 10% came out, at which point they took a snapshot and went “aha, copyright violated!”

    They had to spend a lot of effort to get that flimsy case. It likely wouldn’t work on a modern AI, training techniques are much better now. Overfitting is better avoided and synthetic data is used.

    Why do you think that of all the observable patterns, the AI will specifically copy “ideas” and “styles” but never copyrighted works of art?

    Because it’s literally physically impossible. The classic example is Stable Diffusion 1.5, which had a model size of around 4GB and was trained on over 5 billion images (the LAION5B dataset). If it was actually storing the images it was being trained on then it would be compressing them to under 1 byte of data.

    AIs don’t seem to be able to distinguish between abstract ideas like “plumbers fix pipes” and specific copyright-protected works of art.

    This is simply incorrect.

    • patatahooligan@lemmy.world
      link
      fedilink
      English
      arrow-up
      3
      ·
      1 day ago

      The NYT was just one example. The Mario examples didn’t require any such techniques. Not that it matters. Whether it’s easy or hard to reproduce such an example, it is definitive proof that the information can in fact be encoded in some way inside of the model, contradicting your claim that it is not.

      If it was actually storing the images it was being trained on then it would be compressing them to under 1 byte of data.

      Storing a copy of the entire dataset is not a prerequisite to reproducing copyright-protected elements of someone’s work. Mario’s likeness itself is a protected work of art even if you don’t exactly reproduce any (let alone every) image that contained him in the training data. The possibility of fitting the entirety of the dataset inside a model is completely irrelevant to the discussion.

      This is simply incorrect.

      Yet evidence supports it, while you have presented none to support your claims.

      • FaceDeer@fedia.io
        link
        fedilink
        arrow-up
        4
        arrow-down
        1
        ·
        1 day ago

        Learning what a character looks like is not a copyright violation. I’m not a great artist but I could probably draw a picture that’s recognizably Mario, does that mean my brain is a violation of copyright somehow?

        Yet evidence supports it, while you have presented none to support your claims.

        I presented some, you actually referenced what I presented in the very comment where you’re saying I presented none.

        You can actually support your case very simply and easily. Just find the case law where AI training has been ruled a copyright violation. It’s been a couple of years now (as evidenced by the age of that news article you dug up), yet all the lawsuits are languishing or defunct.

        • patatahooligan@lemmy.world
          link
          fedilink
          English
          arrow-up
          2
          ·
          24 hours ago

          Learning what a character looks like is not a copyright violation

          And nobody claimed it was. But you’re claiming that this knowledge cannot possibly be used to make a work that infringes on the original. This analogy about whether brains are copyright violations make no sense and is not equivalent to your initial claim.

          Just find the case law where AI training has been ruled a copyright violation.

          But that’s not what I claimed is happening. It’s also not the opposite of what you claimed. You claimed that AI training is not even in the domain of copyright, which is different from something that is possibly in that domain, but is ruled to not be infringing. Also, this all started by you responding to another user saying the copyright situation “should be fixed”. As in they (and I) don’t agree that the current situation is fair. A current court ruling cannot prove that things should change. That makes no sense.

          Honestly, none of your responses have actually supported your initial position. You’re constantly moving to something else that sounds vaguely similar but is neither equivalent to what you said nor a direct response to my objections.

          • FaceDeer@fedia.io
            link
            fedilink
            arrow-up
            4
            ·
            23 hours ago

            But you’re claiming that this knowledge cannot possibly be used to make a work that infringes on the original.

            I am not. The only thing I’ve been claiming is that AI training is not copyright violation, and the AI model itself is not copyright violation.

            As an analogy, you can use Photoshop to draw a picture of Mario. That does not mean that Photoshop is violating copyright by existing, and Adobe is not violating copyright by having created Photoshop.

            You claimed that AI training is not even in the domain of copyright, which is different from something that is possibly in that domain, but is ruled to not be infringing.

            I have no idea what this means.

            I’m saying that the act of training an AI does not perform any actions that are within the realm of the actions that copyright could actually say anything about. It’s like if there’s a law against walking your dog without a leash, and someone asks “but does it cover aircraft pilots’ licenses?” No, it doesn’t, because there’s absolutely no commonality between the two subjects. It’s nonsensical.

            Honestly, none of your responses have actually supported your initial position.

            I’m pretty sure you’re misinterpreting my position.

            The “copyright situation” regarding an actual literal picture of Mario doesn’t need to be fixed because it’s already quite clear. There’s nothing that needs to change to make an AI-generated image of Mario count as a copyright violation, that’s what the law already says and AI’s involvement is irrelevant.

            When people talk about needing to “change copyright” they’re talking about making something that wasn’t illegal previously into something that is illegal after the change. That’s presumably the act of training or running an AI model. What else could they be talking about?