• wonderingwanderer@sopuli.xyz
    link
    fedilink
    English
    arrow-up
    1
    ·
    4 days ago

    How would they launder it? Just declare it their own property because a few lines of code look similar? When there’s no established connection between the developers and anyone who has access to the closed-source code?

    That makes no sense. Please tell me that wouldn’t hold up in court.

    • lagoon8622@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      3
      ·
      3 days ago

      Please tell me that wouldn’t hold up in court.

      First tell us how much money you have. Then we’ll be able to predict whether the courts will find in your favor or not

    • sem@piefed.blahaj.zone
      link
      fedilink
      English
      arrow-up
      2
      ·
      3 days ago

      First of all, who is going to discover the closed source use of gpl code and create a lawsuit anyway?

      Second, the llm ingests the code, and then spits it back out, with maybe a few changes. That is how it benefits from copyleft code while stripping the license.

      Maybe a human could do the same thing, but it would take much longer.

      • wonderingwanderer@sopuli.xyz
        link
        fedilink
        English
        arrow-up
        1
        ·
        3 days ago

        Wait, did you just move the goalposts? I thought the issue we were talking about was open-source developers who use LLM-generated code and unwittingly commit changes that contain allegedly closed-source snippets from the LLM’s training data.

        Now you want to talk about LLM training data that uses open-source code, and then closed-source developers commit changes that contain snippets of GPL code? That’s fine. It’s a change of topic, but we can talk about that too.

        Just don’t expect what I said before about the previous topic of discussion to apply to the new topic. If we’re talking about something different now, I get to say different things. That’s how it works.

        • sem@piefed.blahaj.zone
          link
          fedilink
          English
          arrow-up
          1
          ·
          3 days ago

          I was responding specifically to this part

          But if an LLM regurgitates closed-source code from its training data, I just can’t see any way how that would be the developer’s fault…

          showing what would happen when the llm regurgitates open source code into close source projects.

          Sorry if you didn’t like that.

          • wonderingwanderer@sopuli.xyz
            link
            fedilink
            English
            arrow-up
            1
            ·
            2 days ago

            But you flipped the situation, making it an entirely different discussion, and then you went on as if you thought my previous point was still supposed to apply to the new topic that you introduced.

            It’s not that I don’t like it; we can talk about the issues with training commercial LLMs on GPL code. It was just an unannounced change of topic. Like you were trying to score points, so you brought up something irrelevant to pretend I’m arguing against, which I wasn’t.

            Corporations have been able to steal open-source code without the help of AI, and the same issues arise due to lack of transparency. It’s a problem, sure, but it wasn’t the problem we were discussing. And you acting like I’m somehow arguing against it being a problem is a strawman, because it’s not what the thing I said was in reference to.

    • ricecake@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      1
      ·
      3 days ago

      I believe what they’re referring to is the training of models on open source code, which is then used to generate closed source code.
      The break in connection you mention makes it not legally infringement, but now code derived from open source is closed source.

      Because of the untested nature of the situation, it’s unclear how it would unfold, likely hinging on how the request was formed.

      We have similar precedent with reverse engineering, but the non sentient tool doing it makes it complicated.

      • wonderingwanderer@sopuli.xyz
        link
        fedilink
        English
        arrow-up
        1
        ·
        2 days ago

        That makes sense. I see the problem with that, and I don’t have a good solution for it. It is a divergence of topic though, as we were discussing open-source programmers using LLMs which are potentially trained on closed-source code.

        LLMs trained on open-source code is worth its own discussion, but I don’t see how it fits in this thread. The post isn’t about closed-source programmers using LLMs.

        Besides, closed-source code developers could’ve been stealing open-source code all along. They don’t really need AI to do that.

        Still, training LLMs on open-source code is a questionable practice for that reason, particularly when it comes to training commercial models on GPL code. But it’s probably hard to prove what code was used in their datasets, since it’s closed-source.

        • ricecake@sh.itjust.works
          link
          fedilink
          English
          arrow-up
          1
          ·
          2 days ago

          I don’t really see it as a divergence from the topic, since it’s the other side of a developer not being responsible for the code the LLM produces, like you were saying.
          In any case, it’s not like conversations can’t drift to adjacent topics.

          Besides, closed-source code developers could’ve been stealing open-source code all along. They don’t really need AI to do that.

          Yes, but that’s the point of laundering something. Before if you put foss code in your commercial product a human could be deposed in the lawsuit and make it public and then there’s consequences. Now you can openly do so and point at the LLM.

          People don’t launder money so they can spend it, they launder money so they can spend it openly.

          Regardless, it wasn’t even my comment, I just understood what they were saying and I’ve already replied way out of proportion to how invested I am in the topic.

          • wonderingwanderer@sopuli.xyz
            link
            fedilink
            English
            arrow-up
            1
            ·
            1 day ago

            Conversations can drift to adjacent topics, yeah, but it’s not a “gotcha” when someone suddenly changes the topic to the inverse of what was being said, and then acts like they’re arguing against you because the thing that you said about the original topic doesn’t add up with the new topic.

            If you change the topic, you need to at least give the other person an opportunity to respond to your new topic, not just assume that their same argument applies.

            • ricecake@sh.itjust.works
              link
              fedilink
              English
              arrow-up
              1
              ·
              1 day ago

              Alright. I didn’t see any gotchas or argument, and didn’t make the comment.

              That being said, reading the context I assume you’re referring to, it hardly reads like anything more than talking about the implication of the idea you shared.
              Disagreeing because applying the argument consistently results in an undesirable outcome isn’t objectionable.

              • wonderingwanderer@sopuli.xyz
                link
                fedilink
                English
                arrow-up
                1
                ·
                12 hours ago

                Disagreeing because applying the argument consistently results in an undesirable outcome isn’t objectionable.

                I’m not objecting to disagreement, I’m objecting to the attempt to apply my argument to a different situation that it wasn’t meant for, and then going on as if that’s even remotely what I was saying.

                That’s not “applying the argument consistently”, it’s removing context, overgeneralizing the argument, and applying a strawman based on a twisted version of it.

                Open-source developers using AI trained on closed-source code and closed-source developers using AI trained on open-source code are two different issues. My point was only intended to apply to the former, because that’s what we were talking about. Trying to apply what I said to the former is a distortion of my argument, and not the argument I was making.

                And to try to conflate the two is to be allergic to nuance, which is honestly just typical and unsurprising, but if that’s the case then I’m done wasting my time on this conversation.

                • ricecake@sh.itjust.works
                  link
                  fedilink
                  English
                  arrow-up
                  1
                  arrow-down
                  1
                  ·
                  6 hours ago

                  I’m really not interested in the topic. I’m talking because I explained what someone else meant and you started responding as though that was an opinion or argument I was making.

                  That’s not “applying the argument consistently”, it’s removing context, overgeneralizing the argument, and applying a strawman based on a twisted version of it.

                  It’s really not.
                  It’s not unreasonable for someone to think “developers who use copy written code from AI aren’t liable for infringement” applies to closed source devs as well as open, and to disagree because they don’t like one of those.
                  It’s perfectly valid for you to also disagree and say the statement shouldn’t apply both ways, but that doesn’t make the other statement somehow a non-sequitor.

                  • wonderingwanderer@sopuli.xyz
                    link
                    fedilink
                    English
                    arrow-up
                    1
                    arrow-down
                    1
                    ·
                    4 hours ago

                    If you’re not interested, then why are you still here saying the same thing over and over again?

                    It’s perfectly fine if someone wants to make a claim that “we should apply the same argument across both situations,” and then I would give my reasoning as to why different arguments apply. But that’s not what happened.

                    What happened was, I gave an argument applied to the situation being discussed. Someone else tried to apply my argument to a different situation, in order to argue against a point that I didn’t make. And ever since that point, this whole conversation has been going in circles in which you and that other commenter keep arguing as if I’m saying something that I never said, and I keep stating repeatedly that it’s not what I said.

                    And if you read back through this chain, I never said it. I even said I can understand the other point of view, and would probably even agree with it, if that’s the conversation we were having, and I said we could even have that conversation, but that the sudden change of topic as an attempt to “score points” against me is not a good faith argumentation style.

                    Is it a problem if commercial LLMs are trained on GPL code, and then used by closed-source developers to generate proprietary code which potentially contains open-source snippets? Yes, I’ve never denied that. But that’s not what this conversation has been about.

                    From the start, it’s been about open-source developers using LLMs to write open-source code, when those LLMs are potentially trained on closed-source code and may generate snippets closely resembling closed-source code.

                    Those are fundamentally different situations, and if you can’t see that then I can break it down for you in minute detail. But the point I made about the one thing was never meant to apply to the other; and arguing against the point I made as if it was meant to apply to a different situation is a bad faith argument.