• Fmstrat@lemmy.world
    link
    fedilink
    English
    arrow-up
    1
    ·
    16 minutes ago

    This seems like an invalid test.

    One of them collected posts from Hacker News and LinkedIn profiles and then linked them by using cross-platform references that appeared in user profiles. They then stripped all identifying references from the posts and ran a large language model on them.

    If I post something on LinkedIn, and then post the same thing on Hacker News, of course an LLM could match my accounts up.

    Am I missing something?

  • FlashMobOfOne@lemmy.world
    link
    fedilink
    English
    arrow-up
    33
    arrow-down
    1
    ·
    1 day ago

    And it will falsely identify people at even greater scale, because it is an imprecise and buggy tool.

  • ShotDonkey@lemmy.world
    link
    fedilink
    English
    arrow-up
    7
    ·
    23 hours ago

    The results, especially the high numbers stated in the news article (68% recall, 90% accuracy) are overestimated as their verification method (i.e., whether the LLM detected really the right account) come from matching veryfied accounts with a test set of anonymous accounts of which they knew the real name. They knew the real name bcs the persons had a public link to their LinkedIn in their “anonymous” profile (which was removed for the sake of testing wheter the LLm can match the two acfounts. That being said: a user who uses a pseudonym but links his/her account publically to a, say, LinkedIn account doesn’t really care about anonymity and might hand out many more ‘breadcrumbs’ to follow than a truly anonymous account.

    But I still think that also in the case of a fully anonymous account, people can be fingerprinted and matched with non-anonymous identities due to language, style etc. by a LLM.

    • GamingChairModel@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      ·
      4 hours ago

      Reminds me of an AI tool that could identify authorship of articles with surprisingly high accuracy, and then they peeked under the hood and realized it was just looking for the author byline at the top of the article that says “By John Doe,” where it completely failed if the article didn’t explicitly say who the author was.

  • ComradePenguin@lemmy.ml
    link
    fedilink
    English
    arrow-up
    8
    ·
    1 day ago

    Is this the first step towards using local LLMs for anonymity? 🫠 Always rephrasing each sentence somewhat. Truly dystopian stuff

  • ne0phyte@feddit.org
    link
    fedilink
    English
    arrow-up
    28
    ·
    1 day ago

    I am so grateful for already having been paranoid about sharing anything identifying about me starting 15+ years ago.

    I never uploaded a picture of myself. Never used my real name anywhere. I used different nicks for different branches of the Internet. A plethora of different email addresses etc.

    People thought I was being overly careful and I probably missed a lot of things due to not using Whatsapp, Facebook, Instagram, Twitter, Snapchat but I can’t say I regretted it at any point.

    • TankovayaDiviziya@lemmy.world
      link
      fedilink
      English
      arrow-up
      4
      ·
      24 hours ago

      Doing those is not unreasonable, but not even having a bank account is way too far. I know of someone, who was later diagnosed with autism and doesn’t have a job due to condition, initially didn’t want a bank account for fear of online snooping.

      Minimising digital footprint is perfectly fine, but trying to be off the grid and yet wants to participate in society and still engage in consumption is unreasonable. And this thinking isn’t just on one person, I saw many users in Reddit privacy stressing themselves out in trying to completely wipe off their digital footprints. Unless you participate in political activities, or really just wants to live completely isolated in a forest, being off the grid is totally unreasonable.

    • Scrollone@feddit.it
      link
      fedilink
      English
      arrow-up
      17
      ·
      1 day ago

      It’s not enough. You should use a different writing style for each website you write on.

  • jballs@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    135
    ·
    2 days ago

    As a registered Republican woman from Texas with five children and two dogs, let me just say that I am astonished!

    • pivot_root@lemmy.world
      link
      fedilink
      English
      arrow-up
      49
      ·
      2 days ago

      Me too. I thought I was safe as a Ottoman Empire expatriate living in Arrakis! I don’t want LLMs to connect this account to my pseudonymous mommy blog where I write about my three children who might exist but could be delusions of my untreated schizophrenia.

      • CheesyFingers@piefed.social
        link
        fedilink
        English
        arrow-up
        18
        ·
        2 days ago

        It seems that i, the original Unidan, will unfortunately need to create even more alts to escape being found out. Blast!

      • potoooooooo ✅️@lemmy.world
        link
        fedilink
        English
        arrow-up
        9
        ·
        2 days ago

        Oh, WE EXIST, mommy! Let me assure you, as one of said imaginary schizophrenia babies. Currently shacking up in Miami with my new wife I just met cranking my hog at Sturgis.

      • Bigfishbest@lemmy.world
        link
        fedilink
        English
        arrow-up
        8
        ·
        2 days ago

        I don’t believe this! As a fumgrian living as a would be dead camoose off Mt. Kabul, I am overjizzed that AI is reading all my pornhub comments.

    • whaleross@lemmy.world
      link
      fedilink
      English
      arrow-up
      9
      ·
      edit-2
      1 day ago

      As true as my name is Brenda and my last name is also Brenda. And so is my husband, Brenda. It is a hot day in Texas America today, I’m going to grill one of our dogs for dinner. It is a hot day republican tradition to grill a dog. Hence the name Hot Dogs and the playful name Wieners, named after wiener dogs. Oh lordy bless you heart yeehaa.

  • FauxPseudo @lemmy.world
    link
    fedilink
    English
    arrow-up
    89
    arrow-down
    1
    ·
    2 days ago

    From a Facebook post I made on February 17th:

    There are giant AI data firms that promise they can go through massive troves of data and pull out general and specific information from them. Information that is actionable and accurate. Give it 6 million data points and it’ll find all the links and organize them for you and unmask hidden details that aren’t visible to the naked eye.

    Not one of those companies is stepping up to go through the publicly released Epstein files.

    • Spaniard@lemmy.world
      link
      fedilink
      English
      arrow-up
      4
      ·
      edit-2
      1 day ago

      Today I asked AI to tell me which phone providers were available short by price and offers and it lied all the time, when I pointed it the AI corrected most of it but also removed some that were accurate for some reason.

      It would have been quicker if I did that myself instead of ask AI, oh also didn’t provide all companies.

      Maybe those companies have better AI that can make no mistakes but I doubt it, I think the LLMs will lie and no one has time to check if they are correct.

        • Spaniard@lemmy.world
          link
          fedilink
          English
          arrow-up
          2
          ·
          edit-2
          1 day ago

          How come it ended up giving me the right answer albeit removing some previous right answers then? (removed a few companies for some reason)

          Anyway that was a small and easy to check misinformation but if they have over 3 decades of online informational about me noway a person is going to confirm the LLM didn’t bullshit it’s way to an answer to satisfy the human.

          • madmantis24@lemmy.wtf
            link
            fedilink
            English
            arrow-up
            3
            ·
            1 day ago

            These models aren’t going to produce accurate information about the people they investigate, and it won’t even matter if it’s accurate. What “matters” is that their reports will add new layers of the facade of legitimacy to whatever story the authorities using them want to construct

    • Randomgal@lemmy.ca
      link
      fedilink
      English
      arrow-up
      30
      ·
      2 days ago

      This is what I find crazy. Where are the AI bros chewing through the Epstein files?

      • osaerisxero@kbin.melroy.org
        link
        fedilink
        arrow-up
        21
        ·
        2 days ago

        I would be shocked if someone hasn’t shoved them into a local model somewhere, but all the big ones would filter them to death with content restrictions

        • General_Effort@lemmy.world
          link
          fedilink
          English
          arrow-up
          2
          ·
          1 day ago

          I don’t think you can do literally the same thing on the Epstein files. Maybe I’m misunderstanding what you have in mind.

          • FauxPseudo @lemmy.world
            link
            fedilink
            English
            arrow-up
            1
            ·
            1 day ago

            In theory, using the information and the released files and the information the public sources, it should be possible to figure out who those redacted names are based on writing style and other factors. We should be able to deanonymize.

            • General_Effort@lemmy.world
              link
              fedilink
              English
              arrow-up
              1
              ·
              22 hours ago

              Hmm. Maybe but it is not the same problem as those discussed in OP. I also have some doubts about the paper, but that’s another story. You could try it out?

              • FauxPseudo @lemmy.world
                link
                fedilink
                English
                arrow-up
                1
                ·
                19 hours ago

                I’m not qualified to design the prompts and home users can’t really pile in 3 million+ documents.

                • General_Effort@lemmy.world
                  link
                  fedilink
                  English
                  arrow-up
                  1
                  arrow-down
                  1
                  ·
                  12 hours ago

                  Prompts are in the appendix: https://arxiv.org/abs/2602.16800

                  I don’t know how far you get on the free tier but it should be at least enough for a proof of principle; to get other people to chip in. You didn’t have qualms demanding other people should do this for free.

                  Mind that this is a serious GDPR violation in Europe. So there will be serious pressure on AI companies to prevent this kind of use.

    • Mubelotix@jlai.lu
      link
      fedilink
      English
      arrow-up
      2
      arrow-down
      1
      ·
      2 days ago

      We wouldn’t want that tbh. Justice needs to be precise and backed up by tangible facts

      • KeenFlame@feddit.nu
        link
        fedilink
        English
        arrow-up
        4
        ·
        2 days ago

        Also don’t use dna tests or chemical analysis. It’s invisible hocus pocus and can be wrong! And woe if someone that fucks and tortures kids regularly is wrongly accused of raping kids and running their child minds no that would be awful

      • FauxPseudo @lemmy.world
        link
        fedilink
        English
        arrow-up
        3
        ·
        2 days ago

        You can use the results of the AI analysis to identify people and then use that to do a proper investigation. Right now none of that is happening. No speculation. No tangibles. No investigation. No indictment.

        Trying to unmask people is a step in the right direction.

  • tal@lemmy.today
    link
    fedilink
    English
    arrow-up
    39
    ·
    2 days ago

    Of course, another option is for people to dramatically curb their use of social media, or at a minimum, regularly delete posts after a set time threshold.

    Deletion won’t deal with someone seriously-interested in harvesting stuff, because they can log it as it becomes available. And curbing use isn’t ideal.

    I mentioned before the possibility of poisoning data, like, sporadically adding some incorrect information about oneself into one’s comments. Ideally something that doesn’t impact the meaning of the comments, but would cause a computer to associate one with someone else.

    There are some other issues. My guess is that it’s probably possible to fingerprint someone to a substantial degree by the phrasing that they use. One mole in the counterintelligence portion of the FBI, Robert Hanssen, was found because on two occasions he used the unusual phrase “the purple-pissing Japanese”.

    FBI investigators later made progress during an operation where they paid disaffected Russian intelligence officers to deliver information on moles. They paid $7 million to KGB agent Aleksander Shcherbakov[48] who had access to a file on “B”. While it did not contain Hanssen’s name, among the information was an audiotape of a July 21, 1986, conversation between “B” and KGB agent Aleksander Fefelov.[49] FBI agent Michael Waguespack recognized the voice in the tape, but could not remember who it was from. Rifling through the rest of the files, they found notes of the mole using a quote from George S. Patton’s speech to the Third Army about “the purple-pissing Japanese”.[50] FBI analyst Bob King remembered Hanssen using that same quote. Waguespack listened to the tape again and recognized the voice as Hanssen’s. With the mole finally identified, locations, dates, and cases were matched with Hanssen’s activities during the period. Two fingerprints collected from a trash bag in the file were analyzed and proved to be Hanssen’s.[51][52][53]

    That might be defeated by passing text through something like an LLM to rewrite it. So, for example, to take a snippet of my above comment:

    Respond with the following text rephrased sentence by sentence, concisely written as a British computer scientist might write it:

    Deletion won’t deal with someone seriously-interested in harvesting stuff, because they can log it as it becomes available. And curbing use isn’t ideal.

    I mentioned before the possibility of poisoning data, like, sporadically adding some incorrect information about oneself into one’s comments. Ideally something that doesn’t impact the meaning of the comments, but would cause a computer to associate one with someone else.

    I get:

    The deletion of data alone will not prevent a determined party from gathering information, as they may simply record the information as it becomes available prior to its deletion. Moreover, restricting usage is not an ideal solution to the problem at hand.

    I previously mentioned the possibility of introducing deliberate errors or misinformation into one’s own data, such as periodically inserting inaccurate details about oneself within comments. The goal would be to include information that does not significantly alter the meaning of the comment, but which would cause automated systems to incorrectly associate that individual with another person.

    That might work. One would have to check the comment to make sure that it doesn’t mangle the thing to the point that it is incorrect, but it might defeat profiling based on phrasing peculiarities of a given person, especially if many users used a similar “profile” for comment re-writing.

    A second problem is that one’s interests are probably something of a fingerprint. It might be possible to use separate accounts related to separate interests — for example, instead of having one account, having an account per community or similar. That does undermine the ability to use reputation generated elsewhere (“Oh, user X has been providing helpful information for five years over in community X, so they’re likely to also be doing so in community Y”), which kind of degrades online communities, but it’s better than just dropping pseudonymity and going 4chan-style fully anonymous and completely losing reputation.

    • GamingChairModel@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      ·
      2 hours ago

      It might be possible to use separate accounts related to separate interests

      That’s what people should do. And the natural consequence is that there is code switching, where people subtly use different jargon and references and writing style when talking to different audiences.

      Nobody is gonna correlate my shitposts or joke comments to my work email, because the way I write in a professional environment is totally different from the way I write with my friends and family, or in casual contexts organized around different interests. Even between different friends, family, or colleagues, I have a sense of my audience, and my tone/style differs significantly for different people.

      So at that point, if I have a Linux/technology account and a separate account for the sports I like and a separate account for the local things happening in my city, who’s going to be able to link them by their very different textual styles?

    • HyperfocusSurfer@lemmy.dbzer0.com
      link
      fedilink
      English
      arrow-up
      4
      ·
      edit-2
      2 days ago

      Regarding the last point: it’s more of a bias, tho, so reducing it may even be a good thing. E.g. asking Kent Overstreet’s opinion on your bcachefs setup is probably useful, while getting relationship advice from him is ill-advised.

      • regenwetter@piefed.social
        link
        fedilink
        English
        arrow-up
        3
        ·
        2 days ago

        Advice being right or wrong isn’t necessarily the big issue for online communities (unless most other users are also wrong). What really degrades them is users acting like assholes, and someone who acts like that in a tech community is fairly likely to also do that in a political or relationship community.

    • zerofk@lemmy.zip
      link
      fedilink
      English
      arrow-up
      1
      ·
      2 days ago

      Your above average use of the word “one” and variations like “one’s” could be quite telling.

      As could my correction of “it’s” in the above sentence.