PDF.

We show that large language models can be used to perform at-scale deanonymization. With full Internet access, our agent can re-identify Hacker News users and Anthropic Interviewer participants at high precision, given pseudonymous online profiles and conversations alone, matching what would take hours for a dedicated human investigator. We then design attacks for the closed-world setting. Given two databases of pseudonymous individuals, each containing unstructured text written by or about that individual, we implement a scalable attack pipeline that uses LLMs to: (1) extract identity-relevant features, (2) search for candidate matches via semantic embeddings, and (3) reason over top candidates to verify matches and reduce false positives. Compared to prior deanonymization work (e.g., on the Netflix prize) that required structured data or manual feature engineering, our approach works directly on raw user content across arbitrary platforms. We construct three datasets with known ground-truth data to evaluate our attacks. The first links Hacker News to LinkedIn profiles, using cross-platform references that appear in the profiles. Our second dataset matches users across Reddit movie discussion communities; and the third splits a single user’s Reddit history in time to create two pseudonymous profiles to be matched. In each setting, LLM-based methods substantially outperform classical baselines, achieving up to 68% recall at 90% precision compared to near 0% for the best non-LLM method. Our results show that the practical obscurity protecting pseudonymous users online no longer holds and that threat models for online privacy need to be reconsidered.

  • Silver Needle@lemmy.ca
    link
    fedilink
    English
    arrow-up
    1
    ·
    edit-2
    3 hours ago

    I am saying that the internet is as an international object antithetical to nations as its control panel sits not in one nation but all and that nations therefore seek to nerf it, only for it to return stronger and even more difficult to regulate as more and more people adapt to internationalized organizational patterns. As a corollary, there is a real cultural unification happening across borders as a secondary effect. I’ve read people terming it a “discordization” because people are starting to talk the way people talk in Discord chatrooms.

    Yes, so you do have to restrict access and notably deanonymize users. California is trying to force OSes to implement age checking, which is of course a way to unmask people online. Protectionism cannot merely be understood as a set of possible tax policies, it is exactly the regaining of nation-centralized control in any sphere of life. States do not want people to be able to choose who to hang out with if the pool is the entire world, states do not have an interest in letting subjects learn about reality beyond a certain threshold where the scope of a person’s understanding exceeds the boundaries of countries.

    What I am getting at exactly is the social structure that humans find themselves in. When relations/hierarchies are on the brink of flattening, that is everyone is linked to the next in a symmetrical fashion, like in a family or within small communities 5000 years ago, states, companies and even small businesses will feel compelled to work in such a way that preserves their asymmetrical stance in society. As it happens the internet is extremely good at producing flat social structures, anonymity, reach, openness and near-infinite scalability make it possible. You may be able to neutralize one netizen or manipulate one online community, by the time that has happened five hundred heads of the hydra have regrown. Cost and expenses don’t work out.