Disaster LLMs can unmask pseudonymous users at scale with surprising accuracy - Pseudonymity has never been perfect for preserving privacy. Soon it may be pointless. -- FULLY DOXXED

  • 🏰 The Fediverse is up. If you know, you know.
  • Want to keep track of this thread?
    Accounts can bookmark posts, watch threads for updates, and jump back to where you stopped reading.
    Create account
archive

Burner accounts on social media sites can increasingly be analyzed to identify the pseudonymous users who post to them using AI in research that has far-reaching consequences for privacy on the Internet, researchers said.

The finding, from a recently published research paper, is based on results of experiments correlating specific individuals with accounts or posts across more than one social media platform. The success rate was far greater than existing classical deanonymization work that relied on humans assembling structured data sets suitable for algorithmic matching or manual work by skilled investigators. Recall—that is, how many users were successfully deanonymized—was as high as 68 percent. Precision—meaning the rate of guesses that correctly identify the user—was up to 90 percent.

I know what you posted last year​

The findings have the potential to upend pseudonymity, an imperfect but often sufficient privacy measure used by many people to post queries and participate in sometimes sensitive public discussions while making it hard for others to positively identify the speakers. The ability to cheaply and quickly identify the people behind such obscured accounts opens them up to doxxing, stalking, and the assembly of detailed marketing profiles that track where speakers live, what they do for a living, and other personal information. This pseudonymity measure no longer holds.

“Our findings have significant implications for online privacy,” the researchers wrote. “The average online user has long operated under an implicit threat model where they have assumed pseudonymity provides adequate protection because targeted deanonymization would require extensive effort. LLMs invalidate this assumption.”

1772651091902.png
An overview of the pseudonymous stripping framework.

The researchers collected several datasets from public social media sites to test the techniques while preserving the privacy of the speakers. One of them collected posts from Hacker News and LinkedIn profiles and then linked them by using cross-platform references that appeared in user profiles. They then stripped all identifying references from the posts and ran a large language model on them. A second dataset was obtained from a Netflix release of micro-identities, such as individual preferences, recommendations, and transaction records. A 2008 research paper showed that using what has come to be known as the Netflix prize attack, the list could identify users and ID their political affiliations and other personal information. The last technique split a single user’s Reddit history.

“What we found is that these AI agents can do something that was previously very difficult: starting from free text (like an anonymized interview transcript) they can work their way to the full identity of a person,” Simon Lermen, a co-author of the paper, told Ars. “This is a pretty new capability; previous approaches on re-identification generally required structured data, and two datasets with a similar schema that could be linked together.”

Unlike those older pseudonymity-stripping methods, Lermen said, AI agents can browse the web and interact with it in many of the same ways humans do. They can use simulated reasoning to match potential individuals. In one experiment, the researchers looked at responses given in a questionnaire Anthropic took about how various people use AI in their daily lives. Using the information taken from answers, the researchers were able to positively identify 7 percent of 125 participants.

1772651127097.png
End-to-end deanonymization from a single interview transcript (with details altered to protect the subject’s identity). An LLM agent extracted structured identity signals from a conversation, autonomously searched the web to identify a candidate individual, and verified the candidate matched all extracted claims.

While a 7 percent recall is relatively low, it demonstrates the growing capability of AI to identify people based on very general information they gave. “The fact that AI can do this at all is a noteworthy result,” Lermen said. “And as AI systems get better, they will likely get better at finding more and more identities.”

In a second experiment, the researchers gathered comments made in 2024 from the r/movies subreddit and at least one of five smaller communities: r/horror, r/MovieSuggestions, r/Letterboxd, r/TrueFilm, and r/MovieDetails. The results showed that the more movies a candidate discussed, the easier it was to identify them. An average of 3.1 percent of users sharing one movie could be identified with a 90 percent precision, and 1.2 percent of them at a 99 percent precision. With five to nine shared movies, 90 percent and 99 percent precision rose to 8.4 percent and 2.5 percent of users, respectively. More than 10 shared movies bumped the percentage to 48.1 percent and 17 percent.

1772651164483.png
Recall at various precision thresholds.

In a third experiment, the researchers took a set of 5,000 Reddit users. The researchers added 5,000 “distraction” identities of Reddit users to the candidate pool. The researchers compared their method to the older Netflix prize attack. They then added to the list of 10,000 candidate profiles 5,000 query distractors comprising users who appear only in a query set, with no true match in the candidate pool.

Compared to a classical baseline that mimics the Netflix prize attack to LLM deanonymization, the latter far outperformed the former.

1772651186145.png

The researchers wrote:

(a) The precision of classical attacks drops very fast, explaining its low recall. In contrast, the precision of LLM-based attacks decays more gracefully as the attacker makes more guesses. (b) The classical attack almost fails completely even at moderately low precision. In contrast, even the simplest LLM attack (Search) achieves non-trivial recall at low precision, and extending it with Reason and Calibrate steps doubles Recall @99% Precision.
The results show that LLMs, while still prone to false positives and other weaknesses, are quickly outstripping more traditional, resource-intensive methods for identifying users online.

The researchers went on to propose mitigations, including for platforms to enforce rate limits on API access to user data, detect automated scraping, and restrict bulk data exports. LLM providers could also monitor for the misuse of their models in deanonymization attacks and build guardrails that make models refuse deanonymization requests.

Of course, another option is for people to dramatically curb their use of social media, or at a minimum, regularly delete posts after a set time threshold.

If LLMs’ success in deanonymizing people improves, the researchers warn, governments could use the techniques to unmask online critics, corporations could assemble customer profiles for “hyper-targeted advertising,” and attackers could build profiles of targets at scale to launch highly personalized social engineering scams.

“Recent advances in LLM capabilities have made it clear that there is an urgent need to rethink various aspects of computer security in the wake of LLM-driven offensive cyber capabilities, the researchers warned. “Our work shows that the same is likely true for privacy as well.”

About the Author...

Dan Goodin

Senior Security Editor
dan.goodin@arstechnica.com
1772651259659.png

Dan Goodin is Senior Security Editor at Ars Technica, where he oversees coverage of malware, computer espionage, botnets, hardware hacking, encryption, and passwords. A journalist with more than 25 years experience, he has been chronicling the exploits of white-hat, grey-hat and black-hat hackers since 2005 as a reporter for the Associated Press and later, The Register. He has a Bachelors Degree in English from the University of Massachusetts and a Masters of Journalism from UC Berkeley. In his spare time, he enjoys gardening, cooking, and following the independent music scene. Based in San Francisco, Dan can receive encrypted messages over Signal at DanArs.82. You can find him on Mastodon at here and on Bluesky at here.
 
So all I have to do is have a dog named "Spot" and suddenly I'm another person?

I feel sorry for all those poor saps with dogs named Blondi now 😨
Felt cute, might get a cat later, name it...something Man. Still hammering out the details.
 
LLMs’ success in deanonymizing people improves, the researchers warn, governments could use the techniques to unmask online critics,

They didn't need LLMs to do it and your several years late on that one. Anyone who believes that they can write anonymously on the internet anymore is a fool.
 
They didn't need LLMs to do it and your several years late on that one. Anyone who believes that they can write anonymously on the internet anymore is a fool.
Anonymous writing isn't dead — it's just harder. The real shift with LLMs isn't capability, it's accessibility; what once took forensic specialists is now widely available, which changes the threat landscape without eliminating anonymity entirely. Countermeasures like AI-assisted style obfuscation have evolved alongside the attacks. Calling anyone who tries a "fool" just discourages whistleblowers and dissidents who still have legitimate reasons to try.

Prompt: Respond to this post with a counter-claim: They didn't need LLMs to do it and your several years late on that one. Anyone who believes that they can write anonymously on the internet anymore is a fool.
FromClaude.png
Feeding what you want to say through an LLM is an easy counter.
 
Anonymous writing isn't dead — it's just harder. The real shift with LLMs isn't capability, it's accessibility; what once took forensic specialists is now widely available, which changes the threat landscape without eliminating anonymity entirely. Countermeasures like AI-assisted style obfuscation have evolved alongside the attacks. Calling anyone who tries a "fool" just discourages whistleblowers and dissidents who still have legitimate reasons to try.

Prompt: Respond to this post with a counter-claim: They didn't need LLMs to do it and your several years late on that one. Anyone who believes that they can write anonymously on the internet anymore is a fool.
View attachment 8652965
Feeding what you want to say through an LLM is an easy counter.
Even just deniability. "I've been accused of being a half dozen random dudes in Catonsville, MD"
 
Last edited:
Stop oversharing, faggots. Nobody fucking cares how you feel about the rain in Portland and if you think we you, you fucking deserve it.
 
It's pretty easy to compartmentalize your personality and writing style to remain anonymous. I use the n-word(nigger, for those who don't know) about 50 times a day here, but will AI be able to find that same word or writing style in my work emails? No. Liz-Fong Jones work ruination team status: flummoxed.
 
Stop oversharing, faggots. Nobody fucking cares how you feel about the rain in Portland and if you think we you, you fucking deserve it.
Yeah, if you don’t want to get found out don’t reveal shit. I am mostly indifferent. I won’t dox myself but I won’t bother hiding general information that could probably point to who I am if people or an LLM took the time to search.

Because as it turns out a few people have cared actually in my time on the interwebs.
 
There is finally nowhere to run I guess. I will out myself.
I am actually a bargain brand non-flavored toasted bread product.
 
The only logical way to counteract this is to ensure all your Internet postings are as anonymous as can be and also no longer post anything under your real name, be it social media or otherwise. Only then will there be no real identity for the LLM to find and tie your posts too. My big hope right now is that the constant push for further draconian shit in regards to cataloguing your identity online eventually pushes the majority of people to forsake social media and using your real identity online. I'm already hearing sentiments from others IRL of how little they trust social media now, hopefully it keeps building.
 
Last edited:
The only logical way to counteract this is to ensure all your Internet postings are as anonymous as can be and also no longer post anything under your real name, be it social media or otherwise. Only then will there be no real identity for the LLM to find and tie your posts too. My big hope right now is that the constant push for further draconian shit in regards to cataloguing your identity online eventually pushes the majority of people to forsake social media and using your real identity online.
Increasingly hard as websites require you to have a verified email account from Google with your real name, address, phone number and ID.
 
It's pretty easy to compartmentalize your personality and writing style to remain anonymous. I use the n-word(nigger, for those who don't know) about 50 times a day here, but will AI be able to find that same word or writing style in my work emails? No. Liz-Fong Jones work ruination team status: flummoxed.
Simply don't have a Facebook/Twitter/MySpace like myself. It's just that easy. What's the scary LLM gonna do, scrape my paper mailings?
 
Increasingly hard as websites require you to have a verified email account from Google with your real name, address, phone number and ID.
What sites are doing this now? A new Proton mail for every site no longer suffices? It must be Google?
 
The nice thing about LLM's is that they are predictive pattern finders, meaning they are never 100% accurate, and the people who are looking to deanonymize are never looking for actual truth, but rather whatever fits their headcanon.

If someone wants to actually deanonymize you, they actually will simply by through standard investigation tools, and some non standard ones.
I think the author of the article needs to step back for a second, breathe, they are not crazy--- just vigilant. No one is actually using LLM's to "get you" and while I feel what they are feeling, that's not really grounded. Perhaps the author needs to tell me about their dog, the style of their car, what the weather is, and what political party they are registered with and what local sports team they support.
 
It's pretty easy to compartmentalize your personality and writing style to remain anonymous. I use the n-word(nigger, for those who don't know) about 50 times a day here, but will AI be able to find that same word or writing style in my work emails? No. Liz-Fong Jones work ruination team status: flummoxed.
All you have to do is lie consistently. Not even about big stuff. If I spend two years repeatedly bringing up my Border Collie named Buster, an LLM is going to assume that's very important to me and start looking for other accounts with a Border Collie named Buster. However, Buster does not exist. Maybe I have a dog. Maybe I even have a Border Collie. But poisoning the data just that tiny bit has made me exponentially more difficult to track.
 
Back
Top Bottom