Science "Trust the Science" - A Growing Problem - Usage of certain words in scientific papers have skyrocketed since ChatGPT

  • Want to keep track of this thread?
    Accounts can bookmark posts, watch threads for updates, and jump back to where you stopped reading.
    Create account
Word of the day: apophenia. Why? Because apophenia means a tendency to perceive connection or meaningful patterns between unrelated random things. While I do not personally believe that this video is an example of apophenia, I think every reader should be aware of what it means before reading this and carefully evaluate their own opinion on the subject.

A while ago I talked about the degradation of Internet quality as a direct result of generative AI, but today I have a much different approach which more adequately showcases what's happening right now in academia. I'm referring to medical science, journalistic publications, and in general the world of scholastic literature. See, it's important because we as human beings on planet Earth are consistently seeking new knowledge. Sure, modern generations seem to be looking for that knowledge on platforms like TikTok, which is brain dead, but even those of us who consider ourselves to be somewhat well informed - although there certainly is a degree of hubris in that statement alone, to be sure - even those of us who consider ourselves to be relatively well informed are consistently using appeals to authority.

For anyone who doesn't recognize that term, appeal to authority describes a very common logical fallacy in argumentation where, in order to support a given statement, someone will connect that statement to some sort of authoritative figure. The idea is that if an authority figure has made some sort of claim, the claim itself is more likely to be true because of who made it. While it's actually a bit more complicated than that, generally speaking, the truth here is that we often cite authoritative sources in an effort to understand the world or to prove ourselves correct.
Those authoritative sources come from a very wide background. Some consider journalists to be authoritative, some consider talk show hosts to be authoritative - scientists, politicians, law enforcement, etc. You get the point. However, the consistently cited highest authority is the world of science. That world is obviously broken down further into categories such as biological, physical, chemical, environmental, social, etc. But the world of scientific research and subsequent publication is very often considered the highest authority we have.

So where's the problem? Well, to properly explain that, I need to talk a little bit about generative AI and showcase a few patterns, because this new online world, so to speak, facilitated by ChatGPT and similar programs, is rapidly infecting every corner of our digital experience, even so far as to now be extensively involved in the world of academic research and publication behind the scenes. Let me explain.

First, ChatGPT became commercially active on November 30th, 2022. This makes 2023 the first complete fiscal year that ChatGPT was widely available, which is very important for us to remember. Second, over the course of its current lifespan, many people have begun noticing certain patterns. ChatGPT may produce lifelike imitations of written human consciousness, but no matter what you ask it to do, it consistently favors certain words and phrases. The process of directly speaking of how and why the program selectively chooses those words and phrases is extremely complicated, but a variety of sources ranging from research teams to individuals who frequently generate tons of outputs have broken down their list of the most common outputs from ChatGPT, which is where I decided to start looking.

According to Twixify, these right here are 124 of the most common words and phrases that ChatGPT chooses, starting off with "meticulous" and "meticulously". This is a list of 10 high-frequency outputs from an individual researcher, starting with "delve" and "dive". And this is an article from Ideapod showcasing top phrases to avoid for yourself as an individual if you don't want to sound like an AI, which has things like "let's delve in" or "let's uncover".

There's absolutely no shortage here in terms of people noticing patterns, from publications to Reddit threads to full-blown peer-reviewed research. What's more, these common words are very often similar between lists. This is the part where I urge everyone watching this to do their own individual testing. After browsing these papers for an extensive period of time and observing what words or phrases are the most commonly displayed throughout them, I put together a truncated list of words that appear frequently at a high level on most of these different lists. Words highlighted in green are some of the most common outputs by consensus, and words in yellow are also extremely common, appearing multiple times but maybe they aren't the number one or number two result on multiple of those lists and sometimes appear further down.

Obviously I could increase the size of this list pretty easily and add a lot more words and phrases to analyze, but for the sake of demonstration, this is probably more than sufficient for today. Now, having this list composed, we need to actually look at what's been happening. And that's where things get really strange.

I'll start with PubMed before moving over to a platform called OpenAlex, which is a lot easier to showcase what I'm talking about more directly. But here's the gist of it. Let's start with the very first word on my list: meticulous. Searching PubMed, which uses the National Library of Medicine containing 36 million citations for biomedical literature, we see that the word "meticulous" appears with slowly increasing frequency until 2023, where it suddenly doubles. Coincidentally, that is the first full year that ChatGPT was available commercially. And we also see that in 2024, it's on track to more than quadruple.
In 2021, there were roughly 1,200 papers using the term. In 2022, there were 1,100. But in 2023, that number suddenly jumps to over 2,000, on track now in the first quarter of 2024 to see nearly 5,000 instances by the end of the year. Keep in mind, "meticulous" is one of the most common outputs from ChatGPT, expressed by multiple studies, lists, and publications analyzing the subject, which means that the frequency of a ChatGPT favorite word is doubling, then quadrupling inside medical research literature during the two years directly after the program comes out.

Hopefully everyone can see what's happening here, because this requires the question to be asked: Is ChatGPT writing these papers? Is it translating these papers? Is it reformatting them? Is ChatGPT being used at all here, and if so, in what capacity? Obviously, that's just one singular example, so let's continue. "Delve". This is another commonly accepted "spam word" from ChatGPT, and also one that other users have searched before on PubMed, which effectively caused me to go down this road to begin with because the results are staggering and it's very extreme. So let's demonstrate.

"Delve", from 2020 up until 2022, was increasingly used, seemingly on pace with the general rise in publication frequency overall. But in 2023, the number of uses for this word just completely explodes, more than quadrupling in that one year from 627 to almost 3,000, and is currently on track to be used in more than 10,000 different published works during 2024. That rise is unbelievable, and once again requires the question: Is ChatGPT being used to write scientific research?

Let's keep going. "Seamlessly". Not nearly as shocking compared to the previous two, but a dramatic rise during 2023 usage beyond any single year jump in history. Next up, "realm". Once again, enormous spike in usage during 2023 alone, far surpassing any previous increase on a yearly basis by incredible margins. "Unwavering". A much less commonly used word in general, according to this search, but again, a dramatic spike in 2023, larger than any previous single year increase on record.

Next up, "additionally". This one is a bit more interesting, because it was basically growing exponentially up until 2021 already. But in 2023, a larger than expected increase based on historical patterns. I could keep going, which I will in the background, I guess, but this pattern is frequently occurring if you analyze words and phrases that are commonly associated with ChatGPT.
It's not universally true, mind you. There are plenty of words that don't showcase a surprising or unexpected increase in usage like this. But the number of times where it is true, and the scale to which we see it demonstrated, well, let's just say it makes me absolutely certain that ChatGPT is being heavily used in medical research papers.


Now, here's the thing. That's one way to showcase what I'm talking about, and it works, it definitely does. But there's another way. This is OpenAlex. It's named after the ancient Library of Alexandria, funnily enough, and contains over 250 million articles, dissertations, books, data sets, you name it. The interesting thing that OpenAlex lets us do is combine keyword searches one after the other, which paints an entirely new picture.

Let's begin by going one at a time. "Crucial", "harness", "unlock" - three of the words from our list of most common ChatGPT outputs. We start with "crucial", and there's not really any noticeable spike in usage during 2023. This is a much wider set of academic literature compared to just PubMed, so maybe that's to be expected. I don't know. But let's keep going. If we add "harness" now, we can see - well, wait a minute, there actually is an increase, a pretty dramatic spike, if we're being honest, in the usage of these two words simultaneously together during 2023. And if we then add "unlock", another hyper-common ChatGPT output, the results become very obvious. The usage of these three words in concert in the paper or the article or whatever it is more than doubles from 2022 onward, far outside normal historical patterns.

Let's try it again. "Seamlessly", "meticulous", "realm". Starting with "seamlessly" by itself, we have a pretty normal-looking distribution compared to other keywords that are not commonly associated with ChatGPT. Good sign, I guess. Add on "meticulous", and - wait a second - suddenly there's a huge spike. Now, I should be clear here, "meticulous" by itself does not have the same enormous discrepancy. So what we can see now, very clearly demonstrated, is that from 2022 to 2023, the usage of these two ChatGPT outputs in tandem has tripled. Add on one more, "realm" in this case, and what do we find? Yep, as expected, once again, the usage of these words has spiked by a completely disproportionate amount during 2023 alone, far beyond any normal pattern for each of them separately, and far beyond any historical increase before that.

I could keep going for quite a long time, but hopefully, it's becoming clear now: ChatGPT is being used in some capacity, widely, across academic research. It's being used in a capacity where, at the very least, it is writing or rewriting this academic research. And while some of that usage may be completely innocuous, the scale at which we see it demonstrated here, in my opinion, is extremely alarming.
Keep in mind, alongside all this, scientific papers have already been experiencing record growth in the number of retractions required during 2023, before we've even seen the effects that ChatGPT will have on the entire field. People are faking research, faking citations, and getting caught more than ever before, disproportionately higher than the natural growth rate of research itself.
But adding in a phenomenon where anyone with a computer can falsify these things and create fake articles, well, the AI-ification of the internet is happening faster and more invasively than almost anyone could have predicted.

Bottom line, simple keyword research indicates to me that science has a really, really serious problem growing inside it right now. Maybe a lot of this is just AI translation, or assisted research with an AI writer to try and eliminate mundane or boring tasks. But that needs to be disclosed. There needs to be very, extremely, unbelievably strict regulation on these things, at the very least mandating a detailed disclosure of what you used AI for, when you used it, and what precisely it did to that research. Or the credibility of all of your work goes down to absolute zero, and you cannot be taken seriously, and no one should listen to you.

The problem I see is that scientific research and academic publications are very often subjected to the same sort of corrupt profit motive as any other industry. The same way journalists will use AI, and are using AI now more and more, for cheap, easy, instant articles, scientists will use it for easy, hassle-free write-ups and citations. Hell, even lawyers are now using this to just fake citations in court cases.
If you can't see how badly this is all going downhill right now, I don't know what to tell you. In the end, this is just a demonstration of how far, and how fast I might add, the problems are spreading. But it remains to be seen what anyone will actually do to stop, regulate, clarify, or control it.

Originally from a youtube video by UpperEchelon and reformated into an article by Claude LLM.
 
Don't expect Peer Review to save us.
1712304026885.png
 

Attachments

Didn't expect to see Upper Echelon (formerly Gaming) on the Farms
He's one of the youtubers who actually became more self-aware and professional as he gained subs. Pretty much a reverse of the usual trend.

Very low clickbait to info ratio too.
 
Daily reminder that the whole industry of scientific journal publishing and modern peer review was started in 1959 by Pergamon Press, owned and ran by Robert Maxwell, notorious Mossad, KGB, and MI6 asset, embezzler, and father of Ghislaine Maxwell.

The whole industry is a grift, and has been from the very start. Maxwell started some 700 journals, all cross promoting each other to make each look more legit to trick lazy academics into thinking they were prestigious and worth publishing their papers.
 
Daily reminder that the whole industry of scientific journal publishing and modern peer review was started in 1959 by Pergamon Press, owned and ran by Robert Maxwell, notorious Mossad, KGB, and MI6 asset, embezzler, and father of Ghislaine Maxwell.

The whole industry is a grift, and has been from the very start. Maxwell started some 700 journals, all cross promoting each other to make each look more legit to trick lazy academics into thinking they were prestigious and worth publishing their papers.
This is further proliferated by grants. Researchers will shoe horn unrelated bullshit into their studies just to get some extra grant money. Some places will churn out as many studies as possible to get as much money that they can, even if the studies are dubious.
 
And its thanks to what they have done during the Wu-Flu years, trust in 'scientists' have dwindled a shitton. Especially when they are doing the exact same thing as ads in the 1900s.

"Smoke cigarettes! It's good for your health." (Lung Cancer)
"Want perfect teeth? This radioactive material will do it for you!" (Fucked up teeth and cancer)
"Thalidomide will make your pregnancy safe, and comfortable!" (Babies with missing limbs)

And next on the docket will be pro-WEF propaganda as you are seeing already.

"15 minute cities are great for human health!"
"Eating bugs is way better than eating meat!"
"Why own anything when you can own nothing?"

Scientists are people too. They can be paid off and even if that isn't the case, it doesn't cost much money to buy a labcoat and a public speaker to push propaganda. Even more effective when you happen to own the media. I welcome the age of the new skeptic because our shitty elites don't want us to observe reality.

HECU did nothing wrong.
 
every time i hear of someone using generative AI for anything serious, my faith in humanity dies a little bit

this technology is really great - for making things like "dagoth ur talking about redguard crime stats to joe rogan" or "jordan peterson giving a lecture on sneed's feed and seed"

but using this shit for serious business is just fucking retarded
 
Daily reminder that the whole industry of scientific journal publishing and modern peer review was started in 1959 by Pergamon Press, owned and ran by Robert Maxwell, notorious Mossad, KGB, and MI6 asset, embezzler, and father of Ghislaine Maxwell.

The whole industry is a grift, and has been from the very start. Maxwell started some 700 journals, all cross promoting each other to make each look more legit to trick lazy academics into thinking they were prestigious and worth publishing their papers.
Do you have any further material or sources I can refer to about this? I'm very interested in the scam that is modern academia, and all of the ways they trick people into thinking they have more legitimacy than they really do.
 
Didn't expect to see Upper Echelon (formerly Gaming) on the Farms
It's a shame he bent the knee to Mario Nawfal's lawsuit and redacted his whole series on him. I wish someone bigger like Coffeezilla would take him on.

Not that I'm calling UE a cuck. He was just overwhelmed in a SLAPP suit by a rich asswipe.

For the uninitiated: Mario Nawfal is some cryptobro cunt from Dubai who makes money off rugpulls and being one of MuskTwitter's favorite pool boys. He's hailed as some genius because he has a British accent and commented on FTX's downfall one time.
 
Last edited:
Part of the problem is that, as any of the people involved in publishing stuff will gladly tell you, these scientists are "encouraged" to constantly publish papers by their bosses and institutions. And by "encouraged" I mean they literally have quotas of X papers per semester or worse. This means that they have to literally keep making new papers out of thin air so that people who don't even share a field and who are in charge of their budgets can keep funding them and those mid-wits think more papers = more science.
 
every time i hear of someone using generative AI for anything serious, my faith in humanity dies a little bit

this technology is really great - for making things like "dagoth ur talking about redguard crime stats to joe rogan" or "jordan peterson giving a lecture on sneed's feed and seed"

but using this shit for serious business is just fucking retarded
Fuck you man, my AI girlfriend loves me! She said so herself! It's totally not an algorithm designed to cater to my sexual and emotional needs!!!!
 
every time i hear of someone using generative AI for anything serious, my faith in humanity dies a little bit

this technology is really great - for making things like "dagoth ur talking about redguard crime stats to joe rogan" or "jordan peterson giving a lecture on sneed's feed and seed"

but using this shit for serious business is just fucking retarded
Having to explain to (otherwise quite intelligent) people I know that, no, ChatGPT isn't a magic genie who can give you perfect answers to every question has really made me realize how susceptible everyone is to the Gell-Mann amnesia effect. The amount of hype and overuse of the term AI has left a swathe of the public totally off the mark when it comes to how these things work and what their limits are. Of course, it's pretty much all by design — people wanting money and influence making grandiose boasts, quoted by journalists with no more familiarity than Joe Schmoe, printing articles promising the future on a silver platter/the destruction of all we hold dear.

I think there can and will be uses in more academic/serious contexts, but it's not going to be done well by throwing current LLMs at a problem with no further training or modification and just hoping for the best. Seeing that article the other day about NYC using a chatbot instead of just having a single FAQ page that they update with received questions had me wondering who in charge of that department was skimming off the top in grift.
 
Do you have any further material or sources I can refer to about this? I'm very interested in the scam that is modern academia, and all of the ways they trick people into thinking they have more legitimacy than they really do.

The Guardian has published several opinion pieces on scientific journals being a racket.

Scientific publishing is a rip-off:
The model was pioneered by the notorious conman Robert Maxwell. He realised that, because scientists need to be informed about all significant developments in their field, every journal that publishes academic papers can establish a monopoly and charge outrageous fees for the transmission of knowledge. He called his discovery “a perpetual financing machine”. He also realised that he could capture other people’s labour and resources for nothing. Governments funded the research published by his company, Pergamon, while scientists wrote the articles, reviewed them and edited the journalsfor free. His business model relied on the enclosure of common and public resources. Or, to use the technical term, daylight robbery.

Is the staggeringly profitable business of scientific publishing bad for science?:
Maxwell insisted on grand titles – "International Journal of" was a favourite prefix. Peter Ashby, a former vice president at Pergamon, described this to me as a "PR trick", but it also reflected a deep understanding of how science, and society's attitude to science, had changed. Collaborating and getting your work seen on the international stage was becoming a new form of prestige for researchers, and in many cases Maxwell had the market cornered before anyone else realised it existed.
Despite the narrow audience, scientific publishing is a remarkably big business. With total global revenues of more than £19bn, it weighs in somewhere between the recording and the film industries in size, but it is far more profitable. In 2010, Elsevier’s scientific publishing arm reported profits of £724m on just over £2bn in revenue. It was a 36% margin – higher than Apple, Google, or Amazon posted that year.
A 2005 Deutsche Bank report referred to it as a “bizarre” “triple-pay” system, in which “the state funds most research, pays the salaries of most of those checking the quality of research, and then buys most of the published product”.

A paper about how the articles published in the highest ranked, most cited journals aren't actually any more reliable than the lower ranking journals and may actually be less reliable:
"Now it has become their task to find the ground-breaking among the too-good-to-be-true data, submitted by desperate scientists, who face unemployment and/or laboratory closure without the next high-profile publication."
"retractions cover only about 0.05% of the literature"

This guy back in 2012 goes over how the medical journals are financially incentivized to publish basically any study from a pharmaceutical company due to the huge profit (80%) they make on reprints:
The conflict of interest is clearly huge. If Elsevier had to maintain its profit margin by cutting costs rather than publishing that one article then it might mean firing 25 editors. I’ve written about this before in my article arguing rather painfully that “Medical journals are an extension of the marketing arm of pharmaceutical articles.”

There's a ton of other stuff out there if you go looking, but this should be a good area to start with. There's tons of places that have gone over various terrible articles that get published, peer reviewed (which is a whole other scam) and are worthless.
 
Back
Top Bottom