The treacherous words that can identify generative AI text

Enlarge / If your right hand starts typing ‘delve’, you might be an LLM.

Getty Images

Until now, even AI companies have struggled to come up with tools that can reliably detect when a piece of text has been generated using a large language model. Now, a group of researchers has developed a new method to estimate the use of LLM in a large set of scientific texts by measuring which “redundant words” appeared much more frequently during the LLM era (i.e. 2023 and 2024). The results “suggest that at least 10% of 2024 abstracts were processed with LLMs,” the researchers said. In a pre-print paper published earlier this month, four researchers from Germany’s University of Tübingen and Northwestern University said they were inspired by studies that measured the impact of the COVID-19 pandemic by looking at excess mortality compared with the recent past. Taking a similar look at “excessive word use” after LLM writing tools became widely available in late 2022, the researchers found that “the advent of LLMs led to an abrupt increase in the frequency of certain style words” that was “unprecedented in both quality and quantity’.

Deepening it

To measure these changes in vocabulary, the researchers analyzed 14 million paper abstracts published on PubMed between 2010 and 2024, tracking the relative frequency of each word as it appeared each year. They then compared the expected frequency of those words (based on the pre-2023 trendline) to the actual frequency of those words in summaries from 2023 and 2024, when LLMs were widely used.

The results found a number of words that were extremely uncommon in these scholarly abstracts before 2023, but that suddenly increased in popularity after LLMs were introduced. For example, the word “delves” appears in 25 times as many papers in 2024 as the pre-LLM trend would expect; words like “showcasing” and “underscores” were also used nine times more often. Other previously common words became notably more common in post-LLM abstracts: the frequency of “potential” increased by 4.1 percentage points; “findings” by 2.7 percentage points; and “crucial” by 2.6 percentage points, for example.

Enlarge / Some examples of words whose usage increased (or decreased) significantly after LLMs were introduced (bottom three words shown for comparison).

These kinds of changes in word usage can of course occur independently of LLM usage; the natural evolution of language means that words sometimes go in and out of fashion. However, the researchers found that such huge and sudden increases in the pre-LLM era were only observed for words associated with major global health events: ‘ebola’ in 2015; “zika” in 2017; and words such as ‘coronavirus’, ‘lockdown’ and ‘pandemic’ in the period 2020 to 2022.

In the post-LLM period, however, researchers found hundreds of words with sudden, pronounced increases in scientific usage that had no common link to world events. While the excess words during the COVID pandemic were overwhelmingly nouns, the researchers found that the words with a post-LLM frequency boost were overwhelmingly “style words” such as verbs, adjectives and adverbs (a small sample: “about, in addition, extensively , crucial, improving, exhibited, insights, especially, particularly, within”).

This is not an entirely new finding: for example, the increased prevalence of “delve” in scientific articles has been noted many times in the recent past. But previous studies generally relied on comparisons with “ground truth” human writing samples or lists of predefined LLM markers obtained outside the study. Here, the pre-2023 set of abstracts acts as its own effective control group to show how vocabulary choice has generally changed in the post-LLM era.

A complicated interplay

By highlighting hundreds of so-called “marker words” that became significantly more common in the post-LLM era, the insidious signs of LLM usage can sometimes be easily picked up. Take this exemplary abstract rule mentioned by the researchers, with the marker words highlighted: “A extensive grip of the complicated interaction between […] And […] is decisive for effective therapeutic strategies.”

After taking some statistical measures of the appearance of highlight words in individual articles, the researchers estimate that at least 10 percent of the post-2022 articles in the PubMed corpus were written with at least some LLM assistance. The number could be even higher, the researchers say, because their set of LLM-supported abstracts could be missing ones that don’t contain any of the highlight words they identified.

Enlarge / Before 2023, it would take a major global event like the coronavirus pandemic to trigger these kinds of leaps in word usage.

These measured percentages can also vary widely between different subsets of articles. The researchers found that articles written in countries such as China, South Korea and Taiwan contained LLM marker words 15 percent of the time, suggesting that “LLMs… could help non-natives edit English texts, increasing their extensive could justify use.” On the other hand, the researchers offer that native speakers of English ‘may [just] are better at noticing and actively removing unnatural words from LLM outcomes’, thus hiding their LLM use from these types of analyses.

Detecting LLM usage is important, the researchers note, because “LLMs are notorious for fabricating references, providing inaccurate summaries, and making false claims that sound authoritative and persuasive.” But as knowledge about LLMs’ signature marker words begins to spread, human editors may become better at removing those words from generated text before it’s shared with the world.

Who knows, perhaps future large language models will perform this kind of frequency analysis themselves, reducing the weight of marker words to better mask their output as human-like. It won’t be long before we need to call in a few Blade Runners to track down the generative AI text hiding in our midst.

Deepening it

A complicated interplay

Leave a Comment Cancel reply