Is AI Changing How We Write? A Reddit Analysis Tutorial
If you've ever read something online and thought, "Was that written by an AI?", you're not alone. Since ChatGPT went mainstream, there's been an explosion of AI enhanced content, with people using it to generate their posts, proofread their drafts, and even beginning to sound like AI themselves after talking to it too much.
Recent research has attempted to gauge the prevalence of AI assisted writing in different media such as biomedical publications [1] and YouTube videos [2]. In this tutorial, we'll use our new Reddit analysis platform to do the same for Reddit.
We'll walk through a real investigation, step-by-step, showing how you can go from a simple question to a nuanced answer in minutes. Our goal isn't to find bots, but to track LLM-influenced language, the characteristic writing style of AI, whether it's used by bots or by humans who've started to adopt it.
Let's get started.
Finding the AI Fingerprint in Reddit's Data
Following the recent research [2], we begin by selecting five words known to be far more common in AI-generated text: "delve", "underscore", "comprehend", "meticulous", and "intricate". As ChatGPT was released in 2022, we'll use 2021 as a baseline, though we expect content from 2022 to be mostly unaffected as ChatGPT was released November 30th and adoption in December was negligible.
Our first step is to see how often these words appeared on Reddit over the years. Using the Time Series Studio, we can quickly create a chart of their raw annual counts.
GPT words total occurrences in Reddit
How to create this: In the Time Series Studio, click 'add analysis' select 'Specific Words in Reddit' and Configuration 'Yearly' enter your target words, set the time range (2015-2024). After the analysis is done click on 'Results' button and attach total_occurrences time series for each word. You may want to rename the series for clarity by clicking on their names on 'Series Shelf'
The chart shows interesting trends. The only word that hasn't grown from 2021 is "underscore", with "delve" and "intricate" seeing an expected jump from 2022 to 2023. But there's a catch: as we can see on Platform Overview, Reddit itself has grown significantly from 2021. More users and more comments mean the raw count of many words will go up even if they haven't become more popular. To account for this, we have to normalize our data.
A better metric would be a word's relative frequency or share, aka its proportion of all words used on the platform in a given year. This would tell us if a word has actually become more popular relative to others.
Share of GPT words in Reddit
How to create this: In the same workspace, add analysis 'All Words in Reddit', attach total_occurrences series. Then on the 'Series Shelf' below click '+Operations' and choose Ratio, create a new series using a word occurrences series and all reddit words series. Turn off visibility of redundant series (eye icon on the left). Repeat for all words
This normalized view is much clearer. The growth for "meticulous" and "comprehend" vanished, as their rise was an illusion of platform growth. But the spike for "delve" and "intricate" remains, confirming their growth is real and disproportionate.
To make the magnitude of growth more apparent, we can index the data. By setting 2021 as a baseline (value = 100), we can see how much each word grew relative to its own starting point.
Share of GPT words growth relative to 2021
How to create this: In the same workspace for each word share series create an indexed series by using 'Index to 100' operation, picking the series & the year.
The chart makes the platform-wide trend clear. The popularity of "delve" and "intricate" has grown more than 35% and 50% respectively since the end of 2021.
Drilling Down
We've seen a platform-wide trend, but Reddit isn't a monolith, it consists of thousands of subreddits. Where exactly did this growth happen?
To find out, we can use our Term Share Analysis tool to rank the subreddits by growth of our five target words between 2021 and 2024.
Top Subreddits by Increase in Target Words' Share, 2021 vs. 2024
Subreddit (total words) | Change (per 1M words) | Share [2024] (per 1M words) |
|---|---|---|
95,376,953 words | +161.8 | 166.7 |
1,824,689 words | +138.8 | 188.5 |
1,122,615 words | +131.9 | 162.1 |
658,151 words | +104.0 | 120.0 |
1,762,911 words | +89.0 | 120.3 |
589,730 words | +70.4 | 117.0 |
3,687,188 words | +63.1 | 70.0 |
1,275,276 words | +54.1 | 75.3 |
3,362,885 words | +52.4 | 95.2 |
12,453,230 words | +47.4 | 73.0 |
3,249,969 words | +46.7 | 153.2 |
10,792,556 words | +41.9 | 69.2 |
1,398,140 words | +37.0 | 54.4 |
9,809,607 words | +36.4 | 76.8 |
2,150,928 words | +30.5 | 80.0 |
9,260,855 words | +29.7 | 68.4 |
520,319 words | +27.7 | 75.0 |
3,650,473 words | +26.6 | 72.3 |
6,968,605 words | +23.2 | 24.8 |
7,008,167 words | +21.7 | 27.3 |
How to create this: In the Term Share Analysis tool, enter your target words, select 2024 as target year & 2021 as base year, set 'Min Total Words'=500000 in Advanced Options for both years
Clicking on a few of the subreddits to read public-facing descriptions makes it clear that the top is dominated by subreddits dedicated to tabletop games and roleplaying such as:r/lfgpremium, r/roll20LFG, and r/pbp. It makes a lot of sense, as LLMs are a perfect match for roleplaying and generating worldbuilding texts.
Investigating an Outlier: The Curious Case of r/wow
The top result, however, is a surprise: r/wow, the main community for the game World of Warcraft. Why would this specific gaming sub be the epicenter?
To investigate, we isolated r/wow in our Time Series Studio and plotted the occurrences of our five words just within it.
GPT word occurences in r/wow
How to create this: In the new workspace create 'Specific Words in Subreddit' yearly analysis, set dates, subreddit to wow and target words. After completion, add total_occurrences series for all words
While four out of five words are flat, "delve" dramatically rose from a few hundred mentions in 2023 to over 15,000 in 2024. A sudden change like this often points to a one-off external event. Let's plot a monthly chart to confirm this.
delve word in r/wow
How to create this: the same as the chart above, but pick 'Monthly' analysis version and dates accordingly, set only one word 'delve'
We see that the big spike actually happened in September 2024 with other months having much fewer occurrences. To find out what was really going on, we can use the Post Finder tool to look at the raw posts like this:
Do I really need to fly to delves?
I mean I get that the first time I should have to find the location of a delve, but if I'm just bored and I wanna knock out a delve, it'd be awfully nice if I could just queue into one like a follower dungeon…
How to create this: set the target year, month and subreddit, then configure a filter. In this case we want to see example posts that contain 'delve' in their main text, so we add rule 'text content' 'with substring' = 'delve' to filter
As we can deduce from posts like that, users weren't "delving into topics"; they were talking about something similar to dungeons called delves. Googling "wow delves 2024 september" explains the sudden flood of such discussions. In September 2024, a major WoW update "The War Within" introduced short-form instanced scenarios that you can run solo or in a group of 2-5 players called "delves".
This is an important lesson in data analysis: context is everything. Without drilling down into the source data and cross-checking our assumptions, we can easily misinterpret the results.
A Different Clue
The r/wow case shows that single words can sometimes be misleading. As an independent check on our findings, we can track the use of the em-dash (—), a well-known tic of many LLMs.
Let's examine a couple of larger roleplaying subreddits from our top results (r/DiscordRP, r/roleplaying) where we suspect genuine LLM influence, compared against our new control, r/wow.
Ratio of em-dash posts in subreddits
How to create this: In new workspace, for each subreddit create the 'Submission Metrics' analysis, set subreddit, start & end dates, add filter rule for 'text content' and 'with substring' set to em-dash (—). After completion attach submissions_count. Use 'Subreddit Metrics' analysis, selecting all three subreddits & 'Total Submissions' metric to get total submissions time series for all three of them. Use Ratio operation for each of the subreddits
The results support our theory of AI-assisted writing in roleplaying subreddits. They start the year with around 5% of posts containing em-dashes, and end it with a stunning 10%! That's more than 10x our control r/wow at year's end, which has less than 1% of such posts.
Your Turn to Explore
In this tutorial, we went from a broad question about online language to a nuanced, community-specific answer. We found clear evidence that "LLM assisted writing" is on the rise on Reddit, saw where it's most prevalent, and learned how to spot and correct for false positives.
More importantly, we've demonstrated a repeatable research workflow. By combining theTime Series Studio for trends, the Term Share Analysis for hotspots, and the Post Finder for context, you can answer your own research questions with confidence.
We've shown you the method, now it's your turn. Go forth and explore!
