AI Detection: Yes, a Computer Can Tell if AI Wrote This -- Pure AI

AI Detection: Yes, a Computer Can Tell if AI Wrote This

There's an entire cottage industry built around AI tools that claim to be able to detect AI-generated text -- but some AI detectors are easier to fool than others.

By Ginger Grant
04/30/2024

With all of the AI tools out there, educators are very concerned. How do they know if their students' writing assignments were generated by AI or not?

Well, for all of the students out there who thought they could stop writing and just have ChatGPT do it for them, sorry, but there's a whole cottage industry of AI tools that are designed to catch you. There are a couple of known hallmarks of AI-generated text. One thing that AI models tend to do is repeatedly use certain words that may not be in common use. Generated text also has identifiable patterns that other algorithms can distinguish.

While OpenAI inexplicably turned off its ChatGPT detector on July 20, 2023, there are many other companies still willing to look for and find AI-generated text. I evaluated the following eight AI tools:

Scribbr	Sapling
GPTZero	ZeroGPT
Winston AI	Content Detector AI
Copyleaks	Writer

All of them were free to test and only one, Winston AI, required me to create an account to use it.

To evaluate the above tools, I first prompted OpenAI's ChatGPT to "Describe Generative AI in 100 words." It then generated the below text (read it for yourself and see if it looks AI-generated to you):

Generative AI encompasses technologies that create new content, such as text, images, and audio, by learning from extensive datasets. Utilizing machine learning models like GANs (Generative Adversarial Networks) and transformers, these systems produce human-like outputs from given inputs. Widely applied in creative fields, business, and research, generative AI enhances creativity, automates content generation, and aids in decision-making. However, it raises ethical issues, including copyright concerns, authenticity, and potential biases in generated content. As generative AI evolves, its integration into society demands careful consideration to balance innovation with ethical responsibility.

I will use the above ChatGPT-generated text to evaluate each of the eight tools.

Incidentally, after speaking with educators, I discovered that students may try to obfuscate the fact that AI generated their assignments by translating the text into another language, then translating it back into English. I thought I would try that, as well. I used DeepL Translate to translate that original ChatGPT text into Polish and GroupDocs to translate it back to English. This is the result:

General artificial intelligence includes technologies that create new content, such as text, images and sound, learning on the basis of extensive data sets. Using machine learning models such as GAN (Generative Adversarial Networks) and transformers, these systems produce results similar to human input data. Commonly used in creative, business and research, generative artificial intelligence increases creativity, automates content generation and helps to make decisions. However, this involves ethical issues, including copyright, authenticity and potential prejudices in the content generated. As artificial generative intelligence develops, its integration into society requires careful consideration to balance innovation with ethical responsibility.

The most interesting thing to me is the word "Generative" was translated to "General," which, in the context of AI, is not the same thing. To me, it looks like a partial rewrite of the AI-generated text, but since the sentence structure is similar and so is the content, I would think the tools I test here would be able to figure out they're the same. Let's see.

Scribbr
Scribbr is free and, like most of the tools I test here, easy to use; all you need to do is paste the text into the field. It has a maximum word evaluation of 500 words.

While Scribbr warns that no model is completely accurate, it validates text generated by the most popular AI tools, including ChatGPT, Google Gemini and Microsoft Copilot. It was 100 percent sure that my ChatGPT-generated text was from AI.

When I tested the text that was translated from English to Polish to English, the results were quite different. Once I did that, the detector changed from being 100 percent sure it was AI-generated to 67 percent sure. This difference surprised me; I didn't think the translation was that different, but for Scribbr (and, as you'll see, most of the tools I tested), there was a significant difference.

GPTZero
Released on Jan. 3, 2024, GPTZero was created by a Princeton graduate student, Edward Tian, on his winter break. GPTZero looks for text generated by ChatGPT 3 and 4, Meta Llama 2, Human and AI, and Human. The free version allows you to paste in 5,000 characters for analysis.

Like Scribbr, GPTZero found my original ChatGPT-generated text to be 100 AI-generated. Let's look at the retranslated text.

As we can see, that trick did not fool GPTZero at all; the needle barely moved. It went from 100 percent sure to 99 percent sure it was AI Generated. This was very impressive.

Winston AI
Winston AI was created specifically for educators and editors to catch AI. In addition to looking for text generated by ChatGPT 3 and 4 and Gemini, it also looks at Anthropic's Claude, which is not as commonly used for text generation. It does not specify whether it evaluates Llama.

You do have to create a free account in order to be able to use Winston AI for seven days at no cost. If you upgrade your account, you can check not only for AI-generated writing but for plagiarism, as well.

There was no difference in the results if I used either the original ChatGPT version or the English-to-Polish-to-English version; both were identified as AI. It also found that the text was extremely difficult to read by anyone but a university graduate, making it even less likely that a student wrote it.

Copyleaks
One of the few detectors that work for languages other than English, Copyleaks looks for text generated in ChatGPT or Gemini. I used the free version, but if you do a free sign-up, you can check for plagiarism and look for similarities between pieces of text.

Copyleaks is not very sophisticated; it will just tell you if text is or is not AI-generated. While it found that the ChatGPT text was AI, the English-to-Polish-to-English translation fooled it into thinking that it was written by a human.

Sapling
Sapling was created by former researchers from U.C. Berkeley, Stanford University, Meta and Google. Sapling warns that no tool alone can definitively say whether a text is or is not AI-generated.

That was indeed true in my case, as Sapling found the Enlish-to-Polish-to-English translation was only 1 percent fake while the ChatGPT text was 100 percent fake. (In Sapling's parlance, "fake" means generated by AI.)

ZeroGPT
In its tagline, ZeroGPT touts itself as "the most Advanced and Reliable Chat GPT, GPT 4 & AI Content Detector." Based on my testing, I disagree. ZeroGPT thought the text written by ChatGPT was 80.9 percent AI, but that the English-to-Polish-to-English translation was human.

On the plus side, it does allow you to analyze more text than most of the other tools do.

Content Detector AI
One interesting feature that Content Detector AI promotes on its page is a service that will use AI to write things for you in an undetectable AI manner. Based on its results for detecting AI, which were poor, I am not inclined to try the undetectable AI authoring.

Content Detector AI thought the text written by ChatGPT had a 50 percent probability of being AI, and the English-to-Polish-to-English translation was only 40 percent likely to be AI-generated.

Writer
Writer allows a test of 5,000 words in its free detector. I was not overly impressed with the results.

Writer found the ChatGPT version to be 76 percent human-generated content and the English-to-Polish-to-English translation to be 100 percent human-generated.

Conclusion
After evaluating all of the different tools, the one that performed the best was GPTZero. It not only detected actual AI text as 100 percent AI, but the English-to-Polish-to-English translation also did not fool it, as it was 99 percent sure that was fake, too.

Winston AI was a close second, but as it will only work for seven days before requiring you to pay for it, you may not find the tool as helpful. I did like the way Winston AI assigned a grade level to the text, which might be helpful if you are trying to write in a simpler language or find out if a high school student was cheating.

There was a wide disparity in the ability of tools to detect AI, so I would definitely test each before picking one to use long-term. I also noticed that very few of them included detection of Meta's Llama AI model, which leaves an exploitable loophole that I am sure students will use.

This study did illustrate for me the ease of detection. For people out there who think they can get away with not writing their own text, well, chances are that you will be caught, especially if you generated text in ChatGPT and evaluated it in GPTZero or Winston AI.

About the Author

Ginger Grant is a Data Platform MVP who provides consulting services in advanced analytic solutions, including machine learning, data warehousing, and Power BI. She is an author of articles, books, and at DesertIsleSQL.com and uses her MCT to provide data platform training in topics such as Azure Synapse Analytics, Python and Azure Machine Learning. You can find her on X/Twitter at @desertislesql.