Source: The Guardian
As AI cheating booms, so does the industry detecting it: âWe couldnât keep up with demandâ
ChatGPT is creating headaches for schools while giving rise to a growing cohort of companies that say they can âtellâ human from machine.
Since its release last November, ChatGPT has shaken the education world. The chatbot and other sophisticated AI tools are reportedly being used everywhere from college essays to high school art projects. A recent survey of 1,000 students at four-year universities by Intelligent.com found that 30% of college students have reported using ChatGPT on written assignments.
This is a problem for schools, educators and students â but a boon for a small but growing cohort of companies in the AI-detection business. Players like Winston AI, Content at Scale and Turnitin are billing for their ability to detect AI-involvement in student work, offering subscription services where teachers can run their studentsâ work through a web dashboard and receive a probability score that grades how âhumanâ or âAIâ the text is.
At this stage, most clients are teachers acting on their own initiative, although Winston AI says it is beginning talks with school administrators at the district level as the problem grows. And with only one full academic semester since ChatGPT was released, the disruption and headaches are only beginning.
Methods for detecting AI-generated content typically involve the search for a âtellâ â a feature that distinguishes an AI author from a human one. According to MIT Technology Reviewâs guide, in AI content âthe word âtheâ can occur too many timesâ. The text can also have a sort of anti-style indicating a lack of human flair. The presence of typos is often a dead giveaway for a human mind â LLMs (large language models like ChatGPT) have Scripps Spelling Bee-winning skills. Visual generative AI has its own teething issues; mistakes like a hand with too many fingers are common.
AI relies on patterns and phrases in its training data â just like the problem of overusing the word âtheâ, sometimes it can rely on these patterns too much.
John Renaud, the co-founder of Winston AI, says two of the most notable tells theyâre looking for are âperplexityâ and âburstinessâ. âPerplexityâ refers to the sophistication of language patterns that appear within a text sample (is this a pattern that exists in the training data, or is it intricate enough to seem novel?), while âburstinessâ refers to âwhen a text features a cluster of words and phrases that are repeated within a short span of timeâ.
Renaud says the company saw a surge of interest in the wake of ChatGPT: âIt all happened within a week or two â suddenly we couldnât keep up with demand.â And itâs not just academia: school essays are the most commonly scanned content but the second âwould be publishers scanning their journalistsâ/copywritersâ work before publishingâ.
The company claims to be one of the more accurate detectors around, boasting a 99.6% accuracy rate. Even though he was âvery worriedâ about ChatGPTs initial breakout, Renaud has since become more sanguine.
âWith predictive AI, weâll always be able to build a model to predict it,â he says. In other words, the current generation of autocomplete-on-steroids algorithms will always be deterministic enough to have tells.
Annie Chechitelli, Turnitinâs chief product officer, also thinks AI fears are overblown, publishing a letter recently to the Chronicle of Higher Education titled âNot True That ChatGPT Canât Be Accurately Detectedâ and pushing back on claims weâve gone through the generated-content looking-glass.
âWe think there will always be a tell,â she says over Zoom. âAnd weâre seeing other methods to unmask it. We have cases now where teachers want students to do something in person to establish a baseline. And keep in mind that we have 25 years of student data to train our model on.â
And like Renaud at Winston AI, Chechitelli is seeing an explosion of interest in her services and AI detection in general. âA survey is conducted every year of teachersâ top instructional challenges. In 2022 âpreventing student cheatingâ was 10th,â she says. âNow itâs number one.â
Altogether, the state of the industry gives the impression of an arms race between AI generators and AI detectors lasting years, each trading supremacy as the technological tit-for-tat plays out. While some believe humans will remain one step ahead, others are more bearish about the potential for these tools to eventually avoid our detection. Irene Solaiman, policy director at AI startup Hugging Face, recently wrote in the MIT Technology Review: âThe bigger and more powerful the model, the harder it is to build AI models to detect what text is written by a human and what isnât.â
One larger solution thatâs being proposed is âwatermarksâ. The idea is that models such as ChatGPT could be made to structure sentences in ways that identify that the content is AI generated, deliberately inserting the âtellsâ that detection software is already looking for.
But both Chechitelli and Renaud agree that the idea has flaws, especially if it is not universally adopted. If there were an alternative, âeveryone is just gonna flock to the one without the watermark,â Renaud says. Why would someone use an algorithm that tattled on them, versus one that just quietly produced convincing content?
The era of the human-authored web is ending, and no one is entirely sure what comes next. Whether AI content becomes indistinguishable or the human touch proves impossible to replicate, one thing is certain â there will be power for those who can tell the difference.