August 14, 2024

Statistical analysis can detect when ChatGPT is used to cheat on multiple-choice chemistry exams

by McKenzie Harris, Florida State University

As the use of generative artificial intelligence continues to extend into all reaches of education, much of the concern related to its impact on cheating has focused on essays, essay exam questions and other narrative assignments. Use of AI tools such as ChatGPT to cheat on multiple-choice exams has largely gone ignored.

A Florida State University chemist is half of a research partnership whose latest work is changing what we know about this type of cheating, and their findings have revealed how the use of ChatGPT to cheat on general chemistry multiple-choice exams can be detected through specific statistical methods. The work was published in Journal of Chemical Education.

"While many educators and researchers try to detect AI assisted cheating in essays and open-ended responses, such as Turnitin AI detection, as far as we know, this is the first time anyone has proposed detecting its use on multiple-choice exams," said Ken Hanson, an associate professor in the FSU Department of Chemistry and Biochemistry. "By evaluating differences in performances between student- and ChatGPT-based multiple-choice chemistry exams, we were able to identify ChatGPT instances across all exams with a false positive rate of almost zero."

Researchers collected previous FSU student responses from five semesters worth of exams, input nearly 1,000 questions into ChatGPT and compared the outcomes. Average score and raw statistics were not enough to identify ChatGPT-like behavior because there are certain questions that ChatGPT always answered correctly or always answered incorrectly resulting in an overall score that was indistinguishable from students.

"That's the thing about ChatGPT—it can generate content, but it doesn't necessarily generate correct content," Hanson said. "It's simply an answer generator. It's trying to look like it knows the answer, and to someone who doesn't understand the material, it probably does look like a correct answer."

By using fit statistics, researchers fixed the ability parameters and refit the outcomes, finding ChatGPT's response pattern was clearly different from that of the students.

On exams, high-performing students frequently answer difficult and easy questions correctly, while average students tend to answer some difficult questions and most easy questions correctly. Low-performing students typically only answer easy questions correctly. But on repeated attempts by ChatGPT to complete an exam, the AI tool sometimes answered every easier question incorrectly and every hard question correctly. Hanson and Sorenson used these behavior differences to detect the use of ChatGPT with almost 100-percent accuracy.

The duo's strategy of employing a technique known as Rasch modeling and fit statistics can be readily applied to any and all generative AI chat bots, which will exhibit their own unique patterns to help educators identify the use of these chat bots in completing multiple-choice exams.

The research is the latest publication in a seven-year collaboration between Hanson and machine learning engineer Ben Sorenson.

Hanson and Sorenson, who first met in third grade, both attended St. Cloud State University in Minnesota for their undergraduate degrees and stayed in touch after moving into their careers. As a faculty member at FSU, Hanson became curious about measuring how much knowledge his students retained from lectures, courses and lab work.

"This was a conversation that I brought to Ben, who's great with statistics, computer science and data processing," said Hanson, who is part of a group of FSU faculty working to improve student success in gateway STEM courses such as general chemistry and college algebra. "He said we could use statistical tools to understand if my exams are good, and in 2017, we started analyzing exams."

The core of this Rasch model is that a student's probability of getting any test question correct is a function of two things: how difficult the question is and the student's ability to answer the question. In this case, a student's ability refers to how much knowledge they have and how many of the necessary components are needed to answer the question they have. Viewing the outcomes of an exam in this way provides powerful insights, researchers said.

"The collaboration between Ken and I, though remote, has been a really seamless, smooth process," Sorenson said. "Our work is a great way to provide supporting evidence when educators might already suspect that cheating may be happening. What we didn't expect was that the patterns of artificial intelligence would be so easy to identify."

More information: Benjamin Sorenson et al, Identifying Generative Artificial Intelligence Chatbot Use on Multiple-Choice, General Chemistry Exams Using Rasch Analysis, Journal of Chemical Education (2024). DOI: 10.1021/acs.jchemed.4c00165

Journal information: Journal of Chemical Education

Provided by Florida State University

Citation: Statistical analysis can detect when ChatGPT is used to cheat on multiple-choice chemistry exams (2024, August 14) retrieved 14 August 2024 from https://phys.org/news/2024-08-statistical-analysis-chatgpt-multiple-choice.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Study finds AI language model failed to produce appropriate questions, answers for medical school exam

0 shares

Feedback to editors

Statistical analysis can detect when ChatGPT is used to cheat on multiple-choice chemistry exams

Tropical Atlantic mixing rewrites climate pattern rules

Protons can tune synaptic signaling by changing the shape of a protein receptor

Scientists create material that can take the temperature of nanoscale objects

Findings challenge current understanding of nitrogenases and highlight their potential for sustainable bioproduction

NASA still deciding whether to keep 2 astronauts at space station until next year

Physicists throw world's smallest disco party with a levitating ball of fluorescent nanodiamond

First-of-its-kind analysis reveals importance of storms in air–sea carbon exchange in Southern Ocean

Fine fragrances from test tubes: A new method to synthesize ambrox

NASA's Perseverance rover to begin long climb up Martian crater rim

Revealing the mysteries within microbial genomes with a new high-throughput approach

Relevant PhysicsForums posts

Free Abstract Algebra curriculum in Urdu and Hindi

Incandescent bulbs in teaching

Sources to study basic logic for precocious 10-year old?

Kumon Math and Similar Programs

AAPT 2024 Summer Meeting Boston, MA (July 2024) - are you going?

How is Physics taught without Calculus?

Study finds AI language model failed to produce appropriate questions, answers for medical school exam

Creating medical exam questions with ChatGPT

ChatGPT bot passes US law school exam

ChatGPT is still no match for humans when it comes to accounting

ChatGPT scores nearly 50% on board certification practice test for ophthalmology, study shows

Despite opportunities to cheat, unsupervised online exams gauge student learning comparably to in-person exams

Larger teams in academic research worsen career prospects, study finds

The 'knowledge curse': More isn't necessarily better

Visiting an art exhibition can make you think more socially and openly—but for how long?

Autonomy boosts college student attendance and performance

Study reveals young scientists face career hurdles in interdisciplinary research

Transforming higher education for minority students: Minor adjustments, major impacts

Medical Xpress

Tech Xplore

Science X

Statistical analysis can detect when ChatGPT is used to cheat on multiple-choice chemistry exams

Tropical Atlantic mixing rewrites climate pattern rules

Protons can tune synaptic signaling by changing the shape of a protein receptor

Scientists create material that can take the temperature of nanoscale objects

Findings challenge current understanding of nitrogenases and highlight their potential for sustainable bioproduction

NASA still deciding whether to keep 2 astronauts at space station until next year

Physicists throw world's smallest disco party with a levitating ball of fluorescent nanodiamond

First-of-its-kind analysis reveals importance of storms in air–sea carbon exchange in Southern Ocean

Fine fragrances from test tubes: A new method to synthesize ambrox

NASA's Perseverance rover to begin long climb up Martian crater rim

Revealing the mysteries within microbial genomes with a new high-throughput approach

Relevant PhysicsForums posts

Related Stories

Study finds AI language model failed to produce appropriate questions, answers for medical school exam

Creating medical exam questions with ChatGPT

ChatGPT bot passes US law school exam

ChatGPT is still no match for humans when it comes to accounting

ChatGPT scores nearly 50% on board certification practice test for ophthalmology, study shows

Despite opportunities to cheat, unsupervised online exams gauge student learning comparably to in-person exams

Recommended for you

Larger teams in academic research worsen career prospects, study finds

The 'knowledge curse': More isn't necessarily better

Visiting an art exhibition can make you think more socially and openly—but for how long?

Autonomy boosts college student attendance and performance

Study reveals young scientists face career hurdles in interdisciplinary research

Transforming higher education for minority students: Minor adjustments, major impacts

Newsletter sign up

Donate and enjoy an ad-free experience