April 20, 2023

ChatGPT is still no match for humans when it comes to accounting

by Todd Hollingshead, Brigham Young University

Last month, OpenAI launched its newest AI chatbot product, GPT-4. According to the folks at OpenAI, the bot, which uses machine learning to generate natural language text, passed the bar exam with a score in the 90th percentile, passed 13 of 15 AP exams, and got a nearly perfect score on the GRE Verbal test.

Inquiring minds at Brigham Young University (BYU) and 186 other universities wanted to know how OpenAI's tech would fare on accounting exams. So, they put the original version, ChatGPT, to the test. Their research is described in Issues in Accounting Education.

The researchers say that while it still has work to do in the realm of accounting, it's a game changer that will change the way everyone teaches and learns—for the better.

"When this technology first came out, everyone was worried that students could now use it to cheat," said lead study author David Wood, a BYU professor of accounting. "But opportunities to cheat have always existed. So for us, we're trying to focus on what we can do with this technology now that we couldn't do before to improve the teaching process for faculty and the learning process for students. Testing it out was eye-opening."

Since its debut in November 2022, ChatGPT has become the fastest growing technology platform ever, reaching 100 million users in under two months. In response to intense debate about how models like ChatGPT should factor into education, Wood decided to recruit as many professors as possible to see how the AI fared against actual university accounting students.

His co-author recruiting pitch on social media exploded: 327 co-authors from 186 educational institutions in 14 countries participated in the research, contributing 25,181 classroom accounting exam questions. They also recruited undergrad BYU students (including Wood's daughter, Jessica) to feed another 2,268 textbook test bank questions to ChatGPT. The questions covered accounting information systems (AIS), auditing, financial accounting, managerial accounting and tax, and varied in difficulty and type (true/false, multiple choice, short answer, etc.).

Although ChatGPT's performance was impressive, the students performed better. Students scored an overall average of 76.7%, compared to ChatGPT's score of 47.4%. On a 11.3% of questions, ChatGPT scored higher than the student average, doing particularly well on AIS and auditing. But the AI bot did worse on tax, financial, and managerial assessments, possibly because ChatGPT struggled with the mathematical processes required for the latter type.

When it came to question type, ChatGPT did better on true/false questions (68.7% correct) and multiple-choice questions (59.5%), but struggled with short-answer questions (between 28.7% and 39.1%). In general, higher-order questions were harder for ChatGPT to answer. In fact, sometimes ChatGPT would provide authoritative written descriptions for incorrect answers, or answer the same question different ways.

"It's not perfect; you're not going to be using it for everything," said Jessica Wood, currently a freshman at BYU. "Trying to learn solely by using ChatGPT is a fool's errand."

The researchers also uncovered some other fascinating trends through the study, including:

ChatGPT doesn't always recognize when it is doing math and makes nonsensical errors such as adding two numbers in a subtraction problem, or dividing numbers incorrectly.
ChatGPT often provides explanations for its answers, even if they are incorrect. Other times, ChatGPT's descriptions are accurate, but it will then proceed to select the wrong multiple-choice answer.
ChatGPT sometimes makes up facts. For example, when providing a reference, it generates a real-looking reference that is completely fabricated. The work and sometimes the authors do not even exist.

That said, authors fully expect GPT-4 to improve exponentially on the accounting questions posed in their study, and the issues mentioned above. What they find most promising is how the chatbot can help improve teaching and learning, including the ability to design and test assignments, or perhaps be used for drafting portions of a project.

"It's an opportunity to reflect on whether we are teaching value-added information or not," said study co-author and fellow BYU accounting professor Melissa Larson. "This is a disruption, and we need to assess where we go from here. Of course, I'm still going to have TAs, but this is going to force us to use them in different ways."

More information: The ChatGPT Artificial Intelligence Chatbot: How Well Does It Answer Accounting Assessment Questions?, Issues in Accounting Education (2023). DOI: 10.2308/ISSUES-2023-013

Provided by Brigham Young University

Citation: ChatGPT is still no match for humans when it comes to accounting (2023, April 20) retrieved 10 July 2024 from https://phys.org/news/2023-04-chatgpt-humans-accounting.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

ChatGPT bot passes US law school exam

552 shares

Feedback to editors

New tools are needed to make water affordable, says study

11 minutes ago

Researchers demonstrate how to build 'time-traveling' quantum sensors

11 minutes ago

Lion with nine lives breaks record with longest swim in predator-infested waters

1 hour ago

New multimode coupler design advances scalable quantum computing

1 hour ago

High-speed electron camera uncovers new 'light-twisting' behavior in ultrathin material

1 hour ago

Perceived warmth, competence predict callback decisions in meta-analysis of hiring experiments

2 hours ago

Those excited for holiday season are more likely to feel they come quicker, study shows

2 hours ago

Scientists create comprehensive map of sea worm neural circuits

2 hours ago

Not so simple: Mosses and ferns offer new hope for crop protection

3 hours ago

Nanoparticle-based delivery system could offer treatment for diabetics with rare insulin allergy

3 hours ago

Load comments (1)

ChatGPT is still no match for humans when it comes to accounting

New tools are needed to make water affordable, says study

Researchers demonstrate how to build 'time-traveling' quantum sensors

Lion with nine lives breaks record with longest swim in predator-infested waters

New multimode coupler design advances scalable quantum computing

High-speed electron camera uncovers new 'light-twisting' behavior in ultrathin material

Perceived warmth, competence predict callback decisions in meta-analysis of hiring experiments

Those excited for holiday season are more likely to feel they come quicker, study shows

Scientists create comprehensive map of sea worm neural circuits

Not so simple: Mosses and ferns offer new hope for crop protection

Nanoparticle-based delivery system could offer treatment for diabetics with rare insulin allergy

Relevant PhysicsForums posts

Japanese Translation Issues with Google Translate

Cover songs versus the original track, which ones are better?

Who is your favorite Jazz musician and what is your favorite song?

Biographies, history, personal accounts

Music to Lift Your Soul: 4 Genres & Honorable Mention

Today's Fusion Music: T Square, Cassiopeia, Rei & Kanade Sato

ChatGPT bot passes US law school exam

Exploring potential applications for ChatGPT in nuclear medicine and molecular imaging

ChatGPT statements can influence users' moral judgments

What is ChatGPT: Here's what you need to know

Top French university bans students from using ChatGPT

Italy says ChatGPT can be back if it makes 'useful' changes

Perceived warmth, competence predict callback decisions in meta-analysis of hiring experiments

Living in America's wealthiest communities may not make you safer

Archaeological evidence shows centuries of intensive economic growth in Britain under Roman rule

Cryptocurrency investors are more likely to self-report 'Dark Tetrad' personality traits, study shows

High ceilings linked to poorer exam results for uni students

Study: More complaints, worse performance when AI monitors employees

Medical Xpress

Tech Xplore

Science X

ChatGPT is still no match for humans when it comes to accounting

New tools are needed to make water affordable, says study

Researchers demonstrate how to build 'time-traveling' quantum sensors

Lion with nine lives breaks record with longest swim in predator-infested waters

New multimode coupler design advances scalable quantum computing

High-speed electron camera uncovers new 'light-twisting' behavior in ultrathin material

Perceived warmth, competence predict callback decisions in meta-analysis of hiring experiments

Those excited for holiday season are more likely to feel they come quicker, study shows

Scientists create comprehensive map of sea worm neural circuits

Not so simple: Mosses and ferns offer new hope for crop protection

Nanoparticle-based delivery system could offer treatment for diabetics with rare insulin allergy

Relevant PhysicsForums posts

Related Stories

ChatGPT bot passes US law school exam

Exploring potential applications for ChatGPT in nuclear medicine and molecular imaging

ChatGPT statements can influence users' moral judgments

What is ChatGPT: Here's what you need to know

Top French university bans students from using ChatGPT

Italy says ChatGPT can be back if it makes 'useful' changes

Recommended for you

Perceived warmth, competence predict callback decisions in meta-analysis of hiring experiments

Living in America's wealthiest communities may not make you safer

Archaeological evidence shows centuries of intensive economic growth in Britain under Roman rule

Cryptocurrency investors are more likely to self-report 'Dark Tetrad' personality traits, study shows

High ceilings linked to poorer exam results for uni students

Study: More complaints, worse performance when AI monitors employees

Newsletter sign up

Donate and enjoy an ad-free experience