New techniques could help identify students at risk for dropping out of online courses

July 1, 2015 by Larry Hardesty
Credit: iStock

MOOCs—massive open online courses—grant huge numbers of people access to world-class educational resources, but they also suffer high rates of attrition.

To some degree, that's inevitable: Many people who enroll in MOOCs may have no interest in doing homework, but simply plan to listen to in their spare time.

Others, however, may begin courses with the firm intention of completing them but get derailed by life's other demands. Identifying those people before they drop out and providing them with extra help could make their MOOC participation much more productive.

The problem is that you don't know who's actually dropped out—or, in MOOC parlance, "stopped out"—until the MOOC has been completed. One missed deadline does not a stopout make; but after the second or third missed deadline, it may be too late for an intervention to do any good.

Last week, at the International Conference on Artificial Intelligence in Education, MIT researchers showed that a dropout-prediction model trained on data from one offering of a course can help predict which students will stop out of the next offering. The prediction remains fairly accurate even if the organization of the course changes, so that the data collected during one offering doesn't exactly match the data collected during the next.

"There's a known area in machine learning called transfer learning, where you train a machine-learning model in one environment and see what you have to do to adapt it to a new environment," says Kalyan Veeramachaneni, a research scientist at MIT's Computer Science and Artificial Intelligence Laboratory who conducted the study together with Sebastien Boyer, a graduate in MIT's Technology and Policy Program. "Because if you're not able to do that, then the model isn't worth anything, other than the insight it may give you. It cannot be used for real-time prediction."

Generic descriptors

Veeramachaneni and Boyer's first step was to develop a set of variables that would allow them to compare data collected during different offerings of the same course—or, indeed, offerings of different courses. These include things such as average time spent per correct homework problem and amount of time spent with video lectures or other resources.

Next, for each of three different offerings of the same course, they normalized the raw values of those variables against the class averages. So, for instance, a student who spent two hours a week watching videos where the class average was three would have a video-watching score of 0.67, while a student who spent four hours a week watching videos would have a score of 1.33.

They ran the normalized data for the first course offering through a machine-learning algorithm that tried to find correlations between particular values of the variables and stopout. Then they used those correlations to try to predict stopout in the next two offerings of the course. They repeated the process with the second course offering, using the resulting model to predict stopout in the third.

Tipping the balance

Already, the model's predictions were fairly accurate. But Veeramachaneni and Boyer hoped to do better. They tried several different techniques to improve the model's accuracy, but the one that fared best is called importance sampling. For each student enrolled in, say, the second offering of the course, they found the student in the first offering who provided the closest match, as determined by a "distance function" that factored in all the variables. Then, according to the closeness of the match, they gave the statistics on the student from the first offering a greater weight during the machine-learning process.

In general, the version of the model that used importance sampling was more accurate than the unmodified version. But the difference was not overwhelming. In ongoing work, Veeramachaneni and Boyer are tinkering with both the distance function and the calculation of the corresponding weights, in the hope of improving the accuracy of the model.

They also continue to expand the set of variables that the can consider. "One of the variables that I think is very important is the proportion of time that students spend on the course that falls on the weekend," Veeramachaneni says. "That variable has to be a proxy for how busy they are. And that put together with the other variables should tell you that the student has a strong motivation to do the work but is getting busy. That's the one that I would prioritize next."

Explore further: Researcher tackles some of the biggest bottlenecks holding back the data science industry

More information: "Transfer Learning for Predictive Models in Massive Open Online Courses." groups.csail.mit.edu/EVO-Desig … eeramachaneni228.pdf

Related Stories

Explainer: What is a small private online course?

November 28, 2014

If you have studied an online course at a university over the past couple of decades, you've probably already experienced a SPOC, or Small Private Online Course. SPOC is a new term for an old concept, which appears to be ...

Online classes really do work, according to study

September 24, 2014

It's been two years since a New York Times article declared the "year of the MOOC" —short for "massive open online courses." Now, for the first time, researchers have carried out a detailed study that shows that these classes ...

Recommended for you

A not-quite-random walk demystifies the algorithm

December 15, 2017

The algorithm is having a cultural moment. Originally a math and computer science term, algorithms are now used to account for everything from military drone strikes and financial market forecasts to Google search results.

US faces moment of truth on 'net neutrality'

December 14, 2017

The acrimonious battle over "net neutrality" in America comes to a head Thursday with a US agency set to vote to roll back rules enacted two years earlier aimed at preventing a "two-speed" internet.

FCC votes along party lines to end 'net neutrality' (Update)

December 14, 2017

The Federal Communications Commission repealed the Obama-era "net neutrality" rules Thursday, giving internet service providers like Verizon, Comcast and AT&T a free hand to slow or block websites and apps as they see fit ...

The wet road to fast and stable batteries

December 14, 2017

An international team of scientists—including several researchers from the U.S. Department of Energy's (DOE) Argonne National Laboratory—has discovered an anode battery material with superfast charging and stable operation ...

0 comments

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.