New techniques could help identify students at risk for dropping out of online courses

July 1, 2015 by Larry Hardesty, Massachusetts Institute of Technology
Credit: iStock

MOOCs—massive open online courses—grant huge numbers of people access to world-class educational resources, but they also suffer high rates of attrition.

To some degree, that's inevitable: Many people who enroll in MOOCs may have no interest in doing homework, but simply plan to listen to in their spare time.

Others, however, may begin courses with the firm intention of completing them but get derailed by life's other demands. Identifying those people before they drop out and providing them with extra help could make their MOOC participation much more productive.

The problem is that you don't know who's actually dropped out—or, in MOOC parlance, "stopped out"—until the MOOC has been completed. One missed deadline does not a stopout make; but after the second or third missed deadline, it may be too late for an intervention to do any good.

Last week, at the International Conference on Artificial Intelligence in Education, MIT researchers showed that a dropout-prediction model trained on data from one offering of a course can help predict which students will stop out of the next offering. The prediction remains fairly accurate even if the organization of the course changes, so that the data collected during one offering doesn't exactly match the data collected during the next.

"There's a known area in machine learning called transfer learning, where you train a machine-learning model in one environment and see what you have to do to adapt it to a new environment," says Kalyan Veeramachaneni, a research scientist at MIT's Computer Science and Artificial Intelligence Laboratory who conducted the study together with Sebastien Boyer, a graduate in MIT's Technology and Policy Program. "Because if you're not able to do that, then the model isn't worth anything, other than the insight it may give you. It cannot be used for real-time prediction."

Generic descriptors

Veeramachaneni and Boyer's first step was to develop a set of variables that would allow them to compare data collected during different offerings of the same course—or, indeed, offerings of different courses. These include things such as average time spent per correct homework problem and amount of time spent with video lectures or other resources.

Next, for each of three different offerings of the same course, they normalized the raw values of those variables against the class averages. So, for instance, a student who spent two hours a week watching videos where the class average was three would have a video-watching score of 0.67, while a student who spent four hours a week watching videos would have a score of 1.33.

They ran the normalized data for the first course offering through a machine-learning algorithm that tried to find correlations between particular values of the variables and stopout. Then they used those correlations to try to predict stopout in the next two offerings of the course. They repeated the process with the second course offering, using the resulting model to predict stopout in the third.

Tipping the balance

Already, the model's predictions were fairly accurate. But Veeramachaneni and Boyer hoped to do better. They tried several different techniques to improve the model's accuracy, but the one that fared best is called importance sampling. For each student enrolled in, say, the second offering of the course, they found the student in the first offering who provided the closest match, as determined by a "distance function" that factored in all the variables. Then, according to the closeness of the match, they gave the statistics on the student from the first offering a greater weight during the machine-learning process.

In general, the version of the model that used importance sampling was more accurate than the unmodified version. But the difference was not overwhelming. In ongoing work, Veeramachaneni and Boyer are tinkering with both the distance function and the calculation of the corresponding weights, in the hope of improving the accuracy of the model.

They also continue to expand the set of variables that the can consider. "One of the variables that I think is very important is the proportion of time that students spend on the course that falls on the weekend," Veeramachaneni says. "That variable has to be a proxy for how busy they are. And that put together with the other variables should tell you that the student has a strong motivation to do the work but is getting busy. That's the one that I would prioritize next."

Explore further: Researcher tackles some of the biggest bottlenecks holding back the data science industry

More information: "Transfer Learning for Predictive Models in Massive Open Online Courses." … eeramachaneni228.pdf

Related Stories

Explainer: What is a small private online course?

November 28, 2014

If you have studied an online course at a university over the past couple of decades, you've probably already experienced a SPOC, or Small Private Online Course. SPOC is a new term for an old concept, which appears to be ...

Online classes really do work, according to study

September 24, 2014

It's been two years since a New York Times article declared the "year of the MOOC" —short for "massive open online courses." Now, for the first time, researchers have carried out a detailed study that shows that these classes ...

Recommended for you

Researchers engineer a tougher fiber

February 22, 2019

North Carolina State University researchers have developed a fiber that combines the elasticity of rubber with the strength of a metal, resulting in a tougher material that could be incorporated into soft robotics, packaging ...

A quantum magnet with a topological twist

February 22, 2019

Taking their name from an intricate Japanese basket pattern, kagome magnets are thought to have electronic properties that could be valuable for future quantum devices and applications. Theories predict that some electrons ...

Solving the jet/cocoon riddle of a gravitational wave event

February 22, 2019

An international research team including astronomers from the Max Planck Institute for Radio Astronomy in Bonn, Germany, has combined radio telescopes from five continents to prove the existence of a narrow stream of material, ...


Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.