There's a transaction that happens every time you load a website, send an email, or click "like" on a friend's post: You get something you want in exchange for some data about your actions and interests. Entire business models depend on the premise that the data we generate in this way have value, and massive databases have been assembled with this in mind.
Can we harness data collection of this kind for research? So far, companies have been in the vanguard of this type of work, with academics lagging behind. We know analysis of large datasets will transform social sciences, but wearables and other sensors could expand biological understanding too. Some firms, such as Twitter, have released data to academics, and many cool projects have emerged as result, from predicting flu outbreaks to training computer models of language. But so far, researchers haven't had much control over what data are available for analysis.
Even when the collected data do align well with a researcher's interests, most companies aren't open enough to be truly useful. Jawbone, for instance, recently released a survey of sleep habits from college students around the United States on its blog, but didn't disclose the algorithm it used to measure sleep. It's understandable: There's not much business upside to opening their methods to potential competitors. But it does mean that the data do not join the scholarly sleep literature, and don't help direct the future of research in the field.
What if researchers got directly involved, providing users with something they want and getting data targeted for their exact research questions in return? In 2014, my Ph.D. advisor Daniel Forger and I tried exactly this. What we learned could be used by researchers in many areas, benefiting the public and scholarly study alike.
I wrote an app that provides travelers with schedules of when to seek and avoid light to help them get over jet lag as quickly as possible. The schedules were computed using a mathematical model of the circadian clock and a kind of mathematics called "optimal control theory." To return the favor for the free app, users could opt in to anonymously submit their sleep history and light exposure during their trip back to us, delivering analyzable data.
About 155,000 users have downloaded our app Entrain to date. Of those, more than 11,000 – seven percent – have sent us data in return. That level of return is a testament to the appeal of what we offered, despite having almost no budget.
Walking around campus here at the University of Michigan, I often see flyers offering to pay me US$10 for taking a survey. Our app is a high-tech version of the same idea, turbo-charged for efficiency: We get lots of data for free, the app itself advertises the schedules our paper describes and we survey a broader audience than just college undergrads. Our potential research pool was limited to smartphone users, but smartphones penetrate into more income brackets and demographics than you might initially expect.
How can other researchers get at mobile data like this? Finding something to exchange for the data is a great first step. This can include educational materials, or information about how a user compares to other survey respondents (for example, the pioneering Munich Chronotype Questionnaire), or individualized theoretical predictions built from mathematical models, like what we did with Entrain.
As a mathematician, I'm particularly partial to the last one: The optimal schedules for reducing jet lag are a neat result, but the techniques used in computing them aren't specific to any one application. There's a whole corpus of mathematical models of biology that could be translated to mobile forms to provide compelling reasons for people to give up their data, like modeling how sleep debt builds up over weeks or how your metabolism adjusts to diet.
The future of data collection
Building the app to collect the data is a major hurdle. Making the app myself was a fun exercise, but a graduate student's home brew can't keep up with professional app designers. With funding, researchers can hire companies to develop an app for them.
That said, it was incredibly freeing to be able to release the app without needing a grant to back it, and new tools are making it increasingly easier to make an app on your own. Since our app came out, for instance, Apple has released ResearchKit, which makes it easier for researchers to get signed waivers from app users and to collect data from participants.
Having help with informed consent solves a problem researchers have that for-profit companies don't: ensuring the people who are the data sources know what information we're using and for what purposes. We solved that problem in Entrain by requiring people to opt in to sending us their information, and anonymizing the information the app sent. As tools like ResearchKit continue to develop, it will get easier and easier for researchers to steer their own data collection.
Mobile is the future of this kind of data collection. Apps are personal in ways websites aren't: They're more closely tied to our identities and can access more private data. With wearables and other new forms of technology connecting to them, our phones are becoming increasingly accurate proxies for ourselves. If researchers can find the right ways to tap into this information and encourage users to share data, they can collect exactly the data their research requires – and lots of it, to boot.
Explore further: 2nd security firm raises concerns about Cruz and Kasich apps (Update)