Seasonal changes in Google search queries for mental illness, U.S. (blue) and Australia (red). Searches were wavelet transformed to isolate the seasonal component. Searches for mental illnesses peak in the winter. (Image: Ben Althouse)

Data streams initiated by patients through their activities on the internet and social networks are now ubiquitous. These so-called Novel Data Streams (NDS) are appealing to public health surveillance officials due to their low costs and ease of collection.

A new paper published in EPJ Data Science reports the findings of a meeting of an international team of scientists and public health officials last year at SFI, in which they evaluated the state of NDS surveillance and outlined a conceptual framework for integrating such data into current public health surveillance systems. 

“Novel data streams encompass a broad set of sources from internet search data to social media posts to Wikipedia access logs, even restaurant reservations and reviews,” says the study's co-lead author, SFI Omidyar Fellow alum Ben Althouse. “Creativity is key.”

A well-known example of a NDS surveillance system is Google Flu Trends, developed in 2008, which translates Google search queries into an estimate of the number of individuals with influenza-like illness (ILI) visiting primary healthcare providers. While the system initially performed well, it fell into criticism due to prediction failures across different influenza seasons, as well as uncertainty about whether prediction of ILI rates two weeks before predictions from the U.S. Centers for Disease Control adds value to the existing systems available to public health authorities.

Nonetheless, “novel data streams have a bright future,” says SFI Omidyar Fellow Sam Scarpino, the other lead author. “Soon, surveillance systems could be nearly instantaneous and deliver on very fine geographic scales.”

NDS might also extend surveillance to places with no existing systems and improve the dissemination of data, and could potentially measure unanticipated events, such syndromes associated with a new pathogen not currently under surveillance.

These systems will require collaborations between academic researchers, private industry, and public health officials to make sure they are properly vetted. “We have to be rigorous in our evaluation and validation of these systems before they’re implemented,” Althouse cautions. “These systems show tremendous potential, we want to make sure we get them right.”

Read the paper in EPJ Data Science (October 16, 2015)

More about SFI's Biology, Behavior, & Disease research projects