Gabriel Garcia for the Santa Fe Institute

As our abilities to acquire, store, and analyze mountains of data have grown in recent years, so too have questions about Big Data’s true abilities and limits. Proponents tout Big Data as a means to improve the quality of almost any process we can measure. But data alone, without accompanying theory, may not lead to the best questions and hence may not be generating the best answers.

SFI’s annual Business Network and Board of Trustees Symposium this weekend will explore both the promise and the limits of Big Data, as well as the value of theory in the Big Data context.

“The fundamental hope of Big Data proponents is that it can provide a tool to help them substantially answer their questions,” says Symposium organizer, neuroscientist, and SFI Business Network Director Chris Wood. “But perhaps the biggest misconception is that we can answer all our scientific, business, governmental, or political questions if we only have the right data.”

Often the true usefulness of Big Data is uncertain, Wood points out. Social scientists are now using data from social networks such as Twitter and Facebook to make inferences about social interactions in general. Whether or not the conclusions from online network data generalize to other forms of interaction is an important empirical question, he says, but that question cannot be answered from the online data alone.

Sometimes Big Data can be the right resource. Symposium speaker Dan Wagner’s analytics during the 2012 Obama campaign used a variety of commercial and political data to identify those voters likely to favor Obama’s messages, then helped find ways to reach them electronically or face-to-face. The strategy contrasts markedly with more traditional “voting bloc” techniques.

Other Symposium participants include Alexander Szalay, a cosmologist at Johns Hopkins who was among the first to build a very large-scale scientific database; SFI External Professor Cosma Shalizi, a professor of statistics at Carnegie Mellon who offers cautionary tales about incorrect inferences and bulky models; and author James Bamford, a leading scholar on the U.S. National Security Agency, which is perhaps the biggest “Big Data” organization of them all.

In any case, it is essential to question and test assumptions about Big Data and its applications, Wood says. A familiar example is the use of search histories to target online ads. Are all search terms equally valuable to advertisers? Such questions are part of the “arms race” between companies providing online services and seeking to deliver the best “eyes” to their advertisers.

The invitation-only Symposium runs October 31-November 2 in Santa Fe.