Crystal Valentine (Amherst College)
Abstract From the time when we first experience intuitive thought we ask the “big questions.” Who are we? Where did we come from? What else is there? What does it all mean? Galileo, a true Renaissance man, used the most sophisticated tool of his time—the telescope—to gaze into the heavens in order to gain insight into those big questions. Now our most sophisticated tool is computation; we leverage massive data sets in order to gain insights on those same questions.
Today we can accurately conclude the big data revolution is here. We have passed an inflection point on the cost/value curve of collecting and analyzing data, motivating researchers and CEO’s to put data, instead of intuition, at the center of decision-making. Computation has become a favorite mode of executing the three steps of the general intellectual inquiry process common to all academic disciplines: observation, inference, and synthesis. Initially used primarily as a simple tool for collecting and curating raw experimental data to enhance human observational capabilities, computation is now a central and trusted method of carrying out both the inference and synthesis steps as well through the use of advanced algorithms, including statistical methods, machine learning, and artificial intelligence. Today, computation and data analysis are part and parcel of a remarkable number of processes for both academic and commercial pursuits.
The growing emphasis on placing data central to the inquiry process coupled with an unprecedented growth in processing power and storage capacity has motivated an explosion in the volume of data being collected and analyzed. The abundance of data and the increasingly data-centric focus of research methods represent a shift within the computer science community, resulting in the emergence of big data as a new sub-field. Moreover, the dramatic growth in data volumes is challenging traditional data storage and processing techniques, precipitating an equally dramatic paradigm shift in the architecture of computational platforms: from single-processor systems to shared-memory multiple-processor machines, and today to shared-nothing distributed systems. These shifts have yielded tremendous advances in science and business technology, but also present some significant technical challenges, requiring the development of robust, distributed platforms and new parallel algorithms.
Just as Galileo’s telescope was, today big data is the modern lens through which we examine life’s big questions. I shall describe several compelling examples--from both academia and industry--of the use of big data, with increasingly sophisticated applications against a backdrop of an arc of technological advancement, and will provide a picture for where big data will take us over the next decade as we continue to gaze into the heavens.
Bio. Crystal Valentine is an Assistant Professor in the Department of Computer Science at Amherst College where she teaches Big Data, Principles of Database Design, and Computational Biology and consults for equity investors as a technical expert. At the conclusion of this semester, she will taking a leave of absence from Amherst in order to join MapR Technologies as VP of Technology Strategy, driving innovation around their big data platform and working on thought leadership projects in the enterprise computing community. Prior to Amherst, she spent four years as a consultant at Ab Initio Software working with Fortune 100 companies to design and implement high-throughput, mission-critical enterprise applications. Crystal received a patent for “Extreme Virtual Memory”, a toolkit developed at MIT Lincoln Laboratory for doing distributed computations on petabyte-scale matrices. Crystal graduated Magna Cum Laude with Distinction and Phi Beta Kappa with a B.A. in Computer Science from Amherst College and received her doctorate in Computer Science from Brown University where she studied algorithms for analyzing the human genome. She was a Fulbright Scholar to Italy.