Big Data

Studying Social Behavior with Big Data; An Undergraduate Toolkit

New challenges and opportunities arise from socially generated “big data” that capture human interactions recorded via web, cell phone, and sensor data. Over the past several years there has been increasing recognition of the transformative potential of this overwhelming deluge of information, both within business and for scientific research. Much of these data capture human interaction and can be mined to understand complex social interaction. However, the skills needed for collecting and analyzing these data are rarely taught within the social sciences (particularly at the undergraduate level). Our goal is to introduce undergraduates to the exciting new field of computational social science, and to provide guidance in helping them gain the skills needed to do work in this area. Over the near term, we hope to give students—especially freshman and sophomores—a taste for the kinds of research questions that can be asked using these data. This project helps address this problem by providing a set of lab tools to be incorporated into social science curricula. The tools enable educators to introduce big data research to students without assuming the students have background knowledge in computer science. Our hope is that students will feel inspired from this hands-on experience to want to gain the necessary skills needed to do work in this area.

The core of this project is a set of modular computer programs for collecting, sorting, visualizing and analyzing data retrieved from websites like Facebook, Craigslist, and Twitter. These public forums are focal points for big data-style research and while many of their data have always been available, until recently only skilled programmers could access them. Now many such websites provide an API (application programming interface) for ease of data collection but more importantly for our pedagogical goals there is now a single, powerful programming language able to handle all steps of the scientific process. This project uses that language, Python, to handle the necessary but often very technical processes that prevent curious minds from experimenting with big data. A key feature of the design of these tools is that they have varying degrees of automation, allowing educators or students to determine the degree to which programming principles are a part of the curriculum while affording the students the chance to learn about big data research through practical experience.

Project Team:
Elizabeth Bruch, Sociology and Complex Systems, College of Literature, Science and the Arts
Jonathan Atwell, Ph.D. Candidate, Sociology, College of Literature, Science and the Arts