Urban Informatics: Using Big Data to Understand and Serve Communities
December 2022
Preface
Note: This is the free online version of:
O’Brien, D.T. (2022). Urban Informatics: Using Big Data to Understand and Serve Communities. Boca Raton, FL, Chapman Hall / CRC Press.
If you would like to purchase a desk copy please see the publisher’s website, Amazon or other booksellers.
They say the “age of smart cities” is upon us. This is often equated with science fiction-y technologies that are under development, such as autonomous vehicles and algorithms that can predict crimes before they happen. But there are more immediate opportunities to transform cities and communities thanks to the vast proliferation of digital data. Novel data resources, including administrative records, social media platforms, and sensor technologies, offer original insights that can help us to refine, improve, and reimagine the ways communities work and the services and products that can support them.
As the name suggests, this book teaches the fundamental skills of urban informatics, or the use of data and technology to better understand and serve communities. This includes the technical skills of accessing, manipulating, analyzing, and visualizing complex, messy, and “big” data sets using R, as well as the ability to interpret and make sense of them. Come be a part of the smart cities revolution and help transform the communities of the 21st century. (Also, if you would like a physical copy, it will be available through Chapman Hall / CRC Press in 2023.)
This book is designed to support learners who would like leverage data science for the purpose of having public impact. As with any textbook, many readers will use this as a reference, dipping in and out to learn or refresh skills as needed. Others may work through it linearly as a full curriculum, either as part of a formal course or a self-directed venture. In either case, the book takes an experiential learning approach, including the following features:
- Urban Informatics integrates technical and conceptual skills, guiding students to make informed decisions about the interpretation of data and their analysis and visualization. Within this there is an especial emphasis on unpacking questions of equity.
- More than just another statistics textbook, the technical curriculum consists of both data management and analytics, including both as needed to become acquainted with and reveal the content of a new data set.
- All content is contextualized in real-world applications relevant to community concerns.
- Real-world worked examples, all drawn from greater Boston, MA, are made possible through public data sets from the Boston Area Research Initiative’s Boston Data Portal.
- Each chapter features traditional problem sets and an Exploratory Data Assignment that prompts students to practice their new skills on a data set of their choice. This alternative set of assignments guides students through the process of becoming familiar with the contents of a novel data set and communicating meaningful insights from the data to others.
- Unit-level assignments, including Community Experiences that prompt students to evaluate the assumptions they have made about their data against real-world information.
Please enjoy, and welcome to the burgeoning world of urban informatics!
Table of Contents
Chapter | Description |
---|---|
Chapter 1: Introduction | Presents the motivating themes behind urban informatics and the structure and philosophy of the book. |
Chapter 2: Welcome to R | Introduces readers to the R software package and how to navigate its interface and coding language. |
Chapter 3: Telling a Data Story | Works through the tools needed to initially observe and interpret the individual records of a data set. |
Chapter 4: The Pulse of the City | Analyzes and visualizes the overarching patterns of a data set. |
Chapter 5: Uncovering Information | Describes how to manipulate and create new variables to better expose content for analysis and visualization |
Chapter 6: Measuring with Big Data | Presents the conceptual basis for converting records into measures that describe unitsof analysis (e.g., neighborhoods). |
Chapter 7: Making Measures from Records | Presents the technical tools needed to aggregate and merge measures created from records. |
Chapter 8: Mapping Communities | Introduces Geospatial Information Systems (GIS) as a tool for mapping and analyzing spatial data. |
Chapter 9: Advanced Visual Techniques | Extends visualization techniques from previous chapters with additional, often more colorful or dynamic tools. |
Chapter 10: Beyond Measurement | Introduces inferential statistics and the logic for testing hypotheses, beginning with correlation tests. |
Chapter 11: Identifying Inequities across Groups | Presents t-tests and ANOVAs for comparing variables across groups. |
Chapter 12: Unpacking Mechanisms Driving Inequities | Presents multiple regression as a tool for using multiple variables to predict outcomes. |
Chapter 13: Advanced Analytic Techniques | Provides an initial exposure to cutting-edge analytic techniques that are beyond the scope of this book. |
Chapter 14: Emergent Technologies | Presents some of the novel technologies that are anticipated to have major impacts in the coming years. |
Acknowledgments
In one sense, this book was a sprint, completed in a single semester. In another, it has been a long-term project beginning in Summer 2014 when I first prepared the curriculum for the course “Big Data for Cities,” which has evolved into this text. I see it as only appropriate to thank everyone along the way.
I was hired by Northeastern University in 2014—a decision made primarily by David Lazer, who was leading a search for computational social scientists, Joan Fitzgerald, who was then Director of the School of Public Policy and Urban Affairs, and Uta Poiger, Dean of the College of Social Sciences and Humanities—ostensibly to deepen the bench of faculty for their newly established Masters of Science in Urban Informatics. I was asked to design and teach Big Data for Cities (BD4C) in my first semester as the introductory course to the program. With advice from my new colleagues James Connolly, Dietmar Offenhuber, and Nick Beauchamp, as well as the new Director of both the School and Program Matthias Ruth, I set out to design a course that wasn’t quite a statistics course or a data management course, but something in between that gave students the true experience of “discovering” and working through a new data set. The first year went fine. The second year was better. The third year I developed it as an online course. By the fourth we were filling up sections in both Fall and Spring semesters (though I was only teaching the former).
I have to thank Saina Sheini Mehrab Zadeh my (recently graduated) PhD student and long-time teaching assistant for the course. She took the course in 2017 and has TA-ed it every year since, also teaching it herself in three semesters. There is no one who has had more influence on the evolution of the course. I am grateful to another (recently graduated) PhD student, Talia Kaufmann, who TA-ed the course before Saina did. I also thank Geoff Boeing (now at University of Southern California), Curt Savoie, and Connor MacKay (both previously of the City of Boston and Commonwealth of Massachusetts), who have also taught the course. Last, I appreciate the influence that all of the students who have taken the course have had on the refinement of the curriculum through their experience, though I couldn’t possibly list them all by name.
A major element of this book is the use of real-world datasets curated by the Boston Area Research Initiative in our Boston Data Portal (BDP). These datasets were made possible by the hard work of many postdocs and research assistants, including: Mehrnaz Amiri, David Brade, Edgar Castro, Qiliang Chen, Alexandra Ciomek, Bidisha Das, Justin de Benedictis-Kessner, Chelsea Farrell, Sage Gibbons, Forrest Hangen, Laiyang Ke, Sam Levy, Barrett Montgomery, Petros Papadopoulos, Josiah Parry, Will Pfeffer, Nolan Phillips, Alina Ristea, Michael Shields, Xin Shu, Riley Tucker, Shunan You, and Michael Zoorob. This also was only possible thanks to a long-term partnership with the City of Boston’s Department of Innovation and Technology, driven by previous Chief Information Officers (Bill Oates, Jascha Franklin-Hodge, Alexandra Lawrence), Chief Data Officers (Andrew Therriault, Stephanie Costa Leabo), and their team members (including the aforementioned Curt Savoie and Connor MacKay), as well as the collaborative leadership of the members of the Mayor’s Office of New Urban Mechanics (Kris Carter, Nigel Jacob, Kimberly Lucas, and Chris Osgood). The BDP has been funded by the National Science Foundation’s Resource Implementations for Data Intensive Research (RIDIR) program (award #1637124 for those curious), the John D. and Catherine T. MacArthur Foundation, and the Herman and Frida L. Miller Foundation.
In June 2021 I found myself at a crossroads. About to embark on sabbatical, I wanted to convert that opportunity into something that would have impact. Over that summer I engaged in a variety of conversations with friends and mentors that led me to the decision to write this book, including with John Wihbey and Alisa Lincoln of Northeastern University, Luis Bettencourt of University of Chicago, Ken Steif (then of University of Pennsylvania), Ben Levine (then Executive Director of the MetroLab Network), and Chris Winship and Rob Sampson (co-founding directors of BARI at Harvard University). The consensus was that there was need for an urban informatics textbook and that I had the opportunity to write it. I also received valuable advice from Brandon Welsh (Northeastern University) and Russell Schutt (UMass Boston), two other colleagues and mentors who had written textbooks and helped me to understand the industry.
The writing of the book itself owes the greatest debt to Northeastern University’s Interdisciplinary Sabbatical program. By becoming a member of the College of Engineering I was able to have the full year to work on multiple projects, enabling me to truly dedicate the time needed to the textbook. This was granted by Provost David Madigan and his team following the strong collective endorsement of Chair of Civil and Environmental Engineering Jerry Hajjar, Dean of Engineering Gregory Abowd, Director of Public Policy and Urban Affairs Jennie Stephens, and Dean of Social Sciences and Humanities Uta Poiger. During the semester, I am grateful to the members of the BARI team—many of whom are mentioned above in the paragraph about the BDP—for providing advice on structure and content and otherwise listening to me prattle on about the work. This thanks also goes to faculty affiliated with BARI, including Nigel Jacob, Kimberly Lucas, and Moira Zellner. All of them not only gave good advice but were too kind to tell me I was getting annoying, which I appreciate.
Last, thank you to my family. My parents, Bonnie and Bill, and siblings, Tania and Liam, for listening and offering input when helpful. My in-laws for the same, especially my ever-supportive mother-in-law Kathy and my sister-in-law Jiordan, who writes books of a very different sort and loves to compare notes on the process. My older son, Beckett, bounced on my knee as a newborn when I developed Big Data for Cities for the first time. My younger son, Sebastian, has also lived alongside the development of the course and this book for his whole life. They are my youngest “math students” (and I can literally hear Beckett teaching Sebastian arithmetic as I write this).
And, of course, thank you to my wife, Leslie, for listening, supporting, and always helping me believe that the final product would be exactly what it needed to be.