Information: Unit I Summary and Major Assignments

Summary and Learning Objectives

Unit I has focused on the skills necessary to represent and understand the basic information contained in data. This has included both technical and interpretive skills.

Technical Skills

  • Installing R and associated packages to enable specific functions.
  • Working with data objects using functions.
  • Isolating desired rows and columns using subsets and related tools.
  • Summarizing the content of individual variables with tables, statistics, and univariate graphics.
  • Creating new variables based on numerical calculations, transforming strings, and categorical criteria.
  • Customizing graphics of various types.

The packages we have learned include:

  • tidyverse for “tidier” coding when working with data.
  • knitr for creating R Markdown scripts that then can be output as interwoven code and results.
  • ggplot2 for graphics.
  • stringr for transforming, combining, and otherwise working with string data.
  • tm for extended mining of rich textual data.
  • lubridate for working with date and time variables.

Interpretive Skills

  • Scrutinizing individual records to better understand the broader content of the data set.
  • Inferring aspects of the data-generation process to frame interpretation of the data set.
  • Planning variable manipulations and calculations to reveal the desired information.

Unit-Level Assignments

Community Experience Assignment

The community exploration assignments in this book are designed to align skills you have been learning with real-world contexts. They are most useful in conjunction with the Exploratory Data Assignments at the end of each chapter, especially when you have been working through them with a single data set. They provide an opportunity to “ground truth,” or really evaluate the assumptions and objectives that have guided your analysis thus far. There will be one in each unit. These can also be combined with a service-learning or capstone oriented course.

For this first community city exploration assignment, please:

  1. Select a neighborhood (or more localized place) based on something notable in the data you are working with this semester, e.g., the highest or lowest value on some variable of interest, a density of cases in one region, etc.
  2. Visit and explore this neighborhood either in person or virtually, with an eye to the observations you made in the data and how these characteristics manifest themselves in the real world.
    • Though there is no strict guideline on how long you need to spend exploring a neighborhood, either in person or virtually, a visit that lasts less than a half-hour would be unlikely to generate enough observations to support a high-quality memo and presentation.
    • A virtual visit might start with Google Maps and Street View and similar searches, but might branch out anywhere the internet can take you.
  3. Write a 3-5 page memo describing the logic for why you visited this place, what you discovered, and what this tells you about the interpretation of your data. This written document should include images from your walk (or from StreetView if visiting virtually) and maps with data describing the region.

Post-Unit Assignment: Read-Me Document

The first unit of this book has focused on turning data into information about the city, its people, and its neighborhoods. We have also learned how to clean and modify certain variables to make the data set easier to analyze. These skills learned here are the same ones necessary to create a high-quality, engaging Read-Me Document that acts as a first view into the contents of the data set and how it might be used.

Think about a person browsing an open data portal, looking for interesting data and what they might do with it. The Read-Me Document should fulfill this role, offering a quick look into the contents of the data and the analytic possibilities they might offer. It should be both informative and enticing, and should include:

  • A brief overview of what the data set contains. This should be a short paragraph composed of short sentences describing the source of the data and the type of information it holds. This will look a lot like the opening paragraph of the Data Dictionary that you have, though possibly even shorter and a little less dry.
  • “Fun facts.” Provide about 5-10 pieces of information that are notable and illustrate the nature of the data set. These should be both engaging and potentially useful to someone looking to analyze the data. For example: “The 311 system categorizes requests for service into 218 types, the most common being General Request.” “The neighborhood with the most 911 calls in 2013 was Dorchester.”
  • Visualization. Include at least two visualizations that illustrate some aspect of the data set that you find particularly noteworthy, complementing the “fun facts.”
  • No more than two pages. Remember that this is something that a visitor will want to read quickly to decide whether it is of interest.

Suggested Rubric (Total 10 pts.)

Communication of Data Content: 3 pts.

Facts: 2 pts.

Figures: 2 pts.

Structure: 2 pts.

Details (Grammar, etc.): 1 pt.