Making (More) Maps for the Yale Law Journal: Protecting Student Voting Rights in Texas

During my time as the Empirical Scholarship Editor for the Yale Law Journal (YLJ), I got to work with a number of talented authors and amazing student editors. Every once in a blue moon I was able to collaborate with an author and make maps to accompany their piece. One of those instances was for a Volume 129 Forum piece by Joaquin Gonzalez, titled Fighting Back to Protect Student Voting Rights.

Gonzalez spent a year at the Texas Civil Rights Project on a YLJ-sponsored public interest fellowship, during which he worked on election and voting-rights issues. During that time, he observed how student voters “face a wide variety of obstacles that can deter them from democratic participation.” For example, many jurisdictions do not accept student identification as ID for voting purposes. In other jurisdictions, voting locations on or near colleges and universities are sparse. Gonzalez writes that, “[t]his lack of access can be the direct result of actions by governing bodies (such as removing or failing to provide on-campus polling locations) or the indirect result of combinations of policies (such as a combination of purposeful campus gerrymandering and strict rules regulating which precincts residents must vote in).” In order to illustrate some of the consequences of “direct” and “indirect” limitations on student voting locations, Gonzalez hoped to generate a few maps of egregious examples. We worked together before publication to produce two figures to accompany his essay.

Figure 1 (below) represents the current election precinct boundaries in Hays County, Texas. Gonzalez chose to highlight precincts that contain Texas State University (TSU), a large public university in Hays County, because “TSU is home to 38,661 students, over 7,000 of whom live on campus, with thousands more living in private housing in the immediate vicinity.” Gonzalez writes:

The lines are winding, though at first glance they may not look inherently illogical. However, overlaying features of the campus reveals the distorted way in which the community is carved into different precincts. Some of what appear to be streets on the precinct map are in fact merely paper streets or walking paths. Perhaps the most absurd result is that the Student Center, which has housed the only on-campus voting location ever used, is bisected by the precinct lines.

 
image005.png
 

Figure 2 shows where many of the major on-campus residences are located—a confusing distribution between precincts by any measure, with no clear or logical dividing lines for which residential halls are assigned to which precinct. Figure 2 also shows Hays County’s proposal for the intended placement of its Election Day poll sites prior to our threatening litigation. State law permits combining precincts into one polling place under certain circumstances. The county originally intended to combine each of the two primarily on-campus precincts with two off-campus precincts, meaning that students in those precincts would have to travel off-campus (approximately 2.2 miles in one case and 1.7 miles in the other) to cast their ballot. On top of figuring out the complicated and illogical assignment of residence halls to different precincts, students (many of whom lack transportation) would have had to find a way to get to these polling locations. One of the polling locations is separated from campus by a highway. If a student showed up at the wrong location, she would have to travel 3.4 miles in the opposite direction to reach the correct location.

 
image006.png
 

I made these maps in QGIS and used Stamen toner basemaps. You can read Gonzalez’ full piece here.

COVID-19 Hospital Capacity and Incarcerated Populations: Making Maps for ACLU Wisconsin

This April, I worked with a number of litigators and public health experts to collect, map, and analyze data about how COVID-19 poses a unique risk to incarcerated populations, in support of several lawsuits seeking prisoner release amid pandemic. This entailed collecting data about prisons and jails, incarcerated individuals, real time hospital capacity and ICU bed capacity, and more. We partnered with the ACLU in Wisconsin, where prisons and jails are overcrowded far beyond their design capacity, and ultimately produced a series of maps that were used in a lawsuit filed directly in the Wisconsin Supreme Court seeking the release of certain vulnerable people from state prisons. You can read the complaint and see the exhibits here.

We knew from public health research that an outbreak in a prison or a jail would have enormous consequences for incarcerated populations and staff because social distancing in those facilities is nearly impossible and inmates regularly share space and resources. Given that, infection spreads faster in jails and prisons than other communities. When more individuals get sick around the same time, hospital resources are strained, especially where there are few staffed beds and ICU beds in the first place. We set about trying to identify places in the state where an outbreak in a prison would be likely to overwhelm hospital resources and result in a large number of deaths, for the individuals who would not be able to get the healthcare they need like a ventilator or an ICU bed.

We presented some of our early research in an online townhall on COVID-19 in prisons and jails hosted by the Wisconsin ACLU, which you can watch here. Here are some notes and maps from that presentation:

We started with a goal that seemed simple: identify specific places in Wisconsin that have (1) large correctional communities, and (2) a dearth of hospital resources. First, we were able to get information on correctional populations from the Wisconsin Department of Corrections, including the number of inmates and staff per facility. That data did not include any information about jail detainees or staff (that data is owned by each county), or family members of any prison or jail staff, and all of those people are part of the larger population who would be vulnerable if there was an outbreak in a facility, so we knew our preliminary population numbers were under-inclusive, but it was a start.

From there we went about collecting information on hospital resources. That was really not an easy task because, as a general rule, hospital data can be really messy. When we started this effort, in March we had just two available sources of information about hospital capacity: The Wisconsin Hospital Association and the American Hospital Directory. There were a number of issues with the data: (1) it was from 2018, so it did not reflect real time hospital occupancy and availability, and (2) there were discrepancies between the two data sources. We struggled to identify and resolve discrepancies in the data, but ultimately just we did the best we would and we were able to extract indicators like total staffed beds and ICU beds per hospital, and average occupancy rates, we which used to then estimate the number of available beds at a given moment (but again, not reflecting COVID-19 reality).

With that we made this preliminary map (below), which was actually very useful at first because it gave us a sense of geographic areas we should be worried about. The shapes of counties are colored according to the number of people per available hospital bed. That is, total county population divided by our estimated number of available beds. Initially this might seem like a counterintuitive visualization because we’re used to looking at maps that show a certain phenomenon per capita, rather than the other way around (capita per phenomenon), but this style of map is useful in the public health context because it allows you to compare real numbers of potentially sick people to real numbers of available resources. Another way to describe this visualization scheme is that each county’s color shows how many people would be competing for a single hospital bed, if everyone were sick at once. Counties shown in darker red have a higher number of people competing and, therefore, are relatively more resource-constrained. Areas on the map that have both a darker red colors and large correctional communities are especially concerning. So, this preliminary map was helpful in that it drew our attention towards central and eastern regions of the state, which have a lot of prisons and relatively few hospital resources.

 
4.8-available-ICU-beds.jpg
 

Fortunately, since we started our mapping efforts, the Wisconsin Hospital Association released a new tool with daily updates on bed counts and number of COVID-19 cases, so we could pull that information daily and compare it to what we knew about correctional populations. That daily information is provided at the healthcare emergency readiness coalition region (HERC) level. HERCs are regions within which certain healthcare and emergency response services are coordinated.

So with this information, we were able to look into HERCs that have large correctional populations and get a sense of where, if there was an outbreak, we would really be in trouble because of the scarcity of hospital resources (in particular, available ICU beds). So we went HERC by HERC and reported a few key indicators: the total population, the total correctional population, real hospital and ICU bed availability, and real numbers of COVID-19 cases and ICU COVID-19 cases.

 
Picture1.png
 

Take for example, Fox Valley, in the central/eastern part in the state. That area has over half a million people, and 4500 people in its correctional population. As of the day we gave the ACLU zoom presentation, there were just 34 available ICU beds in the entire HERC. So, we did the following back-of-the envelope calculation: if even just half of the Fox Valley correctional population was infected in an outbreak, and assuming 10% of that group would need to be put on a ventilator (an extremely conservative estimate according to average hospitalization trajectories), that would mean we would need 225 beds. That’s a pretty fast-and-loose calculation because it assumes instantaneous infection, when in reality individuals would be infected over a series of weeks, and hospital beds would surely turn over as patients were released or died, but it nonetheless highlights the enormous mismatch because the vulnerable population and the number of likely available hospital resources, within the region in which individuals would be likely to receive treatment.

So the takeaway of this visual analysis was that (1) we should be really worried about these areas that have tons of prisons in the central/eastern part of the state, and (2) areas that we otherwise might not be so worried about because they are less populated, like the northernmost HERCS, actually could really be a problem. For example, Northwest HERC does not have a big population, relative to other parts of the state, but it does have a pretty sizable correctional population and very few ICU beds, so it could be hit especially hard if there was a prison outbreak.

Unfortunately, in the particular lawsuit that I mentioned at the beginning of this post, the Wisconsin Supreme Court declined to take the case (the petition asked the court to take original jurisdiction over the case, and thereby pass the lower courts). The court stated that is was “not persuaded that the relief requested, namely this court’s appointment of a special master to order and oversee the expedited reduction of a substantial population of Wisconsin’s correctional facilities is, in view of the myriad factual determinations this relief would entail, either within the scope of this court’s powers of mandamus or proper for an original action.” The Wisconsin ACLU has continued to advocate for the protection of its incarcerated populations in a number of other ways amid pandemic.

The team that I worked with on these Wisconsin projects has also been pursuing a similar COVID-19 mapping effort in New York. The map below (displaying the same phenomena that we initially mapped in Wisconsin) was prepared for leadership of the New York Department of Corrections and Community Supervision, as part of advocacy to the Governor to use his emergency powers to temporarily amend medical parole criteria to enable the release of certain inmates.

1c-pop-per-all-available-beds.jpg

Making Maps for Yale Law Journal

I recently had the pleasure of making a few fun maps for Professor Maureen E. Brady’s new article in the Yale Law Journal: The Forgotten History of Metes and Bounds. You can read the full thing as featured in Volume 128, Issue 4, here. Brady describes the piece as follows:

Since long before the settling of the American colonies, property boundaries were described by the “metes and bounds” method, a system of demarcation dependent on localized knowledge of movable stones, impermanent trees, and transient neighbors. Metes and bounds systems have long been the subject of ridicule among scholars, and a recent wave of law-and-economics scholarship has argued that land boundaries must be easily standardized to facilitate market transactions and yield economic development. However, historians have not yet explored the social and legal context surrounding earlier metes and bounds systems—obscuring the important role that nonstandardized property can play in stimulating growth . . . Using new archival research from the American colonial period, this Article reconstructs the forgotten history of metes and bounds within recording practice. Importantly, the benefits of metes and bounds were greater, and the associated costs lower, than an ahistorical examination of these records would indicate. The rich descriptions of the metes and bounds of colonial properties were customized to the preferences of American settlers and could be tailored to different types of property interests, permitting simple compliance with recording laws. While standardization is critical for enabling property to be understood by a larger and more distant set of buyers and creditors, customized property practices built upon localized knowledge serve other important social functions that likewise encourage development.

Brady describes, at length, the history of metes and bounds, a parcel demarcation system that entailed using descriptions of physical markers, like rocks, streams, and other geographic features, to identify property boundaries. A particularly interesting historic detail of metes and bounds is how the ritual of perambulation — communal walks about the property borders — was essential to its longevity. Brady writes,

The ritual of perambulation could involve much more than merely walking the outskirts of property. Perambulation was also known as “beating the bounds.” Inhabitants of the community would walk around the relevant property, literally striking the boundary line—as well as any markers in it—with sticks, stones, and willow tree branches. Both adults and children went along for the affair. The express purposes of these perambulation procedures were “to make sure that the bounds and marks were not tampered with, to restore them when displaced, and also to establish them in the memory of the folk.” Indeed, the reason for involving children was so that “witnesses to the perambulation should survive as long as possible.” A child might be picked up and flipped, so that the child’s head would touch the boundary.

In addition to offering some charming insight into perambulation, Brady offers a sort of redemption story for metes and bounds, which, as she reports, “have generally been met with derision from surveyors, lawyers, and scholars.” In particular, Brady’s article responds to recent law and economics literature by Gary Libecap and Dean Lueck, which found that a standardized “rectangular system” lowered transaction costs, yielding higher property values in some western states. Brady offers a narrative of the social benefits that metes and bounds yielded that have largely been overlooked by the law and economics literature.


We made four maps for the piece, each exploring the differences between the metes and bounds parcel demarcation system, as compared with standardized property boundaries.

Figure 1

First, Brady wanted to make a map of somewhere in the states, present day, where the legacy of the metes and bounds system would be visible in the geography, adjacent to land that had been historically demarcated using standardized systems. We spent some time zooming around in Google Maps and ultimately decided to map Dudley Township in Ohio. We traced visible property demarcations from aerial imagery, namely roads. Unsurprisingly, areas in grey were demarcated using standardized systems and areas in white were historically demarcated with metes and bounds.

 
Figure 1 Option A Two Tone (Dudley Township).png
 
 
 

Figure 2

Next, Brady wanted us to give a few more depictions of how the metes and bounds system and standardized system amounted to very different spatial patterns of property demarcation. On the left is a depiction of lots in the Virginia Military Reserve, Ross County, Ohio, (from some time between 1799 and 1826). We traced those lots from some historical maps. On the right is a depiction of parcels in Carroll, Nebraska (roughly 1918), also traced from some historical maps.

 
Figure+2+VMD+Version+2.jpg
edited.jpg
 

Figure 3

Next, Brady wanted us to a prepare a visualization of a simplified lot and tier system for identifying parcels. We based this visual on some descriptions from New Haven Town Records from 1649-1684, (Volume 2).

 
edited.jpg

Figure 4

 

Finally, Brady wanted to trace the parcel system in the Oystershell Development in New Haven, the location of her case study. We worked from a historic map for tracing the parcels and used another historic map of New Haven to place the parcels on top of the modern grid. Brady uses the story of the Oystershell Development to explain Connecticut’s legislative response to the rising number of property disputes in the colony’s cities in the early eighteenth century and the difficulty that the colony was having gaining control over the settlement of land. As she explains, many of these property disputes were caused by metes and bounds. Some of these legislative changes included standardizing the shape and contour of new lots, such as those in Oystershell.

 
Figure%2B3%2BOption%2BJ.jpg
 

At this point, I am not a zealous advocate for a return to metes and bounds, but this mappy historical divergence with Brady was a treat. She has another forthcoming article, "Property Convergence in Takings Law," to be published soon in Pepp. L. Rev., to keep your eye out for.

Mapping Race, Crime, and District Attorney Elections in NYC

I've been too swamped with law school to post anything new for a while now, but the past few weeks I've been working on a series of maps for Professor Issa Kohler-Hausmann. Her new book, Misdemeanorland, looks at expanded policing for minor offenses like misdemeanors and violations. In order to get a better sense of how crime and people are distributed across the city, and how that relates to voting behavior for District Attorney elections, I put together some maps of race and ethnicity, misdemeanors and felonies, and voting in DA-elections. You should also check out the other maps and charts that Issa had made, which track campaign financing and theft-of-services violations (like jumping a turnstile). You can see them all on her companion website for the book.

The one technical aspect of this that was a bit tricky was working with the election and population data. Election information is stored by the Board of Elections at a unit called the election district. Population estimates, on the other hand, come from the census and are stored in different aggregate units like census tracts and census block groups. I came up with a simple methodology for assigning population counts to election districts, in order to create voting maps that are normalized by population. The methodology requires the assumption that population is evenly distributed geographically within each census tract, while is obviously faulty. There are lots of ways to improve this - perhaps by adding zoning and land use layers to the maps and weighting population more heavily in more residential areas. For now, you can read about the methodology I implemented for the maps posted here. 


Race/Ethnicity and Voting in District Attorney Elections

Misdemeanors, Felonies, and Race/Ethnicity

Jersey City Zoning in 3D

Thanks to the latest by Mapbox I was able to add 3D buildings to my Jersey City Zoning Map. The buildings rendered below come from my shapefile of Jersey City buildings, which you can download from the city's open data portal, as opposed to OpenStreetMap. My data has more attributes (including zoning and assessment information) from when I merged buildings and parcel data. While the footprints are all the same as those in OSM (I first prepared that dataset specifically for OSM), you should use my dataset if you're interested in building ages, sale information, and zoning information. 

I created the building heights property from the "zoning description" field in the parcel dataset using an admittedly flawed method. The number of stories is usually listed first in a string of codes, followed by an "S". I used a regular expression to extract the number of stories and then I created a "bldgHeight" property (in meters) by multiplying the number of stories per building by 3. Lots of buildings are multi-level (e.g., a building might be one story across the entire lot but have two-stories on a portion of the lot). I grabbed the maximum number of stories in these cases. Some buildings are missing parcel information and so I don't know the number of stories. Some buildings that do merge with parcel information are missing the "zoning description" field. Unfortunately, all I could do for buildings with missing data is pipe in a "3" for one-story. 

I then did a few passes of clean-up based on personal knowledge. I flew around my map and corrected a few dozen buildings I know of that otherwise would have been a "3". For example, the buildings around Journal Square Station and Grove Station, the local high schools, and a few spots downtown. In some of these cases I was able to add additional layers to create the appearance of a 3D rendering (as opposed to just an extrusion). The Goldman Sacks Tower (the tallest building in view, below) is an example of this. All of the buildings have a "bldgHeight" of at least 3 (see my example code below for how I use this field), and all buildings have a minHeight of 0 (this field can be used to create the appearance of raised structures, like bridges) with one exception - the house I grew up in. Good luck trying to find it!

Right now, I've added different layers for each zone category by repeating code blocks. A much more elegant way of doing this would simply be to pass a different "filter" and "fill-color" property to the add.Layer function, but this works for now. I also added a link to a Google form so that anyone can submit updates to the building information. I'd like to make the map overlay collapsable at some point (it's a bit clunky right now). Finally, under advisement from Brian Platt in the Office of Innovation, I'm going to add a toggleable layer for recent development (2013+). That one might take a few more weeks to realize. Enjoy for now!

Go to the map. Go to the code.

    function loadBuildings() {
        map.addSource('Special', {
            type: 'vector',
            url: 'mapbox://sarahmlevine.sarahmlevine.1rz41on6'
        });
        map.addLayer({
            'id': 'Residential',
            'source': 'composite',
            'filter': ['==', 'zone', ('R-1' || 'R-1A' || 'R-1F' || 'R-2' || 'R-3' || 
                                      'R-4' || 'OR' || 'Caven Point')],
            'source-layer': 'buildings-1909vz',
            'type': 'fill',
            'minzoom': 14,
            'paint': {
                'fill-color': '#42e5f4',
                'fill-extrude-height': {
                    'type': 'identity',
                    'property': 'bldgHeight'
                },
                'fill-extrude-base': {
                    'type': 'identity',
                    'property': 'minHeight'
                },  
                'fill-opacity': .5
            }
        });
      };
  map.on('load', function() {
    loadBuildings();
  }

Playing with Carto: Jersey City Building Heights & Years

It's embarrassing to admit, but I finally got around to playing with Carto for the first time.  I loaded in my Jersey City buildings dataset and whipped up this map with two toggle-able layers: buildings by number of stories, and buildings by year built (the default display). You can click on buildings for other parcel information. The year built value comes straight from county assessor data, so it's a bit faulty (although I did a little manual clean up, e.g. replacing bizarre values like "9999" with NULL). The number of stories field was extracted from the "zoning description field" (I grabbed the numeric characters before the "S") from the parcel data. See my last post for information about where to download the Jersey City building footprints data, and see my last, last post for information on how I created that data.

Mapping Jersey City II: Every Building

SEE IT LIVE. SEE THE CODE. LEARN WHY.

If no map appears below (if there's a white background) it's probably because you need to enable WebGL in your browser.

It's been a long-term dream of mine to map every building in Jersey City. See my last post for more about why.  I reached out to the Office of Innovation to see how to go about doing it and they gave me the green light, so I had to pull the trigger.

Once I created the building footprints using the process documented in my last post (these polygons now live in OpenStreetMap) I used QGIS to merge on several other publicly available datasets from the city including zoning, wards, and parcel information. I did my best to make the data comply with the Project Open Data Metadata Schema v1.1 as per their request. 

The zoning and wards merges were extremely clean and easy. The parcel merge was not, to say the least. There are often many buildings inside one parcel (Mun-Bloc-Lot-QCode) or many parcels inside one building. In the first case, I allowed buildings to inherit all parcel information. In the second case, I populate the Lot and Bloc fields with "MANY" as necessary. QCodes, which identify the smallest parcel boundary, were almost never uniquely identifying, so I exclude them. Mun-Bloc-Lot is sufficient to join with county assessment and tax information. 

I completed the parcel merge by calculating building centroids and spatially merging those with parcel polygons (preserving a unique building identifier). I experimented with other methods (like parcel centroids inside building polygons), but I found this method to be the cleanest and require the least manual clean-up. I found the realcentroid plugin extremely helpful, considering some geometry irregularities. I also found the QuickMultiAttributeEdit plugin to be extremely useful for updating the fields on a few objects that merged sloppily.

I used one of my favorite plugins, qgis2web, to produce a quick and sloppy Open Layers 3 web map to immediately send to my favorite people. Unfortunately, I don't think the plugin is equipped for a dataset this large, so I wasn't able to use it to produce a Leaflet map (my preference). A little formatting with the Table Manager plugin and I sent the data off to the city with a data dictionary. They're in the process of putting it up on the Open Data Portal now.

Finally, I uploaded the geojson with footprints and all of the merged fields into Mapbox studio as a tileset. I added it to two styles: one dark basemap and one satellite imagery layer with semi-transparent road information. Finally, I used Mapbox GL JS to code the map shown above. I added an overlay and a legend, and functionality to click on buildings for their information, and toggle between my two basemap styles. I then agonized over colors, added Google fonts, and promptly went to bed.


Things to do and problems to solve:

(1) To solve: It would be great to see the whole city at once, but the dataset is so large that Mapbox enforces that it only be viewable from zoom >=14. I don't love having to pick one part of the city to focus on (at least for the default view), especially because I'm not interested in promoting a downtown-centric image of Jersey City. For now I've settled on what I think is a readily recognizable part of the city.  I would love advice on how to manage this.

(2) To do: Add more data, starting with addresses. This shouldn't be too tricky with some geocoding. This will also make dealing with messy parcel data (and recovering QCODEs) much easier. If I can get that done,  then I can merge on parcel information from the county including owner information, building codes, year built, and building/land assessments. This is definitely feasible (and is just a matter of time). I'd also like to add links to specific, relevant sections of the zoning code for each building. That's another no-brainer.

One Size Does Not Fit All Data Science

As I mentioned a while back, Alex Albright (of The Little Dataset That Could) and I had the chance to present some of our thoughts at Bloomberg's first annual Data for Good Exchange. We decided to talk about what we view as the shortcomings in popular data science education programs and bootcamps. Specifically, we wanted to shine a light on the ways that data scientists are (and are not) adequately trained to contribute to social good projects and work with foreign data. I've included the abstract (below) and introduction (after the jump). You can also read the full text and check out the poster. Thanks to Alex for working on this while on vacation in Portland and thanks to SLS for letting us write things we believe and not firing us for it.

One Size Does Not Fit All: The Shortcomings of the Mainstream Data Scientist Working for Social Good

Data scientists are increasingly called on to contribute their analytical skills outside of the corporate sector in pursuit of meaningful insights for nonprofit organizations and social good projects. We challenge the assumption that the skills and methods necessary for successful data analysis come in a “one size fits all” package for both the nonprofit and for-profit sectors. By comparing and contrasting the key elements of data science in both domains, we identify the skills critical for the successful application of data science to social good projects. We then analyze five well-known data science programs and bootcamps in order to evaluate their success in providing training that transfers smoothly to social impact projects. After surveying these programs, we make a number of recommendations with respect to data science training curricula, non-profit hiring systems, and the data science for social good community’s practices. 

table-1
table-2

While the overwhelming majority of data scientists are employed in the for-profit sector, there is a growing movement taking advantage of their technological savvy and unique toolkit for the benefit of social good projects and programs. Conventionally trained data-scientists are encouraged more and more to play a pivotal role in data-driven social good projects as team members, consultants, or volunteers. However, this phenomenon assumes that the data scientists’ standard toolkit in the for-profit sector translates seamlessly to the realm of social good. We challenge this assumption and argue that while the term “data scientist” has become an amorphous catch-all for programmers, statisticians, bloggers, and other empirically inclined individuals, the skills and methodological knowledge required of a data scientist can and should differ across the for-profit and non-profit sectors. We use this paper as an opportunity to highlight the shortcomings of mainstream data science education and practice when it comes to the non-profit sector and social impact endeavors. 

We begin by comparing and contrasting the roles of data scientists in the for-profit and non-profit environments, and identify three key differences. First, while for-profit data scientists often work with in-house data, non-profit data science often involves working with foreign data that merits greater scrutiny and sensitivity in its treatment. Second, while the corporate environment provides control over the quality of “insights” in the form of management, the non-profit environment can lack effective checks and balances on data and analysis quality. Third, in experimental design, for- profit data scientists often have near-omniscient control over the environment containing study variables, whereas real-world data and studies are seldom so fortunate. We conclude that whereas for-profit data science can often afford to be “insights”-driven and results-oriented, non-profit data science must be less content- driven and more process oriented to avoid results, conclusions, and even policies that are built on poor quality data and inappropriate methods. 

Next, we survey popular data science curricula across bootcamps, online courses, and master’s degree programs in order to generalize the baseline knowledge of emerging data scientists. We then compare and contrast the skills delivered by contemporary data science education with those required for meaningful contribution to social impact projects, and find that the former caters strikingly to a for-profit position. For example, we find that there is little to no focus in current data science education on investigating the quality of data or the identification and integrity of experimental variables. The curricula of these courses illustrate that data scientists are molded to be corporate workers as the default, necessitating a further mechanism to help empirical researchers transition across sectors, even if they bear the same title: “data scientist.” 

Ultimately, we make several recommendations as to (1) how data science training programs can better prepare their students for roles in organizations doing social good, (2) how non-profit organizations can and must be more targeted in their hiring practices to find data scientists who are adequately suited for their projects, and (3) how the data science for social good community can and must develop best practices and ethical codes akin to those in the academic community.