Report on Expungement Outcomes in the Virginia Court Data

Code for Charlottesville

Last updated: 10/3/2021

Table of contents

  1. Introduction
  2. Loading the data from our Postgres DB
  3. How many records qualify as automatic, automatic (pending), petition, petition (pending), or not expungable, in every by year in the data?
  4. How do these counts break down by race?
  5. What are the most frequent reasons for each expungability outcome?
  6. How do these reasons differ by race?

Introduction

Previously, we developed a program to auto-encode every record in the Virginia court data by expungability under the new law: either automatic, petition, automatic or petition pending a mandatory waiting period, or not eligible for expungement. We also automatically coded the reasons for each expungement decision, and phrased the reasons in the language of the new law. We then constructed a loop that applied these auto-encoders to each individuals' records in the Virginia court data. We saved the data in a Postgres database on our Oracle cloud virtual computers.

In this document, we load the data and summarize it in ways that provide answers to the following questions:

  1. How many records qualify as automatic, automatic (pending), petition, petition (pending), or not expungeable, in every by year in the data.

  2. How do these counts break down by race?

  3. What are the most frequent reasons for each expungability outcome?

  4. How do these reasons differ by race?

We begin by loading the following Python packages:

Loading the data from our Postgres DB

First we create a connection to the database:

Then we use this connection to query the entire dataset. This is memory intensive -- working wholly within SQL would be more efficient. But I choose to work in the Python environment to be able to use pandas and the Python plotting packages:

The resulting data contain 9,054,266 distinct records from 3,082,954 different people over the time frame from 2000 to 2020. The first few rows of the data look like this:

How many records qualify as automatic, automatic (pending), petition, petition (pending), or not expungable, in every by year in the data?

First we extract year from the HearingDate column:

There were several errors in how hearing date was entered into the data, with some nonsensical, future years:

We replace these future years with missing values for now, until we can dig into the data to see what went wrong with these codes:

The following table lists raw counts of each expungability outcome by year:

And this table lists the row percents -- out of all records in each year, what percent falls within each expungability outcome?

We can also plot these percentages over time:

How do these counts break down by race?

Race is coded in the data, but in an inconsistent way:

We have to combine and collapse these categories to get a clear picture of how expungability outcomes depend on race:

We can now generate a cross-tabulation of race and expungability outcome. The following table contains row percents: of all records from a person of the given race, what percent falls within each expungability outcome?

We can plot these percents:

What are the most frequent reasons for each expungability outcome?

We can assess the most common reasons for each expungability decision:

How do these reasons differ by race?