Final Project - PA 470

Overview

The final project from this course will be an opportunity for you to demonstrate an understanding of ethical applications of machine learning in the public sector. This understanding can either be demonstrated via building a public sector machine learning pipeline or writing a paper which analyzes the course material and constructs an application of ML in the public sector. A project proposal must be submitted by 4/4. The final submission will be due after class on 4/28 with an in-class presentation on 4/27.

Pipeline, Option 1

This option asks you to create your own machine learning pipeline on a public sector application and produce an ‘implementation report’ in the Rmarkdown format from the Detroit assignment. This report should include an introduction of the topic, simple EDA which explains the data and problem statement to the reader, build and present a model using tidymodels, and discuss data and ethical issues which implementing this modeling including against any relevant baselines. To complete this option successfully, you will need to:

Formulate a problem for which ML is applicable

Determining which problems are possible is difficult. I recommend that you think about some general areas you might be interested in, view data on the Chicago Data Portal, and some of the case studies from the class. Clearly state what you are trying to predict.

Find or simulate public sector data (e.g. open data or census data plus imputation)

What data is available is the limiting factor for your project. Most data which is available will not be on the person level.

Build and analyze a model

Construct a sequence of models and select the one most appropriate for your project. Include relevant information on evaluation metrics and model explainability.

Describe implementation issues including ethics, practicality

What type of data or ethical issues are present? In the public sector, we must do no (or less) harm. A thorough discussion of this would include remarks on similar applications, how your model would allocate government resources, who might be harmed, what benefit your application could have.

It is difficult to create your own ML model especially in public sector applications. This difficulty mainly arises from prediction scoping and data gathering. This is an ambitious option but I encourage you to explore it.

Paper

This option asks you to write an 6-8 page paper (double spaced) where you build a framework to evaluate ML/AI public sector applications. This consists of synthesizing and constructing a case study of a ML/AI application including data, modeling choices, ethical issues.

Prompt:

What makes a good ML/AI public sector application or a bad one? Are certain areas, such as criminal justice, more or less likely to produce problematic implementations? How do we account for the baseline in the public sector (recall the example of historic racial disparities in criminal justice data)? Drawing from the readings in class, construct a framework with which a ML/AI public sector application can be implemented and evaluated. Recall the various examples from class of a machine learning pipeline and guides to public sector implementation. Include and analyze at least three different public sector applications of ML/AI and at least five readings from the class. You may substitute other readings as appropriate. Please cite your sources.

Then, instead of making your own ML model, apply this framework to a constructed case study of a ML/AI application. You may choose a real example, your model from your Detroit assignment, or a hypothetical application. Consider drawing from the analysis in Fragile Algorithms and Fallible DecisionMakers: Lessons from the Justice System, in particular table 3. Your analysis should include discussion along all the steps of your pipeline. Be sure to include information from Chapter 14 of DSPP.

Proposal, Due 4/4

Option 1

An initial outline of your prediction problem (2 sentences or so) and screenshots/descriptions of relevant data. If you are unsure that your problem will work, please reach out to me before submitting your proposal.

Option 2

An initial thesis (2 sentences or so). Descriptively describe (2 sentences or so) your constructed case study.

Presentation

Everyone will give a brief presentation (~5 minutes) during class on 4/27. Present your work (either your model or your case study). Explain the problem, the ML application, and to what extent you think the application should be implemented if at all.

Rubric

Proposal, 10 points
Presentation, 20 points
Project, 70 points