4 min read

Detroit Part 2

Assignment

Instead of traditional problem sets, this course has a single four part assignment where you will build upon your previous work each week with new material from the course. You will explore property assessment in Detroit, Michigan and create an assessment model. After the completion of the assignment, you will wrap your model into a report which analyzes the effectiveness of your model based on the ethical and other frameworks from class and make a brief presentation to the class.

Submissions

Each week you will submit two files on blackboard, your code/Rmd file and the knitted output of your code. Blackboard will not accept html files so you must zip the files together.

Part 2

Objective: Now that you have a decent understanding of the landscape in Detroit, create a new file (part_2.Rmd) which builds upon part_1 in the html report Rmarkdown style.

Submission: submit to Blackboard both your code and the knitted Rmarkdown output.

Part A

Create an ‘introduction’ to your report. Generally, only include stylized output (do not use base R print). This could mean using stargazer to show regressions, DT::datatable to show data.frames, and adding titles/labels to plots. Your introduction should include:

  1. Brief background (2-3 sentences) on issues in the Detroit assessment space
  2. 3 to 4 graphs with descriptive captions which include information on sale price, assessment accuracy, foreclosures, and outliers. Generally focus on single family homes and arm’s length transactions. While it is notable that so many properties are sold for small amounts, we typically only want to look at properties which are class 401, taxable (e.g. assessed over 2000 or so), and sell above $4,000.

Part B

We have two separate (but very related) problems we want to model. First, we want to find a way to identify if a home is likely to be overassessed in a given year. We will analyze homes and assessments from 2016. We will use tidymodels to create a workflow.

  1. Create your workflow
  2. Add to your workflow a classification model
  3. Add to your workflow a recipe of preprocessing steps. Use 2016 sales and assessments with the parcels property characteristics (note that we only know if a home was overassessed if it sold). Create a classification metric of overassessment based on properties which sold and use this as your dependent variable. Explain how you decided to construct this metric and how many classes it has.
  4. Create testing/training data and evaluate your model using the classification metrics from tables 8.3 and 8.4 from the textbook and the classification probability metric ROC curves.

Part C

Second, building off of the workflow from part B. Create a second model to create your own 2019 assessments. (Note that I am choosing this year to avoid impacts from the pandemic and data quality issues. You may, if you’d like, create 2022 assessments. Limited sales data is released here.)

  1. Create your workflow
  2. Add to your workflow a model
  3. Add to your workflow a recipe of preprocessing steps. Use sales and assessments from before 2019 with the parcels property characteristics.
  4. Create testing/training data and evaluate your model using numeric metrics RMSE and MAPE.

Grading Overview

For each assignment, you will be graded on substantial completion of the assignment (demonstrated by an attempt of all parts). When submitting parts 2, 3, and 4, you will be additionally graded on your incorporation of feedback, new concepts from the course, or the correction of any flagged issues.

The assignment will culminate in a final submission of code/report and presentation. Code will be graded based on reproducibility, conceptual understanding, and accuracy. The report will be an Rmarkdown file which knits together graphs, tables, and ethical frameworks. It should be concise (include only relevant information from Parts 1-4). This report will be used to give a five minute presentation to the class on your model and ethical/technical issues with Detroit property assessment.

Asg. Points Category Notes
1 5 Substantial Completion (attempted all parts)
2 5 Substantial Completion (attempted all parts)
2 5 Incorporation of Feedback/New Concepts From Part 1
3 10 Substantial Completion (attempted all parts)
3 10 Incorporation of Feedback/New Concepts From Part 2
4 30 Final Code Reproducible (10), Concepts (10), Accurate (10)
4 20 Final Report Via Rmarkdown HTML, contextualized analysis and ethics
4 15 Final Presentation 3-5 minute presentation on model and insights