5 min read

Detroit Part 3

Assignment

Instead of traditional problem sets, this course has a single four part assignment where you will build upon your previous work each week with new material from the course. You will explore property assessment in Detroit, Michigan and create an assessment model. After the completion of the assignment, you will wrap your model into a report which analyzes the effectiveness of your model based on the ethical and other frameworks from class and make a brief presentation to the class.

Submissions

Each week you will submit two files on blackboard, your code/Rmd file and the knitted output of your code. Blackboard will not accept html files so you must zip the files together.

Part 3 (Due 3/19, 11:59pm)

Create part_3.Rmd, copying the yaml/framework from your previous work.

Note, you may wish to use the attributes table for some property characteristics. Example codebook:

exterior, (1 siding, 2 brick/other, 3 brick, 4 other)
bath, (1 1.0, 2 1.5, 3 2 to 3, 4 3+)
height, (1 1 to 1.5, 2 1.5 to 2.5, 3 3+)

Part A (10%)

Begin to finalize your report by only including report/website quality output. By this, only include stylized output (do not use base R print). Include your code in your report using code_folding: hide. Avoid any package loading or other incidential inclusion of output in your report. This could mean using stargazer to show regressions, DT::datatable to show data.frames, and adding titles/labels to plots. Give all plots and tables appropriate captions. Write two to three sentence introductions for the different sections of your report. Your report should introduce property assessment in Detroit, Michigan (as defined in part 2A), your two prediction models (from 2B and 2C), and a conclusion.

Part B (30%)

Feature engineering. You have now created two base models and evaluation metrics from last week. Investigate creating at least two new predictors and analyze if they improve your model(s). Some possibilities:

  • Neighborhood foreclosures/blight tickets
  • Previous rates of assessment (part B only)
  • Census variables (such as income, race)
  • Neighborhood demolition of homes (look at how many class 401 properties w/ nonzero assessment by year)
  • Sale price per square foot for neighborhood

You may either add one metric to each model or two metrics to one of the two models.

Part C (30%)

Prediction. Create “out of sample” predictions for both models. By this, predict overassessment and assessment/valuation for homes which did not sell for each model (2016 for B, 2019 for C). Note if you are having trouble with this step that you cannot use any information specific to the sale of a property for out-of-sample prediction. In other words, we use information on sale prices to determine the true value of homes but we do not know this information for homes which did not sell and if we want to make a prediction for these homes we cannot use sale price information as a predictor.

A helpful data framework for this section would be to create a dataset of all properties in 2016/2019. You would then label all the properties which sold (leaving un-sold properties unlabeled) and creating your testing and training data by filtering only to properties which were labeled. After training, you can then augment your full dataset of labeled/unlabeled data to get out of sample predictions.

Part D (select one) (30%)

Model Explanation. Each model type has different tools for explainability and we will discuss this more in class. Undertake this initial work knowing that we will gain more techniques for this later on. Complete one of the following…

Either (contextual explanation):

For the classification overassessment model, aggregate your predictions by census tract. Join in a census variable. Create a simple correlation plot and create a representation of the geographic variance in your predictions (this could be a leaflet map by census tract for example).

Or (machine learning explanation):

For the regression assessment/valuation model, undertake an initial analysis of which factors your model identified as most important for valuation.

Grading Overview

For each assignment, you will be graded on substantial completion of the assignment (demonstrated by an attempt of all parts). When submitting parts 2, 3, and 4, you will be additionally graded on your incorporation of feedback, new concepts from the course, or the correction of any flagged issues.

The assignment will culminate in a final submission of code/report and presentation. Code will be graded based on reproducibility, conceptual understanding, and accuracy. The report will be an Rmarkdown file which knits together graphs, tables, and ethical frameworks. It should be concise (include only relevant information from Parts 1-4). This report will be used to give a five minute presentation to the class on your model and ethical/technical issues with Detroit property assessment.

Asg. Points Category Notes
1 5 Substantial Completion (attempted all parts)
2 5 Substantial Completion (attempted all parts)
2 5 Incorporation of Feedback/New Concepts From Part 1
3 10 Substantial Completion (attempted all parts)
3 10 Incorporation of Feedback/New Concepts From Part 2
4 30 Final Code Reproducible (10), Concepts (10), Accurate (10)
4 20 Final Report Via Rmarkdown HTML, contextualized analysis and ethics
4 15 Final Presentation 3-5 minute presentation on model and insights