6:30pm Review Coding Warmup 1
6:45pm Some Background
7:15pm Spatial Data Overview
7:45pm Break + Debugging (15 minutes)
8:00pm Coding
Racism In, Racism Out
Tidymodels Ch 1/2
Figure 1.2: The data science process
Figure 1.3: A schematic for the typical modeling process
Exploratory data analysis (EDA): Initially there is a back and forth between numerical analysis and data visualization (represented in Figure 1.2) where different discoveries lead to more questions and data analysis side-quests to gain more understanding.
Feature engineering: The understanding gained from EDA results in the creation of specific model terms that make it easier to accurately model the observed data. This can include complex methodologies (e.g., PCA) or simpler features (using the ratio of two predictors). Chapter 8 focuses entirely on this important step.
Model tuning and selection (large circles with alternating segments): A variety of models are generated and their performance is compared. Some models require parameter tuning in which some structural parameters must be specified or optimized. The alternating segments within the circles signify the repeated data splitting used during resampling (see Chapter 10).
Model evaluation: During this phase of model development, we assess the model’s performance metrics, examine residual plots, and conduct other EDA-like analyses to understand how well the models work. In some cases, formal between-model comparisons (Chapter 11) help you understand whether any differences in models are within the experimental noise.
Ch 2. tidyverse
Real world problems require accurate projections of geographic relationships.
CSV, Shapefile, GeoJSON
Key to project points onto Earth’s surface
Very common are 4326 or 4269 (census)
census_api_key("YOUR API KEY GOES HERE", install=TRUE)
Let’s show quickly how the data we used last week from the Assessor can be converted to an sf object.
ccao <- read_csv('../../files/Assessor__Archived_05-11-2022__-_Residential_Modeling_Characteristics__Chicago_.zip')
mini <- ccao %>% st_as_sf(coords=c("Latitude", "Longitude")) %>% slice_sample(n=1000)
mini
st_crs(mini) <- 4326
mapview(mini) #this is backwards!
mini2 <- ccao %>% st_as_sf(coords=c("Longitude", "Latitude")) %>% slice_sample(n=1000)
st_crs(mini2) <- 4326
mapview(mini2)