Assignment
Instead of traditional problem sets, this course has a single four part assignment where you will build upon your previous work each week with new material from the course. You will explore property assessment in Cook County, Illinois and create an assessment model. After the completion of the assignment, you will wrap your model into a report which analyzes the effectiveness of your model based on the ethical and other frameworks from class and make a brief presentation to the class.
Submissions
Each week you will submit two files on blackboard, your code/Rmd file and the knitted output of your code. Blackboard will not accept html files so you must zip the files together.
Part 3 (Due 3/25, 11:59pm)
Create part_3.Rmd
, copying the yaml/framework from your previous work.
Part A (10%)
Begin to finalize your report by only including report/website quality output. By this, only include stylized output (do not use base R print). Include your code in your report using code_folding: hide
. Avoid any package loading or other incidental inclusion of output in your report. This could mean using stargazer
to show regressions, DT::datatable
to present tables, and adding titles/labels to plots. Give all plots and tables appropriate captions. Write two to three sentence introductions for the different sections of your report. Your report should introduce property assessment in Cook County, Illinois (as defined in part 2A), your assessment prediction model (from 2C), and a conclusion. Note that Part 2B will not be continued into this assignment.
Part B (40%)
Feature engineering. You have now created a base models and evaluation metrics from last week. Investigate creating at least two new predictors and analyze if they improve your model(s). Some possibilities:
- Census variables (such as income, race)
- Sale price per square foot for neighborhood
- Geographic areas (tract, township, etc.)
- Distance to geographic features (CTA, Lake Michigan, Nearby Counts of Schools/PINs/Transit/etc)
- Time adjustments
Specifically compare model performance between the base model and the new predictors. If these features do not show sufficient improvement, state that and then remove them from your model.
Part C (50%)
Prediction. Create “out of sample” predictions or “assess” all properties. By this, predict assessment/valuation for homes which did not sell (e.g. predict/assess properties for 2023 from part 2C). Note if you are having trouble with this step that you cannot use any information specific to the sale of a property for out-of-sample prediction. In other words, we use information on sale prices to determine the true value of homes but we do not know this information for homes which did not sell and if we want to make a prediction for these homes we cannot use sale price information as a predictor.
A helpful data framework for this section would be to create a dataset of all properties. You would then label all the properties which sold (leaving un-sold properties unlabeled) and create your testing and training data by filtering only to properties which were labeled. After training, you can then augment
your full dataset of labeled/unlabeled data to get out of sample predictions. This framework is helpful because we need the data used to make predictions to have exactly the same format as the data used to train the model.
Using at least one table, plot, and map, present aggregated information on your predictions. For example:
- a table showing the accuracy of your assessments compared to the assessor’s assessments by region or property value
- a plot showing the distribution of predicted market values
- a map which shows summary information of market values/assessment accuracy across your township
Grading Overview
For each assignment, you will be graded on substantial completion of the assignment (demonstrated by an attempt of all parts). When submitting parts 2, 3, and 4, you will be additionally graded on your incorporation of feedback, new concepts from the course, or the correction of any flagged issues.
The assignment will culminate in a final submission of code/report and presentation. Code will be graded based on reproducibility, conceptual understanding, and accuracy. The report will be an Rmarkdown file which knits together graphs, tables, and ethical frameworks. It should be concise (include only relevant information from Parts 1-4). This report will be used to give a five minute presentation to the class on your model and ethical/technical issues with Detroit property assessment.
Asg. | Points | Category | Notes |
---|---|---|---|
1 | 5 | Substantial Completion (attempted all parts) | |
2 | 5 | Substantial Completion (attempted all parts) | |
2 | 5 | Incorporation of Feedback/New Concepts | From Part 1 |
3 | 10 | Substantial Completion (attempted all parts) | |
3 | 10 | Incorporation of Feedback/New Concepts | From Part 2 |
4 | 30 | Final Code | Reproducible (10), Concepts (10), Accurate (10) |
4 | 20 | Final Report | Via Rmarkdown HTML, contextualized analysis and ethics |
4 | 15 | Final Presentation | 3-5 minute presentation on model and insights |