Exercises
18.7. Exercises¶
Revisit the creation of the test-train split, and use
train_test_split
provided inscikit-learn
to randomly divide the data into two parts. Notice that with this method you can separate the predictor variable and the explanatory variables intoX
andy
objects.
We chose a loss function that outputs a bigger loss when the model overestimates donkey weights. But, sometimes we want the opposite: we want a bigger loss when the model underestimates the weights. For example, it’s more harmful to prescribe too little of an antibiotic. If an infection isn’t completely wiped out, it can acquire resistance to the antibiotic and re-emerge. Create a loss function called
bio_loss
that penalizes underestimates of model weights, and re-fit our model to it. How does this new loss function change the model’s predictions and errors?
Fit a model that only uses the one-hot encoded
BCS
variable. Plot the model parameters on a scatter plot, then compare the scatter plot to a box plot ofWeight
vs.BCS
. What do you notice?
Re-fit our final model on the training data, but instead of dropping
BCS_3.0
, dropBCS_4.0
instead. How does this change the model parameters?Why is this new model equivalent to the old one?