18.7. Exercises

  • Revisit the creation of the test-train split, and use train_test_split provided in scikit-learn to randomly divide the data into two parts. Notice that with this method you can separate the predictor variable and the explanatory variables into X and y objects.

  • We chose a loss function that outputs a bigger loss when the model overestimates donkey weights. But, sometimes we want the opposite: we want a bigger loss when the model underestimates the weights. For example, it’s more harmful to prescribe too little of an antibiotic. If an infection isn’t completely wiped out, it can acquire resistance to the antibiotic and re-emerge. Create a loss function called bio_loss that penalizes underestimates of model weights, and re-fit our model to it. How does this new loss function change the model’s predictions and errors?

  • Fit a model that only uses the one-hot encoded BCS variable. Plot the model parameters on a scatter plot, then compare the scatter plot to a box plot of Weight vs. BCS. What do you notice?

  • Re-fit our final model on the training data, but instead of dropping BCS_3.0, drop BCS_4.0 instead. How does this change the model parameters?Why is this new model equivalent to the old one?