Revisit the creation of the test-train split, and use
scikit-learnto randomly divide the data into two parts. Notice that with this method you can separate the predictor variable and the explanatory variables into
We chose a loss function that outputs a bigger loss when the model overestimates donkey weights. But, sometimes we want the opposite: we want a bigger loss when the model underestimates the weights. For example, it’s more harmful to prescribe too little of an antibiotic. If an infection isn’t completely wiped out, it can acquire resistance to the antibiotic and re-emerge. Create a loss function called
bio_lossthat penalizes underestimates of model weights, and re-fit our model to it. How does this new loss function change the model’s predictions and errors?
Fit a model that only uses the one-hot encoded
BCSvariable. Plot the model parameters on a scatter plot, then compare the scatter plot to a box plot of
BCS. What do you notice?
Re-fit our final model on the training data, but instead of dropping
BCS_4.0instead. How does this change the model parameters?Why is this new model equivalent to the old one?