7

A Beginner’s Guide to Machine Learning Model Monitoring

 4 years ago
source link: https://mc.ai/a-beginners-guide-to-machine-learning-model-monitoring/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Metrics in Model Monitoring

There are several metrics that you can use to monitor an ML model. The metric(s) you choose depends on various factors:

  • Is it a regression or classification task?
  • What is the business objective? Eg. precision vs recall
  • What is the distribution of the target variable?

Below are various metrics that are commonly used in model monitoring:

Type 1 Error

Also known as a false positive , it is an outcome where the model incorrectly predicts the positive class. For example, a pregnancy test with a positive outcome, when you aren’t pregnant is an example of a type 1 error.

Type 2 Error

Also known as a false negative , it is an outcome where the model incorrectly predicts the negative class. An example of this is when a result says that you don’t have cancer when you actually do.

Accuracy

The accuracy of a model is simply equal to the fraction of predictions that a model got right and is represented by the following equation:

Precision

Precision attempts to answer “What proportion of positive identifications was actually correct?” and can be represented by the following equation:

Recall

Recall attempts to answer “What proportion of actual positives was identified correctly?” and can be represented by the following equation:

F1 score

The F1 score is a measure of a test’s accuracy — it is the harmonic mean of precision and recall. It can have a maximum score of 1 (perfect precision and recall) and a minimum of 0. Overall, it is a measure of the preciseness and robustness of your model and can be represented with the following equation:

R-Squared

R Squared is a measurement that tells you to what extent the proportion of variance in the dependent variable is explained by the variance in the independent variables. In simpler terms, while the coefficients estimate trends, R-squared represents the scatter around the line of best fit.

For example, if the R² is 0.80, then 80% of the variation can be explained by the model’s inputs.

If the R² is 1.0 or 100%, that means that all movements of the dependent variable can be entirely explained by the movements of the independent variables.

Adjusted R-Squared

Every additional independent variable added to a model always increases the R² value — therefore, a model with several independent variables may seem to be a better fit even if it isn’t. This is where Adjusted R² comes in. The adjusted R² compensates for each additional independent variable and only increases if each given variable improves the model above what is possible by probability.

Mean Absolute Error (MAE)

The absolute error is the difference between the predicted values and the actual values. Thus, the mean absolute error is the average of the absolute error.

Mean Squared Error (MSE)

The mean squared error or MSE is similar to the MAE, except you take the average of the squared differences between the predicted values and the actual values.

Because the differences are squared, larger errors are weighted more highly, and so this should be used over the MAE when you want to minimize large errors. Below is the equation for MSE, as well as the code.

O verall, the metric(s) that you choose to monitor ultimately depends on the task at hand, and the business context that you’re working in.

For example, it’s common knowledge in the data science world that accuracy metrics are irrelevant when it comes to fraud detection models because the percentage of fraudulent transactions is usually less than 1%. Therefore, even if a fraudulent detection model has an accuracy of 99% because it classifies all transactions as non-fraudulent, that doesn’t help us determine whether the model is effective or not.

Another example is that the severity of a false negative classification when it comes to cancer screening tests is much worse than a false positive classification. Saying that a patient with cancer doesn’t have cancer can ultimately lead to his or her death. This is much worse than saying that a patient has cancer, conducting further tests, only to realize that the patient does not have cancer. (It’s always better to be safe than sorry!)


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK