# Precision vs Recall

{% hint style="info" %}
*Sources:*

* [*Precision vs Recall (Shurti Saxena)*](https://towardsdatascience.com/precision-vs-recall-386cf9f89488)
* [*Identification of Similar and Complementary Subparts in B-Rep Mechanical Models*](http://computingengineering.asmedigitalcollection.asme.org/article.aspx?articleid=2610217)
* [*Beyond Accuracy: Precision and Recall (Will Koehrsen)*](https://towardsdatascience.com/beyond-accuracy-precision-and-recall-3da06bea9f6c)
  {% endhint %}

## Overview

* [**Accuracy**](https://jgoodman8.gitbook.io/iron-data-science-notebook/ml-techniques/metrics#accuracy) expresses the percentage of results correctly classified.
* [**Precision**](https://jgoodman8.gitbook.io/iron-data-science-notebook/ml-techniques/metrics#precision) means the percentage of your results that are relevant.&#x20;
* [**Recall**](https://jgoodman8.gitbook.io/iron-data-science-notebook/ml-techniques/metrics#recall) refers to the percentage of total relevant results correctly classified by your algorithm.

## The trade-off

We can see when our precision is 1.0 (no *FP*), our recall remains very low because we still have many *FN*. If we go to the other extreme and classify all inputs as negatives, we will have a recall of 1.0 but our precision will be very low and we’ll detain many innocent individuals. In other words, **as we increase precision we decrease recall and vice-versa**.

Depending on the situation, **we may maximize either precision** (i.e. spam detection) **or recall** (i.e. disease detection).

The [**confusion matrix**](https://jgoodman8.gitbook.io/iron-data-science-notebook/ml-techniques/metrics#the-confusion-matrix) is useful for quickly calculating precision and recall given the predicted labels from a model.  The other main visualization technique for showing the performance of a classification model is the [**Receiver Operating Characteristic (ROC) curve**](https://jgoodman8.gitbook.io/iron-data-science-notebook/ml-datascience/frequent-questions/how-a-roc-curve-works).

![](https://569842953-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LmjpNbCRLUyGiAxD8kn%2F-Lp00hHYsLPiLWXxI2du%2F-Lp0B8l7KS_HDEfx0TzY%2Fimage.png?alt=media\&token=3f23ac98-004f-4483-90fc-fac90170afbf)

## Combining Precision and Recall

{% hint style="success" %}
If we want to create a **balanced classification model** with the optimal balance of recall and precision, then we try to **maximize the F1 score**.
{% endhint %}

In cases where we want to find an optimal blend of precision and recall we can combine the two metrics using what is called the [**F1 score**](https://jgoodman8.gitbook.io/iron-data-science-notebook/ml-techniques/metrics#f-score). It is the **harmonic mean of precision and recall** taking both metrics into account.

{% content-ref url="../ml-techniques/metrics" %}
[metrics](https://jgoodman8.gitbook.io/iron-data-science-notebook/ml-datascience/ml-techniques/metrics)
{% endcontent-ref %}

&#x20;We use the harmonic mean instead of a simple average because it punishes extreme values. A classifier with a precision of 1.0 and a recall of 0.0 has a simple average of 0.5 but an F1 score of 0.&#x20;

{% content-ref url="how-a-roc-curve-works" %}
[how-a-roc-curve-works](https://jgoodman8.gitbook.io/iron-data-science-notebook/ml-datascience/frequent-questions/how-a-roc-curve-works)
{% endcontent-ref %}
