Advanced Machine Learning

22: Model Comparison II

Outline for the lecture

Cross Validation
Expected Value (Profit)
Visualizing Model Performance

Cross Validation

5-fold cross validation

Cross validation results

Grid search workflow

Extra testing data

Nested Cross Validation

Hyperparameter optimization

Tools for hyperparameter optimization

Expected Value (profit)

Statistician view of the world

Which Metric is the right one?

it depends

Expected value

Let's denote an outcome $i$ as $o_i$
The probability of that outcome as $\prob{p}{o_i}$
And its value as $\prob{v}{o_i}$
The expected value is nothing but $$ EV = \sum_i^K \prob{p}{o_i}\cdot\prob{v}{o_i} $$

Example: targeted marketing

A consumer buys the product for $\$200$ and our product- related costs are $\$100$.
To target the consumer with the offer, we also incur a cost. Let’s say that we mail some flashy marketing materials, and the overall cost including postage is $\$1$, yielding a value (profit) of $\prob{v}{o_R} = \$99$ if the consumer responds (buys the product).
Now, what about vNR, the value to us if the consumer does not respond? We still mailed the marketing materials, incurring a cost of $\$1$ or equivalently a benefit of $-\$1$.

Example: targeted marketing

shall we target this specific consumer?

$\prob{p}{o_R} \cdot \$99 - [1 - \prob{p}{o_{R}}]\cdot \$1 \gt 0$
$\prob{p}{o_R} \cdot \$99 \gt [1 - \prob{p}{o_{R}}]\cdot \$1$
$\prob{p}{o_R} \gt 0.01$
Send marketing materials if probability of responding is $\gt 1\%$

Example: targeted marketing

First, convert confusion matrix to probabilities

Example: targeted marketing

Second, estimate the cost benefit matrix

Expected Value Calculation

Example: targeted marketing

What's the profit?

\begin{align} EV = &\, \prob{p}{Y, p} \prob{b}{Y, p} + \prob{p}{N, p} \prob{b}{N, p} + \prob{p}{N, n} \prob{b}{N, n} + \prob{p}{Y, n} \prob{b}{Y, n} \\ = & \fragment{0}{\, \prob{p}{Y| p}\prob{p}{p} \prob{b}{Y, p} + \prob{p}{N|p} \prob{p}{p} \prob{b}{N, p}} \\ & \fragment{0}{+ \prob{p}{N|n} \prob{p}{n} \prob{b}{N, n} + \prob{p}{Y|n} \prob{p}{n} \prob{b}{Y, n}}\\ = &\, \fragment{1}{\prob{p}{p} \left[ \prob{p}{Y| p} \prob{b}{Y, p} + \prob{p}{N|p} \prob{b}{N, p} \right]} \\ & \fragment{1}{+ \prob{p}{n} \left[ \prob{p}{N|n} \prob{b}{N, n} + \prob{p}{Y|n} \prob{b}{Y, n} \right]}\\ =&\, \fragment{2}{0.55 \left[ 0.92\cdot \prob{b}{Y, p} + 0.08\cdot \prob{b}{N, p} \right]} \\ & \fragment{2}{+ 0.45 \left[ 0.86\cdot \prob{b}{N, n} + 0.14\cdot \prob{b}{Y, n} \right]}\\ =&\, \fragment{3}{0.55 \left[ 0.92\cdot99 + 0.08\cdot 0 \right] + 0.45 \left[ 0.86\cdot 0 + 0.14\cdot -1 \right]}\\ =&\, \fragment{4}{\$50.04} \end{align}

Visualizing model performance

Remember Expected Value Calculation?

Remember that Bayesian decision boundary?

Let's move the boundary around until we are happy

ML models rarely return score that is the true probability

For example, we can use the distance from the decision boundary in all linear classifiers for ranking samples, but not as probability. Even when we're estimating the probability densities directly, we may not be able to get sufficiently representative training sample.

A classifier and confusion matrix

With a ranking classifier, a classifier plus a threshold produces a single confusion matrix.

Questions

How do we compare different rankings?
How do we choose a proper threshold?

Profit Curve

With a ranking classifier, we can produce a list of instances and their predicted scores, ranked by decreasing score, and then measure the expected profit that would result from choosing each successive cut-point in the list.

Profit Curve: example

Let's assume our profit margin is small

Profit Curve: example

Profit Curve on budget: example

problems with Profit Curves

Two critical conditions in profit calculation
- The class priors
- The costs and benefits
If both are known, profit curves may be a good choice to visualize classifier performance
In many domains these conditions are either unstable or uncertain
- Ex: the amount of fraud changes from place to place and month to month
- Ex: marketing campaigns have different budgets and offers may have different costs

A possible solution

Draw a profit curve per condition
Difficult to manage, to understand the implications of, and difficult to explain to a stakeholder