Advanced Machine Learning

22: Model Comparison II

Outline for the lecture

  • Cross Validation
  • Expected Value (Profit)
  • Visualizing Model Performance

Cross Validation

5-fold cross validation

XV 5 fold

Cross validation results

XV bars

Grid search workflow

cats and mice

Extra testing data

cats and mice

Nested Cross Validation

cats and mice

Nested Cross Validation

cats and mice

Hyperparameter optimization

hyperopt hyperopt

Tools for hyperparameter optimization

Expected Value (profit)

Statistician view of the world

Pregnant

Which Metric is the right one?

Metrics

it depends

Expected value

  • Let's denote an outcome $i$ as $o_i$
  • The probability of that outcome as $\prob{p}{o_i}$
  • And its value as $\prob{v}{o_i}$
  • The expected value is nothing but $$ EV = \sum_i^K \prob{p}{o_i}\cdot\prob{v}{o_i} $$

Example: targeted marketing

  • A consumer buys the product for $\$200$ and our product- related costs are $\$100$.
  • To target the consumer with the offer, we also incur a cost. Let’s say that we mail some flashy marketing materials, and the overall cost including postage is $\$1$, yielding a value (profit) of $\prob{v}{o_R} = \$99$ if the consumer responds (buys the product).
  • Now, what about vNR, the value to us if the consumer does not respond? We still mailed the marketing materials, incurring a cost of $\$1$ or equivalently a benefit of $-\$1$.

Example: targeted marketing

shall we target this specific consumer?

  • $\prob{p}{o_R} \cdot \$99 - [1 - \prob{p}{o_{R}}]\cdot \$1 \gt 0$
  • $\prob{p}{o_R} \cdot \$99 \gt [1 - \prob{p}{o_{R}}]\cdot \$1$
  • $\prob{p}{o_R} \gt 0.01$
  • Send marketing materials if probability of responding is $\gt 1\%$

Example: targeted marketing

First, convert confusion matrix to probabilities

hyperopt hyperopt

Example: targeted marketing

Second, estimate the cost benefit matrix

CB CB example

Expected Value Calculation

EV diagram

Example: targeted marketing

What's the profit?

\begin{align} EV = &\, \prob{p}{Y, p} \prob{b}{Y, p} + \prob{p}{N, p} \prob{b}{N, p} + \prob{p}{N, n} \prob{b}{N, n} + \prob{p}{Y, n} \prob{b}{Y, n} \\ = & \fragment{0}{\, \prob{p}{Y| p}\prob{p}{p} \prob{b}{Y, p} + \prob{p}{N|p} \prob{p}{p} \prob{b}{N, p}} \\ & \fragment{0}{+ \prob{p}{N|n} \prob{p}{n} \prob{b}{N, n} + \prob{p}{Y|n} \prob{p}{n} \prob{b}{Y, n}}\\ = &\, \fragment{1}{\prob{p}{p} \left[ \prob{p}{Y| p} \prob{b}{Y, p} + \prob{p}{N|p} \prob{b}{N, p} \right]} \\ & \fragment{1}{+ \prob{p}{n} \left[ \prob{p}{N|n} \prob{b}{N, n} + \prob{p}{Y|n} \prob{b}{Y, n} \right]}\\ =&\, \fragment{2}{0.55 \left[ 0.92\cdot \prob{b}{Y, p} + 0.08\cdot \prob{b}{N, p} \right]} \\ & \fragment{2}{+ 0.45 \left[ 0.86\cdot \prob{b}{N, n} + 0.14\cdot \prob{b}{Y, n} \right]}\\ =&\, \fragment{3}{0.55 \left[ 0.92\cdot99 + 0.08\cdot 0 \right] + 0.45 \left[ 0.86\cdot 0 + 0.14\cdot -1 \right]}\\ =&\, \fragment{4}{\$50.04} \end{align}

Visualizing model performance

Remember Expected Value Calculation?

EV diagram

Remember that Bayesian decision boundary?

decision boundary

Let's move the boundary around until we are happy

Cutoffs

ML models rarely return score that is the true probability

For example, we can use the distance from the decision boundary in all linear classifiers for ranking samples, but not as probability. Even when we're estimating the probability densities directly, we may not be able to get sufficiently representative training sample.
Confusion Matrix

A classifier and confusion matrix

With a ranking classifier, a classifier plus a threshold produces a single confusion matrix.
sorted data
Confusion Matrix

Questions

  • How do we compare different rankings?
  • How do we choose a proper threshold?
Confusion Matrix

Profit Curve

With a ranking classifier, we can produce a list of instances and their predicted scores, ranked by decreasing score, and then measure the expected profit that would result from choosing each successive cut-point in the list.

Profit Curve: example

Let's assume our profit margin is small

small profit margin
ranking

Profit Curve: example

small profit curve
ranking

Profit Curve on budget: example

small profit curve

problems with Profit Curves

  • Two critical conditions in profit calculation
    • The class priors
    • The costs and benefits
  • If both are known, profit curves may be a good choice to visualize classifier performance
  • In many domains these conditions are either unstable or uncertain
    • Ex: the amount of fraud changes from place to place and month to month
    • Ex: marketing campaigns have different budgets and offers may have different costs

A possible solution

  • Draw a profit curve per condition
  • Difficult to manage, to understand the implications of, and difficult to explain to a stakeholder