$\text{ERM}_{\cal H}(S) \stackrel{\text{def}}{=} \underset{h\in{\cal H}}{\argmin}L_S(h)$
bias the learner to a particular set of predictors
Other inductive biasi
Maximum conditional independence cast in a Bayesian framework, try to maximize conditional independence (Naive Bayes)
Minimum cross-validation error
Maximum margin
Minimum description length
Minimum features
Nearest neighbors
Finite Hypothesis Classes
$h_S \in \underset{h\in{\cal H}}{\argmin}L_S(h)$
Bound the probability of error
Assumptions
The Realizability Assumption: There exists $h^* \in {\cal H} s.t. L_{{\cal D}, f}(h^*)=0$. This implies: with probability 1 over random samples $S\sim {\cal D}$ labeled by $f$, we have $L_S(h^*)=0$
The i.i.d. Assumption: Samples in the training set are independent and identically distributed. Denoted as $S\sim {\cal D}^m$
Confidence and accuracy
The risk $L_{({\cal D},f)}(h_S)$ depends on the randomly picked training set $S$. We say, the risk is a random variable.
Some training sets $S$ can be really bad! Denote the probability of getting a nonrepresentative sample by $\delta$
Let's call $(1-\delta)$ - confidence parameter
Can't hope to always have perfect loss $L_{({\cal D}, f)}=0$. Let's introduce the accuracy parameter $\epsilon$.
Failure is when $L_{({\cal D}, f)}>\epsilon$ Success is when $L_{({\cal D}, f)}\le \epsilon$
What is the probability of a bad sample that fails the learner?
$S\mid_x = (x_1, x_2, \dots, x_m)$ - instances of the training set
We want to upperbound ${\cal D}^m(\{S\mid_x : L_{({\cal D}, f)}(h_S) > \epsilon\})$
The set of "bad" hypotheses ${\cal H}_B = \{h\in {\cal H} : L_{({\cal D}, f)}(h) > \epsilon\})$
Set of misleading samples $M = \{S\mid_x : \exists h \in {\cal H}_B, L_S(h)=0\}$
But remember our assumption?
The Realizability Assumption: There exists $h^* \in {\cal H} s.t. L_{{\cal D}, f}(h^*)=0$. This implies: with probability 1 over random samples $S\sim {\cal D}$ labeled by $f$, we have $L_S(h^*)=0$
Means $L_S(h_S) = 0$, where $h_S \in \underset{h\in{\cal H}}{\argmin}L_S(h)$. Hence $L_{({\cal D}, f)}(h_S) > \epsilon$ can only happen if for some $h\in {\cal H}_B$, $L_S(h)=0$