Understanding Machine Learning 03

understanding machine learning

note

Author

Shiguang Wu

Published

March 11, 2022

Uniform Convergence

Before, the way of choosing \(m_H\) and the effect of \(\delta\) was related to the learner. However, we can borrow the concept of uniform convergence from analysis to make it independent of what learner you use.

here, we can treat \(L_S(h)\) as \(\sum_i^nf_i(x)\) since they are both intermediate vals during the convergence, and \(L_D(h)\) to be the final end

def \(\epsilon\)-representative sample

\(S\) is \(\epsilon\)-representation sample (w.r.t domain, \(H\), \(l\) and \(D\)) if

\[ \forall h\in\mathcal{H},\, |L_S(h)-L_D(h)|\le \epsilon \]

lemma

if S is \(\frac{\epsilon}{2}\)-representative, then \(\forall h_S\in\argmin_{h\in\mathcal{H}}L_S(h)\)

\[ L_D(h_S)\le \min_{h\in\mathcal{H}}L_D(h)+\epsilon \]

through this lemma, we can immediately have

\(S\) is \(\frac{\epsilon}{2}\)-representative with prob \(1-\delta\) \(\implies\) Agnostic PAC learnability

def uniform convergence

H has the uniform convergence \(\coloneqq\) exists a func \(m_H^{UC}(\epsilon, \delta)\), for every \(D\), if \(|S|\gt m_H^{UC}\) then with \(1-\delta\) prob, it is \(\epsilon\)-representative

It seems stronger than the original agnostic PAC, just like the rel between uniform conv and normal conv in analysis. normal conv only cares the situation in a certain area (here the decider generated by the learner), while uni conv holds on the whole area (all \(h\in\mathcal{H}\))

corallary

\(m_H\le m_H^{UC}\) if \(H\) has the uni conv property

situation of the finite H class

need to find \(m\), so that

\[ D^m(\{S:\forall h\in \mathcal{H},|L_S(h)-L_D(h)|\le\epsilon\})\ge 1-\delta \]

and we may convert it into a more familiar form (convenient for using inequalities)

\[ D^m(\{S:\exists h\in \mathcal{H},|L_S(h)-L_D(h)|\gt\epsilon\})\lt \delta \]

using union bound and Hodeffing inequalities (note that \(L_D(h)=\mathbb{E}_{S\sim D^m}(L_S(h))\)), we have

\[ LHS\le\sum_{h\in\mathcal{H}}2exp(-2m\epsilon^2) \]

as a corollary, we have the upper bound for finite hypothesis class which is agnostic PAC learnable

\[ m_H^{UC}(\epsilon,\delta)\le\left\lceil\frac{log(2|\mathcal{H}|/\delta)}{2\epsilon^2}\right\rceil \]

summary

if uni conv holds, then in most cases, the empirical risks of h in H will faithfully represent their true risks

exercises

\(\forall \epsilon,\delta\gt 0,\exists m(\epsilon,\delta)\,s.t.\) \[ \forall m\ge m(\epsilon,\delta),\,\mathcal{P}_{S\sim D^m}[L_D(A(S))\gt\epsilon]\lt\delta\]
\(\lim_{m\to \infty}\mathbb{E}_{S\sim D^m}[L_D(A(S))]=0\)

1 \(\iff\) 2