VC Dimension
VC Dimension
the formal name of Maximum Non-Break Point
Def.
VC dimension of H, denoted \(d_{vc}(H)\) is largest N for which \(m_H(N) = 2^N\)
the most inputs H that can shatter
\(d_{vc}\) = ‘minimum k’ - 1
if \(N \ge 2, d_{vc} \ge 2, m_H(N) \le N^{d_{vc}}\)
\(d_{vc}\) is the maximum that \(m_H(N) = 2^N\)
\(d_{vc}(H)\) : powerfulness of H
Penalty for Model Complexity
\[\mathbb{P} [ | E_{in}(g) - E_{out}(g) | > \epsilon ] \le 4 ( 2 N )^{d_{vc}} \ exp( -\frac{1}{8} \epsilon^2 N ) = \delta \\ \begin{align} \delta & = 4 ( 2 N )^{d_{vc}} \ exp( -\frac{1}{8} \epsilon^2 N ) \\ \frac{\delta}{4 ( 2 N )^{d_{vc}}} & = exp( -\frac{1}{8} \epsilon^2 N ) \\ \ln \left( \frac{4 ( 2 N )^{d_{vc}}}{\delta} \right) & = -\frac{1}{8} \epsilon^2 N \\ \sqrt{\frac{8}{N} \ln \left( \frac{4 ( 2 N )^{d_{vc}}}{\delta} \right)} & = \epsilon \\ \end{align}\]gen. error
\[| E_{in}(g) - E_{out}(g) | \le \sqrt{\frac{8}{N} \ln \left( \frac{4 ( 2 N )^{d_{vc}}}{\delta} \right)} \\ E_{out}(g) \le E_{in}(g) + \sqrt{\frac{8}{N} \ln \left( \frac{4 ( 2 N )^{d_{vc}}}{\delta} \right)}\]penalty for Model Complexity : \(\frac{8}{N} \ln \left( \frac{4 ( 2 N )^{d_{vc}}}{\delta} \right) = \Omega(N,H,\delta)\)
Target Distribution P(y|x)
unknown target distribution P(y|x) containing f(x) + noise
Error Measure
\[\begin{Bmatrix} P(y = 1|x) & = 0.2 \\ P(y = 2|x) & = 0.7 \\ P(y = 3|x) & = 0.1 \\ \end{Bmatrix}\]0/1 error :
\[err( \hat{y}, y) = [ \hat{y} \ne y ]\]correct or incorrect ?
often for classification
\[\hat{y} = \begin{Bmatrix} 1 & avg. err = 0.8 \\ 2 & avg. err = 0.3 \\ 3 & avg. err = 0.9 \\ 1.9 & avg. err = 1.0 \\ \end{Bmatrix}\]squared error :
\[err( \hat{y}, y) = ( \hat{y} - y )^2\]how far is ^y from y ?
often for regression
\[\hat{y} = \begin{Bmatrix} 1 & avg. err = 1.1 \\ 2 & avg. err = 0.3 \\ 3 & avg. err = 1.5 \\ 1.9 & avg. err = 0.29(*) \\ \end{Bmatrix} \\ \\ f(x) = \sum_{y \in Y} y \ P(y | x)\]Algorithmic Error Measure : \(\widehat{err}\)