VC Dimension
VC Dimension
the formal name of Maximum Non-Break Point
Def.
VC dimension of H, denoted dvc(H) is largest N for which mH(N)=2N
the most inputs H that can shatter
dvc = ‘minimum k’ - 1
if N≥2,dvc≥2,mH(N)≤Ndvc
dvc is the maximum that mH(N)=2N
dvc(H) : powerfulness of H
Penalty for Model Complexity
P[|Ein(g)−Eout(g)|>ϵ]≤4(2N)dvc exp(−18ϵ2N)=δδ=4(2N)dvc exp(−18ϵ2N)δ4(2N)dvc=exp(−18ϵ2N)ln(4(2N)dvcδ)=−18ϵ2N√8Nln(4(2N)dvcδ)=ϵgen. error
|Ein(g)−Eout(g)|≤√8Nln(4(2N)dvcδ)Eout(g)≤Ein(g)+√8Nln(4(2N)dvcδ)penalty for Model Complexity : 8Nln(4(2N)dvcδ)=Ω(N,H,δ)
Target Distribution P(y|x)
unknown target distribution P(y|x) containing f(x) + noise
Error Measure
{P(y=1|x)=0.2P(y=2|x)=0.7P(y=3|x)=0.1}0/1 error :
err(ˆy,y)=[ˆy≠y]correct or incorrect ?
often for classification
ˆy={1avg.err=0.82avg.err=0.33avg.err=0.91.9avg.err=1.0}squared error :
err(ˆy,y)=(ˆy−y)2how far is ^y from y ?
often for regression
ˆy={1avg.err=1.12avg.err=0.33avg.err=1.51.9avg.err=0.29(∗)}f(x)=∑y∈Yy P(y|x)Algorithmic Error Measure : ^err