机器学习著名定理之—No Free Lunch定理详解
source link: https://blog.csdn.net/qq_38406029/article/details/123061417
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
No Free Lunch定理
定理(No Free Lunch): 假定 A \mathcal{A} A是一个在域 X \mathcal{X} X的二分类任务中任意一个机器学习算法,其损失函数为 0 - 1 0\text{-}1 0-1损失。令 n n n是一个大小为 ∣ X ∣ / 2 |\mathcal{X}|/2 ∣X∣/2的训练集,存在域 X \mathcal{X} X中的分布 D \mathcal{D} D,则有
(1)存在一个函数 f : X → { 0 , 1 } f:\mathcal{X}\rightarrow \{0,1\} f:X→{0,1},且有 L D ( f ) = 0 L_\mathcal{D}(f)=0 LD(f)=0。
(2)对于子列 S ∼ D n \mathcal{S\sim D}^n S∼Dn,则概率不等式 P ( L D ( A ( S ) ) ≥ 1 / 8 : S ∼ D n ) ≥ 1 / 7 P(L_\mathcal{D}(\mathcal{A}(\mathcal{S}))\ge 1/8: \mathcal{S}\sim\mathcal{D}^n)\ge 1/7 P(LD(A(S))≥1/8:S∼Dn)≥1/7成立。
证明:
(1) 令
C
\mathcal{C}
C表示域
X
\mathcal{X}
X中大小为
2
n
2n
2n的一个子集。主要的证明思路是只利用数据集
C
\mathcal{C}
C一半的数据样本点并不能给出剩下一半数据点的信息。 假定
H
\mathcal{H}
H表示数据集
C
\mathcal{C}
C到标签集合
{
0
,
1
}
\{0,1\}
{0,1}所有可能的函数集合,且
T
T
T表示的是函数集合的基数,其中
H
=
{
f
1
,
⋯
,
f
T
}
\mathcal{H}=\{f_1,\cdots,f_T\}
H={f1,⋯,fT},
T
=
2
2
n
T=2^{2n}
T=22n。对于
H
\mathcal{H}
H中每一个函数假设,令
D
i
\mathcal{D}_i
Di是
C
×
{
0
,
1
}
\mathcal{C}\times\{0,1\}
C×{0,1}中的分布
D
i
(
{
(
x
,
y
)
}
)
=
{
1
/
2
m
i
f
y
=
f
i
(
x
)
0
o
t
h
e
r
w
i
s
e
\mathcal{D}_i(\{(x,y)\})=\left\{\right.
Di({(x,y)})={1/2m0if y=fi(x)otherwise进而可知存在函数
f
i
f_i
fi,在数据分布
D
i
\mathcal{D}_i
Di上则有
L
D
i
(
f
i
)
=
0
L_{\mathcal{D}_i}(f_i)=0
LDi(fi)=0。
(2)主要证明的关键在于即对任意的学习算法
A
\mathcal{A}
A有
max
i
∈
[
T
]
E
S
∼
D
i
n
[
L
D
i
(
A
(
S
)
)
]
≥
1
/
4
\max\limits_{i \in [T]}\mathbb{E}_{\mathcal{S}\sim \mathcal{D}_i^n}[L_{\mathcal{D}_i}(\mathcal{A}(\mathcal{S}))]\ge 1 / 4
i∈[T]maxES∼Din[LDi(A(S))]≥1/4首先从
C
×
{
0
,
1
}
\mathcal{C}\times \{0,1\}
C×{0,1}中采样出
n
n
n个样本构造一个训练集,其中采样出的样本可以重复,进而可知有
k
=
(
2
n
)
n
k=(2n)^n
k=(2n)n中可能的样本序列。令这些样本序列分别表示为
S
1
,
S
2
,
⋯
,
S
k
\mathcal{S}_1,\mathcal{S}_2,\cdots,\mathcal{S}_k
S1,S2,⋯,Sk。
S
j
i
=
(
(
x
1
,
f
i
(
x
1
)
)
,
⋯
,
(
x
n
,
f
i
(
x
n
)
)
)
\mathcal{S}_j^i=((x_1,f_i(x_1)),\cdots,(x_n,f_i(x_n)))
Sji=((x1,fi(x1)),⋯,(xn,fi(xn)))表示的是函数
f
j
f_{j}
fj在样本序列
S
j
S_j
Sj中的数据集合,则有
E
S
∼
D
i
n
[
L
D
i
(
A
(
S
)
)
]
=
1
k
∑
j
=
1
k
L
D
i
(
A
(
S
j
i
)
)
\mathbb{E}_{\mathcal{S}\sim \mathcal{D}_i^n}[L_{\mathcal{D}_i}(\mathcal{A}(\mathcal{S}))]=\frac{1}{k}\sum\limits_{j=1}^k L_{\mathcal{D}_i}(\mathcal{A}(\mathcal{S}^i_j))
ES∼Din[LDi(A(S))]=k1j=1∑kLDi(A(Sji))又因为
"
m
a
x
i
m
u
m
"
≥
"
a
v
e
r
a
g
e
"
≥
"
m
i
n
i
m
u
m
"
\mathrm{"maximum"}\ge \mathrm{"average"}\ge \mathrm{"minimum"}
"maximum"≥"average"≥"minimum",所以则有
max
i
∈
[
T
]
1
T
∑
j
=
1
k
L
D
i
(
A
(
S
j
i
)
)
≥
1
T
∑
i
=
1
T
1
k
∑
j
=
1
k
L
D
i
(
A
(
S
j
i
)
)
=
1
k
∑
j
=
1
k
1
T
∑
i
=
1
T
L
D
i
(
A
(
S
j
i
)
)
≥
min
j
∈
[
k
]
1
T
∑
i
=
1
T
L
D
i
(
A
(
S
j
i
)
)
i∈[T]maxT1j=1∑kLDi(A(Sji))≥T1i=1∑Tk1j=1∑kLDi(A(Sji))=k1j=1∑kT1i=1∑TLDi(A(Sji))≥j∈[k]minT1i=1∑TLDi(A(Sji))现固定
j
∈
[
k
]
j \in [k]
j∈[k],令
S
j
=
{
x
1
,
⋯
,
x
n
}
\mathcal{S}_j=\{x_1,\cdots,x_n\}
Sj={x1,⋯,xn},
C
\
S
j
=
{
v
1
,
⋯
,
v
p
}
C \backslash \mathcal{S}_j=\{v_1,\cdots,v_p\}
C\Sj={v1,⋯,vp},其中
p
≥
n
p\ge n
p≥n是剩余没有采样的样本数。对于每一个函数
h
:
C
→
{
0
,
1
}
h:C \rightarrow\{0,1\}
h:C→{0,1},
i
∈
[
T
]
i\in[T]
i∈[T]有
L
D
i
(
h
)
=
1
2
n
∑
x
∈
C
I
(
h
(
x
)
≠
f
i
(
x
)
)
=
1
2
n
(
∑
l
=
1
n
I
(
h
(
x
l
)
≠
f
i
(
x
l
)
)
+
∑
r
=
1
p
I
(
h
(
v
r
)
≠
f
i
(
v
r
)
)
)
≥
1
2
n
∑
r
=
1
p
I
(
h
(
v
r
)
≠
f
i
(
v
r
)
)
≥
1
2
p
∑
r
=
1
p
I
(
h
(
v
r
)
≠
f
i
(
v
r
)
)
LDi(h)=2n1x∈C∑I(h(x)=fi(x))=2n1(l=1∑nI(h(xl)=fi(xl))+r=1∑pI(h(vr)=fi(vr)))≥2n1r=1∑pI(h(vr)=fi(vr))≥2p1r=1∑pI(h(vr)=fi(vr))所以
1
T
∑
i
=
1
T
L
D
i
(
A
(
S
j
i
)
)
≥
1
T
∑
i
=
1
T
1
2
p
∑
r
=
1
p
I
(
A
(
S
j
i
)
(
v
r
)
≠
f
i
(
v
r
)
)
=
1
2
p
∑
r
=
1
p
1
T
∑
i
=
1
T
I
(
A
(
S
j
i
)
(
v
r
)
≠
f
i
(
v
r
)
)
≥
1
2
min
r
∈
[
p
]
1
T
∑
i
=
1
T
I
(
A
(
S
j
i
)
(
v
r
)
≠
f
i
(
v
r
)
)
T1i=1∑TLDi(A(Sji))≥T1i=1∑T2p1r=1∑pI(A(Sji)(vr)=fi(vr))=2p1r=1∑pT1i=1∑TI(A(Sji)(vr)=fi(vr))≥21r∈[p]minT1i=1∑TI(A(Sji)(vr)=fi(vr))对于给定的
r
∈
[
p
]
r\in [p]
r∈[p],因为
T
T
T是所有可能函数映射的基数,所以总有成对存在的
a
,
b
∈
[
T
]
a,b\in [T]
a,b∈[T]有
I
(
A
(
S
j
i
)
(
v
r
)
≠
f
a
(
v
r
)
)
+
I
(
A
(
S
j
i
)
(
v
r
)
≠
f
b
(
v
r
)
)
=
1
\mathbb{I}(\mathcal{A}(\mathcal{S}^i_j)(v_r)\ne f_a(v_r))+\mathbb{I}(\mathcal{A}(\mathcal{S}^i_j)(v_r)\ne f_b(v_r))=1
I(A(Sji)(vr)=fa(vr))+I(A(Sji)(vr)=fb(vr))=1进而则有
E
S
∼
D
i
n
[
L
D
i
(
A
(
S
)
)
]
=
e
1
T
∑
i
=
1
T
L
D
i
(
A
(
S
j
i
)
)
≥
1
2
min
r
∈
[
p
]
1
T
∑
i
=
1
T
I
(
A
(
S
j
i
)
(
v
r
)
≠
f
i
(
v
r
)
)
=
1
2
⋅
1
T
⋅
T
2
=
1
4
ES∼Din[LDi(A(S))]=eT1i=1∑TLDi(A(Sji))≥21r∈[p]minT1i=1∑TI(A(Sji)(vr)=fi(vr))=21⋅T1⋅2T=41根据马尔可夫不等式的修改版可知,给定一个随机变量
X
∈
[
0
,
1
]
X \in[0,1]
X∈[0,1],给定一个常数
a
∈
[
0
,
1
]
a\in [0,1]
a∈[0,1],进而则有
E
[
X
]
=
∫
0
1
x
p
(
x
)
d
x
=
∫
0
a
x
p
(
x
)
d
x
+
∫
a
1
x
p
(
x
)
d
x
≤
a
∫
0
a
p
(
x
)
d
x
+
∫
a
1
p
(
x
)
d
x
=
a
(
1
−
p
{
X
≥
a
}
)
+
p
{
X
≥
a
}
=
a
+
(
1
−
a
)
P
{
X
≥
a
}
E[X]=∫01xp(x)dx=∫0axp(x)dx+∫a1xp(x)dx≤a∫0ap(x)dx+∫a1p(x)dx=a(1−p{X≥a})+p{X≥a}=a+(1−a)P{X≥a}马尔可夫不等式为
P
{
X
≥
a
}
≥
E
[
X
]
−
a
1
−
a
P\{X\ge a\}\ge \frac{\mathbb{E}[X]-a}{1-a}
P{X≥a}≥1−aE[X]−a利用马尔可夫不等可知
P
(
L
D
(
A
(
S
)
)
≥
1
/
8
:
S
∼
D
n
)
=
E
S
∼
D
i
n
[
L
D
i
(
A
(
S
)
)
]
−
1
/
8
1
−
1
/
8
≥
1
/
4
−
1
/
8
1
−
1
/
8
=
1
7
P(LD(A(S))≥1/8:S∼Dn)=1−1/8ES∼Din[LDi(A(S))]−1/8≥1−1/81/4−1/8=71
Recommend
-
74
README.md
-
8
Trust Eats Process for Lunch A set of individuals can become a self-actualized team that can deliver working software despite uncertainty, not when they hav...
-
5
There's No Such Thing as Free Lunch: How to Choose the Best Fundraising Option for Your New Business Before charging ahead with fundraising, think carefully about how different pathways can accelerate your co...
-
4
There ain't no such thing as a free lunch From Wikipedia, the free encyclopedia Jump to navigation
-
4
June 3, 2021 .NET Open Source: What Happens When the Free Lunch Ends? It’s a Thursday, which means: .NET open source drama. Last October,...
-
1
中国剩余定理学习笔记[TOC] 中国剩余定理(Chinese Remainder Theorem, CRT)是一种求解一元线性同余方程组的方法。其中,“同余”指的是两个数mod n得到的余数相同,因此剩余定理也可以被叫做余数定理。 维基百科上给出的形式描述如下:
-
3
Twitter No Free Lunch At Twitter Anymore: Elon Musk Will Save $13 Million Due To This Move
-
7
Gov. Walz signs free school lunch bill into law Gov. Walz signs free school lunch bill into law A bill that will provide free school lunches to all students in Minnesota is now law.
-
3
Red Hat ends the RHEL clones’ free lunch Red Hat is forcing companies to choose a successor to CentOS Linux. Think carefully about the foundation of your...
-
3
Venture Capital & Free LunchWhy Venture Capital is the Best Asset ClassFeb 20, 2024Welcome to the 68 newly Not Boring people who ha...
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK