[Submitted on 13 Nov 2022]

Near-Linear Sample Complexity for L_p Polynomial Regression

We study L_p polynomial regression. Given query access to a function f:[-1,1] \rightarrow \mathbb{R}, the goal is to find a degree d polynomial \hat{q} such that, for a given parameter \varepsilon > 0, \|\hat{q}-f\|_p\le (1+\varepsilon) \cdot \min_{q:\text{deg}(q)\le d}\|q-f\|_p. Here \|\cdot\|_p is the L_p norm, \|g\|_p = (\int_{-1}^1 |g(t)|^p dt)^{1/p}. We show that querying f at points randomly drawn from the Chebyshev measure on [-1,1] is a near-optimal strategy for polynomial regression in all L_p norms. In particular, to find \hat q, it suffices to sample O(d\, \frac{\text{polylog}\,d}{\text{poly}\,\varepsilon}) points from [-1,1] with probabilities proportional to this measure. While the optimal sample complexity for polynomial regression was well understood for L_2 and L_\infty, our result is the first that achieves sample complexity linear in d and error (1+\varepsilon) for other values of p without any assumptions.
Our result requires two main technical contributions. The first concerns p\leq 2, for which we provide explicit bounds on the L_p Lewis weight function of the infinite linear operator underlying polynomial regression. Using tools from the orthogonal polynomial literature, we show that this function is bounded by the Chebyshev density. Our second key contribution is to take advantage of the structure of polynomials to reduce the p>2 case to the p\leq 2 case. By doing so, we obtain a better sample complexity than what is possible for general p-norm linear regression problems, for which \Omega(d^{p/2}) samples are required.

Comments:	68 pages, to be presented at SODA 2023
Subjects:	Data Structures and Algorithms (cs.DS)
Cite as:	arXiv:2211.06790 [cs.DS]
	(or arXiv:2211.06790v1 [cs.DS] for this version)
	https://doi.org/10.48550/arXiv.2211.06790

[2211.06790] Near-Linear Sample Complexity for $L_p$ Polynomial Regression

Near-Linear Sample Complexity for L_p Polynomial Regression

Recommend

TikTok开始与苹果谈判确保美数据安全计划不受应用商店阻碍

中东LP频频现身国内一级市场

[2301.09810] Balanced Allocations with Heterogeneous Bins: The Power of Memory

[2211.05217] Smaller Low-Depth Circuits for Kronecker Powers

电动车满意度最新调查：特斯拉首失桂冠败给Rivian

6 Programming Languages You Didn’t Know Were Invented By Women

积极应对人口老龄化，国务院发文鼓励加强中医药老年健康服务能力建设

Handyman Websites: 28 Examples for Inspiration

2. You say Low, I say Code!

办一张日常使用的主卡，移动，电信和联通选择什么好？

About Joyk