机器学习代写|流形学习代写manifold data learning代考|Motivation for the Submanifold Estimator

We would like to estimate the values of a PDF that lives on an (unknown) $d$-dimensional Riemannian submanifold $M$ of $\mathbb{R}^{D}$, where $d<D$. Usually, $D$-dimensional KDE does not work for such a distribution. This can be intuitively understood by considering a distribution on a line in the plane: 1-dimensional KDE performed on the line (with a bandwidth $h_{m}$ satisfying the asymptotics given above) would converge to the correct density on the line, but 2-dimensional KDE, differing from the former only by a normalization factor that blows up as the bandwidth $h_{m} \rightarrow 0$ (compare (3.1) for the cases $D=2$ and $D=1$ ), diverges. This behavior is due to the fact that, similar to a “delta function” distribution on $\mathbb{R}$, the $D$-dimensional density of a distribution on a $d$-dimensional submanifold of $\mathbb{R}^{D}$ is, strictly speaking, undefined – the density is zero outside the submanifold, and in order to have proper normalization, it has to be infinite on the submanifold. More formally, the $D$ dimensional probability measure for a $d$-dimensional $\mathrm{PDF}$ supported on $M$ is not absolutely continuous with respect to the Lebesgue measure on $\mathbb{R}^{D}$, and does not have a probability density function on $\mathbb{R}^{D}$. If one attempts to use $D$-dimensional KDE for data drawn from such a probability measure, the estimator will “attempt to converge” to a singular PDF; one that is infinite on $M$, zero outside.

For a distribution with support on a line in the plane, we can resort to 1-dimensional KDE to get the correct density on the line, but how could one estimate the density on an unknown, possibly curved submanifold of dimension $d<D$ ? Essentially the same approach works: even for data that lives on an unknown, curved d-dimensional submanifold of $\mathbb{R}^{D}$, it suffices to use the $d$-dimensional kernel density estimator with the Euclidean distance on $\mathbb{R}^{D}$ to get a consistent estimator of the submanifold density. Furthermore, the convergence rate of this estimator can be bounded as in (3.3), with $D$ being replaced by $d$, the intrinsic dimension of the submanifold. [20]

机器学习代写|流形学习代写manifold data learning代考|Statement of the Theorem

Let $(M, \mathbf{g})$ be a d-dimensional, embedded, complete, compact Riemannian submanifold of $\mathbb{R}^{D}(d0 .^{7}$ Let $d(p, q)=d_{p}(q)$ be the length of a length-minimizing geodesic in $M$ between $p, q \in M$, and let $u(p, q)=u_{p}(q)$ be the geodesic distance between $p$ and $q$ as measured in $\mathbb{R}^{D}$ (thus, $u(p, q)$ is simply the Euclidean distance between $p$ and $q$ in $\left.\mathbb{R}^{D}\right)$. Note that $u(p, q) \leq d(p, q)$. We will denote the Riemannian volume measure on $M$ by $V$, and the volume form by $d V .^{8}$
Theorem 3.3.1 Let $f: M \rightarrow[0, \infty)$ be a probability density function defined on $M$ (so that the related probability measure is $f V)$, and $K:[0, \infty) \rightarrow[0, \infty)$ be a continuous function that vanishes outside $[0,1)$, is differentiable with a bounded derivative in $[0,1)$, and satisfies the normalization condition, $\int_{|\mathbf{z}| \leq 1} K(|\mathbf{z}|) d^{d} \mathbf{z}=1$. Assume $f$ is differentiable to second order in a neighborhood of $p \in M$, and for a sample $q_{1}, \ldots, q_{m}$ of size $m$ drawn from the density $f$, define an estimator $\hat{f}{m}(p)$ of $f(p)$ as, $$\hat{f}{m}(p)=\frac{1}{m} \sum_{j=1}^{m} \frac{1}{h_{m}^{d}} K\left(\frac{u_{p}\left(q_{j}\right)}{h_{m}}\right),$$
where $h_{m}>0$. If $h_{m}$ satisfies $\lim {m \rightarrow \infty} h{m}=0$ and $\lim {m \rightarrow \infty} m h{m}^{d}=\infty$, then, there exist non-negative numbers $m_{}, C_{b}$, and $C_{V}$ such that for all $m>m_{}$ the mean squared error of the estimator (3.4) satisfies,
$$\operatorname{MSE}\left[\hat{f}{m}(p)\right]=\mathrm{E}\left[\left(\hat{f}{m}(p)-f(p)\right)^{2}\right]<C_{b} h_{m}^{4}+\frac{C_{V}}{m h_{m}^{d}}$$
If $h_{m}$ is chosen to be proportional to $m^{-1 /(d+4)}$, this gives,
$$\mathrm{E}\left[\left(f_{m}(p)-f(p)\right)^{2}\right]=O\left(\frac{1}{m^{4 /(d+4)}}\right),$$
as $m \rightarrow \infty$.

$u(p, q)=u_{p}(q)$ 是之间的测地线距离 $p$ 和 $q$ 如测量 $\mathbb{R}^{D}$ (因此， $u(p, q)$ 只是之间的欧几里 得距离 $p$ 和 $q$ 在 $\left.\mathbb{R}^{D}\right)$. 注意 $u(p, q) \leq d(p, q)$. 我们将在 $M$ 经过 $V$, 体积形式为 $d V .{ }^{8}$

$p \in M$ ，对于一个样本 $q_{1}, \ldots, q_{m}$ 大小的 $m$ 从密度中提取 $f$, 定义一个估计器 $f f(p)$ 的 $f(p)$ 作为，
$$\hat{f} m(p)=\frac{1}{m} \sum_{j=1}^{m} \frac{1}{h_{m}^{d}} K\left(\frac{u_{p}\left(q_{j}\right)}{h_{m}}\right)$$

$$\operatorname{MSE}[\hat{f} m(p)]=\mathrm{E}\left[(\hat{f} m(p)-f(p))^{2}\right]<C_{b} h_{m}^{4}+\frac{C_{V}}{m h_{m}^{d}}$$
$$\mathrm{E}\left[\left(f_{m}(p)-f(p)\right)^{2}\right]=O\left(\frac{1}{m^{4 /(d+4)}}\right)$$

