## 统计代写|随机过程代写stochastic process代考|STAT6540

2022年12月29日

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• Advanced Probability Theory 高等概率论
• Advanced Mathematical Statistics 高等数理统计学
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础
## 统计代写|随机过程代写stochastic process代考|Fractal Supervised Classification

The rightmost video in Figure 9 shows supervised clustering in action, from the first frame representing the training set with 4 groups, to the last one showing the cluster assignment of any future observation (an arbitrary point location in the state space). Based on image filtering techniques acting as a neural network, the video illustrates how machine learning algorithms are performed in GPU (graphics processing unit). GPU-based clustering [Wiki] is very fast, not only because it uses graphics processors and memory, but the algorithm itself has a computational complexity that beats (by a long shot) any traditional classifier. It does not require the computation of nearest neighbor distances.

The video medium also explains how the clustering is done, in better ways than any text description could do. You can view the video (also called data animation) on YouTube, here. The source code and instructions to help you create your own videos or replicate this one, is in Section 6.7.2. See Section 3.4.3 for a description of the underlying supervised clustering methodology.

I use the word “fractal” because the shape of the clusters, and their boundaries in particular, is arbitrary. The boundary may be as fractal-like as a shoreline. It also illustrates the concept of fuzzy clustering [Wiki]: towards the middle of the video, when the entire state space is eventually classified, constant cluster re-assignments are taking place along the cluster boundaries. A point, close to the fuzzy border between clusters $\mathrm{A}$ and $\mathrm{B}$, is sometimes assigned to $\mathrm{A}$ in a given video frame, and may be assigned to $\mathrm{B}$ in the next one. By averaging cluster assignments over many frames, it is possible to compute the probability that the point belongs to A or B. Another question is whether the algorithm (the successive frames) converge or not. It depends on the parameters, and in this case, stochastic convergence is observed. In other words, despite boundaries changing all the time, their average location is almost constant, and the changes are small. Small portions of a cluster, embedded in another cluster, don’t disappear over time.

## 统计代写|随机过程代写stochastic process代考|Statistical Inference, Machine Learning

This section covers a lot of material, extending far beyond Poisson-binomial processes. The main type of processes investigated here is the $m$-interlacing defined in Section 1.5.3, as opposed to the radial cluster processes studied in Section 2.1. An $m$-process is a superimposition of $m$ shifted Poisson-binomial processes, well suited to model cluster structures. In Section 3.4.3, I discuss supervised and unsupervised clustering algorithms applied to simulated data generated by $m$-processes. The technique, similar to neural networks, relies on image filtering performed in the GPU (graphics processing unit). It leads to fractal supervised clustering, illustrated with data animations. I discuss how to automatically detect the number of clusters in Section 3.4.4.

Before getting there, I describe different methods to estimate the core parameters of these processes. First in one dimension in Section 3.2, then in two dimensions in Section 3.4.2. The methodology features a new test of independence (Section 3.1.3), model fitting via the empirical distribution, and dual confidence region in the context of minimum contrast estimation (Section 3.1.1). I show that the point count expectations are almost stationary but exhibit small periodic oscillations (Section 3.1.2) and that the increments (point counts across non-overlapping, adjacent intervals) are almost independent.

In many instances, Poisson-binomial processes exhibit patterns that are invisible to the naked eye. In Section 3.3, I show examples of such patterns. Then, I discuss model identifiability, and the need for statistical or machine learning techniques to unearth the invisible patterns. Boundary effects, their impact, and how to fix this problem, is discussed mainly in Section $3.5$.

In 1979, Bradley Efron published his seminal article “Bootstrap Methods: Another Look at the Jackknife” [24], available online here. It marked the beginning of a new era in statistical science: the development of model-free, data driven techniques. Several chapters in my book “Statistics: New Foundations, Toolbox, and Machine Learning Recipes” [37] published in 2019 (available online here) deal with extensions and modern versions of this methodology. I follow the same footsteps here, first discussing the general principles, and then showing how it applies to estimating the intensity $\lambda$ and scaling factor $s$ of a Poisson-binomial process. As in Jesper Møller [58], my methodology is based on minimum contrast estimation: see slides 114-116 here or here. See also [18] for other examples of this method in the context of point process inference.

