统计代写|贝叶斯网络代写Bayesian network代考|TAMS22

Doug I. Jones

Statistical Inference 统计推断
Statistical Computing 统计计算
(Generalized) Linear Models 广义线性模型
Statistical Machine Learning 统计机器学习
Longitudinal Data Analysis 纵向数据分析
Foundations of Data Science 数据科学基础
统计代写|贝叶斯网络代写Bayesian network代考|Bayesian Network Learning

In the context of BNs, model selection and estimation are collectively known as leaming, a name borrowed from artificial intelligence and machine learning. BN learning is usually performed as a twostep process:

1. structure learning, learning the structure of the DAG;
2. parameter learning, learning the local distributions implied by the structure of the DAG learned in the previous step.
Both steps can be performed either using the information provided by a data set or by interviewing experts in the fields relevant for the phenomenon being modelled. Combining both approaches is common. Often the prior information available on the phenomenon is not enough for an expert to completely specify a BN. Even specifying the DAG structure is often impossible, especially when a large number of variables are involved. This is the case, for example, for most applications in genetics and systems biology (because of how many components are involved in biological processes) and in the social sciences (because of lack of agreement between experts and of solid experimental evidence).

This workflow is inherently Bayesian. Consider a data set $D$ and a $B N B=(G, X)$. If we denote the parameters of the global distribution of $\mathbf{X}$ with $\Theta$, we can assume without loss of generality that $\Theta$ uniquely identifies $\mathbf{X}$ in the parametric family of distributions chosen for modelling $\mathrm{D}$ and write $B=(G, \Theta)$. BN learning can then be formalised as
$\operatorname{Pr}(B \mid D)=\operatorname{Pr}(G, \Theta \mid D)$-learning $=\operatorname{Pr}(G \mid D)$-structure learning $\operatorname{Pr}(\Theta \mid G, D)-$ parameter learning.(6.12)
The decomposition of $\operatorname{Pr}(\mathrm{G}, \Theta \mid \mathrm{D})$ reflects the two steps described above, and underlies the logic of the learning process.
Structure learning can be done in practice by finding the DAG $G$ that maximises
Pr $(G \mid D) \propto \operatorname{Pr} \quad(G) \operatorname{Pr} \quad(D \mid G)=\operatorname{Pr} \quad(G) \int \operatorname{Pr} \quad(D \mid G, \Theta) \operatorname{Pr} \quad(\Theta \mid G) d \Theta,(6.13)$
using Bayes’ theorem to decompose the posterior probability of the DAG (i.e., $\operatorname{Pr}(G \mid D)$ ) into the product of the prior distribution over the possible DAGs (i.e., $\operatorname{Pr}$ (G)) and the probability of the data (i.e., Pr (D|G)). Clearly, it is not possible to compute the latter without estimating the parameters $\Theta$ in the process; therefore, $\Theta$ has to be integrated out of Equation (6.13) to make Pr (G|D) independent of any specific choice of $\Theta$.

统计代写|贝叶斯网络代写Bayesian network代考|Structure Learning

Several algorithms have been presented in the literature for this problem, thanks to the application of results arising from probability, information and optimisation theory. Despite the (sometimes confusing) variety of theoretical backgrounds and terminology, they can all be traced to three approaches: constraint-based, score-based and hybrid.
All these algorithms operate under a common set of assumptions:

• There must be a one-to-one correspondence between the nodes in the DAG and the random variables in $\mathbf{X}$ : this means in particular that there must not be multiple nodes which are deterministic functions of a single variable.
• All the relationships between the variables in $\mathbf{X}$ must be conditional independencies, because they are by definition the only kind of relationships that can be expressed by a BN.
• Every combination of the possible values of the variables in $\mathbf{X}$ must represent a valid, observable (even if really unlikely) event. This assumption implies a strictly positive global distribution, which is needed to have a uniquely identifiable model. Constraint-based algorithms can work even when this is not true, because the existence of a perfect map is also a sufficient condition for the uniqueness of the Markov blankets (Pearl, 1988).
• Observations are treated as independent realisations of the set of nodes. If the data present some form of temporal or spatial dependence, it must be specifically accounted for in the definition of the network, as in the dynamic BNs in Chapter $4 .$

1. 结构学习，学习DAG的结构;
2. 参数学习，学习上一步中学习到的 DAG 结构所隐含的局部分布。
这两个步骤都可以使用数据集提供的信自来执行，也可以通过采访与被建模现象相 关的领域的专家来执行。将这两种方法结合起来很常见。通常，有关该现象的可用 先验信自不足以让专家完全指定 BN。甚至指定 DAG 结构通常也是不可能的，尤 其是在涉及大量变量时。例如，遗传学和系统生物学中的大多数应用（因为生物过 程涉及到多少成分) 和社会科学 (因为专家之间缺乏一致意见和可靠的实验证据) 就是这种情况。
这个工作流程本质上是贝叶斯的。考虑一个数据堆 $D$ 和一个 $B N B=(G, X)$. 如果我们表 示全局分布的参数 $\mathbf{X}$ 和 $\Theta$, 我们可以不失一般性假设 $\Theta$ 唯一标识 $\mathbf{X}$ 在为建模选择的参数分布 族中 $\mathrm{D}$ 和写 $B=(G, \Theta)$. BN 学习可以形式化为
$\operatorname{Pr}(B \mid D)=\operatorname{Pr}(G, \Theta \mid D)$-学习 $=\operatorname{Pr}(G \mid D)$-结构学习 $\operatorname{Pr}(\Theta \mid G, D)$ 一参数学 习。 (6.12)
结构学习可以通过找到 DAG 在实践中完成 $G$ 使
Pr最大化
$(G \mid D) \propto \operatorname{Pr} \quad(G) \operatorname{Pr} \quad(D \mid G)=\operatorname{Pr} \quad(G) \int \operatorname{Pr} \quad(D \mid G, \Theta) \operatorname{Pr} \quad(\Theta \mid G) d \Theta,(6.13)$
使用贝叶斯定理分解 DAG 的后验概率 (即， $\operatorname{Pr}(G \mid D)$ ) 到可能的 DAG 上的先验分布的
乘积中 (即， $\operatorname{Pr}(G)$ ) 和数据的概率（即 $\operatorname{Pr}(D \mid G)$ ) 。显然，如果不估计参数，就不可能计
算后者 $\Theta$ 进行中; 所以， $\Theta$ 必须从方程 (6.13) 中积分，以使 $\operatorname{Pr}(G \mid D)$ 独立于任何特定的选择 $\Theta$.

• DAG中的节点与随机变量之间必须存在一一对应的关系 $\mathbf{X}$ : 这尤其竟味着不能有 多个节点是单个变量的确定性函数。
• 变量之间的所有关系 $\mathbf{X}$ 必须是条件独立，因为根据定义，它们是唯一可以用 $\mathrm{BN}$ 表 示的关系。
• 变量的可能值的每个组合 $\mathbf{X}$ 必须代表一个有效的、可观察的 (即使真的不太可 能) 事件。这个假设意味着严格的正全局分布，需要有一个唯一可识别的模型。即 使这不是真的，基于约束的算法也可以工作，因为完美地图的存在也是马尔可夫炎 唯一性的充分条件 (Pearl，1988)。
• 观察被视为节点集的独立实现。如果数据存在某种形式的时间或空间依赖性，则必 须在网络的定义中特别说明，如第 1 章中的动态 BN4.

