数学代写|凸优化作业代写Convex Optimization代考|Optimal detector design and hypothesis testing

Suppose $X$ is a random variable with values in ${1, \ldots, n}$, with a distribution that depends on a parameter $\theta \in{1, \ldots, m}$. The distributions of $X$, for the $m$ possible values of $\theta$, can be represented by a matrix $P \in \mathbf{R}^{n \times m}$, with elements
$$p_{k j}=\operatorname{prob}(X=k \mid \theta=j)$$
The $j$ th column of $P$ gives the probability distribution associated with the parameter value $\theta=j$.

We consider the problem of estimating $\theta$, based on an observed sample of $X$. In other words, the sample $X$ is generated from one of the $m$ possible distributions, and we are to guess which one. The $m$ values of $\theta$ are called hypotheses, and guessing which hypothesis is correct (i.e., which distribution generated the observed sample $X$ ) is called hypothesis testing. In many cases one of the hypotheses corresponds to some normal situation, and each of the other hypotheses corresponds to some abnormal event. In this case hypothesis testing can be interpreted as observing a value of $X$, and then guessing whether or not an abnormal event has occurred, and if so, which one. For this reason hypothesis testing is also called detection.

In most cases there is no significance to the ordering of the hypotheses; they are simply $m$ different hypotheses, arbitrarily labeled $\theta=1, \ldots, m$. If $\hat{\theta}=\theta$, where $\hat{\theta}$ denotes the estimate of $\theta$, then we have correctly guessed the parameter value $\theta$. If $\hat{\theta} \neq \theta$, then we have (incorrectly) guessed the parameter value $\theta$; we have mistaken $\hat{\theta}$ for $\theta$. In other cases, there is significance in the ordering of the hypotheses. In this case, an event such as $\hat{\theta}>\theta$, i.e., the event that we overestimate $\theta$, is meaningful.
It is also possible to parametrize $\theta$ by values other than ${1, \ldots, m}$, say as $\theta \in$ $\left{\theta_1, \ldots, \theta_m\right}$, where $\theta_i$ are (distinct) values. These values could be real numbers, or vectors, for example, specifying the mean and variance of the $k$ th distribution. In this case, a quantity such as $|\hat{\theta}-\theta|$, which is the norm of the parameter estimation error, is meaningful.

数学代写|凸优化作业代写Convex Optimization代考|Deterministic and randomized detectors

A (deterministic) estimator or detector is a function $\psi$ from ${1, \ldots, n}$ (the set of possible observed values) into ${1, \ldots, m}$ (the set of hypotheses). If $X$ is observed to have value $k$, then our guess for the value of $\theta$ is $\hat{\theta}=\psi(k)$. One obvious deterministic detector is the maximum likelihood detector, given by
$$\hat{\theta}=\psi_{\mathrm{ml}}(k)=\underset{j}{\operatorname{argmax}} p_{k j} .$$
When we observe the value $X=k$, the maximum likelihood estimate of $\theta$ is a value that maximizes the probability of observing $X=k$, over the set of possible distributions.

We will consider a generalization of the deterministic detector, in which the estimate of $\theta$, given an observed value of $X$, is random. A randomized detector of $\theta$ is a random variable $\hat{\theta} \in{1, \ldots, m}$, with a distribution that depends on the observed value of $X$. A randomized detector can be defined in terms of a matrix $T \in \mathbf{R}^{m \times n}$ with elements
$$t_{i k}=\operatorname{prob}(\hat{\theta}=i \mid X=k)$$
The interpretation is as follows: if we observe $X=k$, then the detector gives $\hat{\theta}=i$ with probability $t_{i k}$. The $k$ th column of $T$, which we will denote $t_k$, gives the probability distribution of $\hat{\theta}$, when we observe $X=k$. If each column of $T$ is a unit vector, then the randomized detector is a deterministic detector, i.e., $\hat{\theta}$ is a (deterministic) function of the observed value of $X$.

At first glance, it seems that intentionally introducing additional randomization into the estimation or detection process can only make the estimator worse. But we will see below examples in which a randomized detector outperforms all deterministic estimators.

We are interested in designing the matrix $T$ that defines the randomized detector. Obviously the columns $t_k$ of $T$ must satisfy the (linear equality and inequality) constraints
$$t_k \succeq 0, \quad \mathbf{1}^T t_k=1$$

$$p_{k j}=\operatorname{prob}(X=k \mid \theta=j)$$

$$\hat{\theta}=\psi_{\mathrm{ml}}(k)=\underset{j}{\operatorname{argmax}} p_{k j}$$
$$\hat{\theta}=\psi_{\mathrm{ml}}(k)=\underset{j}{\operatorname{argmax}} p_{k j}$$

$$t_{i k}=\operatorname{prob}(\hat{\theta}=i \mid X=k)$$

$$t_k \succeq 0, \quad \mathbf{1}^T t_k=1$$

