## 经济代写|博弈论代写Game Theory代考|ECON90022

2023年3月28日

couryes-lab™ 为您的留学生涯保驾护航 在代写博弈论Game Theory方面已经树立了自己的口碑, 保证靠谱, 高质且原创的统计Statistics代写服务。我们的专家在代写博弈论Game Theory代写方面经验极为丰富，各种代写博弈论Game Theory相关的作业也就用不着说。

• Statistical Inference 统计推断
• Statistical Computing 统计计算
• (Generalized) Linear Models 广义线性模型
• Statistical Machine Learning 统计机器学习
• Longitudinal Data Analysis 纵向数据分析
• Foundations of Data Science 数据科学基础
couryes™为您提供可以保分的包课服务

## 经济代写|博弈论代写Game Theory代考|TD Learning with Manipulated Cost Signals

If the RL agent updates the estimates of the cost-to-go function of a given policy $\mu$ according to (19.3) and (19.4), then under the stealthy attacks, the algorithm can be written as: \begin{aligned} & \tilde{r}{t+1}=\tilde{r}_t+\gamma_t \tilde{d}_t \eta_t, \ & \eta{t+1}=\alpha \lambda \eta_t+\phi\left(i_{t+1}\right), \end{aligned}
where $\tilde{d}t=\tilde{g}\left(i_t, u_t\right)+\alpha \tilde{r}_t^{\prime} \phi\left(i{t+1}\right)-\tilde{r}_t^{\prime} \phi\left(i_t\right)$.
Suppose the sequence of parameters $\left{\tilde{r}_t\right}$ generated by (19.8) under the falsified cost signals is convergent and converges to $\tilde{r}^$. (We will show the conditions under which the convergence of $\left{\tilde{r}_t\right}$ is guaranteed.) Let $r^$ be the solution of $A r+b$. $\operatorname{In} \operatorname{TD}(\lambda)$, the agent aims to estimate the cost-to-go function $J^\mu$ of a given policy $\mu$. In approximated $\operatorname{TD}(\lambda)$ with linear approximation architecture, $\tilde{J}(i, r)=\phi^{\prime}(i) r$ serves as an approximation of $J^\mu(i)$ for $i \in S$. One objective of the adversary can be to deteriorate the approximation and estimation of $J^\mu$ by manipulating the costs. One way to achieve the objective is to let $\tilde{r}_t$ generated by (19.8) converge to $\tilde{r}^$ such that $\Phi^{\prime} \tilde{r}^$ is as a worse approximate of $J^\mu$ as possible.

Lemma 19.1 If the sequence of parameters $\left{\tilde{r}t\right}$ is generated by the $\operatorname{TD}(\lambda)$ learning algorithm (19.8) with stealthy and bounded attacks on the cost signals, then the sequence $\left{\tilde{r}_t\right}$ converges to $\tilde{r}^$ and $\tilde{r}^$ is a unique solution of $A r+\tilde{b}=0$, where $\tilde{b}=\Phi^{\prime} D \sum{m=0}^{\infty}\left(\alpha \lambda P_\mu\right)^m \tilde{g}$ and $\tilde{g}$ is vector whose $i$ th component is $\tilde{g}(i, \mu(i))$.

The proof of Lemma 19.1 follows the proof of proposition 6.4 in Bertsekas and Tsitsiklis (1996) with $g(i, \mu(i), j)$ replaced by $\tilde{g}(i, \mu(i))$. If the adversary performs stealthy and bounded attacks, he can mislead the agent to learn an approximation $\Phi^{\prime} \tilde{r}^$ of $J^\mu$. The distance between $\Phi^{\prime} \tilde{r}^$ and $J^\mu$ with respect a norm $|\cdot|$ is what the adversary aims to maximize. The following lemma provides an upper bound of the distance between $\Phi^{\prime} \tilde{r}^*$ and $J^\mu$.

## 经济代写|博弈论代写Game Theory代考|Q-Learning with Manipulated Cost Signals

If the RL agent learns an optimal policy by Q-learning algorithm given in (19.7), then under stealthy attacks on cost, the algorithm can be written as:
$$\tilde{Q}{t+1}(i, u)=\left(1-\gamma_t\right) \tilde{Q}_t(i, u)+\gamma_t\left(\tilde{g}(i, u)+\alpha \min {D \in U(\bar{\zeta})} \tilde{Q}_t(\bar{\zeta}, v)\right) .$$
Note that if the attacks are not stealthy, we need to write $\tilde{g}_t$ in lieu of $\tilde{g}\left(i_t, a_t\right)$. There are two important questions regarding the $Q$-learning algorithm with falsified cost (19.12): (1) Will the sequence of $Q_t$-factors converge? (2) Where will the sequence of $Q_t$ converge to?

Suppose that the sequence $\tilde{Q}t$ generated by the $Q$-learning algorithm (19.12) converges. Let $\tilde{Q}^$ be the limit, i.e. $\tilde{Q}^=\lim {n \rightarrow \infty} \tilde{Q}t$. Suppose the objective of an adversary is to induce the RL agent to learn a particular policy $\mu^{\dagger}$. The adversary’s problem then is to design $\tilde{g}$ by applying the actions available to him/her based on the information he/she has so that the limit $Q$-factors learned from the $Q$-learning algorithm produce the policy favored by the adversary $\mu^{\dagger}$, i.e. $\tilde{Q}^* \in \mathcal{V}{\mu^{\dagger}}$, where
$$\mathcal{V}\mu:=\left{Q \in \mathbb{R}^{n \times|A|}: \mu(i)=\arg \min _u Q(i, u), \forall i \in S\right}$$ In $Q$-learning algorithm (19.12), to guarantee almost sure convergence, the agent usually takes tapering stepsize ((Borkar, 2009)) $\left{\gamma_t\right}$ which satisfies $0<\gamma_t \leq 1, t \geq 0$, and $\sum_t \gamma_t=\infty, \sum_t \gamma_t^2<\infty$. Suppose in our problem, the agent takes tapering stepsize. To address the convergence issues, we have the following result. then the $Q$-learning algorithm with falsified costs converges to the fixed point of $\tilde{F}(Q)$ almost surely where the mapping $\tilde{F}: \mathbb{R}^{n \times|A|} \rightarrow \mathbb{R}^{n \times|A|}$ is defined as $\tilde{F}(Q)=\left[\tilde{F}{l i}(Q)\right]{l, t}$ with $$\tilde{F}{i u}(Q)=\alpha \sum_j p_{i j}(u) \min _\nu Q(j, v)+\tilde{g}(i, u, j),$$
and the fixed point is unique and denoted by $\tilde{Q}^*$.

# 博弈论代考

## 经济代写|博弈论代写Game Theory代考|TD Learning with Manipulated Cost Signals

(19.3) 和 (19.4)，则在隐身攻击下，算法可写为:
$$\tilde{r} t+1=\tilde{r}t+\gamma_t \tilde{d}_t \eta_t, \quad \eta t+1=\alpha \lambda \eta_t+\phi\left(i{t+1}\right)$$

|Phi^{1prime} \tilde{r}^ 和 $J^\mu$ 尊重规范 $|\cdot|$ 是对手想要最大

## 经济代写|博弈论代写Game Theory代考|Q-Learning with Manipulated Cost Signals

$$\tilde{Q} t+1(i, u)=\left(1-\gamma_t\right) \tilde{Q}t(i, u)+\gamma_t(\tilde{g}(i, u)+\alpha \min$$ 请注意，如果攻击不是隐蔽的，我们需要写 $\tilde{g}_t$ 替代 $\tilde{g}\left(i_t, a_t\right)$. 有两个重要的问题 $Q$-具有伪造成本的学习算法 (19.12) : （1） 将序列 $Q_t$-因素收敛? (2) 顺序在哪里 $Q_t$ 收敛到? 假设序列 $\tilde{Q} t$ 产生的 $Q$ – 学习算法 (19.12) 收敛。让 波浪线 ${Q}^{\wedge}$ 是极限，即 $\tilde{Q}=\lim n \rightarrow \infty \tilde{Q} t$. 假设对手的 目标是诱导 RL 代理学习特定的策略 $\mu^{\dagger}$. 那么对手的问题 就是设计 $\tilde{g}$ 根据他/她所掌握的信息，采取他/她可用的行 动，从而限制 $Q$-从中学到的因素 $Q$ – 学习算法产生对手青 睐的策略 $\mu^{\dagger} ， I E \tilde{Q}^* \in \mathcal{V} \mu^{\dagger}$ ， 在哪里 Imathcal ${V} \backslash m u:=\backslash$ left ${Q \backslash$ in $\backslash m a t h b b{R} \wedge{n \backslash$ times $|A|}: \backslash m u(i)=\backslash a r$ 在 $Q$ – 学习算法 (19.12)，为了保证几乎确定的收敛，代理 通常采用逐渐变小的步长 ((Borkar，2009)) |左{{gamma_t $\backslash$ 右} 满足 $0<\gamma_t \leq 1, t \geq 0$ ， 和 $\sum_t \gamma_t=\infty, \sum_t \gamma_t^2<\infty$. 假设在我们的问题中，代理 采用逐渐变小的步长。为了解决收敛问题，我们有以下结 果。然后 $Q$-具有伪造成本的学习算法收敛到固定点 $\tilde{F}(Q)$ 几乎可以肯定映射在哪里 $\tilde{F}: \mathbb{R}^{n \times|A|} \rightarrow \mathbb{R}^{n \times|A|}$ 定义为 $\tilde{F}(Q)=[\tilde{F} l i(Q)] l, t$ 和 $$\tilde{F} i u(Q)=\alpha \sum_j p{i j}(u) \min _\nu Q(j, v)+\tilde{g}(i, u, j)$$并且固定点是唯一的并表示为 $\tilde{Q}^*$.

## 有限元方法代写

tatistics-lab作为专业的留学生服务机构，多年来已为美国、英国、加拿大、澳洲等留学热门地的学生提供专业的学术服务，包括但不限于Essay代写，Assignment代写，Dissertation代写，Report代写，小组作业代写，Proposal代写，Paper代写，Presentation代写，计算机作业代写，论文修改和润色，网课代做，exam代考等等。写作范围涵盖高中，本科，研究生等海外留学全阶段，辐射金融，经济学，会计学，审计学，管理学等全球99%专业科目。写作团队既有专业英语母语作者，也有海外名校硕博留学生，每位写作老师都拥有过硬的语言能力，专业的学科背景和学术写作经验。我们承诺100%原创，100%专业，100%准时，100%满意。

## MATLAB代写

MATLAB 是一种用于技术计算的高性能语言。它将计算、可视化和编程集成在一个易于使用的环境中，其中问题和解决方案以熟悉的数学符号表示。典型用途包括：数学和计算算法开发建模、仿真和原型制作数据分析、探索和可视化科学和工程图形应用程序开发，包括图形用户界面构建MATLAB 是一个交互式系统，其基本数据元素是一个不需要维度的数组。这使您可以解决许多技术计算问题，尤其是那些具有矩阵和向量公式的问题，而只需用 C 或 Fortran 等标量非交互式语言编写程序所需的时间的一小部分。MATLAB 名称代表矩阵实验室。MATLAB 最初的编写目的是提供对由 LINPACK 和 EISPACK 项目开发的矩阵软件的轻松访问，这两个项目共同代表了矩阵计算软件的最新技术。MATLAB 经过多年的发展，得到了许多用户的投入。在大学环境中，它是数学、工程和科学入门和高级课程的标准教学工具。在工业领域，MATLAB 是高效研究、开发和分析的首选工具。MATLAB 具有一系列称为工具箱的特定于应用程序的解决方案。对于大多数 MATLAB 用户来说非常重要，工具箱允许您学习应用专业技术。工具箱是 MATLAB 函数（M 文件）的综合集合，可扩展 MATLAB 环境以解决特定类别的问题。可用工具箱的领域包括信号处理、控制系统、神经网络、模糊逻辑、小波、仿真等。