文章 | RyanLee's blog

前言

这篇博客主要是对Kevin M. Lynch and Frank C. Park老师的MODERN ROBOTICS MECHANICS, PLANNING, AND CONTROL一书内容做一个简单整理，只涵盖主要核心的内容，细节内容会有所忽略。

RyanLee_ljx...小于 1 分钟

现代机器人学

第 1 章：绪论 (Preliminary)

机器人的本质是由刚体 (Rigid Bodies) 组成的系统。

连杆 (Links): 机器人系统中的刚体。
关节 (Joints): 连接相邻连杆并允许其发生相对运动的部件。

第 2 章：位形空间 (Configuration Space)

2.1 基本概念

位形 (Configuration): 指定机器人上每一个点的位置（Position）和姿态（Orientation）的一组参数。
位形空间 (C-space): 所有可能位形的集合。
自由度 (Degrees of Freedom, dof): C-space 的维度，即表示机器人位形所需的最小实数参数的个数。

RyanLee_ljx...大约 29 分钟

Preliminaries

对数据的认识

机器学习就是对一个未知分布的数据建模的过程。无论是机器学习哪种学派，其都认为观察到的数据并不是凭空产生的，而是由一个潜在的、客观存在的数据生成过程所产生。这个数据生成过程可以用一个概率分布来描述。

例如抛硬币，会出现正面或反面，我们抛了 $k$ 次，得到 $k$ 个数据。这个结果就可以看作是由一个伯努利分布生成（采样）的。

RyanLee_ljx...大约 11 分钟

变分推断与VAE

隐变量

举一个例子（源于【隐变量（潜在变量）模型】硬核介绍）：

观察下图，表面上我们观测到的数据是一堆点 $x = \{x_1, x_2, \dots, x_n\}$ ，但实际上我们可以直观地发现这些点以某种概率采样自四个不同的分布（假设都是高斯分布）。而潜在变量 $z_i$ 控制了 $x_i$ 从哪个分布中采样： $z_i \sim N(\mu_k, \sigma_k^2)$ ，其中 $k = 1, 2, 3, 4$ 。设 $\sigma_k$ 已知。于是，潜在变量 $z_i$ 表示观测变量 $x_i$ 对应类别的序号。

RyanLee_ljx...大约 6 分钟

Before reading

这篇博客及后续内容将主要介绍扩散模型的相关内容，包括一些基础知识，最终引入扩散模型，最终希望介绍Diffusion Policy在机械臂motion planning的应用。

RyanLee_ljx...小于 1 分钟

Chapter 7 Temporal-Difference learning

TD learning refers a wide range of algorithms.

TD algorithm can solve Bellman equation of a given policy $\pi$ without model.

RyanLee_ljx...小于 1 分钟

Chapter 6 Stochastic Approximation

Stochastic Approximation (SA) refers to a broad class of stochastic iterative algorithms solving root finding or optimization problems. Compared to many other root-finding algorithms such as gradient-based methods, SA is powerful in the sense that it does not require to know the expression of the objective function nor its derivative.

RyanLee_ljx...大约 3 分钟

Chapter 5 Monte Carlo Learning

This chapter we will introduce a model-free approach for deriving optimal policy.

Here, model-free refers that we do not rely on a specific mathematical model to obtain state value or action value. Like, in the policy evaluation, we use BOE to obtain state value, which is just model-based. For model-free, we do not use that equation anymore. Instead, we leverage the mean estimation methods.

RyanLee_ljx...大约 4 分钟

Chapter 4 Value Iteration and Policy Iteration

In the last chapter, we study the Bellman Optimality Equation. This chapter we will introduce three model-based, iterative algorithm —— value iteration, policy iteration and truncated policy iteration —— for solving the BOE to derive the optimal policy,

Value Iteration

RyanLee_ljx...大约 3 分钟

Chapter 3 Optimal Policy and Bellman Optimality Equation

We know that RL's ultimate goal is to find the optimal policy. In this chapter we will show how we obtain optimal policy through Bellman Optimality Equation.

Optimal Policy

The state value could be used to evaluate if a policy is good or not: if

v_{\pi_{1}}(s) \ge v_{\pi_{2}}(s), \ \ \forall s \in \mathcal S

RyanLee_ljx...大约 3 分钟