
Recent Advances in imitation learning from observation

자월현 2020. 11. 8.

deep-learning 발달 전 imitation learning using state-only demonstrations papers
1. Movement imitation with nonlinear dynamical systems in humanoid robots.

2. Humanoid robot learning and game playing using pc-based vision.


**visual observations can only provide partial state information**
**In imitation learning, agents do not receive task reward feedback r.**

Behavior cloning does not require any further interaction between the agent and the environment - covariate shift problem
IRL-based techniques iteratively alternate between using the demo to infer a hidden reward function and using RL.
- object manipulation: Guided cost learning: Deep inverse optimal control via policy optimization.

GAIL: induce an imitator state-action occupancy measure that is similar to that of the demonstrator.

Imitation learning from observation

## perception
1. Record the expert's movements using sensors placed directly on the expert agent
- Trajectory formation for imitation with nonlinear dynamical systems.

arm-reaching movements, biped locomotion, and human gestures
- Incremental learning of gestures by imitation in a humanoid robot.


2. Motion capture: use visual markers on the demo to infer movement.
- Motion capture in robotics review.

locomotion, acrobatics, martial arts
- require costly instrumentation and pre-processing:

A deep learning framework for character motion synthesis and editing. ★


### Embodiment Mismatch

1. learns correspondence between the embodiments using autoencoders in a supervised fashion

(encoded representation은 embodiment features에 invariant하다)

- Learning invariant features spaces to transfer skills with reinforcement learning ★


2. unsupervised fashion & human supervision 조금

- Time-contrastive networks: self-supervised learning from video.



### Viewpoint difference

1. context translation model to translate an observation by predicting it in the target context.

- Imitation from observation: Learning to imitate behaviors from raw video via context translation.


2. classifier to distinguish viewpoints and maximize the domain confusion in the adversarial setting during the training

- Third-person imitation learning.



## Control

### Model-based algorithms

#### Inverse dynamics models

- Grounded action transformation for robot learning in simulation.


1. Explore and collect data (s, a, s') and learn the pixel-level inverse dynamics model (o, o') -> a

- Combining self-supervised learning and imitation for vision-based rope manipulation.


2. reinforced inverse dynamics modeling (uses sparse reward function to optimize the model)

- Ridm: reinforced inverse dynamics modeling for learning from a single observed demonstration


** each observation transition is reachable through the application of a single action. **

- Zero-shot visual imitation: execute multiple actions until it gets close enough to the next demonstrated frame.


- Behavior cloning from observation: learn generalized imitation policies using multiple demo.


- Hybrid reinforcement learning with expert state sequences. 


(visual demo랑 reward info 다 접근가능하다는 전제를 한다. minimize a linear combination of behavior cloning loss and RL loss)


#### Forward dynamics model

- imitating latent policies from observation.


--> 먼저, 현재 state가 주어졌을 때 latent(unreal) action z의 probability를 측정해주는 latent policy를 배운다. 실제 action이 행해지지 않기 때문에 offline에서 학습가능하다. latent policy를 배울 때 latent forward dynamics model을 이용하는데 이건 그 다음 state와 prior over z given s를 예상해준다. 그 다음 env interaction하면서 action-remapping network을 배운다.


### Model-free algorithms

#### Adversarial methods

- Learning human behaviors from motion capture by adversarial imitation. ★




#### reward-engineering methods

- Internal model from observations for reward shaping.



