The Power of the Senses:
Generalizable Manipulation from Vision and Touch
through Masked Multimodal Learning

The following are visualizations of the policies trained in the paper using Masked Multimodal Learning (M3L).

Tactile Insertion Environment

Door Opening Environment

In-hand Cube Rotation Environment