Embodied cognition is deeply related to the body of an agent being the center of the cognitive process in addition to the brain. In order to achieve embodied cognitive processes in artificial agents, they should be endowed with self-learning capabilities from their sensory-motor experiences to “understand” themselves and their environment and to construct “knowledge”. Self-supervised and unsupervised machine learning methods can be used among the major means to achieve the independence of self-learning agents to be able to learn by themselves without extrinsic human supervisions. Learning compact, generative, semantically “meaningful”, disentangled, usable and relevant latent representations from raw sensorimotor perceptions is an important step in enabling the agents to ground their self-learning in their sensorimotor domain. From these representation learning domains, we have selected two representation learning problems: unsupervised state representation learning and self-supervised spatial representation learning. In our state representation learning problem, we have introduced a new unsupervised contrastive learning, which we call Balanced View Spatial-Deep InfoMax (BVS-DIM). BVS-DIM learns using contrastive learning between the anchor sample and constructed balanced views. We have made evaluations using state variables of the agent-environment and reinforcement learning task. In both of these evaluations methods, BVS-DIM is superior to the state-of-the-art methods. It also could capture the spatio-temporally evolving latent factors of the agent-environment, as the balanced views contain spatio-temporal mixtures in addition to balanced similarity-contrast.
In spatial representation learning, we have combined the self-supervised learning along with online interactive based learning of the agent from its sensory-motor perceptions. Using estimations of the vision system’s outputs for short episodes of consecutive, small motor transitions, the motor system’s kinematics models are trained. After the motor systems are trained, the agent uses the inverse kinematics of the agent and the vision system to take interactive learning and make iterative adjustments to the estimated positions. The final objective is to let the agent be able to map its motor state to the corresponding position of its end-effector.
In these two domains of experiments, we have learned that representation learning enables agents to have learning autonomy, to compose their future learning and make explorations of their environment using these learned representations. Computing non-linear and compact representation based on self learning capability enables the artificial agents to have better perception capabilities of themselves and their environments.