Artificial Intelligence has achieved impressive results in many different fields be it object recognition, NLP, or playing complex games like Go. However, it is still a challenge for artificial agents equipped with deep neural networks to successfully navigate in an environment. A task where mammals with their behavior based on spatial representation shine.

Thereby, the ability of the brain to maintain a representation to self-localize in the environment and to update one’s position on the basis of self-motion or landmarks of the environment are core components of spatial navigation and are needed for agents that move and interact in an…


In the last years, huge progress has been made in the field of machine learning. Achieving higher and better performance using more and more data. One main influencer to this progress has been Deep Learning and the exhaustive use of deeper and deeper neural networks. Especially in Computer Vision and Natural Language Processing Deep Learning has been a driving force. Continuously building and training bigger Neural Network architectures and increasing the trainable parameters have resulted in better performances achieved by different research communities [1, 2].

However, this trend of utilizing bigger architectures could not be transferred to Deep Reinforcement Learning

Can Increasing Input Dimensionality Improve Deep Reinforcement Learning?


One major problem of current state-of-the-art Reinforcement Learning (RL) algorithms is still the need for millions of training examples to learn a good or near-optimal policy to solve the given task. This plays especially a critical role for real-world applications in the industry be it for robotics or other complex optimization problems for decision making or optimal control.

Due to these problems, engineers and researchers are looking for ways to improve this sample-inefficiency to increase the speed of learning and the need for gathering millions of expensive training examples.

One idea, the researchers came up with is decoupling representation learning


Model-Free Reinforcement Learning Algorithms have achieved impressive results and Researchers come up with new and better ideas to further improve their performance. But despite all their benefits and improvements in recent papers, it is a common consense, that Model-Free algorithms are extremely data inefficient. Requiring millions of frames or examples to learn optimal policies and exact value functions. Thus making them not suitable for real-world applications in the industry as for example robotics.

In contrast, Model-Based approaches were Introduced that often claim to be much more efficient than their Model-Free counterparts, due to the possibility for planning, look-ahead search, or…

The image of the world around us, which we carry in our head, is just a model. Nobody in his head imagines all the world, government or country. He has only selected concepts, and relationships between them, and uses those to represent the real system. (Forrester 1971)

In this article, I want to introduce and write about the paper World Models by David Ha and Jürgen Schmidhuber.


In our daily life, we are confronted with tons of information from our world around us streaming in through our different senses. Since we are not able to process everything in detail that…

In this article I want to give a quick presentation of the D2RL paper, applying deep dense architectures of neural networks for Deep Reinforcement Learning.

The effect of large and dense network architectures has been long explored in Computer Vision and Deep Learning. The improved performance and other benefits of such dense models compared to shallow ones are widely established. Thereby, in Deep Reinforcement Learning, neural network architectures haven't gotten that much attention yet. Commonly used networks like policy or Q-function are usually only two layers deep.

However, the disproportionate advantage of the size and depth of the neural networks…

In this article, I want to give an introduction to Model-Based Reinforcement Learning. Talk about the fundamental concept behind MB-RL, the benefits of those methods, their applications, and also the challenges and difficulties that come with applying MB RL to your problem.


In artificial intelligence (AI) sequential decision-making, commonly formalized as MDP, is one of the key challenges. Reinforcement Learning and Planning have been two successful approaches to solve these problems. Each with advantages and disadvantages.

A logical step would be to combine both methods in order to obtain advantages for both and hopefully eliminate their disadvantages.

The key difference…

This article is part of a series about advanced exploration strategies in RL:


For a Reinforcement Learning Agent to be successful it must frequently encounter a reward signal. Based on this feedback the agent adapts its behavior, trying to maximize the overall return received in an episode. However, in some environments, those reward feedbacks can be extremely sparse or absent altogether. What makes it very difficult for the agent to learn and ending up in a goal state, solving the proposed task.

Most, if not all regular RL approaches…

This article is the second part of the series where I want to talk about distributional reinforcement learning. If you haven't read the first part I encourage you to do so since the algorithms and problems which the algorithms try to solve build upon each other.


As we have heard it the first article, distributional RL tries to learn the value distribution. That is the distribution of random returns the agent received from the environment.

Usually, you would guess that a reward is returned very distinct and clear through the reward function of the environment. However, elements like state transitions…


Value-based reinforcement learning methods like DQN try to model the expectation of total returns, or value.

That is, the value of an action a in state s describes the expected return, or discounted sum of rewards, obtained from beginning in that state, choosing action a, and subsequently following a prescribed policy.

All the state transitions, actions, and rewards, that are used to calculate the value or long-term return can induce randomness, if sampled probabilistically. This makes it useful to display the returns in a distribution: value distribution. The distribution of the random return received by a reinforcement learning agent.


Sebastian Dittert

Ph.D. student at UPF Barcelona for Deep Reinforcement Learning

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store