How do our youth play a big role in politics?

Our society does not realize that our youth are the light in the long black tunnel of hopelessness. The reality is that they shouldn't be but they are, they are fighting everyday constantly for their…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




A Brief Introduction to Markov Chains

A general guide on what makes the Markov Decision Process tick

tldr; This is part of a series of reference articles teaching the concepts that drive Reinforcement Learning. We’ll be starting off simple, incrementally moving on to more complex concepts further down the road.

A Markov Chain is a stochastic (always changing) model that is used to predict/estimate/guess the outcome of an event given only the previous state and its action. The state refers to the current ‘slice’ of the environment. It’s the particular condition that an agent is contained in at a specific time step.

For example, in Sonic the Hedgehog, a state refers to the current pixel intensity values that represent the game screen.

It can be represented by a matrix of values pertaining to the pixel values. When used in a Reinforcement Learning algorithm, it can be represented as something like this:

The different color RGB channels that form the current state

The Markov Chain is a model used to describe a sequence of consecutive events where the probability or chance of an event depends only on the event before it.

If a sequence of events exhibits the Markov Property of the reliance on the previous state, then the sequence is called ‘Markovian’ in nature.

An example of a Markov Chain with numbers for visualization purposes

The figure above shows two states, A and E. The probability of an event occurring depends on the current state. A movement from A to E or vice versa depends on whether the ‘character’ was previously on A or E.

If the previous state or location was A, then the probability of going to E is 0.4 while the probability of staying on A is 0.6 (fundamentally A to A).

The probability of going from E to A is 0.7 while the probability of staying on E is 0.3 (E to E).

Hopefully, this can help you visualize the process of moving from state to state depending on the current state. If we were to write this mathematically, it is simply the probability of an event happening in the future time step, St+1, s’, given the current state St, s:

For some problems in Reinforcement Learning, the actions performed in a particular state is directly related to the previous state, the actions performed in that state and the rewards that the agent receives upon performing said actions.

This Markovian property of a problem greatly benefits us as it makes the agent really versatile while being really lightweight compared to deep neural networks with heavy parameters.

Furthermore, Markov Chains are used in a process called Markov Reward Process where the agent (whose objective is to maximize the rewards earned in the task) in the current task is rewarded based on its performance in the current time step.

Markov Chains are really useful in Reinforcement Learning as it has enabled us to achieve and even exceed human performance in many areas and problems from teaching robots how to handle objects with care to beating difficult Atari games that were previously never beaten by statistical Machine Learning.

Over the next few articles, we’ll go through the various forms and applications of the Markov Property and how useful it is when dealing with certain problems.

For the next article, I am planning to write a small dictionary of all the terminology used in the field of Reinforcement Learning as a reference piece. If you have any suggestions for RL articles or questions in general, feel free to drop in a comment below.

With that, I’ll see you in the next one.

Check out my other articles on technology and Machine Learning, and reviews of up-and-coming software innovations.

Add a comment

Related posts:

I Fell in Love with a Merguez Sausage

It was love at first sight. Or rather bite. Or rather squirt. Seated on a bench at a trestle table running the length of a seemingly endless garage beneath a bungalow on a cul-de-sac in La Rochelle…

9 Songs Based on True Events

As real-life events and individuals inspire pop culture in many forms, here are some in one such form; songs based on nine such true events and people behind them.

Why Defi?

I get it. I mean I know what it means. Decentralized finance is the unabbreviated form, but what do normal people do with it? The other day I read an article that referred to “hobbyist Defi users”…