# The EM algorithm and Gaussian mixture models – part I In the last few posts on machine learning, we have looked in detail at restricted Boltzmann machines. RBMs are a prime example for unsupervised learning - they learn a given distribution and are able to extract features from a data set, without the need to label the data upfront. However, there are of course many … Continue reading The EM algorithm and Gaussian mixture models – part I

# Why you need statistics to understand neuronal networks When I tried to learn about neuronal networks first, I did what probably most of us would do - I started to look for tutorials, blogs etc. on the web and was surprised by the vast amount of resources that I found. Almost every blog or webpage about neuronal networks has a section on training … Continue reading Why you need statistics to understand neuronal networks

# The Metropolis-Hastings algorithm In this post, we will investigate the Metropolis-Hastings algorithm, which is still one of the most popular algorithms in the field of Markov chain Monte Carlo methods, even though its first appearence (see ) happened in 1953, more than 60 years in the past. It does for instance appear on the CiSe top ten list … Continue reading The Metropolis-Hastings algorithm

# Recurrent and ergodic Markov chains Today, we will look in more detail into convergence of Markov chains - what does it actually mean and how can we tell, given the transition matrix of a Markov chain on a finite state space, whether it actually converges. So suppose that we are given a Markov chain on a finite state space, with … Continue reading Recurrent and ergodic Markov chains

# Finite Markov chains In this post, we will look in more detail into an important class of Markov chains - Markov chains on finite state spaces. Many of the subtleties that are present when studying Markov chains in general state spaces do not appear in the finite case, while most of the key ideas and features of Markov … Continue reading Finite Markov chains

# Monte Carlo methods and Markov chains – an introduction In our short series on machine learning, we have already applied sampling methods several times. We have used and implemented Gibbs sampling, and so far we have simply accepted that the approach works. Time to look at this in a bit more detail in order to understand why it works and what the limitations of … Continue reading Monte Carlo methods and Markov chains – an introduction

# Training a restricted Boltzmann machine on a GPU with TensorFlow During the second half of the last decade, researchers have started to exploit the impressive capabilities of graphical processing units (GPUs) to speed up the execution of various machine learning algorithms (see for instance  and  and the references therein). Compared to a standard CPU, modern GPUs offer a breathtaking degree of parallelization - … Continue reading Training a restricted Boltzmann machine on a GPU with TensorFlow

# Training restricted Boltzmann machines with persistent contrastive divergence In the last post, we have looked at the contrastive divergence algorithm to train a restricted Boltzmann machine. Even though this algorithm continues to be very popular, it is by far not the only available algorithm. In this post, we will look at a different algorithm known as persistent contrastive divergence and apply it to … Continue reading Training restricted Boltzmann machines with persistent contrastive divergence

# Learning algorithms for restricted Boltzmann machines – contrastive divergence In the previous post on RBMs, we have derived the following gradient descent update rule for the weights. $latex \Delta W_{ij} = \beta \left[ \langle v_i \sigma(\beta a_j) \rangle_{\mathcal D} - \langle v_i \sigma(\beta a_j) \rangle_{P(v)} \right] &s=1$ In this post, we will see how this update rule can be efficiently implemented. The first thing … Continue reading Learning algorithms for restricted Boltzmann machines – contrastive divergence

# Restricted Boltzmann machines In the previous post, we have seen that a Boltzmann machine as studied so far suffers from two deficiencies. First, training is very slow as we have to run a Gibbs sampler until convergence for every iteration of the gradient descent algorithm. Second, we can only see the second moments of the data distribution and … Continue reading Restricted Boltzmann machines