Don’t Get Lost in the Forest I – The Algorithm

With this blog post we start a series on the machine learning algorithm random forest and how to accelerate the training process with GPUs. For simplicity we focus on continuous features. If you are interested in categorical features you might want read Mathias’ blog.

In this first post we explain what a decision tree is, how to go from a decision tree to a random forest and discuss the advantages of random forests over decision trees. We finish by describing a random forest training algorithm. In a second post we go through the important aspects of the algorithm’s CPU implementation and will do the same for the GPU implementation in a third post. The series will end with a post on the performance of the different implementations.

Random forests were developed by Leo Breiman and Adele Cutler, their dominance in the field of machine learning has been illustrated nicely in a blog post at Strata 2012: “Ensembles of decision trees (often known as random forests) have been the most successful general-purpose algorithm in modern times” when it comes to “automatically identify the structure, interactions, and relationships in the data”. Furthermore it is noted that “most Kaggle competitions have at least one top entry that heavily uses this approach”. Random forests also have been the algorithm of choice for the body part recognition of Microsoft’s Kinect, a motion sensing input devices for Xbox consoles and Windows PCs. More details can be found in the paper Why did Microsoft decide to use Random Forests in the Kinect.

Random forests consist of an ensemble of decision trees. We therefore start to look at decision trees. A simple decision tree could be used to decide what you need to take with you when leaving the house:


This tree has been constructed by reason only, not involving any data. In machine learning however the trees are created such that they optimally predict a given sample of data (training data).

Petal and Sepal
Example: Petal vs. Sepal

For this series we will use the well-known Iris data set as training data. It contains four features (the length and the width of sepals and petals) of three Iris species:


Image of Iris Setosa
Iris Setosa
Iris versicolor
Iris versicolor
Iris virginica
Iris virginica

Random forests can play their strength when having hundreds of different features. But few features make a data set much better accessible as an example. Checking the scatter plot of all features, you can see the clustering of the different types.

Scatterplot Iris data
Scatterplot Iris data

The scatter plot illustrates that Setosa is nicely distinguishable from the other two types. Keeping Virginica apart from Versicolor is much more difficult. A decision tree predicting the Iris specie from these features might look like:


This tree does not classify all the examples correctly. We could change this by increasing the maximum tree depth. By doing so, the tree can predict the sample data 100% correctly, but only by learning the noise in the samples. In the most extreme case the algorithm is equivalent to a dictionary containing every sample. This is knows as over-fitting and leads to bad results when using it for out of sample predictions. In order to overcome over-fitting, we can train multiple decision trees, by introducing weights for the samples and only considering a random subset of the features for each split. The final decision of the random forest will be determined by a majority vote on the trees’ predictions. This technique is also known as bagging. It helps to decrease the variance (error from sensitivity to noise in the training set) without increasing the bias (error introduced from not enough flexibility of the model).


We first explain, how to build decision trees. In a second part we look at how to create a random forest.

Building Decision Trees

To build good decision trees a discrimination measure is required. A common choice is to use the split information or entropy as described in Efficient C4.5 as measure. It is definded as

$$\rm{Split}(T) = \sum_{i,b} \frac{T_{i,b}}{T} \log_2 \left( \frac{T_{i,b}}{T} \right),$$

where $T_{i,b}$ is the number of samples with label $i$ in branch $b$ after the split and $T$ is the total number of samples. Other possible metrics are the information gain, the Gini impurity and the variance reduction.

The search for a decision tree which is a global optimum for the discrimination measure is known as a NP-complete problem. We therefore focus on the a local optimization where for each split the local optimum is chosen, i.e. for every node we check which feature to use for splitting and where to split to get the lowest split information.

The steps to build a decision tree are:

  1. For all features (e.g. for Iris data set the sepal-length, sepal-width, petal-length, petal-width), take all samples and order them by value.
  2. For every split possibility, i.e. two successive samples with different values, calculate the split entropy:
    – Make a histogram based on labels for both branches after the split.
    – Calculate the split entropy $\rm{Split}(T)$.
  3. Choose the split for the feature and split threshold with the smallest split entropy.
  4. Apply the algorithm from step 1 on all resulting sub-branches, until either all features have the same label, or the predefined maximal three depth has been reached.

We illustrate the calculation of the split entropy with a simple example. Assume we have ten samples for a single continuous feature, with three different labels all of them with weight one and the following value-label-map:

feature value (sorted) label
1 1
2 2
3 1
4 1
5 1
6 1
7 3
8 3
9 2
10 3

We have nine split possibilities 1.5, 2.5, …, 9.5. The split between 1 and 2 at 1.5 would lead to the following two histograms. The lower branch would be

label counts
1 1
2 0
3 0

whereas the upper branch would be

label counts
1 4
2 2
3 3

The split at 1.5 leads to a split entropy of

$$\frac{1}{1} \log_2 \left( \frac{1}{1} \right) + \frac{0}{1} \log_2 \left( \frac{0}{1} \right) + \frac{0}{1} \log_2 \left( \frac{0}{1} \right) + \frac{4}{9} \log_2 \left( \frac{4}{9} \right) + \frac{2}{9} \log_2 \left( \frac{2}{9} \right) + \frac{3}{9} \log_2 \left( \frac{3}{9} \right)= 13.77.$$

We calculate the split entropies for all the other splits possibilities and plot them as a function of the split threshold:


We see from the plot that a split between 6 and 7 is ideal. The same procedure is now applied to the lower (samples 1-6) and the upper (samples 7-10) branch and the sub-branches thereof until either the maximum depth has been reached or the sub-branch only contains a single sample.

Bagging Decision Trees to Random Forests

Decision trees, in particular those with many levels, suffer from over-fitting. To reduce the risk of over-fitting many different decision trees are trained and combined to an ensemble with majority vote. This decreases the variance of the decision trees without increasing the bias. In order to get different trees, randomness is introduced at two points:

  1. Every sample gets a weight which is randomly chosen for every tree. The weights are chosen by randomly selecting n samples out of the existing n samples with replacement. The histograms are then created not by counts, but by sums on these weights.
  2. For every split only some randomly chosen features are considered.

To gain a deeper understanding the following video of Jeremy Howard is helpful:


Webinar on how to Accelerate .NET Applications with GPUs

Software companies use frameworks such as .NET to target multiple platforms from desktops to mobile phones with a single code base in order to reduce costs by leveraging existing libraries and to cope with changing trends. While developers can easily write scalable parallel code for multi-core CPUs on .NET, they face a bigger challenge using GPUs to tackle compute intensive tasks.

Alea GPU closes this gap by bringing GPU computing directly into the .NET ecosystem.

Register here for a free webinar where you can learn how to write great cross platform GPU accelerated .NET applications in any .NET language much easier than ever before.


Alea GPU 2.1 Released

Not too long ago we released the new Alea GPU 2.0 release, which was a major step forward for GPU computing on .NET. Today we can announce Alea GPU 2.1. It is a maintenance release but also brings some new interesting features.

First of all Alea GPU 2.1 has integrated cuDNN, a GPU-accelerated library of primitives for deep neural networks, which are very much in vogue these days.

The new version also supports printing from GPU kernels, either with printf/printfn in F# or Console.Write/Console.WriteLine in C# based GPU kernels. This is a very handy tool for quickly debugging GPU kernels or understand them more thoroughly.

Also important is supporting IntPtr in malloc and int64 indexing, which allows to address device memory beyond the 4GB boundary.

Finally, some experts requested support for atomicCAS and __shfl_xor. Unfortunately atomicCAS has an issue on Linux which could not be resolved in time. We hope it will be fixed with the upgrade to CUDA 7.5, which will be released this summer.

The Alea Tutorial will be updated soon as well with some examples how to use cuDNN directly with Alea GPU.


Alea GPU on Amazon EC2 Instances

The Amazon Elastic Compute Cloud (Amazon EC2) provides resizable compute capacity in the cloud. EC2 offers a wide selection of instance types optimized to fit different use cases. For GPU applications, the g2.2xlarge instance has 8 CPU core and 1 GRID K520 GPU, the larger g2.8xlarge instance has 32 CPU cores and 4 GRID K520 GPUs.

To simplify the utilization of Alea GPU on EC2 we created two community AMIs (Amazon Machine Images) with up to date CUDA drivers and pre-installed tools so that you can start using Alea GPU right away:

  • win2012r2-cuda70-vsc2013 with AMI ID ami-ab1ef4c0 is Windows Server 2012 R2 image with Visual Studio 2013 Community Edition and TightVNC
  • ubuntu14.04-cuda70-mono with AMI ID ami-670ee40c is an Ubuntu Trusty Tahr 14.04 image and comes with Mono.ubuntu-ami

On both AMIs an up to date GRID GPU driver and the CUDA toolkit 7.0 is installed. The AMIs can be used out of the box to experiment with Alea GPU on either Windows or Linux.

The AMIs are available in the geographic location US East (N. Virginia).

Using GRID K520 GPUs on Windows

On Windows the GRID K520 GPUs do not support the TCC mode, which means that they cannot be used with a Remote Desktop connection. The simple way out is to use a VNC server or TeamViewer. The AMI win2012r2-cuda70-vsc2013 has therefore TightVNC server pre-installed. It is launched at system boot time as a Windows service. The VNC server listens on port 5900. The password for connection and configuration is vncxyz. These settings can be changed easily after booting.

Booting an Instance of the Windows AMI win2012r2-cuda70-vsc2013

We assume that you have an AWS account and that you created a key pair.
Here are the steps to launch an instance of the AMI win2012r2-cuda70-vsc2013:

    1. Log on to the AWS console and select the EC2 Dashboard. If you do not yet have a key pair select Key Pairs under Network & Security and follow the instructions. Also make sure that you selected the geographic location US East (N. Virginia) (or any other location on which the AMIs are available) in the top right corner.ec2-dashboard
    2. Press the Launch Instance button to get to the Quick Start wizard.
    3. Select the Community AMIs and enter win2012r2-cuda70-vsc2013 in the search field. Press Select.


    1. Scroll down to find the G2 instances. Select either g2.2xlarge or g2.8xlarge. Note that both instances are paid. Press Next: Configure Instance Details.


  1. Choose the Purchasing options. Spot instances are usually less expensive but may not be available immediately or may be interrupted if the spot price rises above the user supplied spot price limit. Also note that instances purchased at spot price cannot be stopped. Press Next: Add Storage.ec2-launch-3
  2. Increase the partition size or add additional storage as required. Skip Next: Tag Instance and proceed to Configure Security Groupec2-launch-4
  3. Create a security group with custom TCP rule for port 5900 or port range 5900-5910. This is required to connect via VNC to the instance. The RDP port rule for port 3389 is added by default. Press Review and Launch.ec2-launch-6
  4. Review your settings and press Launch.ec2-launch-7
  5. Choose your key pair for this instance. Press Launch Instances.ec2-launch-8
  6. The Launch Status window appears. Press View Instances.ec2-launch-9
  7. The instance boot process takes some time. After successful booting the instance is visible under Instances. Select the desired instance and press Connect.ec2-launch-10
  8. A popup window appears where you can download the Remote Desktop File and get the password to connect to the instance.ec2-launch-11
  9. To decrypt the password you have to supply the private key file that you generated at the beginning.ec2-launch-12
  10. To decrypt the password you have to supply the private key file that you generated at the beginning. Press Decrypt Password to get the Administrator password.ec2-launch-13
  11. Start a VNC client, such as TightVNC and connect to the instance. Enter the IP address followed by the port number 5900 and press Connect.ec2-launch-14
  12. Supply the password vncxyz.ec2-launch-15
  13. Send Ctrl-Alt-Del.ec2-launch-16
  14. Provide the password that you decrypted before. You may start the on screen keyboard if you have difficulties to enter the special characters in the password.ec2-launch-17
  15. Once connected, start a command line console and execute nvidia-smi to verify if the GRID K520 GPUs are

Booting an Instance of the Ubuntu AMI ubuntu14.04-cuda70-mono

Linux does not have the issue with the TCC mode.

Starting a instance of the Ubuntu AMI is very similar. The main difference is that the security group has to have a port rule for SSH on port 22. Then you connect directly with ssh to the instance.

If you want to experiment with the CUDA debugger use the Windows AMI win2012r2-cuda70-vsc2013 with pre-installed Visual Studio 2013 Community Edition.

Exploring Alea GPU

Now you can start exploring Alea GPU.

Important: the GRID K520 GPU is an enterprise GPU. Send us a mail to support(at) or sales(at) to request a temporary demo license.