Tag Archives: python


Don’t Get Lost in the Forest IV – Performance Test

Having discussed and implemented a random forest algorithm in the last three posts, we will finalize this series by comparing the performance of our two implementations with the random forest implementation of the Python package sklearn. For the test we will use a randomly created data set consisting of 20000 samples with 20 features and two classes. We will train 1000 trees and make two examples with different maximum tree-depth:

  1. depth = 1 (stumps)
  2. depth = 5

In addition to the discussed GPU implementation, we have a version using CUDA-Streams, a way to perform multiple CUDA operations simultaneously (beyond multi-threaded parallelism), which helps hidding time lost by copying memory. We will leave this topic for a possible future post, but will use the implementation for performance tests.

The tests have been performed on a machine with an Intel Core i7-4771 CPU @ 3.50GHz, 16 GB RAM and a NVIDIA Titan Black GPU.

getting the following result:

For Python we use the following script, reading in the same test data as in .NET.

It builds the 1000 trees in between 43 s (single CPU core) and 9.6 s (all CPU cores). This is much faster than our CPU implementation, which was expected as our code, though reasonably well optimized, is not expected to catch up with the highly optimized sklearn library. Our GPU implementation however still gets a nice speed up of over 4 against the most parallel Python run.

For deeper trees with depth 5 we get the following results:

For these parameters the Python code needs between 3min 40s (single CPU core) and 44 s (all CPU cores). Here the speedup is smaller, what can be explained as the GPU implementation is parallel on the number of samples and they shrink with every split.


How to use Alea.cuBase in Python


Python is often used for scripting and rapid prototyping. In this post we illustrate how we can integrate Alea.cuBase and Python so that we can call GPU algorithms coded with Alea.cuBase conveniently in Python.

In this post we rely on Python for .NET. It provides a nearly seamless integration of Python with the .NET Common Language Runtime (CLR). Note that it does not implement Python as a first-class CLR language, nor does it translate Python code to managed code IL code. It is rather an integration of the C Python engine with the .NET runtime.

An alternative approach would be to use IronPython, which is a is an implementation of the Python programming language targeting the .NET Framework, entirely written in C#. However, because IronPython has some limitations in using very useful Python libraries such as matplotlib, we prefer to work with C Python and Python for .NET.

Setting up the Environment

We suggest that you install the Python tools for Visual Studio from http://pytools.codeplex.com which turn Visual Studio into a nice Python IDE, supporting both CPython and IronPython.


If you are going to use IronPython, all that is needed is to install IronPython from http://www.ironpython.net.

Python for .NET

Python for .NET consists of two components:

  1. clr.pyd, a Python module interfacing with the .NET world
  2. Python.Runtime.dll, an assembly used by clr.pyd

We need to compile Python for .NET to use .NET 4.0 framework and the proper Python versions. Currently, Python for .NET surpported Python version from 2.3 to 2.7 Checkout the source of Python for .NET from https://pythonnet.svn.sourceforge.net/svnroot/pythonnet/trunk.

It contains one solution file for VS 2008. Open it with VS 2010, the conversion will succeed without errors. To compile Python for .NET to use Python 2.7 and .NET 4.0 the following steps are required:

Right-click on project “Python.Runtime” and select “Properties”, select “Application” tab and change the “Target framework” to “.NET Framework 4”. Then open the file pythonnet\pythonnet\src\runtime\buildclrmodule.bat and change the following command:


Attention, it appears two times. Next, open the file clrmodule.il and change the lines with the version number in the following piece of code:



To change the Python interpreter version, right-click on project “Python.Runtime” and select “Properties”. In the “Build” tab, “Conditional compilation symbols”, change “PYTHON26” to “PYTHON27″ to select the Python 2.7 interpreter.

The last step is to patch methodbinder.cs. Replace the method MatchParameters with the following code:

[sourcecode language=”csharp”]
private static bool _RetrieveGenericArguments(List<Type> gts, Type pt, Type it)
bool ok = true;
if (pt.GUID == new Guid())
else if (pt.IsGenericType && it.IsGenericType && it.GetGenericTypeDefinition().GUID == pt.GUID)
var pts = pt.GetGenericArguments();
var its = it.GetGenericArguments();
for (int i = 0; i < pts.Length; ++i)
ok &= _RetrieveGenericArguments(gts, pts[i], its[i]);
else if (!pt.IsGenericType && !it.IsGenericType && pt.GUID == it.GUID)
// nothing
ok = false;
return ok;

internal static MethodInfo MatchParameters(MethodInfo[] mis, Type[] its)
foreach (var mi in mis)
if (!mi.IsGenericMethodDefinition) continue;

var pts = (from p in mi.GetParameters() select p.ParameterType).ToArray();
if (pts.Length != its.Length) continue;

var n = pts.Length;
var gts = new List<Type>();
bool ok = true;
for (int i = 0; i < n; ++i)
ok &= _RetrieveGenericArguments(gts, pts[i], its[i]);
if (!ok) continue;
if (gts.Count != mi.GetGenericArguments().Length) continue;
return mi.MakeGenericMethod(gts.ToArray());

return null;

Now recompile the project “Python.Runtime”.

After a sucessful build you can test it with the following simple Python script:

[sourcecode language=”python”]
import sys


import clr, System

print System.Environment.

you can also print out the sys.path

print ‘—–‘
for p in sys.path:
print p
print ‘—–‘

Note that the path C:\dev\pythonnet\pythonnet\src\runtime\bin\Release has to point to the location of the module clr.pyd and the assembly Python.Runtime.dll.

Interfacing Python and .NET

In order to use a private assembly, use clr.AddReference() function. For example to use the assembly “Test.dll” call clr.AddReference(“Test”) to load it.

We refer to http://ironpython.net/documentation/dotnet/dotnet.html for how to interoperate with .NET from Python.

Preparing a .NET Assembly with GPU Code

We create an F# library project, referencing Alea.CUDA. Make sure that you set the “Copy Local” property of the Alea.CUDA assembly refernce to true. The example below provides a simple kernel adding two arrays and some helper class DeviceWorkerHelper, which exposes some module load functions to get around some limitations of Python for .NET with class extension methods.

[sourcecode language=”fsharp”]
module Lib.Test

open Alea.CUDA

let a = [| 1.0; 2.0 |]

let pfunct = cuda {
let! kernel =
<@ fun (C:DevicePtr<float>) (A:DevicePtr<float>) (B:DevicePtr<float>) ->
let tid = threadIdx.x
C.[tid] <- A.[tid] + B.[tid] @>
|> defineKernelFunc

return PFunc(fun (m:Module) (A:float[]) (B:float[]) ->
let n = A.Length
use A = m.Worker.Malloc(A)
use B = m.Worker.Malloc(B)
use C = m.Worker.Malloc(n)
let lp = LaunchParam(1, n)
kernel.Launch m lp C.Ptr A.Ptr B.Ptr
C.ToHost()) }

type DeviceWorkerHelper(worker:DeviceWorker) =
member this.LoadPModule(f:PFunc<‘T>, m:Builder.PTXModule) = worker.LoadPModule(f, m)
member this.LoadPModule(fm:PFunc<‘T> * Builder.PTXModule) = worker.LoadPModule(fm)
member this.LoadPModule(f:PFunc<‘T>, m:Builder.IRModule) = worker.LoadPModule(f, m)
member this.LoadPModule(fm:PFunc<‘T> * Builder.IRModule) = worker.LoadPModule(fm)
member this.LoadPModule(t:PTemplate<PFunc<‘T>>) = worker.LoadPModule(t)

Calling a GPU Kernel from Python

The following Python script shows how to call the kernel from the Test assembly:

[sourcecode language=”python”]
import sys
import clr, System



from Alea.CUDA import Engine, Framework
from Lib import Test

worker = Engine.workers.DefaultWorker
print worker.Name
worker = Test.DeviceWorkerHelper(worker)

A = System.Array[System.Double]([1.0, 2.0, 3.0, 4.0])
B = System.Array[System.Double]([1.5, 2.5, 3.5, 4.5])

def test(pfuncm):
C = pfuncm.Invoke.Invoke(A).Invoke(B)
for x in C: print x,
print “”

print “Loading into worker”
pfuncm = worker.LoadPModule(Test.pfunct)

print “Invoking GPU kernel”

Executing the script produces the following output:

Unfortunately this script cannot be executed in the Python Interactive inside Visual Studio, because the Python REPL process exits with a StackOverflowException at the import of Alea.CUDA.


We have show how to use Alea.cuBase in Python with a suitable modification of Python for .NET. If you just want to do rapid prototyping together with some simple plotting and visualization we suggest that you also take a look at the F# interactive and the FSharpChart library.