# Writing Monte Carlo Simulation for Derivative Pricing with Alea GPU

This post shows how to write a GPU accelerated Monte Carlo simulation for financial derivative pricing from scratch with F# and Alea GPU. We first look at the generation of random numbers with the Alea.CUDA.Unbound library and then write a Monte Carlo simulation running on the GPU. A brief sample of writing simple functions can be seen in the previous post

The post assume that you have some F# programming experience. Knowledge of the CUDA programming model, the geometric Brownian motion, and the Monte Carlo method are helpful but not necessary to understand most of the code.

To run the code you need an F# development environment such as Visual Studio or Xamarin and a CUDA capable GPU. The code was written and tested with Alea GPU 2.0, Visual Studio 2013 and CUDA 6.5 on Windows.

## Generating Random Numbers

Generating random numbers using Alea.CUDA actually requires no understanding of the CUDA programming model at all. Alea.CUDA comes with the Alea.CUDA.Unbound library that contains two random number generators (XorShift7, Mrg32k3a) and one quasi random number generator (Sobol) out of the box for your CUDA device and the host.

To generate nmc random numbers on your CUDA device you first have to set up a random number generator with

In the code above we have created a XorShift7 random number generator that returns normally distributed single-precision random numbers. numDimensions can be understood as the number of dimensions of the underlying problem. For example if you are randomly drawing points in a plane, you need 2 random numbers to determine a sample and numDimensions = 2. We have to allocate memory on the device to which the random numbers can be written.

It’s instructive to look at the type of prngBuffer. It is DeviceMemory<float32>. Variables of type DeviceMemory<'T> represent memory allocated on the device with the ability to transfer from and to
the device. They can be roughly though of as device versions of 'T[]. A close “relative” is deviceptr<'T> which can be obtained by calling .Ptr on a DeviceMemory<'T> and which we can think of as the pointer to the data on the device that have have to pass to a kernel (more later). Having allocated the memory on the device, we have to instruct the generator to fill the memory with nmc samples and then return the result from the device to the host memory.

Again the types are instructive. DeviceMemory<'T> has a member function Gather : unit -> 'T [] and we can infer from the type that Gather transfers the content from the device to the host.

Having generated the random numbers we can use them on the CPU. Alea.CUDA.Unbound also contains other algorithms that can be used out of the box without a detailed knowledge of the GPU.

The full example is

The plot of the distribution of random numbers generated by the code should look like

## A small Monte Carlo Simulation

The problem we will solve is a simulation of a geometric Brownian motion of the form

$$\frac{dS_t}{S_t} = \mu(t) \, dt + \sigma(t,S_t) \, dW_t$$

also known as a local volatility model in finance due to the dependence of the volatility on the coordinate $S_t$. A discussion of the model is far beyond the scope of this post. We discretize the time axis of the problem into $n$ time steps $t_i$ from $t_0 = 0$ to $t_{n} = T$ and get

$$S_{t_{i+1}} = S_{t_i} \cdot \exp \left[ dt_i \left( \mu(t_i) – \frac{\sigma(t_i,S_{t_i})^2}{2} \right) + \sqrt{dt_i} \, \sigma(t_i,S_{t_i}) \, dW_i \right],$$

where $dt_i = t_{i+1} – t_i$ and the $dW_i$ are drawn independently from a standard normal distribution. This is not accidentally the distribution we have used in the example of the random numbers. We will now implement a Monte Carlo simulation for the GPU bottom up. We start by the implementation of the time step.

This is the formula of the discretization above and not much more can be said. Next we implement the time stepping for a single sample.

We are using the fact that we can pass functions to a kernel. dt, mu, and dW are functions from Step to value and sigma is – as in the formulas above a function from time and value to value and represented as such. This loop is for a single sample only and indifferent to the functional form we use for example for sigma. We could implement and pass a linear interpolation of data. We could also and will do so later pass a constant. Before we can create the final kernel to run we also have to provide an implementation of the sample loop.

Compare the sample loop given by forAll to apply. The are nearly identical. Only the input and output parameters are absent and the function f has the sample id as an additional input.

For the final kernel we decide to have mu and sigma as constants over time and pass the random numbers as a deviceptr<_>.

We provide all required functions and as the resulting function is a function of basic types and deviceptr<_> only, it is a kernel we can compile ahead of time and also do so. We finally combine the kernel with parameters and the random number generation and obtain

The full example is

and the plot of the distribution of $S_T$ should look like

The code is relatively simple and straightforward. By the way, only minor changes are necessary to run the code on the CPU and functions such as eulerLogStep and mcLoop can be reused.

Enjoy playing with Alea GPU …

﻿