Tag Archives: GPU


Play with Particles I – The Physics and CPU Implementation

This blog starts a sequence of posts introducing an implementation of the n-body problem, i.e. the simulation of Newton’s equations for many bodies or particles. The goal of these posts is to illustrate the cross platform capabilities of Alea GPU and its interoperability with OpenGL.

This post explains the physics and a CPU implementation. In the second post we provide two GPU implementations using Alea GPU and benchmark them on Mono running on a Ubuntu Linux machine. The last post considers how to visualize the particle dynamics with OpenGL. All code is fully cross platform and runs on Windows, Linux or Mac OS X. We even tested it on the Jestson TK1 embedded development kit.

N-body simulation can be solved analytically for two particles, but in general has no analytic solution for $n>2$. It is often used in astronomy, where particles mostly represent stars (but might also represent parts of a planet as in impact studies resulting in moon formation) and the interacting forces are the classical Newtonian gravitational forces. Hence, n-body simulations can be used to study the formation of galaxies.

In physics, chemistry and material sciences, where particles represent atoms or molecules, the method is called molecular dynamics. Often periodic boundary conditions are used to avoiding strong boundary effects, and thermo- and bariostates are attached to simulate constant temperature and/or constant pressure ensemble, known as the Nosé-Hoover thermostat algorithm. Some interesting simulations can also be found on YouTube, for example a simulation of water vaporizing (annoying sound) and melting ice.

Finally, the method is applied in computer graphics to simulate explosions, the flow of water, smoke and many more.

In this article we focus on an example form astronomy. It is similar to NVIDIA’s example code (see also NVIDIA’s documentation).

The Algorithm and the Physics Behind It

The n-body problem is all about solving Newton’s equations

$$\frac{d^2}{dt_i^2} x = \frac{F_i}{m_i},$$

where we use the gravitational force

$$F_i = m_i \sum_{i \neq j} \frac{m_j}{\left| x_i – x_j \right|}$$

induced by all other particles. In fact we will slightly change this potential by adding a softening factor so that the gravitational force becomes

$$F_i = m_i \sum_{i \neq j} \frac{m_j}{\sqrt{\left| x_i – x_j \right|^2+ \rm{softening}^2}}.$$

With this modified gravitational force, although physically an approximation, we have removed the singularity for close particles. The acceleration on particle $i$ is then

$$a_i = \sum_{i \neq j} \frac{m_j}{\sqrt{\left| x_i – x_j \right|^2+ \rm{softening}^2}}.$$

Newton’s equations imply conservation of energy, momentum and angular momentum. Using the wrong scheme for numerical integration or too big time steps destroys these conservation laws. Integration schemes that preserve the conservation laws are the simplectic integrators such as verlet and the velocity verlet algorithm. In this post we follow NVIDIA’s example and use an integration scheme including a damping term $<0$, which slightly reduces the energy in every integration step

$$v_{t+dt} = \rm{damping} \left(v_t + a_t dt\right),$$

$$x_{t+dt} = x_{t} + v_{t+dt}  dt.$$

Note that there is a factor of $\frac{1}{2}$ missing for the position integration. This is adding slightly more energy to the system.

CPU Implementation

We use an abstract classes so that we can implement different versions for CPUs and GPUs.

Furthermore, we need some methods to initialize the particles’ position and velocity:

The next function adjusts the momentum of every particle such that the total momentum is 0:

The function bodyBodyInteraction will used on the CPU as well as on the GPU and therefore has the attribute [<ReflectedDefinition>]. It calculates the acceleration between two particles at positions bi and bj. The other arguments are the softeningSquared factor and the acceleration ai on particle $i$ from particle interactions calculated so far. It returns ai plus the acceleration of the particle at position bj exerting on the particle at position bi. The positions are of type float4, where the w-coordinate is the particle’s mass. We use the intrinsic GPU function __nv_rsqrtf which calculates $f(x) = \frac{1}{sqrt(x)}$ and has an optimized GPU implementation.

We now need a method which calls the bodyBodyInteraction function for all pairs of particles. Because of our modified gravitational force with a softening factor we can simplify the implementation and also add the acceleration of each particle with itself, which is just zero. We can safe an if condition and obtain branch free code, which is important on the GPU. The time integration using a damping term can now be implemented as follows:

We have everything ready to implement the abstract interfaces ISimulator and ISimulatorTester.

This completes the CPU implementation of the n-body simulation. We can now create an ISimulator and steps forward in time using the Integrate method.

In the next blog post we will focus on the GPU implementation. In a third post we will show how to use OpenGL through the Open Toolkit library to visualize the results directly on the GPU.


Radically Simplified GPU Parallelization: The Alea Dataflow Programming Model

Many programmers still leave the massive GPU parallel power unused – be it because of lacking experience in CUDA or because of limited time and budget. We aim to drastically simplify GPU parallelization by introducing our Alea dataflow programming model based on .NET. Complex computations can be easily and rapidly composed of a set of prefabricated and customizable operations that underlie asynchronous execution. The run-time system automatically translates this abstract model to efficient GPU code and schedules the operations with minimum memory transfers. By way of illustrative application cases of finance and statistics, we explain the model, take a look at the run-time system, and demonstrate its performance that proves to be as good as in manually optimized CUDA implementations.

Presentation slides


GPU Accelerated Backtesting and ML for Quant Trading Strategies

In algorithmic trading large amounts of time series data are analyzed to derive buy and sell orders so that the strategy is profitable but also risk measures are at an acceptable level.
Bootstrapping walk forward optimization is becoming increasingly popular to avoid curve fitting and data snooping. It is computationally extremely expensive but can be very well distributed to a GPU cluster.

We present a framework for bootstrapping walk forward optimization of trading strategies on GPU clusters, which allows us to analyze strategies in minutes instead of days. Moreover, we show how signal generation can be combined with Machine Learning to make the strategies more adaptive to further improve the robustness and profitability.

Presentation slides