This blog starts a sequence of posts introducing an implementation of the n-body problem, i.e. the simulation of Newton’s equations for many bodies or particles. The goal of these posts is to illustrate the cross platform capabilities of Alea GPU and its interoperability with OpenGL.

This post explains the physics and a CPU implementation. In the second post we provide two GPU implementations using Alea GPU and benchmark them on Mono running on a Ubuntu Linux machine. The last post considers how to visualize the particle dynamics with OpenGL. All code is fully cross platform and runs on Windows, Linux or Mac OS X. We even tested it on the Jestson TK1 embedded development kit.

N-body simulation can be solved analytically for two particles, but in general has no analytic solution for $n>2$. It is often used in astronomy, where particles mostly represent stars (but might also represent parts of a planet as in impact studies resulting in moon formation) and the interacting forces are the classical Newtonian gravitational forces. Hence, n-body simulations can be used to study the formation of galaxies.

In physics, chemistry and material sciences, where particles represent atoms or molecules, the method is called molecular dynamics. Often periodic boundary conditions are used to avoiding strong boundary effects, and thermo- and bariostates are attached to simulate constant temperature and/or constant pressure ensemble, known as the Nosé-Hoover thermostat algorithm. Some interesting simulations can also be found on YouTube, for example a simulation of water vaporizing (annoying sound) and melting ice.

Finally, the method is applied in computer graphics to simulate explosions, the flow of water, smoke and many more.

In this article we focus on an example form astronomy. It is similar to NVIDIA’s example code (see also NVIDIA’s documentation).

# The Algorithm and the Physics Behind It

The n-body problem is all about solving Newton’s equations

$$\frac{d^2}{dt_i^2} x = \frac{F_i}{m_i},$$

where we use the gravitational force

$$F_i = m_i \sum_{i \neq j} \frac{m_j}{\left| x_i – x_j \right|}$$

induced by all other particles. In fact we will slightly change this potential by adding a softening factor so that the gravitational force becomes

$$F_i = m_i \sum_{i \neq j} \frac{m_j}{\sqrt{\left| x_i – x_j \right|^2+ \rm{softening}^2}}.$$

With this modified gravitational force, although physically an approximation, we have removed the singularity for close particles. The acceleration on particle $i$ is then

$$a_i = \sum_{i \neq j} \frac{m_j}{\sqrt{\left| x_i – x_j \right|^2+ \rm{softening}^2}}.$$

Newton’s equations imply conservation of energy, momentum and angular momentum. Using the wrong scheme for numerical integration or too big time steps destroys these conservation laws. Integration schemes that preserve the conservation laws are the simplectic integrators such as verlet and the velocity verlet algorithm. In this post we follow NVIDIA’s example and use an integration scheme including a damping term $<0$, which slightly reduces the energy in every integration step

$$v_{t+dt} = \rm{damping} \left(v_t + a_t dt\right),$$

$$x_{t+dt} = x_{t} + v_{t+dt} dt.$$

Note that there is a factor of $\frac{1}{2}$ missing for the position integration. This is adding slightly more energy to the system.

# CPU Implementation

We use an abstract classes so that we can implement different versions for CPUs and GPUs.

1 2 3 4 5 6 7 8 9 10 11 |
type ISimulator = abstract Description : string abstract Integrate : newPos:deviceptr -> oldPos:deviceptr -> vel:deviceptr -> numBodies:int -> deltaTime:float32 -> softeningSquared:float32 -> damping:float32 -> unit type ISimulatorTester = abstract Description : string abstract Integrate : pos:float4[] -> vel:float4[] -> numBodies:int -> deltaTime:float32 -> softeningSquared:float32 -> damping:float32 -> steps:int -> unit |

Furthermore, we need some methods to initialize the particles’ position and velocity:

1 2 3 4 5 6 7 8 9 10 11 12 13 |
let initializeBodies1 clusterScale velocityScale numBodies = let rng = System.Random(42) let random a b = rng.NextDouble()*a - b |> float32 let scale = clusterScale * max 1.0f (float32 numBodies/1024.0f) let vscale = velocityScale*scale let randomPos _ = scale * random 1.0 0.5 let randomVel _ = vscale * random 1.0 0.5 let pos = Array.init numBodies (fun _ -> float4(randomPos(), randomPos(), randomPos(), 1.0f)) let vel = Array.init numBodies (fun _ -> float4(randomVel(), randomVel(), randomVel(), 1.0f)) pos, vel |> adjustMomentum |

The next function adjusts the momentum of every particle such that the total momentum is 0:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
let adjustMomentum (vel : float4[]) = // we assume the mass is stored in vel.w let totalMomentum = vel |> Array.fold (fun (acc:float4) e -> float4(acc.x + e.x*e.w, acc.y + e.y*e.w, acc.z + e.z*e.w, acc.w + e.w)) float4Zero printfn "total momentum and mass 0 = %A" totalMomentum let len = Array.length vel |> float32 let vel = vel |> Array.map (fun e -> float4(e.x - totalMomentum.x / len / e.w, e.y - totalMomentum.y / len / e.w, e.z - totalMomentum.z / len / e.w, e.w)) // initialize with total momentum = 0 let totalMomentum = vel |> Array.fold (fun (acc:float4) e -> float4(acc.x + e.x*e.w, acc.y + e.y*e.w, acc.z + e.z*e.w, acc.w + e.w)) float4Zero printfn "total momentum and mass 1 = %A" totalMomentum vel |

The function bodyBodyInteraction will used on the CPU as well as on the GPU and therefore has the attribute [<ReflectedDefinition>]. It calculates the acceleration between two particles at positions bi and bj. The other arguments are the softeningSquared factor and the acceleration ai on particle $i$ from particle interactions calculated so far. It returns ai plus the acceleration of the particle at position bj exerting on the particle at position bi. The positions are of type float4, where the w-coordinate is the particle’s mass. We use the intrinsic GPU function __nv_rsqrtf which calculates $f(x) = \frac{1}{sqrt(x)}$ and has an optimized GPU implementation.

1 2 3 4 5 6 7 8 9 10 11 |
[<ReflectedDefinition>] let bodyBodyInteraction softeningSquared (ai : float3) (bi : float4) (bj : float4) = let r = float3(bj.x - bi.x, bj.y - bi.y, bj.z - bi.z) let distSqr = r.x*r.x + r.y*r.y + r.z*r.z + softeningSquared let invDist = __nv_rsqrtf distSqr let invDistCube = invDist * invDist * invDist let s = bj.w * invDistCube float3(ai.x + r.x*s, ai.y + r.y*s, ai.z + r.z*s) |

We now need a method which calls the bodyBodyInteraction function for all pairs of particles. Because of our modified gravitational force with a softening factor we can simplify the implementation and also add the acceleration of each particle with itself, which is just zero. We can safe an if condition and obtain branch free code, which is important on the GPU. The time integration using a damping term can now be implemented as follows:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
let integrateNbodySystem (accel : float3[]) (pos : float4[]) (vel : float4[]) (numBodies : int) (deltaTime : float32) (softeningSquared : float32) (damping : float32) = // Force of i particle on itselfe is 0 because of the regularisatino of the force. // As fij = -fji we could save half of time, but implement it here as on GPU. for i = 0 to numBodies - 1 do let mutable acc = float3(0.0f, 0.0f, 0.0f) for j = 0 to numBodies - 1 do acc <- bodyBodyInteraction softeningSquared acc pos.[i] pos.[j] accel.[i] <- acc for i = 0 to numBodies - 1 do let mutable position = pos.[i] let accel = accel.[i] let mutable velocity = vel.[i] velocity.x <- velocity.x + accel.x * deltaTime velocity.y <- velocity.y + accel.y * deltaTime velocity.z <- velocity.z + accel.z * deltaTime velocity.x <- velocity.x * damping velocity.y <- velocity.y * damping velocity.z <- velocity.z * damping position.x <- position.x + velocity.x * deltaTime position.y <- position.y + velocity.y * deltaTime position.z <- position.z + velocity.z * deltaTime // store new position and velocity pos.[i] <- position vel.[i] <- velocity |

We have everything ready to implement the abstract interfaces ISimulator and ISimulatorTester.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
let createSimulator(worker : Worker, numBodies : int) = // create arrays for work, store in closure, to save time for allocation. let haccel = Array.zeroCreate numBodies let hpos = Array.zeroCreate numBodies let hvel = Array.zeroCreate numBodies let description = "CPU.Simple" { new ISimulator with member x.Description = description member x.Integrate newPos oldPos vel numBodies deltaTime softeningSquared damping = worker.Gather(oldPos, hpos) worker.Gather(vel, hvel) integrateNbodySystem haccel hpos hvel numBodies deltaTime softeningSquared damping } let createSimulatorTester(numBodies : int) = let description = "CPU.Simple" let accel = Array.zeroCreate numBodies { new ISimulatorTester with member x.Description = description member x.Integrate pos vel numBodies deltaTime softeningSquared damping steps = for i = 1 to steps do integrateNbodySystem accel pos vel numBodies deltaTime softeningSquared damping } |

This completes the CPU implementation of the n-body simulation. We can now create an ISimulator and steps forward in time using the Integrate method.

In the next blog post we will focus on the GPU implementation. In a third post we will show how to use OpenGL through the Open Toolkit library to visualize the results directly on the GPU.