In this project we built a 3D Position Based Fluid Simulator based off of Position Based Fluids [Macklin, Muller]. The end product is a program capable of rendering particles with fluid properties such as incompressibility, tensile instability, vorticity, and viscosity. In addition, particle-plane collisions were also implemented to allows for confinement of particles. The next major step was upgrading from the deprecated GLUT library to OpenGL 3.3 which required a large architecture overhaul. However, thanks to this upgrade we now had access to variety of features that parallelize particle initialization, rendering, and shading leading to much more efficiency on the graphics end. Afterwards many changes were made to the UI/UX such as adding a camera, mapping a cubemap as a background view, adding a skybox light, and adding a collection of shaders (Blinn Phong, Refractive, and Velocity) to the particles. To scale the program even further we upgraded to CUDA so that the work of the algorithm's main update step could be parallelized and distributed amongst many GPU cores. This enabled us to render simulations with hundreds of thousands of particles. With the extra time we had left over, we also implemented spherical collisions and rendered a few scenes featuring that feature.
First came the particle, by the first milestone we had developed the following struct for local computation and rendering:
struct Particle {vec3 position;vec3 predicted_position;vec3 delta_position;vec3 velocity;vec3 force;vector<Particle *> neighborhood; }
The struct keeps track of particle positions, the predicted position of the particle for the next time step, a incompressibility pressure correction term delta_pos
, as well as the current velocity and force acting on the particle. Additionally, the algorithms presented to enforce incompressibility require knowledge of the particles neighbors, for which we built a spatial map similar to that in the project 4 cloth simulator to quickly search for the particles in range of the particle of interest.
Having our particle representation, we implemented the following algorithm:
for Particle i in particles:accumulate forces from external accelerationsupdate the particle's velocity based on accumulated forcescalculate the particle's predicted position in the next time stepfor Particle i in particles:find all neighboring particles (within a constant radiusKERNAL_RADIUS )for i in range(SOLVER_ITERS):for Particle i in particles:calculate a correction vectordelta_p to maintain incompressibilityadjust the particle's position by addingdelta_p detect collisions and respondfor Particle i in particles:update velocity to be the change in particle position over the timestepapply vorticity confinement and adjust for viscosityupdate the position to be the predicted position
First forces were accumulated on the particle and its predicted position calculated using explicit Euler. Afterwards, using our spacial map, we find all particles in the surrounding neighborhood of radius
. At this point, we run multiple iterations of our solver to try and satisfy incompressibility constraints by calculating the correction vector delta_pos
. This requires the use of two smoothing kernels. We use the poly6 kernel to perform density operations and the spiky kernel for gradient calculations. These kernels distribute the pressure within a neighborhood of particles to promote incompressibility and prevent particles from being too clumped. The solver runs for multiple iterations to improve the particle spacing. After running the solver, we adjust the velocity of the particle to be the distance the particle is expected to travel over the timestep, applying additional corrections to conserve vorticity and give viscosity to the liquid. Finally we move the particle to its predicted position.
For this step we used explicit Euler to accumulate external forces on the particle and then update their position's based on the particle's current position and velocity.
\(\textbf{v}_{i} \Leftarrow \textbf{v}_{i} + \Delta \textbf{f}_{ext}(\textbf{x}_{i})\)
\(\textbf{x}_{i}^* \Leftarrow \textbf{x}_{i} + \Delta t \textbf{v}_{i}\)
The next step is to find each Particles neighbors in an efficient manner for following calculations in the update step. For this step we followed the procedure outlined in CUDA Particles [Green] . Our interpretation of this approach divides the 3D space of the simulation into bins or voxels sized relative to the kernel radius used, \(h\), and particles are hashed into a voxel based on their positions. With this hash function we can build a spatial map between a hash key and its corresponding bin, where particles that lie in the same bin should hash to the same key. We note that for any given particle all candidate neighbor particles lie within the space immediately surrounding the bin corresponding to the given particle's hash key. This means we need to search only 27 bins and the particles they contain to ensure that we find all neighbors for a given particle. The cutoff distance to be considered a neighbor is chosen empirically. We came up with the following hash function:
int ParticleManager::hash_bin(glm::ivec3 pos) {return (pos.x * 0x9e3779b9 + pos.y) * 1610612741 + pos.z; }
To enforce a constant density in the particles to emulate fluid incompressibility we evaluate a series of equations based off a particle and it's neighbors calculated from the last step. The result of this is a correctional vector, \(\Delta \textbf{p}\), to be applied to each particle. Firstly we express the density constraint on the ith particle as a function of the position of itself and its neighbors.
\(C_{i}(\textbf{p}_{1}, ...,\textbf{p}_{n}) = \frac{\rho_{i}}{\rho_{0}} - 1\)
\(\rho_0\) is the rest density of the fluid, which is found emperically. The density \(\rho_i\) of a particle is found using the standard SPH estimator:
\(\rho_{i} = \sum_j m_j W_(\textbf{p}_i - \textbf{p}_j, h)\)
Where \(W\) is a kernel function and the summation is over all neighbors \(j\), see appendix for details. From this density constraint we evaluate the following scaling factor to apply the correctional force. Note that \(\epsilon\) is an empirically chosen relaxation parameter.
\(\lambda_{i} = -\frac{C_{i}(\textbf{p}_{1}, ...,\textbf{p}_{n})}{\Sigma_{k}\lvert\nabla_{\textbf{p}_{k}}C_{i}\rvert^2 + \epsilon}\)
\(\nabla_{\textbf{p}_k}C_i = \frac{1}{\rho_0}\sum_j\nabla_{\textbf{p}_k}W(\textbf{p}_i-\textbf{p}_j, h)\)
Now with the scaling factor we can write the total position update, \(\Delta \textbf{p}_{i}\), for a given particle \(\textbf{p}_{i}\) as:
\(\Delta \textbf{p}_{i} = \frac{1}{\rho_{0}} \sum_j(\lambda_{i} + \lambda_{j})\nabla W(\textbf{p}_{i} - \textbf{p}_{j}, h)\)
We also note that due to this term, particles do not have to implement self collision since enforcing incompressibility already accomplishes this.
At this point, we were able to generate basic renders where particles attempt to distribute density throughout the fluid body.
Since many of the calculations in the simulation depend on a given particle's neighbors, a deficiency of neighbors leads to noticeable particle clumping. To compensate for this, we add an artificial pressure term, \(s_{corr}\) added to the lambda values in the coefficient of the last equation. specified in terms of the smoothing kernel, \(W\).
\(s_{corr} = -k (\frac{W(\textbf{p}_{i} - \textbf{p}_{j}, h)}{W(\Delta\textbf{q}, h)})^2\)
\(\Delta q\) is a point located a fixed distance inside the smoothing kernel radius and \(k\) is a small positive constant, both of which are empirically determined. This correctional term ensures that particle density is slightly lower than rest density. Furthermore this also causes particles to attract their neighbors producing surface tension effects seen in fluids.
As noted by Muller et al, position based methods such as this one often have an issue with introducing additional damping which produces undesirable results in the simulation. This numerical dissipation is countered by vorticity confinement to replace any lost energy from the system. We first implemented the base procedure outlined by Muller et al:
\(\omega_{i} = \nabla \times \textbf{v} = \sum_{j}\textbf{v}_{ij} \times \nabla_{\textbf{p}_{j}}W(\textbf{p}_{i} - \textbf{p}_{j}, h)\)
\(\textbf{f}_{i}^{vorticity} = \epsilon (\textbf{N} \times \omega_{i})\)
Then we adapted our vorticity model as documented in Bubbles Alive [Hong et al.]
\(\textbf{p}_{\oplus} = \frac{m_{i}\textbf{p}_{i} + m_{j}\textbf{p}_{j}}{m_{i} + m_{j}}\)
\(\eta = \textbf{p}_{\oplus} - \textbf{p}_{i}\)
\(\textbf{N} = \frac{\eta}{|\eta|}\)
Finally, to emphasize coherent motion in the fluid we add XSPH viscosity with an empirically chosen parameter, \(c\) :
\(\textbf{v}_{i}^{new} = \textbf{v}_{i} + c\Sigma_{j}\textbf{v}_{ij}\cdot W(\textbf{p}_{i} - \textbf{p}_{j}, h)\)
The vorticity model gave us a large amount of trouble for a great deal of time as whenever the additional velocity was added back into the system, the simulation tended to blow up and particles would gain extremely large velocities seemingly out of nowhere. By adding a clamp to the maximum about of force that the vorticity adjustment could add and lowering the coefficient with which this force is weighted, we were able to limit this explosion while still being able to see the effects of vorticity confinement. This increases the realism of the render and helps in reducing the stacking effect from before.
To create a basic scene, we also implemented basic particle-plane collisions. If we find a particle is predicted to pass through a plane (its predicted position and current position are on opposite sides of the plane), we simply move the particle along the plane's normal until it is back on the correct side of the plane, adding a small offset
to prevent it from colliding immediately again.
We extended this to triangles collisions as well. We use the Moller-Trumbore algorithm to identify the time of collision as well as the barycentric coordinates of the point of intersection, treating the particle as a ray originating at its current position and moving in the direction of its predicted position. If we see the particle does pass through the plane in this timestep, we check if the barycentric coordinates lie within the triangle body, and if so, we use the same collision procedure as for planes save for pushing the particle up from the collision point, not from the projection of the predicted particle to the triangle surface, mostly for programming ease reasons. This unfortunately leads particles to stick to triangle faces, but it is relatively simple to swap this collision code out for that of the planes.
As an extra, we added sphere collisions as well. Again we check if a particle ends up on opposite sides of the sphere during a timestep. If so, we simple take the particles predicted position and project it along a sphere normal to the surface of the sphere. Depending on the direction of travel, we give the predicted position a small offset of
in the appropriate direction.
Because particles all look the exact same, and copying vertices to the GPU to render is an expensive operation, we can use instancing to copy over all particle information once and draw once per render operation. This required us to create our own particle VAO, containing the necessary information from which triangles could be constructed and drawn. This was done by sampling over the sphere over longitude and latitude lines. All particles are drawn using this VAO, with an additional offset parameter passed into the shader to differentiate them. The following shader then was used for particles:
#version 330 corelayout (location = 0)in vec3 pos;layout (location = 1)in vec3 n;layout (location = 2)in vec3 offset;out vec3 normal;out vec3 frag_pos;uniform mat4 view;uniform mat4 projection;void main() { gl_Position = projection * view * vec4(pos + offset, 1.0f); normal = n; frag_pos = pos + offset; }
Because the entire array of particle structs is passed in, in the fragment shader, we can also read additional inputs such as the particle's velocity, curl, neighbors, etc. for additional fun shaders. Because these shaders run in parallel on the GPU, lighting effects such as Blinn-Phong shading and drawing can then be done in parallel, increasing the number of particles we could handle and aesthetic features we could add to the model.
While rendering could now be done in parallel, it's hard to escape that the update algorithm itself is already embarrassingly parallel. We decided to be able to increase our particle counts, we would attempt to use CUDA to move the calculation of the algorithm itself on the GPU as well. Integrating CUDA with our current project's CMake files was a bit of a struggle due to our project structure, but thankfully the Nvidia forums are quite active. Upon compilation, we did not realize that we could not use the standard library on GPU, meaning we had to rethink our particle model, at which point was relying on vectors to keep track of particle neighbors, stored as particle pointers. We made the decision then to cap the number of neighbors a particle could have to 50 and create an array of ints to store neighbors instead. This array only stores the index of the neighbor in a shared array of all particles. An additional variable to keep track of the number of neighbors the particle actually has was also added to allow for safe iteration over the neighbors list. The new particle struct was then created:
struct Particle {vec3 position;vec3 predicted_position;vec3 delta_position;vec3 velocity;vec3 omega;float mass;float lambda;float rho;int neighborhood[MAX_NEIGHBORS];int num_neighbors; }
The other major changes to note are the removal of the force members, since we decided to modify velocity on demand rather than storing a forces in the particle. The addition of the lambda, rho, and omega terms were to avoid recalculating them every time they were needed. We included these terms in the particles which are all passed into the GPU during the parallelized update along with the neighborhood and num_neighbors members. Collectively, these allow us to limit the number of neighbors a particle can have and and to access them and all their relevant attributes only using array indices.
Because our spacial map relies on the standard library's hash map data structure, and we did not have the time to build a GPU compatible spacial map nor could we find one already built, we unfortunately had to keep the spacial map creation on the host, despite it's potential for parallelization. Particles are first copied back to build the spacial map between the force accumulation step and solving for incompressibility constraints. This parallelization however still allowed us to increase the particle counts in our scenes from only around 10k on CPU to 200k on GPU. We find that at 150k particles, we still get approximately 1-2 frames per second depending on the chosen kernel size and number of solver iterations.
Porting to CUDA also lead to an interesting bug that was difficult to diagnose. We had originally implemented planes as a class, but porting to CUDA would have meant we would have needed to modify our files and functions and compile them to .cu files, which would be annoying given the way we had structured the program. We instead just replaced our implementations with structs and then add a collision function with the rest of our CUDA kernels. The would lead the simulations to blow up and again particles would be gaining large velocities. Thankfully, after looking around the scenes for a while, it was noticed that particles hitting the forward wall were the ones to first become overexcited and found that that plane's normal vector was unnormalized, which originally happened in the constructor of the plane.
In writing the new rendering engine after moving away from the starter code, we were able to learn about many cool features that we felt would make our renders less boring. Most of the following come from tutorials on the amazing learnopengl.com
Originally, we had attempted to mimic the camera from projects 2 and 3, however the dimensions of the camera were always off and we didn't quite understand the camera code and openGL binding matrices. Switching to the shader dominant rendering we have now seemed to make the pipeline simpler as now we only had to worry about binding the view matrices at a single point in the pipeline. This allowed us to implement a camera with more freedom of movement and interact with it's produced matrix directly. This camera was useful in debugging and setting up scenes with a moving camera.
The single color clear got a bit boring after a while, so we went ahead and added a skybox. This made the scenes feel much larger than they were and went well with our fragment shader colors. Paired with moving the camera during, the skybox gives the illusion of complexity to our renders. This was done by creating a simple VAO for a cube and rendering it with perspective set to infinity.
We wanted to replicate the photo given the Muller, Macklin paper of the bunny taking a shower. We found a low polygon count bunny (as we didn't have an acceleration structure in place for triangle collisions) and wrote some mesh loading code with the help of a obj parser from Assimp. Upon receiving the list of vertices and indices, we parsed these ourselves to create a new vector of triangles to the scene, which the particles would also collide with when checking for collisions. Because this wasn't a main priority, we stuck with just using the barycentric coordinates of the point of intersection as the point from which we would place the particle after collisions instead of the algorithm used for planes, causing particles to stick on the triangles after colliding.
Originally, we had planned to make it our goal to implement the ellipsoid splattering technique used in the Macklin paper, but we looked at the math and didn't really understand any of it so we thought our time would be better spent porting to CUDA. When going through the openGL tutorials, the skybox tutorial also had a refractive shader tutorial, so we went ahead and added it in as a poor man's substitute. The results were surprisingly satisfactory at large particle counts.
We had to write our own lighting code to work with our shaders without the use of GLUT. Doing so really helped to enforce and clear up some of the confusion from the shaders in project 2. This gave us direct control over lighting position, feel, material, and strength. We wrote another shader to implement the Blinn-Phong shader model and added specular highlights to our water shader. Lights are rendered as small cubes. Unfortunately, we couldn't find resources on implementing bloom before the final presentation deadline.
In rewriting the renderer code from scratch and learning how the renders from prior projects worked, we learned a great deal about writing visual applications. At the largest level, we learned how the openGL pipeline works, from initialization, to binding of the projection, model, and view matrices, to creating VAO's, EBO's, and VBO's, to writing GLSL shaders. The learnopengl tutorials also gave many small tips that were fun and interesting to think about how to utilize openGL to achieve specific effects. Finally just getting used to openGL's stack based call system was crucial to speeding up future development.
Using CUDA to parallelize, we learned a useful skill in how to write programs for the GPU, building with CMake, and integrating with existing projects. This made us more conscious about the memory we were using and looked for ways to balance repeating function calls with storing too much memory to copy to and from the GPU.
On the side, this has also taught us about building and maintaining large projects. CMake made working both at home and in the lab relatively painless as it handled OS rules. CMake also made adding dependencies and linking the many libraries we used relatively easy as well, able to split the project into subdirectories under the example of the CMake files from our previous projects.
We also learned for our future endeavors to build in I/O argument support for loading project parameters. We neglected that as neither of us has much experience with IO in cpp, but meant every time we wished to tweak the scene for a new render, we needed to recompile the project, though with CMake, the wait time was not too bad.
This project also gave a different look at graphics programs than the other projects from class, which largely focussed on the math. Here we were encouraged to experiment and just play around, adding meaningless but fun things like skyboxes or colorful shaders. It ended bringing up many ideas for new projects that would be exciting to try out!
The poly6 kernel was used for density adjustments as in Muller et al.
\(W_{poly6}(\textbf{r}, h) = \frac{315}{64\pi h^9}(h^2 - |\textbf{r}|^2)^3\)
The spiky kernel was used for all gradient kernels
\(W_{spiky}(\textbf{r}, h) = \frac{15}{\pi h^6}(h - |\textbf{r}|)^3\)
\(\nabla_{\textbf{p}}W_{spiky}(\textbf{r}, h) = \frac{45}{\pi h^6}(h-|\textbf{r}|)^2\hat{r}\)