## Current Projects:

### SPIKE GPU - An Implementation of a Recursive Divide-and-Conquer Parallel Strategy for Solving Large Systems of Linear Equations

This project proposes to investigate, produce, and maintain a methodology and its software implementation that leverage emerging heterogeneous hardware architectures to solve billion-unknowns linear systems in a robust, scalable, and efficient fashion. The two classes of problems targeted under this project are banded dense and sparse general linear systems. Preliminary results suggest that the adopted methodology displays a good strong-scaling attribute and its early implementation, called SPIKE, is one order of magnitude faster than competitive software solutions.Ang Li, Radu Serban, Dan Negrut

### Coupled Fluid-Flexible Body Investigation Using Chrono::Fluid

The interaction of fluid-flexible bodies was studied via a Lagrangian-Lagrangian framework, relying on Smoothed Particle Hydrodynamics, a general 3D rigid body dynamics, and an Absolute Nodal Coordinate Formulation (ANCF). The dynamics of the two phases, fluid and solid, are coupled with the help of Lagrangian markers, referred to as Boundary Condition Enforcing (BCE) markers which are used to impose no-slip and impenetrability conditions. Such BCE markers are associated both with the solid suspended particles and with any confining boundary walls and are distributed in a narrow layer on and below the surface of solid objects. The ensuing fluid-solid interaction forces are mapped into generalized forces on the rigid and flexible bodies and subsequently used to update the dynamics of the solid objects according to rigid body motion or ANCF method. The robustness and performance of the simulation algorithm is demonstrated through several numerical simulation studies.##### Videos:

Immersed Flexible Beams in Impulsively Started Channel FlowSPH-ANCF Model of Polymer Particles in Channel Flow

Arman Pazouki, Radu Serban, Dan Negrut

### Characterization of Xeon Phi with Linear Algebra Workloads

The efforts behind this independent study are to analyze how well suited Xeon Phi is for some frequently used linear algebra routines such as factorization and solvers. We are working with Intel MKL 11.1 on Xeon Phi based on KNC (MIC) architecture. The workloads under study include factorization and solving of dense and banded systems. Specifically, we are investigating the potential vectorization opportunities for such routines. The goal is to make use of all such opportunities that can help design hybrid banded spike-based solver.Omkar Deshmukh, Dan Negrut

### Performance Analysis of CULA on different NVIDIA GPU Architectures

The CULA is a next generation linear algebra package that uses the GPU as a co-processor to achieve speedups over existing linear algebra packages. CULA supports matrix inversion operation which helps in solving and factorizing the linear algebra matrices. The performance and actual speed ups of CULA depends heavily on the algorithm and the size of the data set. Additionally, the performance also varies with the GPU memory available for performing the computation, which varies with different flavors of NVIDIA GPU cards. This feature can be potentially explored by using the device interface model of CULA. So, the performance analysis in terms of GFLOPS can be done on Fermi, Tesla as well as Kepler Architectures. The performance analysis will involve running different applications on CULA dense R17. This study is important as it will reflect the advantages of using a particular architecture for getting optimized performance for Spike GPU solver.Prateek Gupta, Dan Negrut

### Performance Comparison Study between Nvidia Fermi and Kepler Architecture

A comparative study between Nvidia Fermi and Kepler architectures are being undertaken. The key aspects being targeted are performance scaling for computational kernels like tiled matrix multiplication, memory transfer behavior, gains using streaming and performance difference observed when using THRUST library. Variation of execution configuration, working data set and higher occupancy on individual architectures would be exercised.Contributors: Arindam Sinha, Dan Negrut

### Selective Laser Sintering Simulation Using Chrono::Engine

This project presents an effort to use physics based simulation techniques to model the Selective Laser Sintering (SLS) layering process. SLS is an additive manufacturing process that melts thin layers of extremely fine powder; we use powder with an average diameter of 58 microns. In the numerical model, each powder particle is a discrete object with 632,000 objects used for the SLS layering simulation. We first performed an experiment to measure the angle of repose for the polyamide 12 (PA 650) powder used in the SLS process. This measurement was used to determine the correct friction parameters and calibrate the numerical model. Once calibrated, initial simulations for the SLS layering process were performed to measure the changes in the surface profile of the powder. Future work will study the effect that different powders and roller speeds have on the surface roughness of a newly deposited powder layer along with determining the changes to density and porosity in the final part.##### Videos:

SLS Layering VideoSLS Angle Of Repose Video

Contributors: Hammad Mazhar, Endrina Forti, Jonas Bollmann

Prof. Tim Osswald, Prof. Dan Negrut

### Robot Walking on Granular Terrain

The simulation consists of a six-legged robot which walks over granular terrain. This project begins to model the experiment presented in A Terradynamics of Legged Locomotion on Granular Media by Li, Zhang, and Goldman. Currently, the terrain is composed of 10,000 spheres.Francisco Mercado

### Cross-sectional Pattern in Mixing of Granular Material

This simulation is modeled after Spontaneous chaotic granular mixing by Shinbrot, Alexander, and Muzzio. A mixing barrel is filled with sand-sized particles with different colored particles upstream and downstream. The fractal pattern that emerges in the cross-section is observed at each quarter turn of the barrel in the article but can be observed throughout our simulation.Francisco Mercado

### A Multibody Dynamics-Enabled Mobility Analysis Tool for Military Applications

This project demonstrates a modeling, simulation, and visualization framework aimed at enabling physics-based analysis of ground vehicle mobility. This framework, called Chrono, has been built to leverage parallel computing both on distributed and shared memory architectures. Chrono is both modular and extensible. Modularity stems from the design decision to build vertical applications whose goal is to reduce the end-to-end time from vision-to-model-to-solution-to-visualization for a targeted application field. The extensibility is a consequence of the design of the foundation modules, which can be enhanced with new features that benefit all the vertical applications. Two factors motivated the development of Chrono. First, there is a manifest need of modeling approaches and simulation tools to support mobility analysis on deformable terrain. Second, the hardware available today has improved to a point where the amount of sheer computer power, the memory size, and the available software stack (productivity tools and programming languages) support computing on a scale that allows integrating highly accurate vehicle dynamics and physics-based terramechanics models. Although commercial software is available nowadays for simulating vehicle and tire models that operate on paved roads; deformable terrain models that complement the fidelity of present day vehicle and tire models have been lacking due to the complexity of soil behavior. This project demonstrates Chrono's ability to handle these difficult mobility situations through several simulations, including: (i) urban operations, (ii) muddy terrain operations, (iii) gravel slope operations, and (iv) river fording.Daniel Melanz, Hammad Mazhar, Dan Negrut

### Chrono::Render A Purpose Rendering Capability of Large-Scale Simulation Data

As simulations grow in complexity the data extracted from the model grows in size. For engineers and scientists, it is difficult and tedious to gain meaningful insights for large data sets; hence visualization becomes critical to computer simulation since it provides a more natural means to extract the salient information of abstruse data. Additionally, visualization makes it easier to share and communicate the content of a simulation leading to wider interest and understanding of its results. Therefore, we have been developing a rendering pipeline called Chrono::Render which provides a simple means to efficiently create high quality renderings of arbitrary data. The pipeline uses the open source Blender modeling software as the front end via a plugin. The data is then passed to the Simulation Based Engineering Lab's Euler server via a web interface to render with Pixar's PhotoRealistic RenderMan (PRMan) or an open source alternative such as Aqsis or Pixie.Contributors: Daniel Kaczmarek, Aaron Bartholomew

### Terramechanics Methods for Real-time Off-Road Vehicle Mobility Simulation on Deformable Terrain

By extending semi-analytical Terramechanics methods for general three-dimensional tire and terrain geometries and combining it with a deformable compaction-based terrain model, general purpose tire/terrain mobility scenarios can be simulated. A vertical application was then created with this framework that combines a multibody vehicle in CHRONO::Rigid with the physics-based, 3-D deformable terrain database of CHRONO::Terrain. Using representative suspension hardpoints, spring/damper rates and accurate mass/inertia information, a representative HMMWV vehicle model was developed. Contact patch force models were developed by extending semi-analytical terramechanics approaches to the general, 3-D case. Leveraging High Performance Computing in the form of parallel CPUs and GPUs enables real-time vehicle mobilty to be realized, which enables operator-in-the-loop simulations.Contributors: Justin Madsen, Andrew Seidl, Dan Negrut - UW Madison

Prof. Paul Ayers, University of Tennessee-Knoxville

### Compaction-Based Terrain Model for Soft Soil Off-Road Vehicle Mobility Simulations

In an effort to support general 3-D vehicle mobility on non-flat terrain, CHRONO::Terrain is a deformable terrain database system that allows for the terrain surface to be described on both macro- and micro-scale resolutions. Inspired by previous work that used a combination of global low-resolution surface elements with localized high-frequency B-Splines to add "bumpiness", it is possible to capture slopes, hills and walls, as well as give the driver the appearance of bumpy, non-flat off-road terrain. A soil-compression model tracks the 3-D stress/strain due to vehicle loads, and the terrain surface deforms according to a visco-elastic-plastic approach that considers effects of generalized 3-D tire and terrain geometries.Contributors: Justin Madsen, Andrew Seidl, Dan Negrut - UW Madison

Prof. Paul Ayers, George Bozdech, University of Tennessee-Knoxville

Jeff Freeman, Ford Cook-MechSim Inc.

### Implementation of an Index-3 Differential-Algebraic Equation Solver on Parallel Architecture

The Absolute Nodal Coordinate Formulation (ANCF) has been widely used to carry out the dynamics analysis of flexible bodies that undergo large rotation and large deformation. This formulation is consistent with the nonlinear theory of continuum mechanics and is computationally more efficient compared to other nonlinear finite element formulations. Kinematic constraints that represent mechanical joints and specified motion trajectories can be introduced to make complex flexible mechanisms. As the complexity of a mechanism increases, the system of differential algebraic equations becomes very large and results in a computational bottleneck. This project helps alleviate this bottleneck using three tools: (1) an implicit time-stepping algorithm, (2) fine-grained parallel processing on the Graphics Processing Unit (GPU), and (3) enabling parallelism through a novel Constraint-Based Mesh (CBM) approach. The combination of these tools results in a fast solution process that scales linearly for large numbers of elements, allowing meaningful engineering problems to be solved.Daniel Melanz, Radu Serban, Ang Li, Dan Negrut

## Past Projects:

### Power Performance Scaling Analysis of Computational Kernels using CUDA

A power performance scaling analysis for matrix multiplication, matrix transpose and fast Fourier transform CUDA kernels using different optimization techniques was undertaken to correlate the overall power usage. The effect of varying execution configurations and working set sizes on NVIDIA K20X device was captured using NVML API provided by NVIDIA. In due course of the project, a kernel independent, pluggable code structure was designed by compute the power concurrently with kernel execution using OpenMP. Kernel optimizations observed for power behaviour included tiling using shared memory, memory bank conflict free access, pinned memory on host, variation in granularity and special cases like FFT using reduction operation. The result exposed a favourable working set size for each kernel, which completes the computation in most power efficient way amongst the implemented kernel variations and optimizations.Contributors: Arindam Sinha, Prateek Gupta

### Metronome Synchronization

Metronomes tuned to the same frequency but initialized out of sync can self-synchronize if they are placed on a common base that is free to translate. Video recordings of this phenomenon can be found in many places on the internet. This interactive simulation allows the user to start each metronome individually and then release the base to allow synchronization of the coupled oscillators. Based on the amount of damping in the metronomes the level of synchronization will vary. With a small amount of damping, the possibility of symmetric synchronization is high; this is when n metronomes have uniformly distributed phase shifts of lambda/n. As damping increases the likelihood of complete synchronization (all metronomes ticking in unison) increases.Francisco Mercado

### Simulation and Validation of Particle Suspension Using Chrono::Fluid

We employ a Lagrangian-Lagrangian (LL) numerical formalism to study two- and three-dimensional (2D, 3D) pipe flow of dilute suspensions of macroscopic neutrally buoyant rigid bodies at flow regimes with Reynolds numbers (Re) between 0.1 and 1400. A validation study of particle migration over a wide spectrum of Re and average volumetric concentrations demonstrates the good predictive attributes of the LL approach adopted herein. Using a scalable parallel implementation of the approach, 3D direct numerical simulation is used to show that (1) rigid body rotation affects the behavior of a particle laden flow; (2) an increase in neutrally buoyant particle size decreases radial migration; (3) a decrease in inter-particle distance slows down the migration and shifts the stable position further away from the channel axis; (4) rigid body shape influences the stable radial distribution of particles; (5) particle migration is influenced, both quantitatively and qualitatively, by the Reynolds number; and (6) the stable radial particle concentration distribution is affected by the initial concentration. The parallel LL simulation framework developed herein does not impose restrictions on the shape or size of the rigid bodies and was used to simulate 3D flows of dense, colloidal suspensions of up to 30,000 neutrally buoyant ellipsoids.##### Videos:

Rigid Body Suspension in Channel FlowContributors: Arman Pazouki, Dan Negrut

### A Parallel GPU Implementation of the Absolute Nodal Coordinate Formulation

With a Frictional/Contact Model for the Simulation of Large Flexible Body Systems This contribution discusses how a flexible body formalism, specifically, the Absolute Nodal Coordinate Formulation (ANCF), is combined with a frictional/contact model using a continuous contact force model to address many-body dynamics problems; i.e., problems with hundreds of thousands of rigid and deformable bodies. Since the computational effort associated with these problems is significant, the analytical framework is implemented to leverage the computational power available on today’s commodity Graphical Processing Unit (GPU) cards. The code developed is validated against ANSYS and FEAP results. The resulting simulation capability is demonstrated in conjunction with hair simulation.Contributors: Naresh Khude, Daniel Melanz, Dan Negrut