Abstract
Recent development of massively parallel processors such as graphical processing units (GPUs), has already proven to be very effective for a vast amount of scientic applications. One major benefit of the GPU, is that it is already a standard device in most affordable desktop computers. Thus, the privilege of high-performance parallel computing is now in principle accessible for many scientific users, no matter their economic resources. Though being highly effective units, GPUs and parallel architectures in general, pose challenges for software developers to utilize their efficiency. Sequential legacy codes are not always easily parallelized and the time spent on conversion might not pay o in the end. We present a highly generic C++ library for fast assembling of partial differential equation (PDE) solvers, aiming at utilizing the computational resources of GPUs. The library requires a minimum of GPU computing knowledge, while still oering the possibility to customize user-specic solvers at kernel level if desired. Spatial dierential operators are based on matrix free exible order nite dierence approximations. These matrix free operators minimize both memory consumption and main memory access, two important features for ecient GPU utilization and for enabling solution of large problems. In order to solve the large linear systems of equations, arising from the discretization of PDEs, the library includes a set of common iterative solvers. All iterative solvers are based on template arguments, such that vector and matrix classes, along with their underlying implementations, can be freely interchanged or new schemes developed without much coding eort. The generic nature of the library, along with a predened set of interface rules, allow us to set up the components for PDE solver through type binder denitions. We encourage this use of parameterized binding objects, as it allows the user to control the assembling of PDE solvers at a high abstraction level, without necessarily having to change internal code. We will illustrate the assembling of a tool using our library for fast and scalable simulation of fully nonlinear free surface water waves over uneven depths[1, 2, 3]. The wave model is based on the potential ow formulation, with the computational bottleneck of solving a fully three dimensional Laplace problem eciently. A robust h- or p-multigrid preconditioned defect correction method is applied to keep storage low and algorithmic eciency high. Performance analysis of the implemented wave model shows that performance is comparable to a dedicated (non-library version) reference GPU-based solver. Work in progress also address the problem of simulating water waves at very large scales. Therefore we added an MPI layer to support domain decomposition preconditioning using multiple GPUs. The wave tool is to be used for ecient analysis of both coastal engineering problems and interactive real-time computing of ship-wave problems. Such applications will benet well from high- performance software. We will report our recent progress on the development of the new tool for coastal engineering.