GPU-based Polynomial Finite Element Matrix Assembly for Simplex Meshes
Abstract
In this paper, we present a matrix assembly technique for arbitrary polynomial order finite element simulations on simplex meshes for graphics processing units (GPU). Compared to the current state of the art in GPU-based matrix assembly, we avoid the need for an intermediate sparse matrix and perform assembly directly into the final, GPU-optimized data structure. Thereby, we avoid the resulting 180% to 600% memory overhead, depending on polynomial order, and associated allocation time, while simplifying the assembly code and using a more compact mesh representation. We compare our method with existing algorithms and demonstrate significant speedups.
Supporting Information
Filename | Description |
---|---|
cgf13581-sup-0001-S1.pdf70.6 KB | Supplement Material |
Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.
References
- Anzt H., Dongarra J., Flegar G., Quintana-Ortí E. S.: Batched Gauss-Jordan elimination for block-Jacobi preconditioner generation on GPUs. In Proceedings of the 8th International Workshop on Programming Models and Applications for Multicores and Manycores (2017), PMAM'17, pp. 1–10. doi:10.1145/3026937.3026940. 6
- Bender J., Deul C.: Adaptive cloth simulation using corotational finite elements. Computers & Graphics 37, 7 (2013), 820–829. doi:10.1016/j.cag.2013.04.008. 2
- Babuška I., Guo B. Q.: The h, p and h-p version of the finite element method: Basis theory and applications. Advances in Engineering Software 15, 3–4 (1992), 159–174. doi:10.1016/0965-9978(92)90097-Y. 10
- Bell N., Hoberock J.: Thrust 1.8.1, Mar. 2015. URL: https://thrust.github.io/. 5, 6
- Brooks Jr. F. P.: The Mythical Man-month, anniversary ed. Addison-Wesley Longman Publishing Co., Inc., 1995. ISBN 0-201-83595-9. 2
- CGNS Steering Committee: CGNS standard interface data structures 3.3 (rev 2), 2018. Retrieved 2018-06-08. URL: https://cgns.github.io/CGNS_docs_current/sids/sids.pdf. 4
- Cecka C., Lew A. J., Darve E.: Assembly of finite element methods on graphics processors. International Journal for Numerical Methods in Engineering 85, 5 (2010), 640–669. doi:10.1002/nme.2989. 3
- Davis T.: SuiteSparse: A suite of sparse matrix packages, 2018. Retrieved 2018-06-08. URL: http://faculty.cse.tamu.edu/davis/suitesparse.html. 8
- Di Pietro D. A., Ern A.: Mathematical Aspects of Discontinuous Galerkin Methods. Springer Berlin Heidelberg, 2012. doi: 10.1007/978-3-642-22980-0. 4
10.1007/978‐3‐642‐22980‐0 Google Scholar
- Guo X., Lange M., Gorman G., Mitchell L., Weiland M.: Developing a scalable hybrid MPI/OpenMP unstructured finite element model. Computers & Fluids 110 (2015), 227–234. doi: 10.1016/j.compfluid.2014.09.007. 2
- Grinspun E.: The Basis Refinement Method. PhD thesis, California Institute of Technology, 2003. URL: https://thesis.library.caltech.edu/2324/. 10
- Hatcher A.: Algebraic Topology. Cambridge University Pr., 2002. ISBN 0-521-79540-0. 2, 3
- Hu Y., Zhou Q., Gao X., Jacobson A., Zorin D., Panozzo D.: Tetrahedral meshing in the wild. ACM Transactions on Graphics 37, 4 (2018), 60:1–60:14. doi:10.1145/3197517.3201353. 2
- Irving G., Schroeder C., Fedkiw R.: Volume conserving finite element simulations of deformable models. ACM Transactions on Graphics 26, 3 (2007), 13:1–13:6. doi:10.1145/1276377.1276394. 2
- Koschier D., Lipponer S., Bender J.: Adaptive tetrahedral meshes for brittle fracture simulation. In Proceedings of the 2014 ACM SIGGRAPH/Eurographics Symposium on Computer Animation (2014), SCA ‘14, pp. 57–66. 2
- Komatitsch D., Michéa D., Erlebacher G.: Porting a high-order finite-element earthquake modeling application to NVIDIA graphics cards using CUDA. Journal of Parallel and Distributed Computing 69, 5 (2009), 451–460. doi:10.1016/j.jpdc.2009.01.006. 2
- Kranz J.: Methodik und Richtlinien für die Konstruktion von laseradditiv gefertigten Leichtbaustrukturen. Light Engineering für die Praxis. Springer Vieweg, 2017. doi:10.1007/978-3-662-55339-8. 1
10.1007/978‐3‐662‐55339‐8 Google Scholar
- Liu Y., Jiao S., Wu W., De S.: GPU accelerated fast FEM deformation simulation. In 2008 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS) (2008), pp. 606–609. doi:10.1109/APCCAS.2008.4746096. 1
- Liu W., Vinter B.: A framework for general sparse matrixmatrix multiplication on GPUs and heterogeneous processors. Journal of Parallel and Distributed Computing 85 (2015), 47–61. doi:10.1016/j.jpdc.2015.06.010. 3, 7, 10
- Mueller-Roemer J. S., Altenhofen C., Stork A.: Ternary sparse matrix representation for volumetric mesh subdivision and processing on GPUs. Computer Graphics Forum 36, 5 (2017), 59–69. doi:10.1111/cgf.13245. 2, 5
- Merill D.: CUB 1.8.0, Feb. 2018. URL: https://nvlabs.github.io/cub/. 5, 6
- Müller M., Gross M.: Interactive virtual materials. In Proceedings of Graphics Interface 2004 (2004), GI ‘04, pp. 239–246. 2
- NVIDIA Corporation: CUDA C programming guide, May 2018. Version 9.2. URL: http://docs.nvidia.com/cuda/pdf/CUDA_C_Programming_Guide.pdf. 5
- Reguly I. Z., Giles M. B.: Finite element algorithms and data structures on graphical processing units. International Journal of Parallel Programming 43, 2 (2015), 203–239. doi:10.1007/s10766-013-0301-6. 3
- Thébault L., Petit E., Dinh Q.: Scalable and efficient implementation of 3D unstructured meshes computation: A case study on matrix assembly. In Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (2015), PPoPP 2015, pp. 120–129. doi:10.1145/2688500.2688517. 3
- Vázquez F., Ortega G., Fernández J. J., Garzón E. M.: Improving the performance of the sparse matrix vector product with GPUs. In Proceedings of the 2010 10th IEEE International Conference on Computer and Information Technology (Bradford, United Kingdom, 2010), CIT ‘10, pp. 1146–1151. doi:10.1109/CIT.2010.208. 6
- Weber D., Bender J., Schnoes M., Stork A., Fellner D.: Efficient GPU data structures and methods to solve sparse linear systems in dynamics applications. Comput. Graphics Forum 32, 1 (2013), 16–26. doi:10.1111/j.1467-8659.2012.03227.x. 2, 5, 10
- Willberg C., Duczek S., Vivar Perez J. M., Schmicker D., Gabbert U.: Comparison of different higher order finite element schemes for the simulation of Lamb waves. Computer Methods in Applied Mechanics and Engineering 241–244 (2012), 246–261. doi:10.1016/j.cma.2012.06.011. 3
- Weber D., Kalbe T., Stork A., Fellner D., Goesele M.: Interactive deformable models with quadratic bases in Bernstein-Bézier-form. The Visual Computer 27, 6 (2011), 473–483. doi:10.1007/s00371-011-0579-6. 2, 3
- Weber D., Mueller-Roemer J. S., Altenhofen C., Stork A., Fellner D.: Deformation simulation using cubic finite elements and efficient p-multigrid methods. Computers & Graphics 53, 2 (2015), 185–195. doi:10.1016/j.cag.2015.06.010. 3, 7
- Zhu B., Lee M., Quigley E., Fedkiw R.: Codimensional non-Newtonian fluids. ACM Transactions on Graphics 34, 4 (2015), 115:1–115:9. doi:10.1145/2766981. 10
- Zhu B., Quigley E., Cong M., Solomon J., Fedkiw R.: Codimensional surface tension flow on simplicial complexes. ACM Transactions on Graphics 33, 4 (2014), 111:1–111:11. doi:10.1145/2601097.2601201. 10
- Zayer R., Steinberger M., Seidel H.: A GPU-adapted structure for unstructured grids. Computer Graphics Forum 36, 2 (2017), 495–507. doi:10.1111/cgf.13144. 2, 3, 5, 7, 10
- Zayer R., Steinberger M., Seidel H.: Sparse matrix assembly on the GPU through multiplication patterns. In 2017 IEEE High Performance Extreme Computing Conference (HPEC) (2017), pp. 1–8. doi:10.1109/HPEC.2017.8091057. 3, 7, 8, 9, 10
- Zienkiewicz O. C., Taylor R. L.: Finite Element Method: Volume 1, fifth ed. Butterworth-Heinemann, 2000. ISBN 0-750-65049-4. 3