Short Courses of the 25th Regional School of High Performance Computing from Southern Brazil
Keywords:
Heterogeneous Parallel Programming, General-Purpose Graphics Processing Units (GPGPUs), C++, Application Parallelization, OpenMP, OpenACC, Multicore Processors, Accelerators, CUDA, OpenCL, SYCLSynopsis
The Short Courses of the 25th Regional School of High Performance Computing from Southern Brazil (ERAD/RS) presents the contributions of Brazilian researchers in the field of parallel computing. Comprising three chapters, all focused on high performance computing, the book features converging topics primarily centered around programming APIs for accelerators.
In the first chapter, titled “High-Performance Programming on GPUs with C++”, the author provides an overview of parallel programming APIs for GPUs in C++, ranging from low-level solutions such as CUDA and OpenCL to more portable options like OpenMP and SYCL.
In the second chapter, “Parallel Programming Directives”, the authors introduce the parallel programming APIs OpenMP and OpenACC. These are considered simpler methods for producing parallel operations, offering an accessible entry point to parallelization mechanisms.
In the third chapter, titled “Advanced Multi-GPU Programming with OpenACC”, the authors focus on using the OpenACC API for programming with accelerators, including aspects of heterogeneity.
In addition to these minicourses, for which we present the authors’ texts, ERAD/RS also featured other minicourses that were presented exclusively at the event.
Chapters
-
1. High-Performance Programming on GPUs with C++
-
2. Parallel Programming Directives
-
3. Advanced Multi-GPU Programming with OpenACC
Downloads
References
BARLAS, G. Multicore and GPU Programming: An integrated approach. [S.l.]: Elsevier, 2014.
BIANCHINI, C. P.; COSTA, E. B.; SILVA, G. P. Programação paralela híbrida: Mpi + openmp offloading. In: LORENZON, A. F.; FAZENDA, A. L. (Ed.). Minicursos do SSCAD 2024. [S.l.]: Sociedade Brasileira de Computação, 2024. p. 45–67. DOI: 10.5753/sbc.16010.0.3
BREYER, M.; CRAEN, A. V.; PFLÜGER, D. A comparison of sycl, opencl, cuda, and openmp for massively parallel support vector machine classification on multi-vendor hardware. p. 1–12, 2022.
CHANDRASEKARAN, S.; JUCKELAND, G. OpenACC for Programmers: Concepts and Strategies. 1st. ed. [S.l.]: Addison-Wesley Professional, 2017. ISBN 0134694287.
CHOQUETTE, J. Nvidia hopper h100 gpu: Scaling performance. IEEE Micro, IEEE, 2023. DOI: 10.1109/MM.2023.3256796
DEAKIN, T.; MATTSON, T. G. Programming your GPU with openmp: Performance portability for gpus. 1. ed. [S.l.]: The MIT Press, 2023.
DEAKIN, T.; MCINTOSH-SMITH, S. Evaluating the performance of hpc-style sycl applications. p. 1–11, 2020. DOI: 10.1145/3388333.3388643
DOMINIAK, M. et al. P2300R10: std::execution. 2024. ISO/IEC JTC1/SC22/WG21. Available from Internet: [link].
ELSEBAKHI, E. et al. Large-scale machine learning based on functional networks for biomedical big data with high performance computing platforms. Journal of Computational Science, Elsevier, v. 11, p. 69–81, 2015. DOI: 10.1016/j.jocs.2015.09.008
HENNESSY, J. L.; PATTERSON, D. A. Computer architecture. Los Altos, CA (USA); Morgan Kaufman Publishers, Inc., 1990.
HENNESSY, J.; PATTERSON, D. A new golden age for computer architecture: domain-specific hardware/software co-design, enhanced. 2018.
HONG, S.; JANG, G.; JEONG, W.-K. Mg-fim: A multi-gpu fast iterative method using adaptive domain decomposition. SIAM Journal on Scientific Computing, SIAM, v. 44, n. 1, p. C54–C76, 2022. DOI: 10.1137/21M1414644
JABOT, C. A Universal Async Abstraction for C++. 2019. Accessed: 2025-03-26. Available from Internet: [link].
KIRK, D. B.; WEN-MEI,W. H. Programming massively parallel processors: a hands-on approach. [S.l.]: Morgan Kaufmann, 2016.
KLEMM, M.; COWNIE, J. High performance parallel runtimes: Design and implementation. 1. ed. [S.l.]: De Gruyter Oldenbourg, 2021.
LICHTENBELT, B. Latest news and features in OpenGL. 2012. Presented at SIGGRAPH 2012, Los Angeles, CA, USA. Available from Internet: [link].
LIMA, J. V. F.; SCHEPKE, C.; LUCCA, N. Além de Simplesmente: #pragma omp parallel for. In: CHARãO, A.; SERPA, M. da S. (Ed.). Minicursos da XXI Escola Regional de Alto Desempenho da Região Sul. Porto Alegre/RS: Sociedade Brasileira de Computação, 2021. chp. 4, p. 86–103. DOI: 10.5753/sbc.6150.4.4
LIN, W.-C.; DEAKIN, T.; MCINTOSH-SMITH, S. Evaluating iso c++ parallel algorithms on heterogeneous hpc systems. p. 36–47, 2022. DOI: 10.1109/PMBS56514.2022.00009
LUCCA, C. S. N. Programando Aplicações com Diretivas Paralelas. In: BOIS, M. C. A. D. (Ed.). Minicursos da XX Escola Regional de Alto Desempenho da Região Sul. Porto Alegre: Sociedade Brasileira de Computação, 2020. p. 89–104. ISBN DOI: 10.5753/sbc.4400.9.5.
MEI, X.; CHU, X. Dissecting gpu memory hierarchy through microbenchmarking. IEEE Transactions on Parallel and Distributed Systems, IEEE, v. 28, n. 1, p. 72–86, 2016. DOI: 10.1109/TPDS.2016.2549523
NESI, L. L. et al. Desenvolvimento de Aplicações Baseadas em Tarefas com OpenMP Tasks. In: CHARãO, A.; SERPA, M. (Ed.). Minicursos da XXI Escola Regional de Alto Desempenho da Região Sul. Sociedade Brasileira de Computação - SBC, 2021. p. 131–152. ISBN 9786587003504. DOI: 10.5753/sbc.6150.4.6.
NVIDIA. CUDA C++ Programming Guide. 2023. [link]. [Acessado em 19/01/2024].
NVIDIA. CUDA C++ Unified Memory Guide. 2023. [link]. [Acessado em 19/01/2024].
NVIDIA. NVIDIA H100 Tensor Core GPU Architecture. [S.l.], 2022. Available from Internet: [link].
OpenACC-Standard.org. OpenACC Programming and Best Practices Guide. OpenACCStandard. org, 2023. Available from Internet: [link].
OpenMP. OpenMP Application Program Interface Version 6.0. OpenMP Architecture Review Board, 2024. Available from Internet: [link].
OWENS, J. D. et al. Gpu computing. Proceedings of the IEEE, IEEE, v. 96, n. 5, p. 879–899, 2008. DOI: 10.1109/JPROC.2008.917757
OWENS, J. Gpu architecture overview. In: ACM SIGGRAPH 2007 courses. [S.l.: s.n.], 2007. p. 2–es. DOI: 10.1145/1281500.1281643
PICCHI, J.; ZHANG, W. Impact of l2 cache locking on gpu performance. p. 1–4, 2015. DOI: 10.1109/SECON.2015.7133036
PINTO, V. G.; NESI, L. L.; SCHNORR, L. M. Boas Práticas para Experimentos Computacionais de Alto Desempenho. In: BOIS, A. D.; CASTRO, M. (Ed.). Minicursos da XX Escola Regional de Alto Desempenho da Região Sul. Sociedade Brasileira de Computação - SBC, 2020. p. 1–19. ISBN 9786587003504. DOI: 10.5753/sbc.4400.9.1.
QUEVEDO, C. et al. An Empirical Study of OpenMP Directive Usage in Open-Source Projects on GitHub. In: Anais do XXV Simpósio em Sistemas Computacionais de Alto Desempenho. Porto Alegre, RS, Brasil: SBC, 2024. p. 144–155. ISSN 0000-0000. DOI: 10.5753/sscad.2024.244777
SCHEPKE, C.; LUCCA, N.; LIMA, J. V. F. Diretivas Paralelas de OpenMP: Um Estudo de Caso. In: PADOIN, E. L.; GALANTE, G.; RIGHI, R. (Ed.). Minicursos da XXIII Escola Regional de Alto Desempenho da Região Sul. Porto Alegre: Sociedade Brasileira de Computação, 2023. ISBN DOI: 10.5753/sbc.11938.7.1.
SILVA, G. P.; BIANCHINI, C. P.; COSTA, E. B. Programação Paralela e Distribuída com MPI, OpenMP e OpenACC para computação de alto desempenho. [S.l.]: CasaDoCodigo, 2022.
SILVA, H. U. da et al. Parallel OpenMP and OpenACC Porous Media Simulation. The Journal of Supercomputing, 2022. Available from Internet: DOI: 10.1007/s11227-022-05004-2.
The Khronos Group. OpenCL: The open standard for parallel programming of heterogeneous systems. 2025. Accessed: 2025-03-26. Available from Internet: [link].
