Short Courses of SSCAD 2024

Authors

Arthur F. Lorenzon (ed)
UFRGS
Álvaro Luiz Fazenda (ed)
UNIFESP

Keywords:

Big Data, Machine Learning, Myriad Interface, HPCC Systems, Profiling, Scalability, Parallel Applications, Parallel Scalability Suite, Parallel Programming, MPI, OpenMP Offloading, Architectural Simulation, gem5, Quantum Computing, IBM/Qiskit, Program Analysis and Optimization

Synopsis

This edition of the Short Courses of SSCAD 2024 features six short courses presented during the 25th Brazilian Symposium on High Performance Computing Systems, held from October 23 to 25, 2024, in São Carlos, SP, Brazil. The first chapter delves into essential concepts of processing and analyzing massive data volumes, employing machine learning algorithms practically on the HPCC (High-Performance Computing Cluster) platform. The second chapter introduces readers to the Parallel Scalability Suite, enabling the evaluation of parallel applications' behavior through profiling and scalability visualization. The third chapter explores hybrid parallel programming techniques adhering to the MPI and OpenMP Offloading standards, with an emphasis on accelerator-based parallelism models. In the fourth chapter, fundamental concepts of architectural simulation with the gem5 simulator are introduced. It also examines how simulation enables designers to explore, verify, and optimize architectures by modeling their behavior and interactions with key system components. Considering advancements in quantum computing, Chapter 5 demonstrates how to develop algorithms for a quantum computing architecture using the IBM/Qiskit development kit. Finally, the sixth chapter examines definitions of performance analysis, key techniques employed, and some tools for analyzing the performance of parallel applications.

Chapters

Downloads

Download data is not yet available.

References

Amdahl, G. M. (1967) Validity of the single processor approach to achieving large scale computing capabilities. In Proceedings of the April 18-20, 1967, Spring Joint Computer Conference (pp. 483–485). Association for Computing Machinery.

AMDAHL, G. M. Validity of the single processor approach to achieving large scale computing capabilities. In: Proceedings of the April 18-20, 1967, Spring Joint Computer Conference. New York, NY, USA: Association for Computing Machinery, 1967. (AFIPS ’67 (Spring)), p. 483–485. ISBN 9781450378956. DOI: 10.1145/1465482.1465560.

Arrighi, P. (2019). An overview of quantum cellular automata.

Arute, F., Arya, K., Babbush, R., Bacon, D., Bardin, J. C., Barends, R., Biswas, R., Boixo, S., Brandao, F. G. S. L., Buell, D. A., Burkett, B., Chen, Y., Chen, Z., Chiaro, B., Collins, R., Courtney, W., Dunsworth, A., Farhi, E., Foxen, B., Fowler, A., Gidney, C., Giustina, M., Graff, R., Guerin, K., Habegger, S., Harrigan, M. P., Hartmann, M. J., Ho, A., Hoffmann, M., Huang, T., Humble, T. S., Isakov, S. V., Jeffrey, E., Jiang, Z., Kafri, D., Kechedzhi, K., Kelly, J., Klimov, P. V., Knysh, S., Korotkov, A., Kostritsa, F., Landhuis, D., Lindmark, M., Lucero, E., Lyakh, D., Mandrà, S., McClean, J. R., McEwen, M., Megrant, A., Mi, X., Michielsen, K., Mohseni, M., Mutus, J., Naaman, O., Neeley, M., Neill, C., Niu, M. Y., Ostby, E., Petukhov, A., Platt, J. C., Quintana, C., Rieffel, E. G., Roushan, P., Rubin, N. C., Sank, D., Satzinger, K. J., Smelyanskiy, V., Sung, K. J., Trevithick, M. D., Vainsencher, A., Villalonga, B., White, T., Yao, Z. J., Yeh, P., Zalcman, A., Neven, H., and Martinis, J. M. (2019). Quantum supremacy using a programmable superconducting processor. Nature, 574(7779):505–510.

B N Chandrashekhar and H A Sanjay. Performance analysis of sequential and parallel programming paradigms on cpu-gpus cluster. In 2021 Third International Conference on Intelligent Communication Technologies and Virtual Mobile Networks (ICICV), pages 1205–1213, 2021. DOI: 10.1109/ICICV50876.2021.9388469.

Baczyk, M. (2024). Shall you buy a quantum computer today? - analysis of qc on-premise deployments. Quantum Computing Report. Accessed: 2024-08-08.

Binkert, N., Beckmann, B., Black, G., Reinhardt, S. K., Saidi, A., Basu, A., Hestness, J., Hower, D. R., Krishna, T., Sardashti, S., Sen, R., Sewell, K., Shoaib, M., Vaish, N., Hill, M. D., and Wood, D. A. (2011). The gem5 simulator. SIGARCH Comput. Archit. News, 39(2):1–7.

Binkert, N., Dreslinski, R., Hsu, L., Lim, K., Saidi, A., and Reinhardt, S. (2006). The m5 simulator: Modeling networked systems. IEEE Micro, 26(4):52–60.

Black, G., Binkert, N., Reinhardt, S. K., and Saidi, A. (2010). Modular ISA-Independent Full-System Simulation, pages 65–83. Springer US, Boston, MA.

CHEN, K. Performance Evaluation by Simulation and Analysis with Applications to Computer Networks. [S.l.]: John Wiley & Sons, Ltd, 2015. 286 p. ISBN 9781119006190.

Cornell Virtual Workshop. MPI Calls Among Threads. Technical report, Cornell University, 2024. [link].

CSC – IT Center for Science Ltd. Hybrid CPU programming with OpenMP and MPI. Technical report, CSC – IT Center for Science Ltd, 2022. [link].

Deutsch, D. (1985). Quantum theory, the Church–Turing principle and the universal quantum computer. Proc. R. Soc. Lond., 400(1818):97–117.

Documentation: LearningTrees Documentation. Disponível em: [link] Acesso em: 25 set. 2024.

Documentation: PBblas Documentation. Disponível em: [link] Acesso em: 25 set. 2024.

ECL-ML Machine Learning Module. Disponível em: [link] Acesso em: 25 set. 2024.

Feynman, R. P. (1982). Simulating physics with computers. International Journal of Theoretical Physics, 21(6-7):467–488.

Gabriel P. Silva, Calebe P. Bianchini, and Evaldo B. Costa. Programação Paralela e Distribuída com MPI, OpenMP e OpenACC para computação de alto desempenho. CasaDoCodigo, 2022.

Gamberi, G. P. and Bianchini, C. P. (2023). Study of quantum algorithms and their implementations. In 2023 International Conference on Electrical, Communication and Computer Engineering (ICECCE), pages 1–6.

gem5 (2022). gem5: Execution Basics. [link]. [Accessed 23-09-2024].

Gem5 Project (2024). Getting started with gem5. [link]. Accessed: 2024-10-02.

Grover, L. K. (1996). A fast quantum mechanical algorithm for database search.

GUSTAFSON, J. L. Reevaluating amdahl’s law. Commun. ACM, Association for Computing Machinery, New York, NY, USA, v. 31, n. 5, p. 532–533, may 1988. ISSN 0001-0782. DOI: 10.1145/42411.42415.

GWT-TUD GmbH. Vampir 10.5. 2024. Accessed: 2024-09-13. Disponível em: [link].

Hermes Senger and Jaime Freire de Souza. Programe sua GPU com OpenMP. Technical report, ERAD/RS 2022, 2022. [link].

HOLLINGSWORTH, J.; MILLER, B.; CARGILLE, J. Dynamic program instrumentation for scalable performance tools. In: Proceedings of IEEE Scalable High Performance Computing Conference. [S.l.: s.n.], 1994. p. 841–850.

Holly Judge and Mark Bull. Understanding Hybrid MPI + OpenMP Performance. Technical report, EPCC, University of Edinburgh, 2022. [link].

HPCC Systems Machine Learning Library. Disponível em: [link] Acesso em: 25 set. 2024.

Introducing the new, improved HPCC Systems Machine Learning Library | HPCC Systems. Disponível em: [link] Acesso em: 25 set. 2024.

Introduction to HPCC Systems Open Source Big Data Platform. Disponível em: [link] Acesso em: 25 set. 2024.

Introduction to using PBblas on HPCC Systems. Disponível em: [link] Acesso em: 25 set. 2024.

JAIN, R. The art of computer systems performance analysis - techniques for experimental design, measurement, simulation, and modeling. [S.l.]: Wiley, 1991. 685 p. (Wiley professional computing). ISBN 978-0-471-50336-1.

Javadi-Abhari, A., Treinish, M., Krsulich, K., Wood, C. J., Lishman, J., Gacon, J., Martiel, S., Nation, P. D., Bishop, L. S., Cross, A.W., Johnson, B. R., and Gambetta, J. M. (2024). Quantum computing with qiskit.

John L. Gustafson. 1988. Reevaluating Amdahl’s law. Commun. ACM 31, 5 (May 1988), 532–533. DOI: 10.1145/42411.42415

Jorio, A. and Frossard, J. V. (2024). Material de Estudos para Mecânica Quântica. Programa de Pós-Graduação em Física, UFMG, 2nd edition.

Joshua Hoke Davis, Christopher Daley, Swaroop Pophale, Thomas Huber, Sunita Chandrasekaran, and Nicholas J. Wright. Performance assessment of openmp compilers targeting nvidia v100 gpus, 2020. [link].

Learning Trees — A guide to Decision Tree based Machine Learning. Disponível em: [link] Acesso em: 25 set. 2024.

LIN, Y. C.; SNYDER, L. Principles of Parallel Programming. Boston, Mass: Pearson/Addison Wesley, 2008. ISBN 978-0321487902. Disponível em: [link].

Lowe-Power, J., Ahmad, A. M., Akram, A., Alian, M., Amslinger, R., Andreozzi, M., Armejach, A., Asmussen, N., Bharadwaj, S., Black, G., Bloom, G., Bruce, B. R., Carvalho, D. R., Castrillón, J., Chen, L., Derumigny, N., Diestelhorst, S., Elsasser, W., Fariborz, M., Farahani, A. F., Fotouhi, P., Gambord, R., Gandhi, J., Gope, D., Grass, T., Hanindhito, B., Hansson, A., Haria, S., Harris, A., Hayes, T., Herrera, A., Horsnell, M., Jafri, S. A. R., Jagtap, R., Jang, H., Jeyapaul, R., Jones, T. M., Jung, M., Kannoth, S., Khaleghzadeh, H., Kodama, Y., Krishna, T., Marinelli, T., Menard, C., Mondelli, A., Mück, T., Naji, O., Nathella, K., Nguyen, H., Nikoleris, N., Olson, L. E., Orr, M. S., Pham, B., Prieto, P., Reddy, T., Roelke, A., Samani, M., Sandberg, A., Setoain, J., Shingarov, B., Sinclair, M. D., Ta, T., Thakur, R., Travaglini, G., Upton, M., Vaish, N., Vougioukas, I.,Wang, Z.,Wehn, N.,Weis, C., Wood, D. A., Yoon, H., and Zulian, É. F. (2020). The gem5 simulator: Version 20.0+. CoRR, abs/2007.03152.

Machine Learning Demystified. Disponível em: [link] Acesso em: 25 set. 2024.

MANACERO, A. Predição do desempenho de programas paralelos por simulação do grafo de execução. Tese (Doutorado) — University of Campinas, Brazil, 1997. DOI: 10.47749/T/UNICAMP.1997.118682.

Martin, M. M. K., Sorin, D. J., Beckmann, B. M., Marty, M. R., Xu, M., Alameldeen, A. R., Moore, K. E., Hill, M. D., and Wood, D. A. (2005). Multifacet’s general execution-driven multiprocessor simulator (gems) toolset. SIGARCH Comput. Archit. News, 33(4):92–99.

McIntosh, H. V. (2009). One Dimensional Cellular Automata. Luniver Press.

Michael Klemm and Jim Cownie. High performance parallel runtimes: Design and implementation, volume 1. De Gruyter Oldenbourg, 1 edition, 2021.

Microsoft Quantum (n.d.a). Quantum computing concepts: Entanglement. [link]. Accessed: 2024-09-05.

Microsoft Quantum (n.d.b). Quantum computing concepts: Superposition. [link]. Accessed: 2024-09-05.

MILLER, B.; HOLLINGSWORTH, J. Paradyn Tools Project. 2024. Accessed: 2024-09-13. Disponível em: [link].

ML_Core Documentation. Disponível em: [link] Acesso em: 25 set. 2024.

MPI Forum. MPI: A Message-Passing Interface Standard Version 2.2. Technical report, MPI Forun, 2009.

NASA Advanced Suporcomputing Division. NAS Parallel Benchmarks. 2024. Accessed: 2024-09-13. Disponível em: [link].

Nayak, P., Rathod, S., Surabhi, and Sukanya (2024). Quantum computing: Circuits, algorithms and application. International Journal of Advanced Research in Science, Communication and Technology (IJARSCT), 4(1).

Nielsen, M. A. and Chuang, I. L. (2010). Quantum Computation and Quantum Information: 10th Anniversary Edition. Cambridge University Press.

Nóbrega-da-Silva, Anderson, Cunha, Daniel, Silva, Vitor, Araújo Furtunato, Alex Fabiano, and Xavier-de-Souza, Samuel (2019). "PaScal Viewer: A Tool for the Visualization of Parallel Scalability Trends". In: Handbook of Research on Emerging Developments and Applications of High Performance Computing. pp. 250-264. ISBN: 978-981-13-6209-5. DOI: 10.1007/978-3-030-17872-7_15.

NVIDIA. NVIDIA’s Next Generation Compute Architecture: Kepler GK110/210. Technical report, NVIDIA, 2014.

OpenMP ARB. OpenMP Application Programming Interface Version 5.0. Technical report, OpenMP ARB, 2018.

Pacheco, P. S. (2011) An introduction to parallel programming. Morgan Kaufmann.

Parmar, D. (2024). Patent landscape for quantum computing: A survey of patenting activities on different physical realization methods. IPWatchdog. Accessed: 2024-08-08.

PERFORMANCE RESEARCH LAB. TAU - Tuning and Analysis Utilities. 2006. [link]. Accesseda em Julho de 2024.

Preskill, J. (2012). Quantum computing and the entanglement frontier.

REINDERS, J. Vtune performance analyzer essentials. [S.l.]: Intel Press, 2007.

Ruud van der. Pas, Eric Stotzer, and Christian Terboven. Using OpenMP - the next step: affinity, accelerators, tasking, and SIMD. the MIT Press, 2017.

SAHNI, S.; THANVANTRI, V. Performance metrics: keeping the focus on runtime. IEEE Parallel & Distributed Technology: Systems & Applications, v. 4, n. 1, p. 43–56, 1996.

Shafique, M. A., Munir, A., and Latif, I. (2024). Quantum computing: Circuits, algorithms, and applications. IEEE Access, 12:22296–22314.

ShareTechNote (n.d.). Quantum computing - bloch sphere. [link]. Accessed: 2024-09-05.

Shor, P. (1994). Algorithms for quantum computation: discrete logarithms and factoring. In Proceedings 35th Annual Symposium on Foundations of Computer Science, pages 124–134.

Silva, V. (2018a). Practical Quantum Computing for Developers: Programming Quantum Rigs in the Cloud using Python, Quantum Assembly Language and IBM QExperience. Apress.

Silva, V. (2018b). Practical Quantum Computing for Developers: Programming Quantum Rigs in the Cloud Using Python, Quantum Assembly Language and IBM QExperience. Apress L.P., New York.

Silva, Vitor, Nóbrega-da-Silva, Anderson, Valderrama Sakuyama, C., Manneback, Pierre, and Xavier-de-Souza, Samuel (2022). "A Minimally Intrusive Approach for Automatic Assessment of Parallel Performance Scalability of Shared-Memory HPC Applications". Electronics, vol. 11, no. 5. DOI: 10.3390/electronics11050689.

Source code: HPCC Systems ML_Core repository on GitHub. Disponível em: [link] Acesso em: 25 set. 2024.

Source code: HPCC Systems PBblas repository on GitHub. Disponível em: [link] Acesso em: 25 set. 2024.

Source code: LearningTrees repository on GitHub. Disponível em: [link] Acesso em: 25 set. 2024.

SPEC. Standard Performance Evaluation Corporation. 2024. Accessed: 2024-09-13. Disponível em: [link].

STERLING, T.; ANDERSON, M.; BRODOWICZ, M. High Performance Computing: Modern Systems and Practices. EUA: Morgan Kaufmann, 2017. 718 p.

TAY, Y. C. Analytical Performance Modeling for Computer Systems. Springer International Publishing, 2014. ISSN 1932-1686. ISBN 9783031018008. DOI: 10.1007/978-3-031-01800-8.

Thomas Huber, Swaroop Pophale, Nolan Baker, Michael Carr, Nikhil Rao, Jaydon Reap, Kristina Holsapple, Joshua Hoke Davis, Tobias Burnus, Seyong Lee, David E. Bernholdt, and Sunita Chandrasekaran. Ecp sollve: Validation and verification test-suite status update and compiler insight for openmp. In 2022 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC), pages 123–135, 2022. DOI: 10.1109/P3HPC56579.2022.00017.

Tom Deakin and Timothy G. Mattson. Programming your GPU with openmp: Performance portability for gpus, volume 1. The MIT Press, 1 edition, 2023.

Understanding the Myriad Interface feature of HPCC Systems Machine Learning | HPCC Systems. Disponível em: [link] Acesso em: 25 set. 2024.

Using HPCC Systems Machine Learning. Disponível em: [link] Acesso em: 25 set. 2024.

Valdez, F. and Melin, P. (2022). A review on quantum computing and deep learning algorithms and their applications. Soft Computing.

Waterman, A., Lee, Y., Patterson, D. A., and Asanovic, K. (2014). The risc-v instruction set manual, volume i: User-level isa, version 2.0. EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS-2014-54, page 4.

Wolfram, S. (2002). A New Kind of Science. Wolfram Media.

Yanofsky, N. S. and Mannucci, M. A. (2008). Quantum computing for computer scientists. Cambridge University Press.

Zhao, J. (2022). Possible implementations of oracles in quantum algorithms. J. Phys. Conf. Ser., 2386(1):012010.

Downloads

Publication date

October 23, 2024

License

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Details about the available publication format: Full Volume

Full Volume

ISBN-13 (15)

978-85-7669-610-0