Preview

SRISA Proceedings

Advanced search

Supercomputer Job Management System Simulator with External Control Interface

Abstract

Simulators are popular tools for studying the supercomputer workload managers as complex multiuser systems. The paper formulates requirements for a simulator of a HPC system that includes geographically distributed supercomputers. Compliance with the stated requirements can be ensured by implementing an external simulator control interface. An analysis of the characteristics of modern HPC workload manager simulators is presented from the stated requirements point of view. The architecture of a workload manager simulator with external control interface is proposed. The first results of using the Elytra simulator, which implements the proposed architecture, are considered.

About the Authors

D. Lyakhovets
МСЦ РАН – филиал ФГУ ФНЦ НИИСИ РАН
Russian Federation


A. Baranov
МСЦ РАН – филиал ФГУ ФНЦ НИИСИ РАН
Russian Federation


A. Kudrin
МСЦ РАН – филиал ФГУ ФНЦ НИИСИ РАН
Russian Federation


References

1. А.В. Баранов, А.И. Тихомиров. Методы и средства организации глобальной очереди заданий в территориально распределенной вычислительной системе. «Вестник ЮУрГУ. Серия: Вычислительная математика и информатика», Т. 6 (2017), № 4, 28-42.

2. А.Г. Феоктистов, А.С. Корсуков, Ю.А. Дядькин. Инструментальные средства имитационного моделирования предметно-ориентированных распределенных вычислительных систем. «Системы управления, связи и безопасности», № 4 (2016), 30–60.

3. D. Cameron, R. Carvajal-Schiano, A. Millar, C. Nicholson, K. Stockinger, F. Zini. OptorSim: A simulation tool for scheduling and replica optimisation in data grids. “Computing in High Energy and Nuclear Physics”, 2010, 707-711.

4. S. Bąk, M. Krystek, K. Kurowski, A. Oleksiak, W. Piatek, J. Waglarz. GSSIM - A tool for distributed computing experiments. “Scientific Programming”, V. 19 (2017), 231-251.

5. W. Chen, E. Deelman. WorkflowSim: A toolkit for simulating scientific workflows in distributed environments. “2012 IEEE 8th International Conference on E-Science, e-Science 2012”, 2012, 1-8.

6. S. Ostermann, K. Plankensteiner, R. Prodan, T. Fahringer. GroudSim: An Event-Based Simulation Framework for Computational Grids and Clouds. “Euro-Par 2010 Parallel Processing Workshops. Euro-Par 2010. Lecture Notes in Computer Science”, V. 6586 (2011), 305–313.

7. P.-F. Dutot, M. Mercier, M. Poquet, O. Richard. Batsim: A Realistic Language-Independent Resources and Jobs Management Systems Simulator. “Job Scheduling Strategies for Parallel Processing. Lecture Notes in Computer Science”, V. 10353 (2017), 178-197.

8. M. Obaida, J. Liu. Simulation of HPC job scheduling and large-scale parallel workloads. “2017 Winter Simulation Conference (WSC)”, 2017, 920-931.

9. D. Klusáček, M. Soysal, F. Suter. Alea – Complex Job Scheduling Simulator. “Parallel Processing and Applied Mathematics. PPAM 2019. Lecture Notes in Computer Science”, V. 12044 (2020), 217-229.

10. N. Capit, G. Da Costa, Y. Georgiou, G. Huard, C. Martin, G. Mounié, P. Neyron, O. Richard. A batch scheduler with high level components. “CCGrid 2005. IEEE International Symposium on Cluster Computing and the Grid”, V. 2 (2005), 776-783.

11. D. Klusáček, M. Soysal. Walltime Prediction and Its Impact on Job Scheduling Performance and Predictability. “Job Scheduling Strategies for Parallel Processing. JSSPP 2020. Lecture Notes in Computer Science”, V. 12326 (2020), 127-144.

12. V. Chlumský, D. Klusáček. Improving Accuracy of Walltime Estimates in PBS Professional Using Soft Walltimes. “Job Scheduling Strategies for Parallel Processing. JSSPP 2022. Lecture Notes in Computer Science”, V. 13592 (2023), 192-210.

13. D. Lyakhovets, A. Baranov. Efficiency Thresholds of Group Based Job Scheduling in HPC Systems. “Lobachevskii Journal of Mathematics”, V. 43 (2023), 2863-2876.

14. M. Jaros, D. Klusáček, J. Jaros. Optimizing Biomedical Ultrasound Workflow Scheduling Using Cluster Simulations. “Job Scheduling Strategies for Parallel Processing. JSSPP 2020. Lecture Notes in Computer Science”, V. 12326 (2020), 68-84.

15. A. Baranov, D. Lyakhovets. Accuracy Comparison of Various Supercomputer Job Management System Models. “Accuracy Comparison of Various Supercomputer Job Management System Models”, V. 42 (2021), 2510–2519.

16. G. I. Savin, B. M. Shabanov, P. N. Telegin, and A. V. Baranov, “Joint Supercomputer center of the Russian Academy of Sciences: Present and future,” Lobachevskii J. Math. 40 (2019). 1853–1862.

17. А.В. Баранов, Д.С. Ляховец. Имитационная модель системы пакетирования суперкомпьютерных заданий на базе симулятора Alea. «Программные продукты и системы», №4 (2022), 631-643.

18. W. Cirne and F. Berman, “A model for moldable supercomputer jobs,” in Proceedings of the 15th International Parallel and Distributed Processing Symposium IPDPS 2001 (2001), p. 8.

19. D. Lyakhovets, A. Baranov, P. Telegin. Scale Ratio Tuning of Group Based Job Scheduling in HPC Systems. “Lobachevskii Journal of Mathematics”, V. 44 (2024), 5012-5026.


Review

For citations:


Lyakhovets D., Baranov A., Kudrin A. Supercomputer Job Management System Simulator with External Control Interface. SRISA Proceedings. 2024;14(4):75-83. (In Russ.)

Views: 26


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 2225-7349 (Print)
ISSN 3033-6422 (Online)