|
Abstract.
This article supplements the previously published review on Clusterix-like DBMSs developed at KNRTU-KAI with a number of points important for practice, which may be of interest to specialists and potential customers. This is a conservative type DBMS with occasional updating of analytically processed data. The additional research on Clusterix-New is aimed at: 1) identify the maximum achievable acceleration δυ of its operation with the growth of the database size VDB and the number of working nodes h of the cluster platform. 2) Determine the appropriate choice of h for a given VDB from the condition of obtaining acceptable efficiency eff = δυ / h. 3) Determine ways to improve the performance of Clusterix-New with the move to the Big Data class. 4) Compare its latest version with Apache Spark 3.5, which has a high DBMS rating. 5) Distantiate it with PerformSys, another original DBMS focused on batch query processing.
Keywords:
Clusterix-New DBMS, maximal achievable acceleration, efficiency, choice of number of nodes, moving to Big Data class, competitiveness, PerformSys DBMS.
DOI 10.14357/20718632250211
EDN SKFVYC
PP. 123-134.
References
1. Miryala N. K. et al. Emerging Trends and Challenges in Modern Database Technologies: A Comprehensive Analysis. ResearchGate Publication. 2024. 2. Zhan C. et al. AnalyticDB: real-time OLAP database system at Alibaba cloud. Proceedings of the VLDB Endowment. 2019; 12 (12): 2059-2070. 3. Wang J. et al. Polardb-imci: A cloud-native htap database system at alibaba. Proceedings of the ACM on Management of Data. 2023; 1 (2): 1-25. 4. AnalyticDB. Alibaba Cloud. 2024; Available from: https://www.alibabacloud.com/help/en /analyticdb/analyticdb-for-postgresql/product-overview/overview-productoverview [Accessed: 07.03.2025]. 5. YugabyteDB. The Distributed SQL Database for MissionCritical Apps. Yugabyte, INC. 2025; Available from: https://www.yugabyte.com/ [Accessed: 07.03.2025]. 6. DBMS Postgres Pro Shardman. Postgres Pro Company. Available from: https://postgrespro.ru/ products/shardman [Accessed: 07.03.2025]. In Russ. 7. Postgres Pro Shardman: horizontal scaling of relational DBMSs. Habr. Available from: https://habr.com/ru/companies/postgrespro/articles/811041/ [Accessed: 07.03.2025]. In Russ. 8. YDB — an open source Distributed SQL Database. Yandex. Available from: https://ydb.tech// [Accessed: 07.03.2025]. 9. Russian DBMS industry advances on "elephants". Connect. 2017; 5-6: 34-38. In Russ. 10. EMC Education Services. Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data. John Wiley & Sons. 2015; 11. Xin, Reynold & Rosen, Josh & Zaharia, Matei & J. Franklin, Michael & Shenker, Scott & Stoica, Ion. Spark: SQL and Rich Analytics at Scale. Proceedings of the ACM SIGMOD International Conference on Management of Data. 2012; DOI: 10.1145/2463676.2465288. 12. Apache Spark. 2024; Available from: https://spark.apache.org/ [Accessed: 14.01.2025]. 13. V. A. Raikhlin, R. K. Klassen BIG DATA Class Conservative-Type Clusterix-Like DBMSs. Journal of Information Technologies and Computing Systems. 2024; 3: 39-51. In Russ. 14. Barsegjan A.A., Kuprijanov M.S., Stepanenko V.V., Holod I.I. Data analysis technologies: Data Mining, Visual Mimning, Text Mining. OLAP. SPb.:BHV-Peterburg. 2007; 2 ed. In Russ. 15. Cohen J., Dolan B., Dunlap M., Hellerstein J. M. and Welton C. MAD Skills: New Analysis Practices for Big Data. Proceedings of the VLDB Endowment. 2009; 2 (2): 1481-1492. 16. Raikhlin, V.A. Simulation of Distributed Database Machines. Programming and Computer Software. 1996; 22 (2): 68-74. 17. Kazantsev I. A., Klassen R. K. Improving the automatic pretanslator of SQL-queries to a regular plan. Herald of computer and information technologies. 2021; 18 (12): 3-12. In Russ. 18. Russian DBMS Postgres Pro. Postgres Professional. 2025; Available from: https://postgrespro.ru/products/postgrespro [Accessed: 14.03.2025]. 19. The MySQL Plugin API. MySQL Documentation. 2018; Available from: https://dev.mysql.com/ doc/refman/5.7/en/plugin-api.html [Accessed: 09.04.2018]. 20. Abramov E.V. Parallel DBMS Clusterix. Prototype development and its field study. Vestnik KGTU im. A.N. Tupoleva. 2006; 2: 50-55. In Russ. 21. Raikhlin V.A., Abramov E.V. Database Clusters. Modeling of evolution. Vestnik KGTU im. A.N. Tupoleva. 2006; 3: 22-27. In Russ. 22. Raikhlin V.A., Abramov E.V., Shageev D.O. Evolutionary modeling of the process of choosing the architecture of database clusters. 8 Mezhdunarodnaia Konferentciia "Vysokoproizvoditelnye parallelnye vychisleniia na klasternykh sistemakh" HPC-2008. Kazan: Izd. KGTU. 2008: 249-256. In Russ. 23. Raikhlin V.A., Minyazev R.Sh. Analysis of processes in clusters of conservative databases from the position of self-organization. Vestnik KGTU im. A.N. Tupoleva. 2015; 2: 120-126. In Russ. 24. Vadim A. Raikhlin, Roman K. Klassen. Clusterix-Like BigData DBMS. Data Science and Engineering. 2020; 5(1): 80–93. DOI:10.1007/s41019-020-00116-2 25. Klassen R.K. Clusterix-N. 2025; Available from: https://bitbucket.org/rozh/clusterixn/ [Accessed: 14.01.2025]. In Russ. 26. Klassen R.K. Increasing the efficiency of a parallel DBMS of conservative type on a cluster platform with multicore nodes. Vestnik KGTU im. A.N.Tupoleva. 2015; 1: 112-118. In Russ. 27. Klassen R.K. The program for regional load balancing to a conservative type database on the cluster platform «PerformSys». Certificate of state registration of the computer program No. 2017611785 of 09.02.2017. In Russ. 28. Raikhlin V.A., Klassen R.K. Comparatively inexpensive hybrid technologies of conservative DBMS of large volumes. Informatcionnye tekhnologii i vychislitelnye sistemy. 2018; 68(1): 46-59. In Russ. 29. Klassen R.K. PerformSys. 2019; Available from: https://github.com/rozh1/PerformSys/ [Accessed: 14.01.2025]. In Russ. 30. Raikhlin V.A. Parallel data processing systems. Kazan: Izd-vo «Fən» («Nauka»). 2010; In Russ. 31. Raikhlin V.A., Minyazev R.S. Multiclustering of distributed DBMS of conservative type. Nelineinyi mir. 2011; 8: 473-481. In Russ. 32. Ferhatosmanoglu H., Tosun A. S., Canahuate G., Ramachandran A. Efficient parallel processing of range queries through replicated declustering. Distrib. Parallel Databases. 2006; 20 (2): 117–147. 33. Jae-Woo Chang, Young-Chang Kim. Cluster-based DBMS Management Tool with High-Availability. Journal of Systemics, Cybernetics and Informatics. 2005; 3 (1): 46-51.
|