Журнал «Информационные технологии и вычислительные системы» - D. R. Potapov Implementation of the module for determining complex load parameters for self-adapting data containers

Просматривается номер 2019 / 01

APPLIED ASPECTS OF COMPUTER SCIENCE

P. B. Bogdanov, O. J. Sudareva JPEG image decoding on the KOMDIV microprocessors

Yu. G. Drevs, N. A. Pavlov Multi-agent simulation of the extremism dynamics

A. A. Zuenko, S. Yu. Yakovlev, A. S. Shemyakin, Yu. A. Oleynik Application of constraint programming technology for planning action in emergency situations

CONTROL SYSTEMS

K. S. Ginsberg The structure identification methodology conceptual basis for the purpose of creating automatic control systems with the required properties

P. A. Kurnikov, N. V. Krapuhina Phase space reconstruction of high%loaded caching mechanism dynamics in information systems

SOFTWARE ENGINEERING

V. B. Gusev, P. V. Kurakin Software instruments for verification of models of complex interactions of economic factors

BIOINFORMATICS AND MEDICINE

E.B. Kleymenova, L.P. Yashina Digital technologies for improving the quality and safety of acute cardiovascular disease management

DATA PROCESSING AND ANALYSIS

D. R. Potapov Implementation of the module for determining complex load parameters for self-adapting data containers

A. V. Vershinina, I. E. Bocharova, E. N. Koshkina, S. N. Osipov Assessment of Innovation Start-Ups in e-Sport Industry

SECURITY ISSUES

V. A. Fedorenko, E. V. Navrotskaya Criteria and algorithm of the evaluation of the uniqueness of the complexes of matching tracks in the traces on the shot bullets


	D. R. Potapov Implementation of the module for determining complex load parameters for self-adapting data containers
Abstract. In applications with a large amount of the static data or data which is using for reading mostly cache applying improves performance greatly. To achieve maximum efficiency in an adaptive data storage implementation cache size can be changed dynamically during execution based on difference between speed of a main container and the cache, and container load. The main parameter of load is a set of requesting data, which in common case can be described as Gaussian distribution. But in a real world the container load is a set of simple loads mostly, because requests to data storage can be made by many applications or different tasks. Thus, parameters of such loads should be identified to achieve cache maximum efficiency. This paper provides implementation of the module for determining complex load parameters for self-adapting data containers results. The choice of EM modification, k-means++ initialization, and module structure brief description are also explained in this work. Clustering quality (for one and many clusters, concepts drift and time frame) and module execution time in this research are analyzed. Based on tests results, it can be said, that this module is good enough for determining complex load parameters and can be used in self-adapting data containers effectively. Keywords: store the data, cache efficiency, optimal data storage, adaptive data container, container load, gaussian mixture model, clustering, EM, k-means. PP. 87-95. DOI 10.14357/20718632190108 References 1. Potapov, D. R., M. A. Artemov, and E. S. Baranovskii. 2017. Obzor uslovii adaptatsii samoadaptiruyushchikhsya assotsiativnykh konteinerov dannykh [Review adaptation conditions of adaptive associative data storages]. Vestnik Voronezhskogo gosudarstvennogo universiteta. Seriya: Sistemnyi analiz i informatsionnye tekhnologii 1: 112-119. 2. Zobov, V. V., and K. E. Seleznev. 2014. Instrument dlya modelirovaniya nagruzki na konteinery dannykh [Tool for modeling the load on data containers]. Materialy chetyrnadtsatoi nauchno-metodicheskoi konferentsii «Informatika: problemy, metodologiya, tekhnologii 3: 154–161. 3. Potapov, D. R. 2018. Existing methods of multidimensional «key-value» storages construction for using in adaptive data storages review. JOURNAL OF APPLIED INFORMATICS 2(74): 69-82. 4. Potapov, D. R., M. A. Artemov, E. S. Baranovskii, and K.E. Seleznev. 2017. Obzor metodov postroenija kontejnerov dannyh «kljuch-znachenie» dlja ispol'zovanija v samoadaptirujushhihsja kontejnerah dannyh [Existing methods of “key-value” storages construction for using in adaptive data storages review]. Kibernetika i programmirovanie. 5:14-45. 5. Potapov, D. R. Issledovanie jeffektivnosti primenenija kesha dlja ispol'zovanija v samoadaptirujuwihsja kontejnerah dannyh [Cache efficiency research for using in adaptive data storage]. (In Russian, Unpubl.) 6. Bishop, C. 2006. Pattern Recognition and Machine Learning. Heidelberg: Springer. 738 p. 7. McLachlan, G., and D. Peel. 2004. Finite Mixture Models. NY: John Wiley & Sons. 419 p. 8. Korolev, V.U. 2007. EM-algoritm, ego modifikacii i ih primenenie k zadache razdelenija smesej verojatnostnyh raspredelenij. Teoreticheskij obzor [EM-algorithm, itsmodifications and their application to the problem of separation of mixtures of probability distributions. Theoretical review]. Moscow: IPI RAN. 102 p. 9. McLachlan, G., and T. Krishnan. 1997. The EM algorithm and extensions. Wiley series in probability and statistics. NY: John Wiley & Sons. 400 p. 10. Aggarwal, C.C., J. Han, J. Wang, and P.S. Yu. 2003. A framework for clustering evolving data streams. Proceedings of the 29th international conference on Very large data bases. Berlin. 81-92. 11. Liang, P., and D. Klein. 2009. Online EM for unsupervised models. Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics. Boulder. 611-619. 12. Blomer, J., and K. Bujna. 2013. Simple methods for initializing the EM algorithm for Gaussian mixture models. Computing Research Repository. Vol. abs/1312.5946. 13. Baudry, J.-P., and G. Celeux. 2015. EM for Mixtures. Statistics and Computing 25(4): 713–726. 14. Melnykov, V., and I. Melnykov. 2012. Initializing the EM algorithm in Gaussian mixture models with an unknown number of components. Computational Statistics & Data Analysis. 56(6): 1381-1395. 15. Biernacki, C., G. Celeux, and G. Govaert. 2003. Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models. Computational Statistics & Data Analysis. 41(3-4): 561–575. 16. Meila, M., and D. Heckerman. 1998. An Experimental Comparison of Several Clustering and Initialization Methods. Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence. San Francisco. 386–395. 17. Arthur, D., and S. Vassilvitskii. 2007. K-means++: The Advantages of Careful Seeding. Proceedings of the 8th Annual ACM-SIAM Symposium on Discrete Algorithms. New Orleans. 1027–1035. 18. Bahmani, B., B. Moseley, A. Vattani, R. Kumar, and S. Vassilvitskii. 2012. Scalable k-means++. Proceedings of the VLDB Endowment. 5(7): 622-633. 19. Zhao, W., H. Ma, and Q. He. 2009. Parallel K-means clustering based on mapReduce. Proceedings of the 1st International Conference on Cloud Computing. Heidelberg. 674-679. 20. Xu, Y., W. Qu, Z. Li, C. Ji, Y. Li, and Y. Wu. 2014. Fast Scalable k-means++ Algorithm with MapReduce. Algorithms and Architectures for Parallel Processing. ICA3PP 2014 8631: 15-28. 21. Unsupervised machine learning with multivariate Gaussian mixture model which supports both offline data and real-time data stream. https://github.com/lukapopijac/gaussian-mixture-model 22. Kruglov, V. M., and V. U. Korolev. 1990. Predel'nye teoremy dlja sluchajnyh sum [Limit theorems for random sums]. Moscow: Moscow University Publishing. 269 p. 23. Gmurman, V. E. 2014. Teoriya veroyatnostej i matematicheskaya statistika : uchebnik dlya prikladnogo bakalavriata [Theory of Probability and Mathematical Statistics: A Textbook for Applied Bachelor Degree]. Moscow: Urait. 479p. 24. Bradley, P.S., U. M. Fayyad, and C. A. Reina. 1999. Scaling EM (Expectation-Maximization) Clustering to Large Databases. Microsoft Research Technical Report MSRTR-98-35.

2025 / 02

2025 / 01

2024 / 04

2024 / 03

Abstract.

Keywords: