|
I. M. Janiszewski, V. V. Arlazarov, D. G. Slugin Achieving Statistical Dependence of the CNN Response on the Input Data Distortion for OCR Problem |
|
Abstract. The paper proposes an approach to training a convolutional neural network using information on the level of distortion of input data. The learning process is modified with an additional layer, which is subsequently deleted, so the architecture of the original network does not change. OCR of data based on the MNIST dataset distorted with Gaussian blur using LeNet5 architecture network is considered. This approach does not have quality loss of the network and has a significant error-free zone in responses on the test data which is absent in the traditional approach to training. The responses are statistically dependent on the level of input image’s distortions and there is a presence of a strong relationship between them. Keywords: Convolutional neural networks, pattern recognition, machine learning, distortion, Gaussian blur, OCR, MNIST PP. 94-101. DOI 10.14357/20718632190409 References 1. K. B. Bulatov, V. V. Arlazarov, T. S. Chernov, O. A. Slavin and D. P. Nikolaev, “Smart IDReader: Document Recognition in Video Stream”, ICDAR2017, IEEE Computer Society, ISSN 2379-2140, ISBN 978-15-38635-86-5, pp. 39-44, 2017. DOI: 10.1109/ICDAR.2017.347 2. M. M. Luqman, P. Gomez-Kramer, and J.-M. Ogier, “Mobile Phone Camera-Based Video Scanning of Paper Documents”. Cham: Springer International Publishing, 2014, pp. 164–178. 3. V. V. Arlazarov, A. Zhukovsky, V. Krivtsov, D. Nikolaev, and D. Polevoy, “Analysis of using stationary and mobilesmall-scale digital cameras for documents recognition,” Information Technologies and Computing Systems (3), 71–81 (2014). (in Russian) 4. A. V. Gayer, A. V. Sheshkus and Y. S. Chernyshova, “Effective real-time augmentation of training dataset for the neural networks learning”, ICMV 2018, 11041 ed., SPIE, vol. 11041, 2019. DOI: 10.1117/12.2522969 5. L. Perez and J. Wang, “The effectiveness of data augmentation in image classification using deep learning”, CoRR abs/1712.04621 (2017). 6. K. B. Bulatov, A. E. Lynchenko and V. E. Krivtsov, “Optimal frame-by-frame result combination strategy for OCR in video stream”, ICMV 2017, 10696 ed., SPIE, Apr. 2018, vol. 10696, 758 pp., ISBN 978-15-10619-41-8, 106961Z, 2018, DOI: 10.1117/12.2310139 7. Buhmann, Martin Dietrich, “Radial basis functions : theory and implementations”. Cambridge University Press. ISBN 978-0511040207 (2003). 8. The MNIST database of handwritten digits. URL: http://yann.lecun.com/exdb/mnist 9. Shapiro, L. G. & Stockman, G. C: "Computer Vision", page 137, 150. Prentice Hall, 2001 10. LeCun, Yann; Léon Bottou; Yoshua Bengio; Patrick Haffner (1998). "Gradient-based learning applied to document recognition" (PDF). Proceedings of the IEEE. 86 (11): 2278–2324. DOI:10.1109/5.726791 11. Daniel, Wayne W. (1990). “Spearman rank correlation coefficient”. Applied Nonparametric Statistics (2nd ed.). Boston: PWS-Kent. pp. 358–365. ISBN 978-0-534-91976-4 12. Koenker and G. Bassett, Jr. “Regression Quantiles” Econometrica, Vol.46 No1 (January, 1978) 13. He XM, Zhu LX. “A lack-of-fit test for quantile regression”. Journal of the American Statistical Association (2003);98(1):1013-1022. DOI: 0.1198/016214503000000963 14. Geraci M. “Qtools: A collection of models and tools for quantile inference”. The R Journal (2016), 8(2), 117-138. DOI:10.32614/RJ-2016-037 15. Yulia S. Chernyshova, Alexander V. Gayer, and Alexander V. Sheshkus "Generation method of synthetic training data for mobile OCR system", Proc. SPIE 10696, Tenth International Conference on Machine Vision (ICMV 2017). DOI: 10.1117/12.2310119
|