|
D. P. Matalov, E. L. Pliskin Making a Web service from OCR SDK |
|
Abstract. This article summarizes authors’ experience of developing a web service (WS) based on a document optical character recognition (OCR) software development kit (SDK). We consider issues of WS stability and performance, including: ability not to lose data under high load and after restart; ability to timely detect errors and limit their spread and duration; as well as deterministic WS behavior under conditions of parallel processing of multiple requests. High WS performance implies moderate overhead costs associated with receiving web requests and sending web responses to clients, besides of OCR engine costs itself. The described solution can be used to create a web service from any SDK which enables developer to process input documents and obtain output files from them, not necessarily in connection with optical recognition technologies. Keywords: SDK, SOAP, REST, Java, Web service, optical character recognition, OCR, multithreading. PP. 32-43. DOI 10.14357/20718632190204 Reference 1. Arlazarov V. V., Bulatov K. B., Uskov A. V. Model of object recognition system in a mobile device video stream // Proceedings of the ISA RAS. Special issue, 2018, pp. 73-82, DOI: 10.14357/20790279180508. 2. V. V. Arlazarov, O. A. Slavin, A. V. Uskov and I. M. Yanishevskiy, “Modelling the flow of character recognition results in video stream,” Bulletin of the South Ural State University. Ser. Mathematical Modelling, Programming & Computer Software, vol. 11, no 2, pp. 14- 28, 2018, DOI: 10.14529/mmp180202. 3. K. B. Bulatov, V. V. Arlazarov, T. S. Chernov, O. A. Slavin and D. P. Nikolaev, “Smart IDReader: Document Recognition in Video Stream,” ICDAR2017, IEEE Computer Society, ISSN 2379-2140, ISBN 978-15-38635-86-5, pp. 39-44, 2017, DOI: 10.1109/ICDAR.2017.347. 4. Zur Muehlen M., Nickerson J. V., Swenson K. D. Developing web services choreography standards—the case of REST vs. SOAP //Decision Support Systems. – 2005. – Т. 40. – №. 1. – С. 9-29. 5. Amin Z., Singh H., Sethi N. Review on fault tolerance techniques in cloud computing //International Journal of Computer Applications. – 2015. – Т. 116. – №. 18. 6. Jhawar R., Piuri V. Fault tolerance and resilience in cloud computing environments //Computer and information security handbook. – Morgan Kaufmann, 2017. – С. 165-18 7. Lamiroy B., Lopresti D. P. The DAE platform: a framework for reproducible research in document image analysis //International Workshop on Reproducible Research in Pattern Recognition. – Springer, Cham, 2016.– С. 17-29. 8. Jayathilaka H., Krintz C., Wolski R. Service-level agreement durability for web service response time //2015 IEEE 7th International Conference on Cloud Computing Technology and Science (CloudCom). – IEEE, 2015. – С. 331-338. 9. Wittern E. et al. Opportunities in software engineering research for web API consumption //Proceedings of the 1st International Workshop on API Usage and Evolution. – IEEE Press, 2017. – С. 7-10. 10. Dragoni N. et al. Microservices: yesterday, today, and tomorrow //Present and Ulterior Software Engineering. – Springer, Cham, 2017. – С. 195-216. 11. Karlsson E. The evolution and erosion of a serviceoriented architecture in enterprise software: A study of a service-oriented architecture and its transition to a microservice architecture. – 2018. 12. Würsch M., Ingold R., Liwicki M. Sdk reinvented: Document image analysis methods as restful web services //2016 12th IAPR Workshop on Document Analysis Systems (DAS). – IEEE, 2016. – С. 90-95. 13. Würsch M. et al. Turning Document Image Analysis Methods into Web Services-An Example Using OCRopus //2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). – IEEE, 2017. – Т. 4. – С. 48-52. 14. Watching a Directory for Changes. https://docs.oracle.com/javase/tutorial/essential/io/notification.html 15. Cholia S., Skinner D., Boverhof J. NEWT: A RESTful service for building High Performance Computing web applications //2010 Gateway Computing Environments Workshop (GCE). – IEEE, 2010. – С. 1-11. 16. Boettiger C. An introduction to Docker for reproducible research //ACM SIGOPS Operating Systems Review. – 2015. – Т. 49. – №. 1. – С. 71-79. 17. Apache Tomcat. http://tomcat.apache.org/ 18. Building RESTful Web Services with JAX-RS. https://docs.oracle.com/cd/E19798-01/821-1841/6nmq2cp1v/index.html 19. Building Web Services with JAX-WS. https://docs.oracle.com/cd/E19798-01/821-1841/bnayl/index.html 20. Google Cloud Vision API. https://cloud.google.com/vision/ 21. Microsoft’s Computer Vision API. https://azure.microsoft.com/en-us/services/cognitiveservices/computer-vision/ 22. FreeOCR API. https://ocr.space/ocrapi 23. OCR Cloud 2.0 API. http://www.ocr-it.com/ocr-cloud-2-0-api/ 24. Tabex-OCR-REST-API-Precise-Developers-OCR. http://pdfextractoronline.com/tabex-ocr-rest-api/ 25. https://blog.iron.io/the-overhead-of-docker-run/ 26. https://www.docker.com/ 27. Xu Y. et al. A privacy-preserving content-based image retrieval method in cloud environment //Journal of Visual Communication and Image Representation. – 2017. – Т. 43. – С. 164-172. 28. Zhang L. et al. Pic: Enable large-scale privacy preserving content-based image search on cloud //IEEE Transactions on Parallel and Distributed Systems. – 2017. – Т. 28. – № 11. – С. 3258-3271. 29. Hu H. et al. Web-scale responsive visual search at bing //Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. – ACM, 2018. – С. 359-367. 30. Aweya J. et al. An adaptive load balancing scheme for web servers //International Journal of Network Management. – 2002. – Т. 12. – №. 1. – С. 3-39.
|