Projects

Document Visual Question Answering [2019 - Present]
- This project was concieved as a joint effort between CVIT, IIIT Hyderabad and CVC, UAB Barcelona. The primary focus of the project is to motivate the Document Analysis community to look beyond traditinal document analysis tasks and to strive for buiding systems with true “Document Understing” capabilities. In partnership with industry partners, we have introduced multiple tasks for QA/VQA on document images and conducted open challenges for these new tasks in leading Computer vision and Document analysis conferences. More details are available at docvqa.org
Scene Text Understanding [2015 - 2019]
- CVIT has been working in this space for the last few years and has made significant contributions in scene text recognition prior to the deep learning wave. IIIT5k scene text dataset is one of the most widely used datasets in this field. I have joined this project recently, and I am looking into scene text recognition in an unconstrained manner in a seq2seq framework. More details on our work in this are can be found here
Indian Languages OCR [2013 - Present]
- IIIT Hyderabad has been involved with the development of OCR for Indian languages since the conception of DLI project by the goverment of India. I was fortunate to join this group here and contribute towards a crucial technology in Indian language computing space. Despite the myriad of challenges in the Indian language space, compared to the Latin counterparts, we could achieve state of the art recognition accuracies in 12+ Indian languages. A comprehensive technical report use of CTC based segmentation free approac for OCR of Indian langauges is presented in this work. We follow a segmentation free approach to directly transcribe the text lines into sequences of unicodes. At this point we are trying to make our system available to the public. We are also looking forward to possible collaborations in terms of digitzing vast amounts of Indian language documents and in the development of assistive technologies for the visually challenged. The OCRs developed as part of this efforts are used in other related projects such as the [Goverment of India project on Information access from indian language document collections][11] and Audiobooks project at IIIT Hyderabad.
Audio Books for the Visually Challenged [2013 ]
- The project was an offshoot project of the OCR project. Here we worked in collaboration with the Speech lab in IIIT to make audio books in DAISY standard for the visually challenged. An OCR+TTS workflow was setup starting from scanning of the document . Our team developed web-based and desktop-baed apps for audio book playback and and deployed these applications at various schools for the visually imapired children. We did this as a pilot project in 2013 to assess the performaces of both the OCR and speech synthesizer and to collect feedback from the visually challenged community. At present CVIT has a dedicated team to make audiobooks on a regular basis based on requests from the visually impaired community. More details on this effort can be found here. This project uses Indian languages OCR that were developed as part of my work on Indian languages OCR.
Router Security using raw sockets @ Cisco, Bangalore - Undergrad project [2008]
- To develop an access control framework for router security. Access Control Lists(ACL) filter network traffic by controlling whether routed packets are forwarded at the router. The decision wheether to drop or forward the packet is based on the filter criteria set using the ACL.

Minesh Mathew