Learn to read

Javier Sánchez Rois, Gradiant

The ability of new devices to understand their surrounding environment increases year by year. It is common that new mobile applications and web pages use what they see and hear to develop new ways of interaction with the user, provide new forms of access and authentication or simply to obtain information about their environment. The demand for technologies enabling this type of learning is rising, and technology giants are showing increasing interest in such systems. As in the case of object detection and speech recognition, text recognition is no stranger to this demand, and it is indeed one of the keys to the future of smart devices.
Optical Character Recognition (OCR) is a veteran technology. With its origins in the 1970s, its evolution have been fueled by the enormous progress in areas like image processing and pattern recognition. In fact, text in documents and images under controlled conditions is considered part of a mature technology. Proof of this are the different software solutions that exist, many of them embedded within text processors or included in clod storage services (such as Google’s Tesseract [1]). However, the era of mobile devices poses a new challenge: to locate and recognize text in uncontrolled conditions, where a number of factors (irregular illumination, perspective changes, occlusions..) hamper the process.
The interest in the development of algorithms capable of reading text in natural environments has motivated different academic events such as the ICDAR [2] conferences, where the advent of innovative techniques and studies has taken place. Moreover, in recent years new text recognition systems have been presented: the Naptha plugin [3], for example, capable of detecting and recognizing text in any element present in a web browser, while Google recently presented a technology [4] capable of reading more than 90% of text “captchas”. The interest in such systems becomes even higher if we think to the emergence of new smart devices (such as Google Glass) and the rise of augmented reality.
Though obtaining a system able to recognize the text you see on the street every day is a major challenge to the existing technologies, we will be witnessing the appearance of new devices able to read for us soon. GRADIANT wants to be part of this great challenge and, in projects like MAVEN [5], is currently working on the development of systems capable of recognizing text in natural images.