August 2020 - On-device is a form of artificial intelligence (AI) that quickly became popular in mobile phones. On-device AI is a technology that can “perceive, reason, and take intuitive actions based on awareness of the situation” and is considered more advanced than conventional algorithms. This technology can revolutionize access for people with visual impairments. People with visual impairments may encounter challenges with identifying packaged foods, at home and in the store; however, on-device neural networks on mobile devices can perform these tasks. Two on-device neural networks, MnasNet and MobileNets, are currently in the development stages to complete tasks like labeled product recognition in real-time. Google has also recently developed its own on-device neural network model, Lookout, as an Android app that improves accessibility for people with visual impairment. For tasks such as product recognition, the model includes product index, object tracking, and optical character recognition features which assist with an accurate assessment of products.
Google highlights its model, Lookout, with an in-depth look at the design. The Lookout system has the following components: frame cache, frame selector, detector, object tracker, embedder, index searcher, OCR, scorer, and result presenter. The Frame Cache is the base that manages the “lifecycle” of the image and delivers the data to the other parts of the system. Each image frame goes through the detector which attempts to identify the product while the object trackers simultaneously detect the product’s shape in real-time. Subsequently, these regions of interest from the detector are sent to the embedder model. The embedder unit narrows down the possibilities particularly for products where two images may appear and the system must identify which product it is. The other components, index searcher and OCR, reads the product and extract additional information such as packet size or flavor variant. The scorer aspect of the model provides the result list and the top result after scoring is deemed the final result. The result presenter provides the product name via text-to-speech (TTS). On-Device models are becoming ubiquitous; as developers integrate them into the fabric of wireless technology, it expands the possibility for increased accessibility [Source: Chao Chen via Google AI Blog; Qualcomm].