Tag Archives: first

Updated with currency and color recognition, Seeing AI is available in 35 countries

iPhone with Seeing AI app

Since we first made Seeing AI available, there’s been 100,000 downloads of the app and it has assisted users with over 3 million tasks. We have never been more humbled by your feedback and are encouraged to do more! When we first released this, we launched with features such as the ability to hear what a product is via audibly locating the barcode, describing images, text and faces of friends and family as they come into view. Today, we’re excited to announce new features coming to the app that will build on these early results, provide new user experiences and allow us to continue to learn and innovate. These new features, such as currency, handwriting and color recognition, as well as light detection, are now available in the app in 35 countries, including the European Union.

Some of the new features now available in Seeing AI include:

  • Color recognition: Getting dressed in the morning just got easier with this new feature, which describes the color of objects, like garments in your closet.
  • Currency recognition: Seeing AI can now recognize the currency of US dollars, Canadian dollars, British Pounds and Euros. Checking how much change is in your pocket or leaving a cash tip at a restaurant is much easier.
  • Musical light detector: The app alerts you with a corresponding audible tone when you aim the phone’s camera at light in the environment. A newly convenient tool so you don’t have to touch hot bulbs to know that a light is switched off, or the battery pack’s LED is on.
  • Handwriting recognition: Expanding on the ability of the app to read printed text, such as on menus or signs, the newly improved ability to read handwriting means you can read personal notes in a greeting card, as well as printed stylized text not usually readable by optical character recognition.
  • Reading documents: Seeing AI can read you the document aloud without voiceover, with synchronized word highlighting. Also, it includes the ability to change the text size on the Document channel.
  • Ability to choose voices and speed: Personalization is key, and when you’re not using VoiceOver, this feature lets you choose between the voice that is used and how fast it talks.

With each of these new features, we make sure to protect personal data while ensuring the technology operates effectively and provides users the best experiences with our products. If you have questions, the Microsoft Privacy Statement explains how Microsoft collects, stores and uses personal information.

We continue to hear from you how Seeing AI is bringing value to your life. It’s more than humbling. Cameron Roles, a university lecturer at the Australian National University College of Law, believes there’s never been a better time in history to be a blind person.

“I absolutely love Seeing AI. If my children hand me a note from school or if I pick up a book, I can use Seeing AI to quickly capture that text and just give me a very brief instant overview of what’s on the document,” said Roles of the important capability the app has for reading text and handwriting. “I can quickly be right on top of it.”

“For me, I think we’re coming into a really exciting time,” said Roles. “The growth in artificial intelligence, augmented reality, self-driving cars… I feel that it’s a great time for all of us in society.”

“Technology can be such an enabler of good and such an enabler for people to shrink the world, for the world to come closer together, and for people to be able to achieve so much more than they ever could without it,” said Roles.

We’re excited to share these features and look forward to hearing from you who Seeing AI is making your world more inclusive. Its available in Apple’s App Store in 35 countries and when a new version is released, you will be shown the list of new features when you next launch the app.

Please if you have feedback, we want to hear it! This started as a prototype just a year ago, and while we’ve been thrilled with the progress, we know we have a long way to go. Please send your thoughts, feedback or questions to seeingai@microsoft.com.

If you have further questions or feedback, please contact the Disability Answer Desk. The Disability Answer Desk is there to assist via phone and chat, and in the United States, we also have an ASL option for our customers with hearing loss (+1 503-427-1234).

Microsoft researchers use visual AI to make India’s roads safer – Microsoft News Center India

A nervous student gets into a car. It’s his first time behind the steering wheel. He glances anxiously at his driving instructor on his left. The instructor reassures him and starts on a set of instructions.  He urges him to turn on the ignition, slide the gear from neutral to first, and slowly release the clutch while stepping on the throttle. He also reminds him to be ready to brake if needed and to keep an eye on the rear and wing mirrors.

The scenario above is how drivers have been trained at the Institute of Driving and Traffic Research (IDTR)—a joint venture between the Department of Transport of State Governments and car manufacturer Maruti Suzuki India Ltd., India’s largest passenger car manufacturer. Founded in 2000, IDTR’s aim is to make Indian roads safer.

India has one of the highest number of road accidents in the world. In 2016, 17 deaths and 55 road accidents occurred every 60 minutes—one death every four minutes. The main contributing factors are poor road infrastructure, low awareness of road rules and traffic signs, and distracted and inefficient driving.

One way in which IDTR tries to address India’s dismal road safety record is by teaching safe driving, via scientifically engineered training, testing tracks of international standards and through simulators. Some aspects of IDTR’s methodology are also being used by Maruti Driving Schools, an added service offered countrywide by the dealers of Maruti Suzuki India Ltd.

“At IDTR, our primary focus is on providing quality training to the drivers and developing better methods of training. For this, we use technology to a great extent—simulators, and cameras. Recently, we have developed an on-board diagnostic (OBD) device used for in-car automation. This enhances quality of driving training instructions” says Mahesh Rajoria, Director IDTR  and head of Driving Schools Division at Maruti Suzuki, The driving schools network comprises of  IDTR and the Maruti Driving Schools (MDS). Collectively, IDTR and MDS have trained over three million drivers so far.

However, IDTR now has one more addition, which could change the way trainers teach students to drive cars – an inconspicuous smartphone mounted on the car’s dashboard that records both the driver and the view of the road from the front windshield. After every session, it provides detailed analysis to the instructor, which wasn’t possible earlier. The solution, made by researchers at Microsoft, is called HAMS.

HAMS: Leveraging low-cost tech to tackle road safety

HAMS, which stands for Harnessing AutoMobiles for Safety, is a virtual harness for vehicles that focusses on two factors that are critical to road safety—the state of the driver, and his or her driving relative to other vehicles.

It employs the front and back cameras of dashboard-mounted smartphone, the phone’s GPS and inertial sensors, and an On-Board Diagnostics (OBD-II) scanner, which provides information from the vehicle. Much of this data is processed locally on the smartphone itself, with an Azure-based backend being used for aggregating and visualizing the processed data. The front-camera of the smartphone looks at the driver, the back camera looks out to the front of the vehicle and based on the raw data obtained from the sensors, HAMS detects various events of interest such as driver distraction, fatigue and gaze tracking, as well as vehicle ranging, which determines whether a safe separation distance is being maintained with the vehicles in front.

HAMS monitors driver fatigue by detecting eye closure and yawns from the phone’s front camera. Eye Aspect Ratio (EAR) metric is used to detect eye closure, based on which the PERCLOS metric quantifying the percentage of time the eyes are closed is computed. Yawns are detected using the Mouth Aspect ratio (MAR) metric, which helps detect when the mouth remains open for a continuous period of at least one or two seconds. Gaze tracking, which is done through head pose estimation and eye gaze tracking, enables analyzing mirror scanning behavior, for example, to detect episodes when a driver stares ahead for a prolonged period, thereby failing to maintain awareness of their surroundings.

Vehicle ranging, aimed to prevent tailgating, is determined by delineating a bounding box around the vehicle in front as viewed through the smartphone’s back camera. Based on the size of the bounding box, the distance to the vehicle is estimated.

The success of HAMS lies in its effective architecture that performs edge-based processing of multimodal sensor data using a hybrid approach that combines machine learning with traditional techniques to balance accuracy with efficiency. Such edge-based processing enables raw data to be processed locally on the smartphone, enabling greater efficiency and minimal data usage. It also ensures privacy, since only the detections and no raw images are uploaded to the cloud.

Effective monitoring leading to actionable feedback

“Since we were already working in this direction—that is bringing technology in driving training—we were ready to go along when Microsoft researchers discussed about HAMS,” says Rajoria, recollecting the initial deployment of HAMS with Ashish Mathur from his team.

“HAMS covers parameters in driving instruction that we thought was never possible,” says a jubilant Rajoria. “Take for example, the parameter of maintaining the correct distance between the vehicle you are driving and the vehicle in the front. Now this is a very important parameter as far as the driving instruction goes. HAMS is definitely going to help us with that.”

HAMS is already being used in some cars at IDTR and instructors revisit the footage and analytics after every training session to give feedback to their students in the next session.

The vision behind HAMS

The genesis of the HAMS project goes back to a decade ago, when Principal Researcher Venkat Padmanabhan returned to India after having spent eight-and-a-half years at Microsoft Research in Redmond, USA. “The first thing that hits you, quite literally, is the traffic,” he recalls when asked about how the HAMS project came to be. “Back then, we did a project called Nericell, where we came up with an idea of using smartphones, or what was then considered to be a smartphone, to monitor road and traffic conditions. While we did succeed on many fronts—we did a small-scale deployment and our 2008 paper has garnered well over 1,000 citations and has spawned many efforts—because of the limitations of the hardware, we couldn’t really take it very far.”

All this changed in 2015, when the researchers decided to concentrate the research on road safety, narrowing down on the driver and the driving. The fast developing IoT ecosystem and the advancement of smartphone hardware with faster processors, better cameras and multiple sensors – accelerometer, gyroscope – accelerated the efforts.

“It was the summer of 2016 when we started building and deploying HAMS in over a dozen cabs at Microsoft Research office in Bengaluru—vehicles that were used to shuttle employees back and forth,” Padmanabhan recalls.

But before the successful launch of HAMS, Microsoft researchers had to face challenges on several fronts. First, since HAMS utilized edge processing that would happen directly on the smartphone, the researchers had to figure out how to process information in an intelligent way. Second, the algorithms had to be self-calibrating to work in uncontrolled environments, where, for instance, the mounting of the phone with respect to the driver could vary. The algorithms also required a fair amount of customization to define the various parameters such as vehicle tracking and ranging.

Post-doctoral researcher Akshay Uttama Nambi, who is part of the Microsoft research team that developed HAMS, elaborates on how they overcame the challenges. “Especially with vehicle ranging, where we had to identify the distance between your vehicle and the vehicle in the front, the algorithms had to be efficient enough to track and identify the vehicle in real-time. We developed a hybrid approach where we mix a high computational intensive task with a low computational intensive task. This balances the load on the smartphone.”

“We identified certain features which could be reused across multiple detectors. For example, facial landmarks can be computed once for each image, and then could be used for multiple detectors such as for fatigue, gaze, etc. Thus the heavy-lifting done in extracting the facial landmarks could be used for such diverse tasks as tracking the driver’s blinking rate, detecting whether he is yawning, or whether his gaze was directed in the appropriate direction,” Nambi explains.

Immense possibilities to be explored

The possibility of HAMS goes far beyond it just being used as an aid for driver training. For instance, it could potentially be deployed during the issuance of driving licenses. Presently, just a single practical test along with a theory exam is needed to get a driving licence in India. “If HAMS is deployed, an applicant with a learner’s license can be tested over 100 or 1,000 kilometres, before the licence is granted,” says Padmanabhan.

Another area where HAMS can be put to use is in fleet management, providing stakeholders such as fleet owners or supervisors with visibility in an intelligent way. The fleet can comprise hundreds or thousands of cabs, buses, or trucks, being overseen by a supervisor.

Parents could also possibly use HAMS to monitor the driving of their teenage kids, who might be new drivers.

“Different markets can have dramatically different needs, and this is evident in the innovations in the automotive industry. While self-driving cars are being actively worked on in the West, there is a huge need in India and emerging markets to use AI in existing human-driven cars to help the driver drive safely,” says Sriram Rajamani, Managing Director, Microsoft Research India.

“There is also a huge need to improve safety of fleets such as truck fleets, bus fleets and car fleets. HAMS is an extremely interesting project because it deals with existing vehicles, and existing fleets, and explores improving safety while being frugal in terms of costs,” he adds.