Google Cloud Vision API – a Product Manager overview

Experience of how we’ve been using this tool in our products

First of all, let me tell you: I’m not a developer. I’m a Product Manager, and I don’t have a technical background. This article aims to present a high-level concept of the Google Cloud Vision API and its use to strengthen some of our digital products.

Simply put, the Google Cloud Vision API is a tool to decipher images. Through advanced models of machine learning, it can interpret images and classify them in lots of categories. It can also detect and extract text from pictures and scanned documents. As you might expect, this could be useful in so many situations! I’ll show some cases where we’ve managed to apply this tool to our clients’ products.

Let’s understand how API works. The Google Cloud Vision API has the following interface, you can try it by dragging in an image:

Google Cloud Vision API Interface

Analyze the results we got when submitting this image:

Google Cloud Vision API Data Results

As shown, the API can interpret different aspects of the image, like objects, categories, dominant colors, and safe search. Of course, you can obtain these results on a JSON file to process it inside your application.

Case 1: Inspiration Board

A few years ago, a large international group of clothing stores requested Cinq to create a digital product for them — an application to control the whole clothing management process, from the creative design stage to arranging slots of collections in the stores. The design process of a new collection starts by gathering inspiring material in a type of frame called “Inspiration Board.” Designers can add images of colors, objects, and fabric samples to it whenever they wish to simulate a new collection. They used to create this board physically, but in addition to being a manual and long process, they weren’t able to properly share it with other people, like remote designers. Therefore, one of the features we developed to solve this problem was the online Inspiration Board. As they had a huge source of images of clothes, colors, objects, etc., we submitted this material to the Vision API to get a wide range of categories and tags. If they needed inspiring ideas for a collection with an Autumn theme, for example, they could type matching words and get an inspiration board like this:

Google Cloud Vision API generate moodboard by AI

They could edit this board by adding new pictures from other surveys, through a collaborative process with other designers over the world.

Case 2: Accreditation process for food establishments

One of our clients — a financial service company specialized in benefits – offers meal cards (a widespread practice in Brazil) to company employees as one of their main services. Food establishments can be accredited to accept these cards as payment for the meals.

They hired Cinq to make the accreditation process easier and faster, getting it as digital as possible. One of the stages of the process requires a technician visit to verify the food establishment, validating criteria such as having a physical and proven address and a commercial facade. This could cause a significant delay due to the technician’s availability and the limited area served.

By using the Google Cloud Vision API, we were able to reduce technician visits by 85%, as the establishments themselves could submit pictures of their place to a website or app to validate the criteria. Our solution could verify images by comparing them with expected tags like #facade, #restaurant, #building, #food, etc. See these examples:

Percentage of tags according to Google Cloud Vision API scan of images in the web

In some cases, we could also compare the address given with the one obtained through Google location and dismiss the technician visit if it matched. At the end of the process, this solution saved our client time and money. What a great achievement, isn’t it?

Case 3: Using the OCR tool to give fast feedback about identification documents

This case is in full development right now. Our client, a large telephone operating company in Latin America, is creating a product to offer personal loans to their customers. We are converting it into a complete digital experience, removing unnecessary bureaucracies, personal contacts, and process slowness. One step of the process requires customers to send their identification documents. However, sometimes they send wrong documents or even low-quality images. In those cases, the application will give the user fast feedback, requesting a new image or document. Notice that we are not verifying the customer’s documentation legitimacy at this point, since it will be thoroughly investigated later.

Our solution makes use of Google Vision’s OCR to identify words that will validate the document. Check this example:

Google Vision's OCR Results

As shown, words like “Republica,” “Brasil,” “identidade,” etc., help us validate it as an acceptable document. Therefore, if the user submits an incorrect picture (like a house picture or a low-quality image), we aren’t able to match it with the expected words and consequently, a new submission is required:

Low quality image for Google Vision's OCR

Google Cloud Vision API has been such a terrific tool and you can apply it to solve many of your products’ needs.

You may take it as “too technical” to get acquainted with, especially if you aren’t a developer. Nevertheless, I recommend actively studying it. Knowing tools like that gives you possibilities, alternatives, and knowledge to bring insightful options to the table. You can help your customers save money, time, and resources by preventing the reimplementation of things that are already done and available. You can quickly identify how to support the impacts of your digital products through technical mechanisms, saving time to focus on what you’ll have to develop to make sure your product works properly.

By Evellyn Zagui de Almeida – Product Owner

Aligned. Agile. Accelerated.

Cinq Technologies © 2021. All rights reserved.