An overview of the Google Cloud Vision API by a Product Manager
First of all, let me tell you: I’m not a developer. I’m a Product Manager, and I don’t have a technical background. This article aims to present a high-level concept of the Google Cloud Vision API and its use to strengthen some of our digital products.
Being succinct, The Google Cloud Vision API is a tool to decipher images. Through advanced models of machine learning, it can interpret images and classify them in lots of categories. It can also detect and extract text from either text within pictures or documents’ photos. As you can presume, this could be useful in so many situations! I’ll show some cases we’ve managed to apply this tool to our customers’ products.
Let’s understand how does this API work. The Google Cloud Vision API has this interface, so you can smoothly try it by dragging in an image.
Examine the results we got when submitting this image:
As shown, the API can interpret different aspects of the image, like objects, categories, dominant colors, and safe search aspects. Of course, you can obtain these results on a JSON file to process it inside your application.
Case 1: Inspiration Board
A few years ago, we were requested to create a digital product for this massive international group of clothing stores. It was an application to control the whole clothing management process, from the creative design stage to arranging slots of collections in the stores. The design process of a new collection starts with a reunion of inspiring material in a sort of frame called “Inspiration Board.” Designers could add images of colors, objects, and fabric samples to it whenever they desired to stimulate a new collection. They used to create this board physically. But, beyond being a manual and prolonged process, they wouldn’t be properly able to share it with other persons, like remote designers, for instance. Therefore, one of the features we developed to solve this problem was this online Inspiration Board. Once they had this huge source of images of clothes, colors, objects, etc., we submitted all these materials to the Vision API to get a wide range of categories and tags. Whether they needed inspiring ideas for a collection with the theme “Autumn,” for example, they could type matched words and get an inspiration board like this:
They could edit this board by adding new pictures from other surveys, through a collaborative process with other designers over the world.
Case 2: Accreditation process for food establishments
We have this great customer: a financial service company specialized in benefits. One of their leading services is offering meal ticket-cards to company employees – a widespread practice in Brazil. Food establishments can be accrediting to accept these cards as payment for the meals.
We were hired to make the accreditation process easier and faster, getting it as digital as possible. One of the stages of the process demands a technician visit to verify the food establishment place, validating criteria like having a physical and proven address and a commercial facade. This could cause a significant delay due to the technician’s availability, besides the limited capacity territorial attending.
By using the Google Cloud Vision API, we were able to avoid 85% of technician visits, once the establishment’s owner himself could submit pictures of his place in a website or app to validate the criteria. Our solution could diagnose images by comparing them with expected tags like #facade, #restaurant, #building, #food, etc. See these examples:
In some cases, we could also compare the given address by the establishment’s owner with the one obtained through the Google localization and dispense the technician visit if it matches. At the end of the process, this solution brought a high cost and time reduction to our client. What a great achievement, isn’t it?
Case 3: Using the OCR tool to give fast feedback about identification documents
This case is in sheer development right now. Our customer, this huge telephone operator in Latin America, is creating a product to offer personal loans to their clients. We are converting it into a complete digital experience, removing unnecessary bureaucracies, personal contacts, and process slowness. One step of the process requires sending the customer’s identification documents. However, sometimes customers send wrong documents or even low-quality images. In those cases, the application shall give the user fast feedback, demanding a new image or document. Notice that we are not treating customer’s documentation legitimacy at this point, since these documents will be thoroughly investigated posteriorly.
Our solution resorts Google Vision’s OCR to identify words we expect to prove it as a valid document. Check this example:
As shown, words like “Republica,” “Brasil,” “identidade,” etc., help us to prove it as an acceptable document. Therefore, if the user submits a wrong picture – like a house picture, or a low-quality image, we aren’t able to match it with predictable words and consequently, require a new sending:
Google Cloud Vision API has been such a terrific tool and you can apply it to solve plenty of your product necessities.
Maybe you may take this as “too much technical” to be aware of, especially if you aren’t a developer. Nevertheless, I recommend actively studying it. Knowing things like that gives you possibilities, alternatives, and knowledge to bring insightful options to the table. You can help your customers to save money, time, and resources, avoiding re-implementing things already done and available. You can quickly identify how to support the impacts of your digital products through technical mechanisms, saving time to focus on what you’ll have to develop to make sure your product works properly.
By Evellyn Zagui de Almeida – Product Owner