An Overview of AWS Rekognition

This article has been cross posted to LinkedIn and Medium]. You can listen to this article, crested using AWS Polly.

Facial recognition is all the buzz these days, with concerns over privacy and the potentially inappropriate use of facial recognition to identify criminals, those entering the country illegally, and possibly identifying people who just have unpaid parking tickets. Add concerns with law enforcement storing images of potentially innocent people for future matching, and the level of concern increases [1][2].

However, image and video analysis has applications beyond law enforcement, including medical, veterinary, athletic and other applications. This article is not about the general use cases for image recognition, although I will discuss the use of image recognition in medicine. I also won’t address the social concerns about the technology. Rather, this article discuses the features and functions of AWS Rekognition for image and video analysis.

As humans, we take image recognition for granted. It is a natural process extending from our emergence as a baby throughout our entire life. We can easily recognize objects, distinguish between similar objects, and identify the unique characteristics which tell us this is our pet versus a similar animal. Even as babies, we can recognize faces as Mom and Dad, even though we may not know specifically who they are or be able to verbalize that recognition.

As reported by the Brain Blogger, “The brain identifies items within milliseconds and this fast recognition works because our brain is continually making predictions about objects in our field of view and then comparing these with incoming information”How the Brain Recognizes Faces. This means image recognition is as much knowing or being able to recall sufficient information from our memory to positively identify Grandma, or being able to perceive/predict the information about these objects to know a car accident is about to happen.

This sets the stage for the complexity of image recognition in both still image and video. The artificial intelligence engines may learn enough to identify an object, but there is a confidence factor involved, because there may be a subtle nuance which has not been learned to distinguish between two people or objects with 100% certainty.

If we provided a definition of image recognition, it would be “Image recognition from a technology perspective, refers to a component of technology called computer vision, and refers to technologies able to identify places, logos, people, objects, buildings from an image, whether that image be from a photograph, video, or analyzed in real time from a camera.

Amazon Rekognition is an image and video analysis solution is a product in the Artificial Intelligence/Machine Learning category which uses machine deep learning to identify objects in an image. It is a highly scalable solution capable of quickly analyzing the image and identifying the objects in the image or video.

It requires no knowledge of machine learning to use, and behind the scenes Amazon’s computer vision scientists are constantly analyzing new images and videos, improving upon the deep learning model and adding new features to the service.

Unlike other AWS services, there is no console interface to Rekognition, beyond the demos. While the demos can be used for a test case, they are not practical uses of the service.

Setting up Rekognition involves having access to AWS, downloading the Software Development Kit (SDK) for the desired programming language and implementing your code.

Rekognition can analyze both still frame images and video.

Rekognition for Video includes these features:

  • Real-time analysis of streaming video;
  • Person identification and pathing;
  • Face recognition;
  • Facial analysis;
  • Objects, scenes and activities detection;
  • Inappropriate video detection; and,
  • Celebrity recognition.

Rekognition for Images includes these features;

  • Object and scene detection;
  • Facial recognition;
  • Facial analysis;
  • Face comparison;
  • Unsafe image detection;
  • Celebrity recognition; and,
  • Text in Image.

Image analysis is limited to JPEG and PNG formats.

Although there is some limited interaction with AWS Rekognition in the AWS Console, implementing Rekognition is practically implemented through the SDK.

When using the SDK, image and video can be submitted directly through the API call, or by passing a URL to an object stored in S3. If submitting the image bytes to the API, the image must be Base64 encoded, which is performed automatically for you in some languages. If passing an S3 Object, then it is not necessary to perform the the Base64 encoding.

Here is an example — we are going to use object detection to identify the elements in the following image

Image for post

The code we will be using is provided by AWS to demonstrate the use of the API.

Our code is

import os
import sys
import boto3

def detect_labels_local_file(photo):
# Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.
# PDX-License-Identifier: MIT-0 (For details, see https://github.com/awsdocs/amazon-rekognition-developer-guide/blob/master/LICENSE-SAMPLECODE.)

client = boto3.client('rekognition')

with open(photo, 'rb') as image:
response = client.detect_labels(Image={'Bytes': image.read()})

print('Detected labels in ' + photo)
for label in response['Labels']:
print (label['Name'] + ' : ' + str(label['Confidence']))

return len(response['Labels'])

def main():
photo='dscf5559.JPG'

label_count=detect_labels_local_file(photo)
print("Labels detected: " + str(label_count))

if __name__ == "__main__":
main()

When we execute this in shell using Python3, the results are:

$ python3 rekog1.py
Detected labels in dscf5559.JPG
Walkway : 99.99960327148438
Path : 99.99960327148438
Sidewalk : 99.81764221191406
Pavement : 99.81764221191406
Person : 99.6449966430664
Human : 99.6449966430664
Cobblestone : 92.26229095458984
Furniture : 87.95263671875
Bench : 87.95263671875
Porch : 82.72274780273438
Road : 59.49223327636719
Patio : 55.43474197387695
Labels detected: 12
$

Relatively simple to make work, and it did a pretty good job of identifying the objects in the image.

Let’s also try the Celebrity recognition API. Again, the AWS Rekognition documentation has some sample code we can use for this example. Here is our image, the infamous J.R. Ewing from the TV series “Dallas”, as played by Larry Hagman.

Image for post

The code we are going to execute is

#Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.
#PDX-License-Identifier: MIT-0 (For details, see https://github.com/awsdocs/amazon-rekognition-developer-guide/blob/master/LICENSE-SAMPLECODE.)

import boto3
import json

def recognize_celebrities(photo):


client=boto3.client('rekognition')

with open(photo, 'rb') as image:
response = client.recognize_celebrities(Image={'Bytes': image.read()})

print('Detected faces for ' + photo)
for celebrity in response['CelebrityFaces']:
print ('Name: ' + celebrity['Name'])
print ('Id: ' + celebrity['Id'])
print ('Position:')
print (' Left: ' + '{:.2f}'.format(celebrity['Face']['BoundingBox']['Height']))
print (' Top: ' + '{:.2f}'.format(celebrity['Face']['BoundingBox']['Top']))
print ('Info')
for url in celebrity['Urls']:
print (' ' + url)
print
return len(response['CelebrityFaces'])

def main():
photo='PB086173.JPG'

celeb_count=recognize_celebrities(photo)
print("Celebrities detected: " + str(celeb_count))


if __name__ == "__main__":
main()

Executing the code with our image, and we get a response of

$ python3 rekog2.py
Detected faces for PB086173.JPG
Name: Larry Hagman
Id: 1y3D0N
Position:
Left: 0.47
Top: 0.23
Info
www.imdb.com/name/nm0001306
Celebrities detected: 1
$

These two examples demonstrate the relative simplicity of interacting with AWS Rekognition, although there are many more capabilities I have not discussed in this article.

The cost of Rekognition is based upon still image or video, the amount of image/video and face metadata stored and can vary by region. Free tier users can also try out Rekognition at a lower cost.

Free tier users can analyze 1,000 minutes of video, and 5,000 images with 1,000 face metadata entries per month for the first 12 months. This is a great time to experiment with Rekognition and see how it can work in your application.

Once the free tier is over, pricing is based upon the image type, region and quantity.

Video analysis is billed per second, but charged per minute, and there is a fee for every 1,000 face metadata entries per month. The storage charges for this metadata is pro-rated for partial months.

If you are processing millions of images every month, it can get expensive, but would still be cheaper than “reinventing the wheel” and maintaining the solution. Additionally, AWS Rekognition has a significant database behind it, which would have to be created in other situations.

One “catch” to the pricing model involves the different API calls for different features, like DetectLabel and DetectFaces. If you needed 1,000 images analyzed by both APIs, that would equal 2,000 images processed.

Image recognition typically gets a bad reputation for its use in facial recognition systems, but there are a plethora of uses in medicine. These include x-ray analysis, mammography and supporting blind or low-vision people live independent lives.

According to the U.S. National Cancer Institute, “breast cancer is the most common cancer in women and the second leading cause of cancer deaths” [3]. Like other image recognitions systems, early Computer Aided Detection (CAD) systems used in detecting possible breast cancers resulted in some women having their cancer misdiagnsed[3] and receiving treatments they didn’t need or not having their cancer detected early enough to prevent other complications.

Medical research is still underway to improve upon these CAD systems to incorporate deep learning and image recognition to improve the accuracy in cancer detection4.

Blind and visually impaired people have a unique challenge in navigating the world. They sometimes need to identify an object or find an object around them, or detect when an obstacle which could pose harm to them is in proximity.

There are devices like the WeWalk Smart Cane which builds upon the traditional white cane, but more advanced object detection is capable using the individual’s phone, and Rekognition. Applications like Microsoft’s Seeing AI, while not driven by Rekognition, is a good example of how using the smartphone camera can help a visually impaired person navigate and interact with their surroundings. This application could be enhanced with the additional capabilities of AWS Rekognition.

This article examined what image recognition is, why it is hard to implement within a machine, what AWS Rekognition is and several examples of applications within one field.

However, the possible uses for AWS Rekognition extend well beyond what I have discussed here. Aside from medicine, Rekognition could be used in autonomous vehicles, robotic vision, detecting unsafe content in user generated content, comparing faces for identification, surveying, law enforcement, and more.

Anyone with a need to analyze images or video needs to seriously evaluate the capabilities of AWS Rekognition for their application.

[1]Center for Democracy and Technology

[2]Face Off: Law Enforcement Use of Face Recognition Technology

[3]How Deep Learning Could Catch Breast Cancers that Mammograms Miss

[4]Detecting and classifying lesions in mammograms with Deep Learning

Analyzing an Image loaded rom a Local Filesystem

AWS Rekognition

AWS Rekognition Pricing

How the Brain Recognizes Faces

Recognizing Celebrities in an Image

Seeing AI by Microsoft

WeWalk Smart Cane

Chris is a highly skilled Information Technology AWS Cloud and Security Professional bringing cloud, security and process engineering leadership to simplify and delivery high quality products. He is the co-author of more than seven books and author of more than 70 articles and book chapters in technical, management and information security publications. His extensive technology, information security and training experience, makes him a key resource who can help companies through technical challenges.

Copyright 2019, Chris Hare

Written by

Chris is the co-author of seven books and author of more than 70 articles and book chapters in technical, management, and information security publications.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store