Building a face recognition app in under an hour

Over the weekend, I was flicking through my Amazon AWS console, and I noticed a new service on there called 'Rekognition'.  I guess it was the mangled spelling that caught my attention, but I wondered what this service was? Amazon has a habit of adding new services to their platform with alarming regularity, and this one slipped past my radar somehow.

So I dived in and checked it out, and it turns out that in late 2016, Amazon released their own image recognition engine on their platform.  It not only does facial recognition, but general photo object identification too.  It is still fairly new, so the details were sketchy, but I was immediately excited to try it out.  Long story short, within an hour, I had knocked up a quick sample web page that could grab photos from my PC camera and perform basic facial recognition on it.  Want to know how to do the same? Read on...

I had dabbled in facial recognition technology before, using third party libraries, along with the Microsoft Face API, but the effort of putting together even a rudimentary prototype was fraught with complexity and a steep learning curve.  But while browsing the Rekognition docs (thin as they are), I realised that the AWS API was actually quite simple to use, while seemingly quite powerful.  I couldn't wait, and decided to jump in feet first to knock up a quick prototype.

The Objective

I wanted a 'quick and dirty' single web page that would allow me to grab a photo using my iMac camera, and perform some basic recognition on the photo - basically, I wanted to identify the user sitting in front of the camera.

The Amazon Rekognition service allows you to create one or more collections.  A collection is simply a, well, collection of facial vectors for sample photos that you tell it to save.  NOTE: The service doesn't store the actual photos, but a JSON representation of measurements obtained from a reference photo.

Once you have a collection on Amazon, you can then take a subject photo and have it compare the features of the subject to its reference collection, and return the closest match.  Sounds simply doesn't it?  And it is.  To be honest, coding the front end of this web page to get the camera data actually took longer than the back end to perform the recognition - by a factor of 3 to 1 !!

So, in short, the web page lets you (1) create or delete a collection of facial data on Amazon, (2) upload face data via a captured photo to your collection, and (3) compare new photos to the existing collection to find a match.

Oh, and as a tricky extra (4), I also added in the Amazon Polly service to this demo so that after recognising a photo, the page will broadcast a verbal, customised greeting to the person named in the photo!

The Front End

My first question was what library to use to capture the image using my iMac camera.  After a quick Google search, I found the amazing JPEG Camera library on GitHub by amw, which allows you to use a standard HTML5 canvas to perform the capture, or fallback to a Flash widget for older browsers.  I quickly grabbed the library, and modified the example javascript file for my needs.

The Back End

For the back end, I knocked up a quick Sinatra project, for a lightweight Ruby based framework that could do all the heavy lifting with AWS.  I actually used Sinatra extensively (well, Padrino actually) to build all my web apps, and highly recommend the platform.

Note: Amazon Rekognition example actually promote uploading the source photos used in their API to an Amazon S3 bucket first, then processing them.  I wanted to avoid this double step and send the image data directly to their API instead, which I managed to do.

I also managed to do a similar thing with their Polly greeting.  Instead of saving the audio to an MP3 file and playing that, I managed to encode the MP3 data directly into an <audio> tag on the page and play it from there!

The Code

I have placed all the code for this project on my GitHub page.  Feel free to grab it, fork it and improve it as you like.  I will endeavour to explain the code in more detail here.

The Steps

First things first, you will need an Amazon AWS account.  I won't go into the details of setting that up here, because there are many articles you can find on Google for doing so.

Creating an AWS IAM User

But once you are set up on AWS, the first thing we need to do is to create an Amazon IAM (Identity & Access Management) user which has the permissions to use the Rekognition service.  Oh, we will also set up permissions for Amazon's Polly service as well, because once I got started on these new services, I could not stop.

In the Amazon console, click on 'Services' in the top left corner, then choose 'IAM' from the vast list of Amazon services.  Then, on the left hand side menu, click on 'Users'.  This should show you a list of existing IAM users that you have created on the console, if you have done so in the past.

Click on the 'Add User' blue button on the top of this list to add a new IAM user.

Give the user a recognisable name (more for your own reference), and make sure you tick 'Programmatic Access' as you will be using this IAM in an API call.

Next is the permissions settings.  Make sure you click the THIRD box on the screen, that says 'Attach existing policies directly'.  Then, on the 'Filter: Policy Type' search box below that, type in 'rekognition' (note the Amazonian spelling) to filter only the Rekognition policies. Choose 'AmazonRekognitionFullAccess' from the list by placing a check mark next to it.

Next, change the search filter to 'polly', and place a check mark next to 'AmazonPollyFullAccess'.

Nearly there.  We now have full permission for this IAM for Amazon Rekognition and Amazon Polly.  Click on 'Next: Review' on the bottom right.

On the review page, you should see 2 Managed Policies giving you full access to Rekognition and Polly.  If you don't, go back and re-select the policies again as per the previous step.  If you do, then click 'Create User' on the bottom right.

Now this page is IMPORTANT.  Make a note of the AWS Key and Secret that you are given on this page, as we will need to incorporate it into our application below.  

This is the ONLY time that you will be shown the key/secret for this user, so please copy and paste the info somewhere safe, and download the CSV file from this page with the information in it and keep it safe as well.

Download the Code

Next step, is to download the sample code from my GitHub page so you can modify it as necessary.  Go to this link and either download the code as ZIP file, or perform a 'git clone' to clone it to your working folder.

First thing you need to do is to create a file called '.env' in your working folder, and enter these two lines, substituting your Amazon IAM Key and Secret in there (Note: These are NOT real key details below):

export AWS_KEY=A1B2C3D4E5J6K7L10
export AWS_SECRET=T/9rt344Ur+ln89we3552H5uKp901

You can also just run these two lines on your command shell (Linux and OSX) to set them as environment variable that the app can use.  Windows user can run them too, just replace the 'export' prefix with 'set'.

Now, if you have Ruby installed on your system (Note: No need for full Ruby on Rails, just the basic Ruby language is all you need), then you can run

bundle install

to install all the pre-requisites (Sinatra etc.), then you can type

ruby faceapp.rb

to actually run the app.  This should start up a web browser on port 4567, so you can fire up your browser and go to 

http://localhost:4567

to see the web page and begin testing.

Using the App

The web page itself is fairly simple.  You should see a live streaming image on the top center, which is the feed from your on board camera.

The first thing you will need to do is to create a collection by clicking the link at the very bottom left of the page.  This will create an empty collection on Amazon's servers to hold your image data.  Note that the default name for this collection is 'faceapp_test', but you can change that on the faceapp.rb ruby code (line 17).

Then, to begin adding faces to your collection, ask several people to sit down in front of your PC or table/phone, and make sure their face is in the photo frame ONLY (Multiple faces will make the scan fail).  Once ready, enter their name in the text input box and click the 'Add to collection' button.  You should see a message that their facial data has been added to the database.

Once you have built up several faces in your database, then you can get random people to sit down in front of the camera and click on 'Compare image'.  Hopefully for people who have been already added to the collection, you should get back their name on screen, as well as a verbal greeting personalised to their name.

Please note that the usual way for Amazon Rekognition to work is to upload the JPEG/PNG photo to an Amazon S3 Bucket, then run the processing from there, but I wanted to bypass that double step and actually send the photo data directly to Rekognition as a Base64 encoded byte stream.  Fortunately, the aws-sdk for Ruby allows you to do both methods.

Lets walk through the code now.

First of all, lets take a look at the we page raw HTML itself.

https://github.com/CyberFerret/FaceRekognition-Demo/blob/master/views/faceapp.erb

This is a really simple page that should be self explanatory to anyone familiar with HTML creation.  Just a series of names divs, as well as buttons and links.  Note that we are using jQuery, and also Moment.js for the custom greeting.  Of note is the faceapp.js code, which does all the tricky stuff, and the links to the JPEG camera library.

You may also notice the <audio> tags at the bottom of the file, and you may ask what this is all about - well, this is going to be the placeholder for the audio greeting we send to the user (see below).

Let's break down the main app js file.

https://github.com/CyberFerret/FaceRekognition-Demo/blob/master/public/js/faceapp.js

This sets up the JPEG Camera library to show the camera feed on screen, and process the upload of the images.

The add_to_collection() function is straightforward, in that it takes the captured image from the camera, then does a post to the /upload endpoint along with the user's name as the parameter.  The function will check that you have actually entered a name or it will not continue, as you need a short name as a unique identifier for this facial data.

The upload function simply checks that the call to /upload finished cleanly, and either displays a success message or the error if it doesn't.

The compare_image() function is what gets called when you click the, well, 'Compare image' button.  It simply grabs a frame from the camera, and POSTs the photo data to the /compare endpoint.  This endpoint will return either an error, or else a JSON structure containing the id (name) of the found face, as well as the percentage confidence.

If there is a successful face match, the function will then go ahead and send the name of the found face to the /speech endpoint.  This endpoint calls the Amazon Polly service to convert the custom greeting to an MP3 file that can be played back to the user.

The Amazon Polly service returns the greeting as a binary MP3 stream, and so we take this IO stream and BaseEncode64 it, and place it as an encoded source link in the <audio> placeholder tags on our web page, which we can then do a .play() on the element in order to play the MP3 through the user's speakers using the HTML5 Web Audio API.

This is also the first time I have placed encoded data in the audio src attribute, rather than a link to a physical MP3 file, and I am glad to report that it worked a treat!

Lastly on the app js file is the greetingTime() function.  All this does is work out whether to say 'good morning/afternoon/evening' depending on the user's time of day.  A lot of code for something so simple, but I wanted the custom greeting they hear to be tailored to their time of day.

Lastly, lets look at the Ruby code for the Sinatra app.

https://github.com/CyberFerret/FaceRekognition-Demo/blob/master/faceapp.rb

Pretty straightforward Sinatra stuff here.  The top is just the requires that we need for the various AWS SDK and other libraries.

Then there is a block setting up the AWS authentication configuration, and the default collection name that we will be using (which you can feel free to change).

Then, the rest of the code is simply the endpoints that Sinatra will listen out for.  It listens for a GET on '/' in order to display the actual web page to the end user, and it also listens out for POST calls to /upload, /compare and /speech which the javascript file above posts data to.  Only about 3 or 4 lines of code for each of these endpoints to actually carry out the facial recognition and speech tasks, all documented in the AWS SDK documentation.

That's about all that I can think of to share at this point.  Please have fun with the project, and let me know what you end up building with it.  Personally, I am using this project as a starting block for some amazing new features that I would love to have in our main web app HR Partner.

Good Luck, and enjoy your facial recognition/speech synthesis journey.