Part 7: Image Recognition and Surveillance

Can AI fill the gap left by human weaknesses?
And how far is Orwell from now?

Uwe Weinreich, the author of this blog, usually coaches teams and managers on topics related to strategy, innovation and digital transformation. Now he is seeking a direct confrontation with Artificial Intelligence.

The outcome is uncertain.

Stay informed via Twitter or linkedIn.

Already published:

1. AI and Me – Diary of an experiment

2. Maths, Technology, Embarrassement

3. Learning in the deep blue sea - Azure

4. Experimenting to the Bitter End

5. The difficult path towards a webservice

6. Text Analysis Demystified

7. Image Recognition and Surveillance

8. Bad Jokes and AI Psychos

9. Seven Management Initiatives

10. Interview with Dr. Zeplin (Otto Group)

I must admit it. Sometimes, I'm not very good at recognizing people. It is not about friends, family and close contacts, but about the many more distant acquaintances. OK, whenever I forget my glasses, that mistake explains everything. But even with glasses on it happens to me from time to time that I don't recognize people, confuse them with someone else, or place them in the wrong context. That's annoying and embarrassing.

Artificial intelligence has already shown that it is apparently good at recognizing people. Let's see how it works.

Graeme was kind enough to make himself available as a victim. Thanks Graeme! We're working with this picture:

azure-17

Here it is transformed to grey scale already. Originally it was colored, but for image analysis grey values would be better. Why? A gray scale image is a data set with three dimensions: x-axis, y-axis and brightness of each pixel. Colored images have four dimensions. Each pixel is additionally defined by the values for red, green (rgb) and blue, or cyan, magenta, yellow and black/key (cmyk). Of course, it is easier to calculate with one dimension less.

First, the image is normalized. We already know this from text analysis: cleaning and counting. Again, there is a nice bar chart, this time not of the word frequencies, but of the brightness values in the image. The first graph shows the absolute frequencies per brightness value and the second one the cumulated ones.

azure-18 azure-18b

The image is easier to process if the cumulated values result in a straight curve from bottom left to top right. Achieving that by transforming the image with a computer is a trifle:

azure-18c azure-18d

Graeme also looks a lot more brilliant and richer in contrast now.

How does recognition work?

So far, we have only played a little with the pixel values. Exercises, which probably every hobby photographer masters with his image software. The real task, however, is to recognize what is in the picture. This requires analysis.

The image we now have is richer in contrast and therefore better suited for detecting edges. For humans this is an intuitive and easy exercise. We can see exactly where the jacket ends and the T-shirt begins. The computer initially only sees a three-dimensional data matrix. It can only detect edges if it relates the single pixels to each other, i.e. examines small pixel fields for differences. This is done using the Sobel operator. How it exactly works is more like something for geeks. An advantage is that the procedure does not overstrain the computer and that it works quite well. Here's the result:

azure-19

It's a little creepy. It becomes more difficult for us and much easier for computers to recognize someone. What is still needed is pattern recognition, which spots the characteristic features of a face within the image and makes them available for further processing. Azure has its own face recognition module (Face API) for this purpose. On Graeme's picture the following section was identified:

azure-22

Not bad. But a computer additionally needs to be able to recognize a face when the image is different, the head posture differs, and perhaps the appearance as a whole varies. You guessed it. To do this, the machine must learn again which characteristic features belong to which person. Azure's face recognition masters this with alarming accuracy, as the comparison with this portrait of Graeme's youth shows:

azure-23

Scoring probability: 53%. That's not perfect, but it's amazing.

Training leads to perfection

Not only Microsoft, but also Google, Amazon and IBM train their AI systems with millions of images. They become better and better at recognizing people and things, not only cats, but also faces, objects, situations, etc. If AI analysis Graeme's first image not only with faces, but also with the knowledge that the AI has acquired about Graeme and the meaning of image patterns, the following result is obtained within milliseconds:

Graeme wearing a suit and tie smiling at the camera

That's not quite right. He doesn't wear a tie. But it is very close and kinder than if the algorithm had analyzed the triangular structure under the chin as a double chin.

About the following picture the recognition software says:

a crowd of people watching a football game

The management perspective: a huge, untapped potential

With these examples, Artificial Intelligence penetrates far into the field of skills and activities that were previously reserved for humans. Of course, AI still does not "understand" like a human does. Nevertheless, the technology is able to solve corresponding tasks even without understanding: Analyzing, classifying, categorizing data, determining probabilities, and deducting reactions from them. This is enough to allow AI to carry out many routine tasks, often with even greater precision and without any sign of fatigue. Google Deep Mind, for example, is now so good at lip-reading that it is superior to trained people, reports Wired.

Every company that has standardized or standardizable processes will have to deal with artificial intelligence in the next few years. The potential is too huge to be left to competitors. The effort to introduce and use AI has become even easier through services like AWS AI, Azure, Google AI and Watson than traditional IT development.

The perfect time to – at least – test AI in a business environment is right now. The potential is immense and by no means elevated. In the months and years to come, many companies will certainly enter the market with fascinating solutions.

How close is Orwell?

The year 1984 is long behind us. Back then, AI was still in its infancy. In the meantime, however, it has developed to such an extent that people can be identified relatively well through image recognition, even if the images differ considerably. Situations can be recognized as well, not only football stadiums, but also accidents, threatening situations in crowds and much more.

If you combine image recognition with speech recognition, then it is no longer a utopia to automatically transcribe a video of a discussion, to recognize people and to assign the respective text passages to them. It would also be possible to install and train video surveillance in hot spots, such as underground stations in Berlin. If it is well trained, it could automatically trigger an alarm if someone was attacked. No one needs to stare at screens anymore and video recordings no longer need to be stored in masses, but only for these rare moments when a critical situation arises.

The technology is affordable. However, the examples also make clear that the benefit and harm of such AI applications depend extremely on who uses data from images and sounds with which intention and for what purpose. Detecting and sorting out spoiled raw materials automatically in a flash with the help of AI in food production? Wonderful. Monitoring people in all areas of life without gaps and creating profiles that are more durable than the persons themselves? Scary. Perhaps we should ask ourselves the following questions more often:

Google, Facebook, Amazon and others currently love collecting photos and videos and offer a lot of convenience, such as automatic keywording. This works quite well with AI and seems practical at first. It inevitably results in huge amounts of data about the persons who are depicted. Also about people who do not know that their pictures end up in such data collections. Do we really want to upload every image to cloud storage?
Especially social media have a lot of knowledge about us and try to stimulate us to keep the security settings as low as possible. Are we really aware of all the consequences?
Websites also collect data via trackers (our sites don't!). Script and ad blockers limit or prevent this. Yes, they are annoying because you always allow for certain scripts when pages are not displayed correctly. But isn't it worth the effort after all?

At a time when data is considered the "oil of the 21st century", nothing is free. Everything is paid for, sometimes very, very expensively - with our data.

I have nothing against Artificial Intelligence and I am sure that it will bring a lot of comfort into our lives, solve many problems - e.g. also in medicine - and become a permanent companion. AI is not just going to disappear again.

With a sharp kitchen knife, a chef can create ingenious culinary delights and a murderer can kill people. AI is a tool like any other and we should pay very close attention to how it is used. Let us design the future with AI and do not let it get over us.

⬅ previous blog entry