Despite all the hype around machine learning, technology does a good job of interpreting and matching the words humans use in search query descriptions of images, but they don't do as well as other humans.
The findings from the study, "Image Recognition Accuracy Study," published Tuesday by Perficient Digital, tested the image recognition capabilities of Amazon AWS Rekognition, Google Vision, IBM Watson (now part of A.S. Watson Group), and Microsoft Azure Computer Vision.
The research measured the raw accuracy rates of descriptions. The data shows how machines, algorithms and technology interpret the way humans describe images in search queries. About 3,000 images were used in four categories: Charts, Landscapes, People and Products.
For image descriptions, there are human hand-tagged descriptions and machine-tagged descriptions. Human-tagged image descriptions scored much higher overall, about 573. But Google won the highest scores in three of the four categories used in the analysis for machine tagged descriptions.
Eric Enge, general manager at Perficient Digital, the author of the study, explains in an email to Search Marketing Daily that "each engine assesses what it believes to be the chances it’s predicted tag is accurate."
"This is called the confidence level, so when there's 90% confidence, it means they strongly believe the tag is accurate," he explains. "A 70% confidence level means they have less belief the tag is accurate."
The scores for Amazon, Google, and Microsoft came in higher than human tagged images. Google Vision scored 92.4% using six tags. Microsoft Azure followed with 90% for eight tags. Amazon Rekognition came in at No. 3 with 88.7% using 17 tags. Humans came in at 87.7% using 5 tags. And Watson came in last with 69.3% using 3 tags.
The vocabulary of each engine varied greatly, Enge said.
In the top five words by platform, the human tagged images were trees, sky, woman, blue and snow. AWS Rekognition words were person, human, outdoors, nature and plant. Google Vision focused on Sky, product, food, tree and nature. Watson focused on color, person, building, nature and food. Microsoft Azure was outdoor, person, indoor, sky and nature.
The types of words each engine associated best and the types of words they can describe are also interesting.
For instance, while Google Vision and Microsoft Azure Computer Vision mentions “yellow” frequently, none match Watson’s love for yellow or red. Watson mentions far more reds, such as Alizarin red, dark red, claret red, and Indian red. The technology also likes lemon yellow, pale yellow, jade green steel blue, and ash grey.
Watson also loves descriptive words and likes to add context around them. While Watson loves dogs, Google Vision loves cats, followed by Microsoft Azure, according to the findings.