Welcome to our review of the recognition capabilities of two Amazon Web Services AI / ML technologies – Amazon Textract and Amazon Rekognition’s Detect Image Text function. Amazon Web Services supplies world-class AI / ML services, but sometimes it can be difficult to determine which service is best for your use case.
In this post, we’ve done a comparison of Textract and Rekognition to try to understand the best use cases for each. As Amazon explains it, Textract is a “fully managed machine learning service” dedicated to extracting text and other data from documents and images, while Rekognition is a comprehensive object recognition service which can also recognize text in images with the RekognitionImageDetectText call.
We’ll be comparing real-world results between the two services to define the peak performance areas for both and bring out any significant handicaps which could impact the success of your implementations. In all test cases we used the LINE response from both RekognitionDetectImageText and Textract to process the image.
Example 1. Movie Collection Cover
This sample is an image from a legacy asset (a movie collection). Although the overlaid title is crisp and clear, the shadow and blur around the letters is close enough in tone to potentially present recognition problems. The rest of the image is of dated quality and to add more challenge, some of the visible text is heavily stylized.
Text Recognition Results
RekognitionDetectImageText | Textract |
---|---|
OEZ (From the “Smokey” part of header) and (From the header) THE (from the header) CB (from the small CB radio) And it (From the “Bandit” in the header) THRIS RIRL (From “TRANS AM”) BANONE (From the license plate) |
The (from the header) RII (from the “AM” above the license plate in “TRANS AM”) BAN ONE (From the license plate) |
As you can see, despite the poor quality of the image, RekognitionDetectImageText successfully detected entire words that were presented in a clear font. Stylized text without clear color contrast with its background presented much more of a problem, including the overlaid title text, which one would expect to be more easily read.
In this test, although the results included many inaccuracies, Rekognition definitely did better in recognizing more words, including correctly recognizing letters embossed into a small image (“CB”).
Example 2. Comic Book Cover
This sample is of an artwork using many fonts with different boldness and on different color backgrounds (including a reverse-color logo). The main text has bright colors, outlined edges and colored shadows, but is overlain with other imagery. This kind of sample should challenge any artificial recognition technology.
Text Recognition Results
RekognitionDetectImageText | Textract |
---|---|
MARVEL LEGACY HOME OF THE BRAUE PART 1 695 MARK WAIO MATTHEW CHRIS SAMNEE WILSON CALRAL (CAPTAIN) AAMICA (AMERICA) SAWWEE’17 (Bottom right corner) MIN (Bottom right corner) |
LEGACY HOME OF THE BRAVE PART 1 695 MARK WAID CHRIS SAMNEE MATTHEW WILSON |
Both services recognized many words, with Textract winning the name recognition prize. However, Textract was unable to recognize the stylized text samples, whereas RekognitionDetectImageText did pick those up as text. But Rekognition’s results were very inaccurate, and even the more recognizable text was jumbled, whereas Textract listed them perfectly.
We have to say that neither service was fully up to the challenge of this very difficult test, but Textract is ahead for producing much more usable output.
Example 3. Street Signs
Imagery of road signs is a very applicable use case for real-world AI and machine learning technology. While the image is high-quality, the conditions under which it was taken create some slight fuzziness overall, and the lettered surfaces are all angled with respect to the camera. (*Note that there are two sets of signs, one foreground and one background on the bottom right.)
Text Recognition Results
RekognitionDetectImageText | Textract |
---|---|
Grennan Rd (from stop sign facing us) Brace Rd (from stop sign facing us) STOP BRACE RD (from small stop sign in lower right) GRENNAN RD (from small stop sign in lower right) |
Rd (from Grennan Rd stop sign facing us) Grennan (from stop sign facing us) Brace Rd (from stop sign facing us) STOP BRACE RD (from small stop sign in lower right) GRENNAN RD (from small stop sign in lower right) |
Here both services do very well, but RekognitionDetectImageText delivered perfect accuracy, while Textract missed the name of one of the streets entirely. Interestingly, although all the signs’ letters are in capital-case, both services decided to sentence-case the street names from the set of signs in the foreground, but output capital-case for the much smaller signs in the background.
Example 4. Posing with License Plate
This is a very difficult sample for recognition, as the license plate is rotated to an odd angle and also bent into a slight curve.
Text Recognition Results
RekognitionDetectImageText | Textract |
---|---|
NO (from the top of the license plate) OUTATIME |
*Textract found nothing on this one. |
Here RekognitionDetectImageText, as it is based on a pure object recognition technology, is able to correctly read the large text despite its skewed perspective, and even pick up a couple of letters from the much smaller and blurrier text. Textract, however, came up completely blank on this one.
Example 5. Angled Ad Copy
This sample represents a very common use case for online imagery. The text is angled and while the outlines are sharp, there’s a lined 3D effect applied to the text which blurs the demarcation between text and background.
Text Recognition Results
RekognitionDetectImageText | Textract |
---|---|
AI THE BEST WAY TO SLANT OR SKEW TEXT IN ILLUSTRATOR |
*Textract couldn’t find any text in this at all. |
Again RekognitionDetectImageText shows its strength in recognizing objects in imagery, returning 100% complete and accurate results. Textract unfortunately did not recognize any text in this image.
Example 6. Level Ad Copy
This sample uses the same stylized text as the Angled Ad Copy sample but aligns it on the level this time.
Text Recognition Results
RekognitionDetectImageText | Textract |
---|---|
THE BEST WAY TO SLANT OR SKEW TEXT IN ILLUSTRATOR |
*Textract still couldn’t find anything here. |
We have unexpected results from this one. While RekognitionDetectImageText again perfectly recognized the text, Textract was again unable to recognize any part of it. This is unusual as Textract has been able to recognize level, stylized text in other examples. There is sufficient contrast in the color scheme to differentiate the text. Perhaps the lined 3D effect caused issues, but the cause of Textract’s poor performance on these last two samples is currently unknown.
Conclusions
Textract has a proven track record with scanned documents or similar imagery , which is natural because that is the environment it is designed for. RekognitionDetectImageText and Textract performed comparably well on most samples with Textract doing slightly better with imagery similar to scanned documents. But Rekognition is the clear winner at recognizing stylized text on colored backgrounds, from comic book covers to pictures of real-world objects with lettering.
Overall we can say that RekognitionDetectImageText has the wider range of applicability and better peak performance in specialty use cases, while Textract is a more reliable producer in its area of competence.
Thank you for taking this journey with us! We hope you found the information useful in your decisionmaking about text recognition technologies.
The Nomad Team