Using CAPTCHA to Decipher Old Text

If you think that CAPTCHA, the squiggly lines you have to decipher in order to login or place your comments on many websites, are only there to keep out spammers, think again!

There is actually another use of the annoying feature: to correct mistakes in scanning old text. Guy Gugliotta of the New York Times explains:

For vintage 19th-century texts in English, O.C.R. programs mess up or miss 10 percent to 30 percent of the words. Only humans can fix the errors. The standard method, called key and verify, uses two transcribers to type the text independently and compares the results. This is time-consuming and extremely expensive.

But in 2006, Dr. von Ahn’s team figured out a way around this obstacle. The ubiquitous Captchas, familiar to even the most casual Web user, were the perfect tools. Captchas, short for “completely automated public Turing test to tell computers and humans apart,” are impossible for machines to decipher, but easy for humans. (The test is named for the British computer pioneer Alan Turing.)

Dr. von Ahn’s group estimated that humans around the world decode at least 200 million Captchas per day, at 10 seconds per Captcha. This works out to about 500,000 hours per day — a lot of applied brainpower being spent on what Dr. von Ahn regards as a fundamentally mindless exercise.

“So we asked, ‘Can we do something useful with this time?’ ” Dr. von Ahn recalled in a telephone interview. Instead of making Captchas out of random words printed in a woozy way, why not ask Web users to translate problem words from archival texts?


It's like a stealth Amazon Mechanical Turk, except you're not being paid $0.01 cent to do the task!

Newest 5
Newest 5 Comments

If it's really works it would be good, but it's look like something that complicated, I don't know how much old text that you can decipher with CAPTCHA, maybe if you do that, you'll find another text that dificult to read and you must Re-CAPTCHA it again and again until you never know what it is about..
Abusive comment hidden. (Show it anyway.)

It take at five matching words to confirm that it is the correct word. By putting in an incorrect word, it only requires the system to check the word one extra time to confirm it.
Abusive comment hidden. (Show it anyway.)
It knows ONE word. You have to get that one correct, then the second word it doesn't know. It wants you to tell it what that word is. But it won't know if it isn't correct, which is why you can put whatever you want in.
Abusive comment hidden. (Show it anyway.)
Login to comment.

Email This Post to a Friend
"Using CAPTCHA to Decipher Old Text"

Separate multiple emails with a comma. Limit 5.


Success! Your email has been sent!

close window

This website uses cookies.

This website uses cookies to improve user experience. By using this website you consent to all cookies in accordance with our Privacy Policy.

I agree
Learn More