jump to navigation

Getting Spammed? Help Scan a Book! February 6, 2008

Posted by JR Dixey in news.
Tags: , ,
comments closed

Humans are apparently much better than machines at decoding words than OCR scanners are, so Carnegie Mellon University is putting the unreadable words online for the world to decipher. All in the interest of enhancing their digitizing efforts for the Internet Archive.

They’ve set up ReCAPTCHA, a free CAPTCHA service that gives webmasters the opportunity to add spam-defeating interfaces to websites. What’s the connection? Well, you’ve seen those small forms that force you to type in a word in order to successfully submit? On a ReCAPTCHA form, there is a second word in the CAPTCHA image that an OCR scanner couldn’t read well enough to decipher while scanning a book for the Archive.

If a website user decodes the first word successfully, the system assumes that they also decoded the second word, which becomes a candidate for being marked as deciphered. The system sends the second word to a second tier of CAPTCHAs, and if all of the second set of CAPTCHAs come up with the same reading, it is considered decoded and sent back to the database.

Their tagline? STOP SPAM. READ BOOKS.