Share on Facebook Share on Twitter Email
Answers.com

ReCAPTCHA

 

A method for soliciting help from the general public in order to assist large, text-to-computer projects that digitize thousands of old books. CAPTCHAs are the distorted words found on Web sites that users must type back in to validate that they are humans and not computers. Every day, tens of millions of CAPTCHAs are entered, creating a huge pool of human resources to draw on.

In a reCAPTCHA system, the images of words that the optical character recognition (OCR) scanner cannot decipher are dispersed to several people in the form of a CAPTCHA to get a consensus. For more information or to get reCAPTCHA code, visit www.recaptcha.net. See OCR and CAPTCHA.

A reCAPTCHA
A known word (left) is always sent with the bad word so that the reCAPTCHA serves as a valid CAPTCHA. After several people enter the same text for the bad word, the system considers the word properly converted.

Download Computer Desktop Encyclopedia to your iPhone/iTouch

Search unanswered questions...
Enter a question here...
Search: All sources Community Q&A Reference topics
Wikipedia: ReCAPTCHA
Top
The logo of reCAPTCHA

reCAPTCHA is a system originally developed at Carnegie Mellon University that uses CAPTCHA to help digitize the text of books while protecting websites from bots attempting to access restricted areas.[1] On September 16th, 2009, Google acquired reCAPTCHA.[2] reCAPTCHA is currently digitizing the archives of the New York Times.[3] Twenty years of The New York Times have been digitized and the project hopes to have the 110 other years done by 2010.[4]

reCAPTCHA supplies subscribing websites with images of words that optical character recognition (OCR) software has been unable to read. The subscribing websites (whose purposes are generally unrelated to the book digitization project) present these images for humans to decipher as CAPTCHA words, as part of their normal validation procedures. They then return the results to the reCAPTCHA service, which sends the results to the digitization projects.

The system is reported to solve 200 million captchas every day (as of September 2009),[5] and counts such popular sites as Facebook, TicketMaster, Twitter and StumbleUpon amongst subscribers.[6] Craigslist began using reCAPTCHA in June 2008.[7] The U.S. National Telecommunications and Information Administration also uses reCAPTCHA for its digital TV converter box coupon program website as part of the US DTV transition.[8]

Contents

Origin

The reCAPTCHA program originated with Guatemalan computer scientist Luis von Ahn, aided by a MacArthur Fellowship. An early CAPTCHA developer, he realized "he had unwittingly created a system that was frittering away, in ten-second increments, millions of hours of a most precious resource: human brain cycles."[9]

Operation

An example of a reCAPTCHA challenge, containing the words “following finding”.

Scanned text is subjected to analysis by two different optical character recognition programs; in cases where the programs disagree, the questionable word is converted into a CAPTCHA. The word is displayed along with a control word already known. The system assumes that if the human types the control word correctly, the questionable word is also correct. The identification performed by each OCR program is given a value of 0.5 points, and each interpretation by a human is given a full point. Once a given identification hits 2.5 votes, the word is considered called. Those words that are consistently given a single identity by human judges are recycled as control words.[10]

Implementation

reCAPTCHA tests are taken from the central site of the reCAPTCHA project as they are supplying the undecipherable words. This is done through a JavaScript API with the server making a callback to reCAPTCHA after the request has been submitted. The reCAPTCHA project provides libraries for various programming languages and applications to make this process easier. reCAPTCHA is a free service (that is, the CAPTCHA images are provided to websites free of charge, in return for assistance with the decipherment)[11], but the reCAPTCHA software itself is not open source.

Mailhide

reCAPTCHA has also created project Mailhide[12] which protects email addresses on Web pages from being harvested by spambots. The email address is converted into a format that does not allow a crawler to see the full email address. For example, “noreply@example.com” would be converted to “nor...@example.com”. The visitor would then click on the “...” and solve the CAPTCHA in order to obtain the full email address.

Notes

  1. ^ Luis von Ahn, Ben Maurer, Colin McMillen, David Abraham and Manuel Blum (2008). "reCAPTCHA: Human-Based Character Recognition via Web Security Measures" (PDF). Science 321 (5895): 1465-1468. doi:10.1126/science.1160379. http://www.cs.cmu.edu/~biglou/reCAPTCHA_Science.pdf. 
  2. ^ "Teaching computers to read: Google acquires reCAPTCHA". Google. http://googleblog.blogspot.com/2009/09/teaching-computers-to-read-google.html. Retrieved 2009-09-16. 
  3. ^ "Learn more". reCAPTCHA.net. http://recaptcha.net/learnmore.html. Retrieved 2008-11-23. 
  4. ^ Luis von Ahn. (2009). NOVA ScienceNow s04e01. [Television production]. Event occurs at 46:58. "The New York Times has this huge archive, over 130 years of newspaper archive there. And we've done maybe about 20 years so far of The New York Times in the last few months and I believe we're going to be done next year by just having people do a word at a time." 
  5. ^ http://recaptcha.net/aboutus.html
  6. ^ Rubens, Paul (2007-10-02). "Spam weapon helps preserve books". BBC. http://news.bbc.co.uk/2/hi/technology/7023627.stm. 
  7. ^ "Fight Spam, Digitize Books". Craigslist Blog. 2008-06. http://blog.craigslist.org/2008/06/fight-spam-digitize-books/. 
  8. ^ TV Converter Box Program
  9. ^ Hutchinson, Alex (March 2009), "Human Resources: The job you didn't even know you had", The Walrus: 15-16 
  10. ^ Timmer, John (2008-08-14). "CAPTCHAs work? for digitizing old, damaged texts, manuscripts". Ars Technica. http://arstechnica.com/news.ars/post/20080814-captchas-workfor-digitizing-old-damaged-texts-manuscripts.html. Retrieved 2008-12-09. 
  11. ^ "FAQ". reCAPTCHA.net. http://recaptcha.net/faq.html. 
  12. ^ "Mailhide: Free Spam Protection". reCAPTCHA.net. http://mailhide.recaptcha.net/. 

External links


 
 

 

Copyrights:

Computer Desktop Encyclopedia. THIS COPYRIGHTED DEFINITION IS FOR PERSONAL USE ONLY.
All other reproduction is strictly prohibited without permission from the publisher.
© 1981-2009 Computer Language Company Inc.  All rights reserved.  Read more
Wikipedia. This article is licensed under the Creative Commons Attribution/Share-Alike License. It uses material from the Wikipedia article "ReCAPTCHA" Read more