Friday, October 5, 2007

Anti-Spam Tool Used in Deciphering Digitized Texts

The BBC has a story about how researchers are using an anti-spam tool to help digitize old books that machines can't read. You've probably seen a captcha - one of those distorted images of text that you have to deciper and type into a box. Having users unscramble things is helping Carnegie Mellow University with its digitizing program. OCR (optical character recognition) is supposed to take scans of text and translate them into somethig you could edit in a word processor. But the age of these books can cause one mistake out of every ten words - a real pain for the scanning personnel to fix manually. But if you take each of those images, put them up in captchas, and have people from around the Internet take a few minutes and fix them, it goes a lot faster:
Thanks to the adoption of reCAPTCHAs by popular websites like Facebook, Twitter and StumbleUpon, the system is helping to decipher about one million words every day for CMU's book archiving project, according to [Luis von Ahn, a Professor at CMU].
Now, if only we had an equivalent system for deciphering the handwriting of doctors.

Labels: , , , , ,