« BJ Fogg: Credibility Elements | Main | Thesis Ideas 2: Fraudulent Websites »

Parsing text without spaces

Canyoureadthis? I bet you can. How is it that we can read sentences without the spaces?

More to the point, how can we teach computers to read without spaces?

A nifty paper by Matt Mahoney, from the Florida Institute of Technology outlines one approach:

http://www.cs.fit.edu/~mmahoney/dissertation/lex1.html

The paper cites research by another scientist, showing that infants are able to parse the continuous sound of English before they are even able to speak their own first word.

Matt shows a way to apply stastical analysis to the breaks between characters and determine which breaks signify words. I need to put together something like it, if I choose to go that route for a project I'm working on.

But then I have to figure out how to words in get order the right. And that you do how do?

A puzzle for another day.

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)

About

This page contains a single entry from the blog posted on September 26, 2005 3:37 PM.

The previous post in this blog was BJ Fogg: Credibility Elements.

The next post in this blog is Thesis Ideas 2: Fraudulent Websites.

Many more can be found on the main index page or by looking through the archives.