May 10th, 2004


language-modeling defeats the censors

(via boingboing_net), here, deep-linked

When the State Department blacks out words to release them to the public, you can use language modeling to figure out what's the missing word. But humans can do a pretty good job of guessing anyway.

Here's the new bit: new State Department regulations require all the published documents to be in Times Roman, instead of Courier. While Courier (monospace) fonts can tell you exactly how many characters are in the word, you can get the exact pixel length -- substantially more information -- from a variable-width font like Times Roman.

Statistics and computers are moving so much faster than our ability to think about their implications. Another argument for Open Source, even for things like privacy standards: let all the clever-ass hackers think -- and talk publicly -- about how they might want to break it, rather than thinking you're clever enough to lock them all out.