August 24th, 2004


All roads lead to porn

Our web-crawler corpus of Chinese text (useful in improving speech recognition's language model) has a substantial subsection that includes porn in Chinese. I wouldn't have known -- my Chinese reading skills are for shite -- but my next-desk-neighbor started laughing really hard.

The irony of the whole thing is that this data probably helps our system quite a bit -- although the topic isn't very helpful for certain kinds of dialogue, it's still a pretty good source of statistics about Chinese word frequency.

I'm sure that following links at random in English-language sites would also frequently wind up in pornography too, but now I run dangerously close to a long and probably boring digression on web connectedness graphs.

Reading list

Studio Foglio (a Seattle outfit) has released issue #12 of Girl Genius. This comic is easy to like. It's got smart, sassy (and cute) engineer genius girls, robots, talking cats, scary and yet alluring pirate captains, barons, blimps, cannons, and troll-like constructed soldiers. It's a madhouse alternate Europe where every mad scientist was a Dr. Frankenstein, and central Europe is dominated by one Klaus von Wulfenbach. What other comic line comes with its own goggles?
A cartoon around some of the aforementioned troll-construct soldiers. The Foglios like to play with Jäger dialog:

Political stuff
From _dkg_, I received in the mail today a paper copy of I am New York City, which is a neat exploration of all the positive things happening in NYC. He writes:
Check out this paper some friends of mine put out. [note: Mr. Modesty here is one of the organizers himself, as is lapartera] We're aiming to distribute 25,000 of them before and during the RNC, to better highlight the issues that we think are important here (i.e., it's not all about cops vs. protestors!)

I've read most of this and I think it's great. Go read the PDF or the HTML

I read some more of The Nation on the bus.
