September 2nd, 2004


historical data-management

I've spent the last two weeks (!) trying to figure out how to relate no less than five different kinds of truth.

Before anybody thinks I've gone mystic, I should clarify: in speech recognition research, and other machine-learning contexts, truth refers to the right answer. We have hours and hours of conversations, transcribed by listeners at the Linguistic Data Consortium.

Unfortunately, there has been more than one pass at coming up with the right words -- the right truth. Collapse ) The frustrating thing is that of all the cleverness in data-munging I've done, and all the careful code- and data-archaeology that I've done to get here, none of it is publishable. I'm just hoping that the other researchers I'm doing this for are grateful enough to put me in as a secondary author.

