Log in

No account? Create an account
entries friends calendar profile Previous Previous Next Next
Language Computeer
Fists of irony
[state of the inquiry]
okay, here's the first state address.

Today's inquiry is why are thousands of features the same as five?

The first body chapter of my dissertation is on re-ranking speech-recognition hypotheses, using grammatical structure (parse) information. I'm doing a number of things that are new (which I won't go into) but I am evaluating the reranking success by measuring its performance on word-error rate (WER).

I am exploring what features (classes of information) are useful in doing this reranking -- in particular, what features from parse structure are useful (in speech-recognition, it's pretty well-established that the speech-recognizer's own scores are worth paying attention to). So I am considering two scenarios:

  • [in addition to speech-recognizer scores], add only the "parse-quality" scalar
  • [as above, but also include]a very long (dimension 20k or so) vector of non-local features, like "count of NPs"
these two (plus two baselines, upper and lower), are currently getting these results (lower is better):
 baseline: 0.236361 [baseline]
  parselm: 0.230343
fullfeats: 0.230343
   oracle: 0.161255 [best possible rerank]
So here's the mystery: why is fullfeats getting exactly the same values as parselm? with 20k additional features in the vector, I'd expect that it might even get worse ("the curse of dimensionality") but I wouldn't expect these results to be exactly the same.

Advisor has suggested that there may be a bug in my code, so that is today's Big Question, to try to work backwards through the pipeline to work out if these models are "accidentally" producing exactly the same results (which says I may have to re-evaluate what learner I'm using) or if something more severe has gone wrong (which would actually be more of a relief, because I want the improvements to be larger than 0.6 WER, and I'm looking also to see why there wasn't very much).

5 comments or Leave a comment
trochee From: trochee Date: April 3rd, 2010 04:28 am (UTC) (Link)
guess what: something more severe has gone wrong. Working backwards through the chain, it appears that the 20k feature vector just "happens" to have only the same 5 in it. I think I've worked out why, too (a single-line bug, apparently checked in as part of a large checkin in October. By me. at 9:30 at night, which is too late to be checking in huge chunks of code.)

So the good news, I think, is that the mystery is solved. Some re-running of experiments is called for, but at least there's a possible out - and room for further improvement.
trochee From: trochee Date: April 3rd, 2010 04:28 am (UTC) (Link)
I think I need a 'Rubber Ducky' icon.
From: evan Date: April 3rd, 2010 03:59 pm (UTC) (Link)
Having worked on long pipelines like these for work, I have concluded that testing is critical. And also that it's really hard, because you're not sure what values the resulting model ought to have!
trochee From: trochee Date: April 3rd, 2010 08:53 pm (UTC) (Link)
yes, exactly! And the data get pretty opaque by the time they're converted into vectors for the various machinery that does that, so it can be quite difficult to express what are "reasonable" bounds.

post hoc, i can see that it would have been reasonable to indicate that "the fullfeats sparse vector should nearly always be longer than the parselm vector, for any given candidate" but it turns out that the "nearly" is a bit tricky to code anyway.
lapartera From: lapartera Date: April 4th, 2010 07:58 pm (UTC) (Link)
Well, I have almost no idea what you're talking about here, but I'm glad to see you hard at work on getting the G-D dissertation finished!!

Let's talk sometime!
5 comments or Leave a comment