?

Log in

No account? Create an account
entries friends calendar profile Previous Previous Next Next
Language Computeer
Fists of irony
In brief:
  • 08:36 Waiting for the start of the Web-as-corpus workshop www.sigwac.org.uk/wiki/WAC6 #naacl2010 #
  • 08:42 California, WTH is wrong with you? bit.ly/bV0lUj #
  • 08:48 Guevara: WaCorpus for Norwegian. Norwegian has TWO written standards Bokmål & Nynorsk (& many dialects) #Ididnotknowthat #naacl2010 #WAC6 #
  • 08:56 Guevara ran into Norwegian copyright law working on the web; NoWaC will be free & legal (but research-only) #WAC6 #naacl2010 #
  • 09:04 Duplicate removal was crucial issue for Norwegian data (must go read Broder et al 1997,8) #naacl2010 #WAC6 #
  • 09:15 Now Korean WaCorpus (pres. by Ross Israel). Corpus towards learner particle-error detection in Korean #naacl2010 #WAC6 #
  • 09:34 Invited talk by Patrick Pantel (Yahoo! to Bing) on finding Web knowledge and transferring to search #naacl2010 #WAC6 #
  • 09:37 Yahoo!'s "web-of-objects" sounds a lot like @freebase when Pantel describes it (it's dbpedia-branded) #naacl2010 #WAC6 #
  • 09:45 Pantel main punchline: feature engineering makes a huge difference in entity extraction #thatoneIknew #naacl2010 #WAC6 #
  • 10:21 Pantel's experiments on seed-set prototype removal are fascinating. "prototypicality" can actually be a problem #naacl2010 #WAC6 #
  • 11:13 Goyal et al. talk:using clever trix to sketch cts over v. large data (hashing, conservative updates) #naacl2010 #WAC6 #
  • 11:19 Goyal et al. evaluate approx v exact PMI: good eval for these sorts of sketch-counting #naacl2010 #WAC6 #
  • 11:24 Goyal et. al. get almost no loss on Turney SO-PMI by using 8Gb of counters over 60gigaword stream #naacl2010 #WAC6 #
  • 11:33 Dillon: academic prose web corpus with bootcat. with paper handout. #oldschool #naacl2010 #WAC6 #
  • 12:02 Stefan Evert on Google teraword 5gms made easy "but not for computer" #WAC6 #naacl2010 #
  • 12:08 Evert: Web1T5-easy shoves W1T5 db in sqlite (did it myself in mysql last month!) also adds normalization #naacl2010 #WAC6 #
  • 12:11 Evert: "it uses only 211Gb, and we don't worry about that too much." everyone over 30 chuckles uncomfortably #naacl2010 #WAC6 #
  • 12:15 psyched to throw out my own crappy mysql code and get Evert's --actually DOES seem to make it easy on computer #naacl2010 #WAC6 #

I often use twitter to mention what's happening or linkdump. I LT here for posterity.

Leave a comment