Machine Learning Meetings and Events
Group Meetings: Group meetings are held Mondays from 11am to Noon (talk starts 11:10am) in D.L. Pratt 290C unless otherwise noted. Meetings are coordinated by Hugo Larochelle.
Tea Talks: Tea talks are held every Wednesday at 4:00pm in D.L. Pratt 290C. Talks should be simple, accessible, and not exceed 15 minutes. Speakers bring snacks, make tea, and provide a copy of the presented paper.
Group Meeting May 4, 2009: Read the Web: Toward Never-Ending Language Learning.
- Speaker: Tom Mitchell
- Abstract:
We consider the problem of developing a never-ending language learner to learn to read the web. The thesis underlying our research is that (a) the key problem is to develop semi-supervised learning algorithms that achieve dramatically more accurate learning than current approaches, and (b) this kind of accuracy can be achieved by coupling the semi-supervised training of many, many distinct information extraction functions, rather than treating each as an isolated learning task.
While we don't yet have a never-ending learner that reads the web, this talk will present our system that learns productively for several days, starting with a few hundred labeled training examples and 200 million unlabeled web pages, iteratively learning to extract a data base containing 40,000 assertions (e.g., playsSport('penguins','hockey'), teamMember('penguins','sidney_crosby')) with an accuracy averaging 85%. We also present a formal analysis showing why and how improved accuracy results from coupling the training of many functions.
The knowledge base of 40,000 facts mentioned above is available for browsing at http://rtw.ml.cmu.edu/readtheweb.html
- Notes: in PT266