Conference Papers

SPIGA - A Multilingual News Aggregator

AuthorLeonhard Hennig, Danuta Ploch, Daniel Prawdzik, Benjamin Armbruster, Christoph Büscher, Holger Düwiger, Ernesto William De Luca, Sahin Albayrak
SourceProceedings of the Biennal GSCL Conference 2011 
LinksDownload   |   BibTeX 

News aggregation web sites collect and group news articles from a multitude of sources in order to help users navigate and consume large amounts of news material. In this context, Topic Detection and Tracking (TDT) methods address the challenges of identifying new events in streams of news articles, and of threading together related articles. We propose a novel model for a multilingual news aggregator that groups together news articles in different languages, and thus allows users to get an overview of important events and their reception in different countries. Our model combines a vector space model representation of documents based on a multilingual lexicon of Wikipedia-derived concepts with named entity disambiguation and multilingual clustering methods for TDT. We describe an implementation of our approach on a large-scale, real-life data stream of English and German newswire sources, and present an evaluation of the Named Entity Disambiguation module, which achieves state-of-the-art performance on a German and an English evaluation dataset.