Information Filtering

Contact:  Andreas LommatzschTill Plumbaum

 

The "Information Retrieval"-Cluster addresses many important issues in the areas of Information Retrieval and Artificial Intelligence, where the most important ones are dealing with the efficient usage of the semantic information that is encapsulated into built indices, the optimization of large search spaces to allow the application of filtering algorithms, and the reduction of the response time in order to allow complex filtering chains with sufficient performance, and more. Most of these goals are achieved through the intelligent application of different machine learning techniques that together provide high quality results by taking care of semantics, reduce response time by efficiently reducing the search space, and therefore guarantee good scalability together with high user satisfaction.

As part of this research the CC IRML focus is on multilingual text retrieval and learning to rank.

Multilingual text retrieval allows us to search for documents in any language with a single search query in our native language, by representing words as semantic concepts in an Interlingua. For example, the word 'bank' can signify either the organization which deals with money, or relate to river sides. In German, however, the same word can also refer to a bench that you sit on in a park. Semantic concepts allow us to deal with this cross-lingual ambiguity.

Agent Ensembles: Ensemble learning is a current research topic in the field of statistics, machine learning and information retrieval. Ensemble methods combine multiple models to obtain better results than the results that could be obtained from any of the constituent models. The models can be created based on several learning algorithms, various parameter settings based on adaptive weights for learning instances. Typical applications for ensemble methods are multi-lingual systems (combining models in different languages), search engines (combining various quality and relevance models) and recommender systems.