Information Filtering

Contact:  Christian ScheelAndreas Lommatzsch, Till Plumbaum

 

The "Information Retrieval"-Cluster addresses many important issues in the areas of Information Retrieval and Artificial Intelligence, where the most important ones are dealing with the efficient usage of the semantic information that is encapsulated into built indices, the optimization of large search spaces to allow the application of filtering algorithms, and the reduction of the response time in order to allow complex filtering chains with sufficient performance, and more. Most of these goals are achieved through the intelligent application of different machine learning techniques that together provide high quality results by taking care of semantics, reduce response time by efficiently reducing the search space, and therefore guarantee good scalability together with high user satisfaction.

As part of this research the CC IRML focus on multilingual text retrieval and learning to rank.

Multilingual text retrieval allows us search for documents in any language with a single search query in our native language, by representing words as semantic concepts in an Interlingua. For example, the word 'bank' can signify either the organization which deals with money, or relate to river sides. In German, however, the same word can also refer to a bench that you sit on in a park. Semantic concepts allow us to deal with this cross-lingual ambiguity.

Learning to Rank: Information is not equally relevant to everyone. Machine learning based on feedback from user interaction allows personalized information filtering for each user. It ensures, that preferred results in results lists are on top.Typical applications for the Learning to Rank can be found where information on various criteria (eg relevance to a query, according to the individual user preferences, trustworthiness of information) must be evaluated. Particularly frequent Learning to Rank strategies are therefore used in the personalized search, and when recommendations are generated.

Agent Ensembles: Ensemble learning is a current research topic in the field of statistics, machine learning and information retrieval. Ensemble methods combine multiple models to obtain better results than the results that could be obtained from any of the constituent models. The models can created based on several learning algorithms, various parameter settings based on adaptive weights for learning instances. Typical applications for ensemble methods are multi-lingual systems (combining models in different languages), search engines (combining various quality and relevance models) and recommender systems.