Information retrieval evaluation

cover image

Where to find it

Information & Library Science Library

Call Number
ZA3075 .H275 2011
Status
Available

Summary

Evaluation has always played a major role in information retrieval, with the early pioneers such as Cyril Cleverdon and Gerard Salton laying the foundations for most of the evaluation methodologies in use today. The retrieval community has been extremely fortunate to have such a well-grounded evaluation paradigm during a period when most of the human language technologies were just developing. This lecture has the goal of explaining where these evaluation methodologies came from and how they have continued to adapt to the vastly changed environment in the search engine world today.

The lecture starts with a discussion of the early evaluation of information retrieval systems, starting with the Cranfield testing in the early 1960s, continuing with the Lancaster ""user"" study for MEDLARS, and presenting the various test collection investigations by the SMART project and by groups in Britain. The emphasis in this chapter is on the how and the why of the various methodologies developed. The second chapter covers the more recent ""batch"" evaluations, examining the methodologies used in the various open evaluation campaigns such as TREC, NTCIR (emphasis on Asian languages), CLEF (emphasis on European languages), INEX (emphasis on semi-structured data), etc. Here again the focus is on the how and why, and in particular on the evolving of the older evaluation methodologies to handle new information access techniques. This includes how the test collection techniques were modified and how the metrics were changed to better reflect operational environments. The final chapters look at evaluation issues in user studies -- the interactive part of information retrieval, including a look at the search log studies mainly done by the commercial search engines. Here the goal is to show, via case studies, how the high-level issues of experimental design affect the final evaluations.

Contents

1. Introduction and early history -- Introduction -- The Cranfield tests -- The MEDLARS evaluation -- The SMART system and early test collections -- The Comparative Systems Laboratory at Case Western University -- Cambridge and the "Ideal" Test Collection -- Additional work in metrics up to 1992 -- 2. "Batch" Evaluation Since 1992 -- 2.1. Introduction -- 2.2. The TREC evaluations -- 2.3. The TREC ad hoc tests (1992-1999) -- Building the ad hoc collections -- Analysis of the ad hoc collections -- The TREC ad hoc metrics -- 2.4. Other TREC retrieval tasks -- Retrieval from "noisy" text -- Retrieval of non-English documents -- Very large corpus, web retrieval, and enterprise searching -- Domain-specific retrieval tasks -- Pushing the limits of the Cranfield model -- 2.5. Other evaluation campaigns -- NTCIR -- CLEF -- INEX -- 2.6. Further work in metrics -- 2.7. Some advice on using, building and evaluating test collections -- Using existing collections -- Subsetting or modifying existing collections -- Building and evaluating new ad hoc collections -- Dealing with unusual data -- Building web data collections -- 3. Interactive Evaluation -- Introduction -- Early work -- Interactive evaluation in TREC -- Case studies of interactive evaluation -- Interactive evaluation using log data -- 4. Conclusion -- Introduction -- Some thoughts on how to design an experiment -- Some recent issues in evaluation of information retrieval -- A personal look at some future challenges -- Bibliography -- Author's biography.

Other details