| Cranfield Test 1 | Cranfield Test 2 | Smart Retrieval experiment | MEDLARS Test | The STAIRS project | TREC Experiment: The Text Retrieval Conference |
Introduction | The Cranfield 1 study, led by C. W. Cleverdon, was the first comprehensive assessment of information retrieval systems conducted in Cranfield, UK. Cleverdon's 1962 report on the first Cranfield Study, which started in 1957. | Cranfield 2 the second phase of the Cranfield studies started in 1963 and ended in 1966.The Cranfield 2 was a controlled experiment designed to look into the elements of index languages and how they affect retrieval system performance. | Gerard Salton evaluated the several searching options provided by the SMART retrieval system under laboratory conditions. The system was introduced in 1964 and is based on the processing of abstracts in natural language forms. | From August 1966 to July 1967, the US National Library of Medicine's Medical Literature Analysis and Retrieval System (MEDLARS) performance was evaluated. The operational database of MEDLARS, a database of biomedical papers, was used for the test, Medical Subject Headings thesaurus (MeSH) entries are being taken out. | Blair and Maron (1985) released a report on a large-scale experiment designed to assess a full-text search and retrieval system’s retrieval efficacy. The Storage and Information Retrieval System (STAIRS) Study is the name given to this. | The TREC studies, which were to be conducted by the National Institute of Science and Technology (NIST) and financed by the US Defence Advanced Research Projects Agency (DARPA) in 1991, allowed information retrieval researchers to expand from small data sets to bigger tests. |
Scope | 18,000 indexed items and 1200 search topics were used in the investigation. The documents were selected evenly from the general public, with half being research reports and the other half being magazine pieces. Field of high-speed aerodynamics, which is a subfield of aeronautics. | The Cranfield 2 test was created for information retrieval research, specifically to assess how well IR systems can employ user queries to extract pertinent information from a vast collection of documents. | Enhances accuracy and relevance by utilising cutting-edge methods such as machine learning, natural language processing (NLP), and semantic comprehension. Smart retrieval interprets the meaning of searches by doing more than just matching keywords. | The MEDLARS test's intentions were to assess the current MEDLARS system and identify areas for improvement. At the time of the test, there were roughly 7,00,00 items in the document collection that was accessible on the MEDLARS service. | The STAIRS evaluation’s primary goal was to determine how well the system could retrieve every document and only those that are pertinent to a particular request and recall and precision metrics were employed to achieve this. | In several TREC studies, a broad variety of information retrieval techniques were examined (i.e. from TREC 1 in 1992 to TREC 12 in 2003). Boolean search, statistical and probabilistic indexing, and term weighting strategies are a few noteworthy examples. |
Methodology | The project made use of pre-made enquiries that were created prior to the start of the real search. In all, 400 queries were created, and the system handled each one in its three stages. As a result, the system processed 1200 search requests in total. | Documents from a predetermined corpus are used. It usually contains 1,400 items (research papers, articles, etc.) for Cranfield 2. The retrieval system is tested using a set of 75 queries. These queries are examples of common search terms that a user might use to find specific information in the corpus of documents. | For this experiment, journals were used. Eight distinct individuals with knowledge of the topic, either as librarians or library science students, were requested to create 48 distinct search queries in the documentation area using clear, grammatically accurate English. After each of the eight individuals submitted their inquiry, a total of 48 queries were examined utilizing the several search options provided by the SMART system against a file containing 1268 data. | From the user community, 21 user groups were chosen. 302 search requests were made by the user group that was chosen. The system operator created each query in terms of MeSH and carried out searches. Following a search, users get the sample output for pertinent evaluation. The articles' photocopies, instead were provided for the relevance assessment, rather than just as references. | Nearly 40,000 documents, or about 350,000 pages of hard copy text used in the defence of a major corporate lawsuit, made up the database that the STAIRS study looked at. | The tree-eval package, which provided roughly 85 distinct numbers for a run, including recall and accuracy measurements at different cut-off points and a single value summary measure for recall and precision, was used to evaluate the ad hoc retrieval tasks in TREC. |
Results | With a recall percentage ranging from 60% to 90% and an overall average of 80%, all four systems were functioning effectively. It was found that retrieving papers in general aeronautics domains had success rates that were 4-5% higher than those in specialized fields like high-speed aerodynamics. | Because the test-performing index languages were made up of uncontrolled single words that appeared in papers, the Cranfield 2 test results were surprising. | It was observed from a ranking of the recall-precision graphs produced by the various processing techniques, that Changes in the relevance judgements had no effect on the relative performance of the different retrieval methods, despite the fact that the groups’ general consistency of relevance agreements was not very high. The ranking of alternative search methods was the same across four sets of relevance evaluations. | However, the memory and precision ratios of all 302 queries were analysed, and the individual ratios were averaged using the MEDLARS test. | Eleven of the 51 requests were used to test sampling strategies and account for potential bias when assessing retrieval and sample testing, while 40 of the requests had their recall and precision values determined. | Despite many experimental designs, the performance level remained the same. As an example, some groups used the topic statements to automatically produce queries, while others did so manually; feedback on relevancy was absent from numerous systems; the computer platform utilised varied from personal computers to supercomputers; Variations in the precision-recall curve were negligible; and Despite the similarity of the precision-recall results, there was a significant disarray in the actual documents recovered. |
Limitation | The artificial nature of this study, which had no impact on real-life situations, was one of its primary complaints. | Various word, phrase, or combination units made up each index language of both. The questions and documents were created in the same manner. As a result, the relative efficacy of the languages with varying levels of specificity would be assessed by matching questions to documents. | It was shown that changes in the relevance judgements had no effect on the SMART evaluation outcome. For the collection of data being studied, the recall-precision outcome was essentially invariant. | A number of suggestions for enhancing MEDLARS performance were derived from the test’s outcomes: The creation of another search request is one of the significant modifications made to the MEDLARS as a result of this test. | As mentioned, “It was impossibly difficult for users to predict the exact words, their combinations, and phrases used in all or most of the relevant documents and only in those documents”. This was one of the reasons why STAIRS did not work. | The primary challenge of TREC focusses on methodological problems. The conventional laboratory paradigm was used for TREC investigations which is extremely challenging to convey to people looking up material online as compared to in a traditional library setting. |
Path breaking approach | More real-world complexity, such as personalized search, multimodal data, and interactive evaluation, will be incorporated into the conventional framework of controlled evaluation based on precision and recall. | According to Cranfield 2, retrieval systems might not function very well and are challenging to greatly improve. It’s highly likely that Cranfield 2 promoted caution in system analysis and carrying out experiments. | In the future of SMART retrieval research, keyword-based, static models will be replaced by intelligent, interactive, personalised, and multimodal systems driven by deep learning, artificial intelligence, and natural language processing. | The MeSH system, which was initially developed for MEDLARS, is still in use today to facilitate effective PubMed and MEDLINE searching and information retrieval. | The potential growth of information retrieval, according to the Stairs Project, lies in scalable, user-centered, and semantic retrieval systems. | As IR develops, TREC will remain a driving force behind innovation, helping researchers and practitioners test and refine systems that will impact information retrieval in future generations. |