Abstract:
In this paper, we examine the feasibility of building Information retrieval test collections based on two combined methods, the pooling strategy and the Naïve-Bayes machine-learning algorithm. Within the proposed approach, we built a new Arabic/English test collection. This collection consists of 600 parallel Arabic / English documents collected from abstracts of the doctoral dissertations mainly hosted in the ProQuest library and 161 queries in six topics and nineteen sub-topics. The judgment and score of the relevance between each document and each query is determined by the pooling method, where three search engines (Lucene, Whoosh and Hibernate) are used in two languages (Arabic and English). The obtained results are also examined and validated by the Naïve-Bayes algorithm, whereby 0.629 of F-measure metric is calculated from the relevant documents effectively selected. The paper empirically shows that the use of the machine-learning algorithms combined to the pooling strategy serves to build information retrieval collections efficiently and more quickly.