在MIRACL上进行评估#

MIRACL（跨语言连续体多语言信息检索）是WSDM 2023杯赛挑战项目，专注于18种不同语言的检索任务。该比赛发布了包含16种"已知语言"训练集和开发集、以及2种"惊喜语言"仅开发集的多语言检索数据集。所有主题均由各语言母语者生成，并由他们标注主题与给定文档列表的相关性。您可以在HuggingFace平台获取该数据集。

注意：我们强烈建议您在GPU上运行MIRACL的评估。作为参考，在配备8块A100 40G显卡的节点上完成整个过程大约需要一小时。

0. 安装#

首先安装我们将使用的库：

% pip install FlagEmbedding pytrec_eval

1. 数据集#

MIRACL数据集包含18种语言的大量段落和文章，是训练或评估多语言模型的宝贵资源。数据可从Hugging Face下载。

语言	段落数量	文章数量
阿拉伯语 (ar)	2,061,414	656,982
孟加拉语 (bn)	297,265	63,762
英语 (en)	32,893,221	5,758,285
西班牙语 (es)	10,373,953	1,669,181
波斯语 (fa)	2,207,172	857,827
芬兰语 (fi)	1,883,509	447,815
法语 (fr)	14,636,953	2,325,608
印地语 (hi)	506,264	148,107
印尼语 (id)	1,446,315	446,330
日语 (ja)	6,953,614	1,133,444
韩语 (ko)	1,486,752	437,373
俄语 (ru)	9,543,918	1,476,045
斯瓦希里语 (sw)	131,924	47,793
泰卢固语 (te)	518,079	66,353
泰语 (th)	542,166	128,179
中文 (zh)	4,934,368	1,246,389

from datasets import load_dataset

lang = "en"
corpus = load_dataset("miracl/miracl-corpus", lang, trust_remote_code=True)['train']

语料库中的每个段落包含三个部分：docid、title和text。在docid为x#y的文档结构中，x表示维基百科文章的ID，y是该文章内的段落编号。标题是ID为x的所属文章名称。文本内容则是该段落的正文。

corpus[0]

{'docid': '56672809#4',
 'title': 'Glen Tomasetti',
 'text': 'In 1967 Tomasetti was prosecuted after refusing to pay one sixth of her taxes on the grounds that one sixth of the federal budget was funding Australia\'s military presence in Vietnam. In court she argued that Australia\'s participation in the Vietnam War violated its international legal obligations as a member of the United Nations. Public figures such as Joan Baez had made similar protests in the USA, but Tomasetti\'s prosecution was "believed to be the first case of its kind in Australia", according to a contemporary news report. Tomasetti was eventually ordered to pay the unpaid taxes.'}

qrels 具有以下形式：

dev = load_dataset('miracl/miracl', lang, trust_remote_code=True)['dev']

dev[0]

{'query_id': '0',
 'query': 'Is Creole a pidgin of French?',
 'positive_passages': [{'docid': '462221#4',
   'text': "At the end of World War II in 1945, Korea was divided into North Korea and South Korea with North Korea (assisted by the Soviet Union), becoming a communist government after 1946, known as the Democratic People's Republic, followed by South Korea becoming the Republic of Korea. China became the communist People's Republic of China in 1949. In 1950, the Soviet Union backed North Korea while the United States backed South Korea, and China allied with the Soviet Union in what was to become the first military action of the Cold War.",
   'title': 'Eighth United States Army'},
  {'docid': '29810#23',
   'text': 'The large size of Texas and its location at the intersection of multiple climate zones gives the state highly variable weather. The Panhandle of the state has colder winters than North Texas, while the Gulf Coast has mild winters. Texas has wide variations in precipitation patterns. El Paso, on the western end of the state, averages of annual rainfall, while parts of southeast Texas average as much as per year. Dallas in the North Central region averages a more moderate per year.',
   'title': 'Texas'},
  {'docid': '3716905#0',
   'text': 'A French creole, or French-based creole language, is a creole language (contact language with native speakers) for which French is the "lexifier". Most often this lexifier is not modern French but rather a 17th-century koiné of French from Paris, the French Atlantic harbors, and the nascent French colonies. French-based creole languages are spoken natively by millions of people worldwide, primarily in the Americas and on archipelagos throughout the Indian Ocean. This article also contains information on French pidgin languages, contact languages that lack native speakers.',
   'title': 'French-based creole languages'},
  {'docid': '22399755#18',
   'text': 'There are many hypotheses on the origins of Haitian Creole. Linguist John Singler suggests that it most likely emerged under French control in colonial years when shifted its economy focused heavily on sugar production. This resulted in a much larger population of enslaved Africans, whose interaction with the French created the circumstances for the dialect to evolve from a pidgin to a Creole. His research and the research of Claire Lefebvre of the Université du Québec à Montréal suggests that Creole, despite drawing 90% of its lexicon from French, is the syntactic cousin of Fon, a Gbe language of the Niger-Congo family spoken in Benin. At the time of the emergence of Haitian Creole, 50% of the enslaved Africans in Haiti were Gbe speakers.',
   'title': 'Haitian literature'}],
 'negative_passages': [{'docid': '1170520#2',
   'text': 'Louisiana Creole is a contact language that arose in the 18th century from interactions between speakers of the lexifier language of Standard French and several substrate or adstrate languages from Africa. Prior to its establishment as a Creole, the precursor was considered a pidgin language. The social situation that gave rise to the Louisiana Creole language was unique, in that the lexifier language was the language found at the contact site. More often the lexifier is the language that arrives at the contact site belonging to the substrate/adstrate languages. Neither the French, the French-Canadians, nor the African slaves were native to the area; this fact categorizes Louisiana Creole as a contact language that arose between exogenous ethnicities. Once the pidgin tongue was transmitted to the next generation as a "lingua franca" (who were then considered the first native speakers of the new grammar), it could effectively be classified as a creole language.',
   'title': 'Louisiana Creole'},
  {'docid': '49823#1',
   'text': 'The precise number of creole languages is not known, particularly as many are poorly attested or documented. About one hundred creole languages have arisen since 1500. These are predominantly based on European languages such as English and French due to the European Age of Discovery and the Atlantic slave trade that arose at that time. With the improvements in ship-building and navigation, traders had to learn to communicate with people around the world, and the quickest way to do this was to develop a pidgin, or simplified language suited to the purpose; in turn, full creole languages developed from these pidgins. In addition to creoles that have European languages as their base, there are, for example, creoles based on Arabic, Chinese, and Malay. The creole with the largest number of speakers is Haitian Creole, with almost ten million native speakers, followed by Tok Pisin with about 4 million, most of whom are second-language speakers.',
   'title': 'Creole language'},
  {'docid': '1651722#10',
   'text': 'Krio is an English-based creole from which descend Nigerian Pidgin English and Cameroonian Pidgin English and Pichinglis. It is also similar to English-based creole languages spoken in the Americas, especially the Gullah language, Jamaican Patois (Jamaican Creole), and Bajan Creole but it has its own distinctive character. It also shares some linguistic similarities with non-English creoles, such as the French-based creole languages in the Caribbean.',
   'title': 'Krio language'},
  {'docid': '540382#4',
   'text': 'Until recently creoles were considered "degenerate" dialects of Portuguese unworthy of attention. As a consequence, there is little documentation on the details of their formation. Since the 20th century, increased study of creoles by linguists led to several theories being advanced. The monogenetic theory of pidgins assumes that some type of pidgin language — dubbed West African Pidgin Portuguese — based on Portuguese was spoken from the 15th to 18th centuries in the forts established by the Portuguese on the West African coast. According to this theory, this variety may have been the starting point of all the pidgin and creole languages. This may explain to some extent why Portuguese lexical items can be found in many creoles, but more importantly, it would account for the numerous grammatical similarities shared by such languages, such as the preposition "na", meaning "in" and/or "on", which would come from the Portuguese contraction "na" meaning "in the" (feminine singular).',
   'title': 'Portuguese-based creole languages'},
  {'docid': '49823#7',
   'text': 'Other scholars, such as Salikoko Mufwene, argue that pidgins and creoles arise independently under different circumstances, and that a pidgin need not always precede a creole nor a creole evolve from a pidgin. Pidgins, according to Mufwene, emerged in trade colonies among "users who preserved their native vernaculars for their day-to-day interactions." Creoles, meanwhile, developed in settlement colonies in which speakers of a European language, often indentured servants whose language would be far from the standard in the first place, interacted extensively with non-European slaves, absorbing certain words and features from the slaves\' non-European native languages, resulting in a heavily basilectalized version of the original language. These servants and slaves would come to use the creole as an everyday vernacular, rather than merely in situations in which contact with a speaker of the superstrate was necessary.',
   'title': 'Creole language'},
  {'docid': '11236157#2',
   'text': 'While many creoles around the world have lexicons based on languages other than Portuguese (e.g. English, French, Spanish, Dutch), it was hypothesized that such creoles were derived from this lingua franca by means of relexification, i.e. the process in which a pidgin or creole incorporates a significant amount of its lexicon from another language while keeping the grammar intact. There is some evidence that relexification is a real process. Pieter Muysken and show that there are languages which derive their grammar and lexicon from two different languages respectively, which could be easily explained with the relexification hypothesis. Also, Saramaccan seems to be a pidgin frozen in the middle of relexification from Portuguese to English. However, in cases of such mixed languages, as call them, there is never a one-to-one relationship between the grammar or lexicon of the mixed language and the grammar or lexicon of the language they attribute it to.',
   'title': 'Monogenetic theory of pidgins'},
  {'docid': '1612877#8',
   'text': 'A mixed language differs from pidgins, creoles and code-switching in very fundamental ways. In most cases, mixed language speakers are fluent, even native, speakers of both languages; however, speakers of Michif (a N-V mixed language) are unique in that many are not fluent in both of the sources languages. Pidgins, on the other hand, develop in a situation, usually in the context of trade, where speakers of two (or more) different languages come into contact and need to find some way to communicate with each other. Creoles develop when a pidgin language becomes a first language for young speakers. While creoles tend to have drastically simplified morphologies, mixed languages often retain the inflectional complexities of one, or both, of parent languages. For instance, Michif retains the complexities of its French nouns and its Cree verbs.',
   'title': 'Mixed language'},
  {'docid': '9606120#4',
   'text': 'While it is classified as a pidgin language, this is inaccurate. Speakers are already fluent in either English and French, and as such it is not used in situations where both parties lack a common tongue. As a whole, Camfranglais sets itself apart from other pidgins and creoles in that it consists of an array of languages, at least one of which is already known by those speaking it. For instance, while it contains elements of borrowing, code-switching, and pidgin languages, it is not a contact language as both parties can be presumed to speak French, the lexifer. Numerous other classifications have been proposed, like ‘pidgin’, ‘argot’, ‘youth language’, a ‘sabir camerounais’, an ‘appropriation vernaculaire du français’ or a ‘hybrid slang’. However, as Camfranglais is more developed than a slang, this too is insufficient. Kießling proposes it be classified as a \'highly hybrid sociolect of the urban youth type", a definition that Stein-Kanjora agrees with.',
   'title': 'Camfranglais'}]}

每个条目包含四个部分：query_id、query、positive_passages和negative_passages。其中query_id和query分别对应查询的ID和文本内容。positive_passages和negative_passages是段落列表，每个段落包含对应的docid、title和text。

该结构在train、dev、testA和testB数据集中保持一致。

然后我们处理查询和语料库的ID及文本，并获取开发集的qrels。

corpus_ids = corpus['docid']
corpus_text = []
for doc in corpus:
   corpus_text.append(f"{doc['title']} {doc['text']}".strip())

queries_ids = dev['query_id']
queries_text = dev['query']

2. 从零开始评估#

2.1 嵌入#

在演示中我们使用bge-base-en-v1.5，您可以根据需要更换为您偏好的模型。

import os 
os.environ['TRANSFORMERS_NO_ADVISORY_WARNINGS'] = 'true'
os.environ['SETUPTOOLS_USE_DISTUTILS'] = ''

from FlagEmbedding import FlagModel

# get the BGE embedding model
model = FlagModel('BAAI/bge-base-en-v1.5')

# get the embedding of the queries and corpus
queries_embeddings = model.encode_queries(queries_text)
corpus_embeddings = model.encode_corpus(corpus_text)

print("shape of the embeddings:", corpus_embeddings.shape)
print("data type of the embeddings: ", corpus_embeddings.dtype)

initial target device: 100%|██████████| 8/8 [00:29<00:00,  3.66s/it]
pre tokenize: 100%|██████████| 1/1 [00:00<00:00, 52.84it/s]
pre tokenize: 100%|██████████| 1/1 [00:00<00:00, 55.15it/s]
pre tokenize: 100%|██████████| 1/1 [00:00<00:00, 56.49it/s]
pre tokenize: 100%|██████████| 1/1 [00:00<00:00, 55.22it/s]
pre tokenize: 100%|██████████| 1/1 [00:00<00:00, 49.22it/s]
pre tokenize: 100%|██████████| 1/1 [00:00<00:00, 54.69it/s]
pre tokenize: 100%|██████████| 1/1 [00:00<00:00, 49.16it/s]
pre tokenize: 100%|██████████| 1/1 [00:00<00:00, 50.77it/s]
Chunks: 100%|██████████| 8/8 [00:10<00:00,  1.27s/it]
pre tokenize: 100%|██████████| 16062/16062 [08:12<00:00, 32.58it/s]  
pre tokenize: 100%|██████████| 16062/16062 [08:44<00:00, 30.60it/s]68s/it]
pre tokenize: 100%|██████████| 16062/16062 [08:39<00:00, 30.90it/s]41s/it]
pre tokenize: 100%|██████████| 16062/16062 [09:04<00:00, 29.49it/s]43s/it]
pre tokenize: 100%|██████████| 16062/16062 [09:27<00:00, 28.29it/s]it/s]t]
pre tokenize: 100%|██████████| 16062/16062 [09:08<00:00, 29.30it/s]32s/it]
pre tokenize: 100%|██████████| 16062/16062 [08:59<00:00, 29.77it/s]it/s]t]
pre tokenize: 100%|██████████| 16062/16062 [09:04<00:00, 29.50it/s]29s/it]
Inference Embeddings: 100%|██████████| 16062/16062 [17:10<00:00, 15.59it/s] 
Inference Embeddings: 100%|██████████| 16062/16062 [17:04<00:00, 15.68it/s]]
Inference Embeddings: 100%|██████████| 16062/16062 [17:01<00:00, 15.72it/s]s]
Inference Embeddings: 100%|██████████| 16062/16062 [17:28<00:00, 15.32it/s]
Inference Embeddings: 100%|██████████| 16062/16062 [17:43<00:00, 15.10it/s]
Inference Embeddings: 100%|██████████| 16062/16062 [17:27<00:00, 15.34it/s]
Inference Embeddings: 100%|██████████| 16062/16062 [17:36<00:00, 15.20it/s]
Inference Embeddings: 100%|██████████| 16062/16062 [17:31<00:00, 15.28it/s]
Chunks: 100%|██████████| 8/8 [27:49<00:00, 208.64s/it]

shape of the embeddings: (32893221, 768)
data type of the embeddings:  float16

2.2 索引#

创建一个Faiss索引来存储嵌入向量。

import faiss
import numpy as np

# get the length of our embedding vectors, vectors by bge-base-en-v1.5 have length 768
dim = corpus_embeddings.shape[-1]

# create the faiss index and store the corpus embeddings into the vector space
index = faiss.index_factory(dim, 'Flat', faiss.METRIC_INNER_PRODUCT)
corpus_embeddings = corpus_embeddings.astype(np.float32)
# train and add the embeddings to the index
index.train(corpus_embeddings)
index.add(corpus_embeddings)

print(f"total number of vectors: {index.ntotal}")

total number of vectors: 32893221

2.3 搜索#

使用Faiss索引为每个查询进行搜索。

from tqdm import tqdm

query_size = len(queries_embeddings)

all_scores = []
all_indices = []

for i in tqdm(range(0, query_size, 32), desc="Searching"):
    j = min(i + 32, query_size)
    query_embedding = queries_embeddings[i: j]
    score, indice = index.search(query_embedding.astype(np.float32), k=100)
    all_scores.append(score)
    all_indices.append(indice)

all_scores = np.concatenate(all_scores, axis=0)
all_indices = np.concatenate(all_indices, axis=0)

Searching: 100%|██████████| 25/25 [15:03<00:00, 36.15s/it]

然后将搜索结果映射回数据集中的索引。

results = {}
for idx, (scores, indices) in enumerate(zip(all_scores, all_indices)):
    results[queries_ids[idx]] = {}
    for score, index in zip(scores, indices):
        if index != -1:
            results[queries_ids[idx]][corpus_ids[index]] = float(score)

2.4 评估#

下载用于评估的qrels文件：

endpoint = os.getenv('HF_ENDPOINT', 'https://huggingface.co')
file_name = "qrels.miracl-v1.0-en-dev.tsv"
qrel_url = f"wget {endpoint}/datasets/miracl/miracl/resolve/main/miracl-v1.0-en/qrels/{file_name}"

os.system(qrel_url)

--2024-11-21 10:26:16--  https://hf-mirror.com/datasets/miracl/miracl/resolve/main/miracl-v1.0-en/qrels/qrels.miracl-v1.0-en-dev.tsv
Resolving hf-mirror.com (hf-mirror.com)... 153.121.57.40, 133.242.169.68, 160.16.199.204
Connecting to hf-mirror.com (hf-mirror.com)|153.121.57.40|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 167817 (164K) [text/plain]
Saving to: ‘qrels.miracl-v1.0-en-dev.tsv’

     0K .......... .......... .......... .......... .......... 30%  109K 1s
    50K .......... .......... .......... .......... .......... 61% 44.5K 1s
   100K .......... .......... .......... .......... .......... 91% 69.6K 0s
   150K .......... ...                                        100% 28.0K=2.8s

2024-11-21 10:26:20 (58.6 KB/s) - ‘qrels.miracl-v1.0-en-dev.tsv’ saved [167817/167817]

从文件中读取qrels：

qrels_dict = {}
with open(file_name, "r", encoding="utf-8") as f:
    for line in f.readlines():
        qid, _, docid, rel = line.strip().split("\t")
        qid, docid, rel = str(qid), str(docid), int(rel)
        if qid not in qrels_dict:
            qrels_dict[qid] = {}
        qrels_dict[qid][docid] = rel

最后，使用pytrec_eval库帮助我们计算选定指标的分数：

import pytrec_eval
from collections import defaultdict

ndcg_string = "ndcg_cut." + ",".join([str(k) for k in [10,100]])
recall_string = "recall." + ",".join([str(k) for k in [10,100]])

evaluator = pytrec_eval.RelevanceEvaluator(
    qrels_dict, {ndcg_string, recall_string}
)
scores = evaluator.evaluate(results)

all_ndcgs, all_recalls = defaultdict(list), defaultdict(list)
for query_id in scores.keys():
    for k in [10,100]:
        all_ndcgs[f"NDCG@{k}"].append(scores[query_id]["ndcg_cut_" + str(k)])
        all_recalls[f"Recall@{k}"].append(scores[query_id]["recall_" + str(k)])

ndcg, recall = (
    all_ndcgs.copy(),
    all_recalls.copy(),
)

for k in [10,100]:
    ndcg[f"NDCG@{k}"] = round(sum(ndcg[f"NDCG@{k}"]) / len(scores), 5)
    recall[f"Recall@{k}"] = round(sum(recall[f"Recall@{k}"]) / len(scores), 5)

print(ndcg)
print(recall)

defaultdict(<class 'list'>, {'NDCG@10': 0.46073, 'NDCG@100': 0.54336})
defaultdict(<class 'list'>, {'Recall@10': 0.55972, 'Recall@100': 0.83827})

3. 使用FlagEmbedding进行评估#

我们为热门数据集和基准测试提供独立评估。尝试运行以下代码进行评估，或执行example文件夹中提供的shell脚本。

import sys

arguments = """- \
    --eval_name miracl \
    --dataset_dir ./miracl/data \
    --dataset_names en \
    --splits dev \
    --corpus_embd_save_dir ./miracl/corpus_embd \
    --output_dir ./miracl/search_results \
    --search_top_k 100 \
    --cache_path ./cache/data \
    --overwrite True \
    --k_values 10 100 \
    --eval_output_method markdown \
    --eval_output_path ./miracl/miracl_eval_results.md \
    --eval_metrics ndcg_at_10 recall_at_100 \
    --embedder_name_or_path BAAI/bge-base-en-v1.5 \
    --devices cuda:0 cuda:1 \
    --embedder_batch_size 1024
""".replace('\n','')

sys.argv = arguments.split()

from transformers import HfArgumentParser

from FlagEmbedding.evaluation.miracl import (
    MIRACLEvalArgs, MIRACLEvalModelArgs,
    MIRACLEvalRunner
)


parser = HfArgumentParser((
    MIRACLEvalArgs,
    MIRACLEvalModelArgs
))

eval_args, model_args = parser.parse_args_into_dataclasses()
eval_args: MIRACLEvalArgs
model_args: MIRACLEvalModelArgs

runner = MIRACLEvalRunner(
    eval_args=eval_args,
    model_args=model_args
)

runner.run()

/root/anaconda3/envs/dev/lib/python3.12/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
initial target device: 100%|██████████| 2/2 [00:09<00:00,  4.98s/it]
pre tokenize: 100%|██████████| 16062/16062 [18:01<00:00, 14.85it/s]  
You're using a BertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
/root/anaconda3/envs/dev/lib/python3.12/site-packages/_distutils_hack/__init__.py:54: UserWarning: Reliance on distutils from stdlib is deprecated. Users must rely on setuptools to provide the distutils module. Avoid importing distutils or import setuptools first, and avoid setting SETUPTOOLS_USE_DISTUTILS=stdlib. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml
  warnings.warn(
pre tokenize: 100%|██████████| 16062/16062 [18:44<00:00, 14.29it/s]92s/it]
Inference Embeddings:   0%|          | 42/16062 [00:54<8:28:19,  1.90s/it]You're using a BertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
Inference Embeddings:   0%|          | 43/16062 [00:56<8:22:03,  1.88s/it]/root/anaconda3/envs/dev/lib/python3.12/site-packages/_distutils_hack/__init__.py:54: UserWarning: Reliance on distutils from stdlib is deprecated. Users must rely on setuptools to provide the distutils module. Avoid importing distutils or import setuptools first, and avoid setting SETUPTOOLS_USE_DISTUTILS=stdlib. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml
  warnings.warn(
Inference Embeddings: 100%|██████████| 16062/16062 [48:29<00:00,  5.52it/s] 
Inference Embeddings: 100%|██████████| 16062/16062 [48:55<00:00,  5.47it/s]
Chunks: 100%|██████████| 2/2 [1:10:57<00:00, 2128.54s/it]  
pre tokenize: 100%|██████████| 1/1 [00:11<00:00, 11.06s/it]
pre tokenize: 100%|██████████| 1/1 [00:12<00:00, 12.72s/it]
Inference Embeddings: 100%|██████████| 1/1 [00:00<00:00, 32.15it/s]
Inference Embeddings: 100%|██████████| 1/1 [00:00<00:00, 39.80it/s]
Chunks: 100%|██████████| 2/2 [00:31<00:00, 15.79s/it]
Searching: 100%|██████████| 25/25 [00:00<00:00, 26.24it/s]
Qrels not found in ./miracl/data/en/dev_qrels.jsonl. Trying to download the qrels from the remote and save it to ./miracl/data/en.
--2024-11-20 13:00:40--  https://hf-mirror.com/datasets/miracl/miracl/resolve/main/miracl-v1.0-en/qrels/qrels.miracl-v1.0-en-dev.tsv
Resolving hf-mirror.com (hf-mirror.com)... 133.242.169.68, 153.121.57.40, 160.16.199.204
Connecting to hf-mirror.com (hf-mirror.com)|133.242.169.68|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 167817 (164K) [text/plain]
Saving to: ‘./cache/data/miracl/qrels.miracl-v1.0-en-dev.tsv’

     0K .......... .......... .......... .......... .......... 30%  336K 0s
    50K .......... .......... .......... .......... .......... 61%  678K 0s
   100K .......... .......... .......... .......... .......... 91%  362K 0s
   150K .......... ...                                        100% 39.8K=0.7s

2024-11-20 13:00:42 (231 KB/s) - ‘./cache/data/miracl/qrels.miracl-v1.0-en-dev.tsv’ saved [167817/167817]

Loading and Saving qrels: 100%|██████████| 8350/8350 [00:00<00:00, 184554.95it/s]

with open('miracl/search_results/bge-base-en-v1.5/NoReranker/EVAL/eval_results.json', 'r') as content_file:
    print(content_file.read())

{
    "en-dev": {
        "ndcg_at_10": 0.46053,
        "ndcg_at_100": 0.54313,
        "map_at_10": 0.35928,
        "map_at_100": 0.38726,
        "recall_at_10": 0.55972,
        "recall_at_100": 0.83809,
        "precision_at_10": 0.14018,
        "precision_at_100": 0.02347,
        "mrr_at_10": 0.54328,
        "mrr_at_100": 0.54929
    }
}