API¶
Todo
Add a nice introductory text
tira.io_utils module¶
- tira.io_utils.all_lines_to_pandas(input_file: str | Iterable[str], load_default_text: bool) DataFrame [source]¶
Todo
add documentation
Todo
this function has two semantics: handling a file and handling file-contents
- tira.io_utils.load_output_of_directory(directory: Path, evaluation: bool = False) Dict | DataFrame [source]¶
- tira.io_utils.parse_jsonl_line(input: str | bytearray | bytes, load_default_text: bool) Dict [source]¶
Deseralizes the line using JSON deserialization. Optionally strips the ‘original_query’ and ‘original_document’ fields from the resulting object and converts the qid and docno fields to strings.
- Parameters:
input (str | bytearray | bytes) – A json-serialized string.
load_default_text (bool) – If true, the original_query and original_document fields are removed and the qid and docno values are converted to strings.
- Returns:
The deserialized and (optionally) processed object.
- Return type:
dict
- Example:
>>> parse_jsonl_line('{}', False) {} >>> parse_jsonl_line('{"original_query": "xxxx"}', False) {'original_query': 'xxxx'} >>> parse_jsonl_line('{"original_query": "xxxx"}', True) {} >>> parse_jsonl_line('{"original_query": "xxxx", "qid": 42, "pi": 3.14}', False) {'original_query': 'xxxx', 'qid': 42, 'pi': 3.14} >>> parse_jsonl_line('{"original_query": "xxxx", "qid": 42, "pi": 3.14}', True) {'qid': '42', 'pi': 3.14}
tira.ir_datasets_util module¶
- class tira.ir_datasets_util.TirexQuery(query_id, text, title, query, description, narrative)[source]¶
Bases:
NamedTuple
- description: str¶
Alias for field number 4
- narrative: str¶
Alias for field number 5
- query: str¶
Alias for field number 3
- query_id: str¶
Alias for field number 0
- text: str¶
Alias for field number 1
- title: str¶
Alias for field number 2
- tira.ir_datasets_util.register_dataset_from_re_rank_file(ir_dataset_id, df_re_rank, original_ir_datasets_id=None)[source]¶
Load a dynamic ir_datasets integration from a given re_rank_file. The dataset will be registered for the id ir_dataset_id. The original_ir_datasets_id is used to infer the class of documents, qrels, and queries.
tira.pyterrier_integration module¶
- class tira.pyterrier_integration.PyTerrierAnceIntegration(tira_client)[source]¶
Bases:
object
The pyterrier_ance integration to re-use cached ANCE indices. Wraps https://github.com/terrierteam/pyterrier_ance
- ance_retrieval(dataset: str)[source]¶
Load a cached pyterrier_ance.ANCEIndexer submitted as workshop-on-open-web-search/ows/pyterrier-anceindex from tira.
- References (for citation):
https://arxiv.org/pdf/2007.00808.pdf https://github.com/microsoft/ANCE/
- Args:
dataset (str): the dataset id, either an tira or ir_datasets id.
- Returns:
pyterrier_ance.ANCERetrieval: the ANCE index.
- class tira.pyterrier_integration.PyTerrierIntegration(tira_client)[source]¶
Bases:
object
- class tira.pyterrier_integration.PyTerrierSpladeIntegration(tira_client)[source]¶
Bases:
object
The pyt_splade integration to re-use cached Splade indices. Wraps https://github.com/cmacdonald/pyt_splade
- splade_index(dataset: str, approach: str = 'workshop-on-open-web-search/naverlabseurope/Splade (Index)')[source]¶
Load a cached pyt_splade index submitted as the passed approach (default ‘workshop-on-open-web-search/naverlabseurope/Splade (Index)’) from tira.
- References (for citation):
https://github.com/naver/splade?tab=readme-ov-file#cite-scroll ToDo: Ask Thibault what to cite.
- Args:
dataset (str): the dataset id, either an tira or ir_datasets id. approach (str, optional): the approach id, defaults ‘workshop-on-open-web-search/naverlabseurope/Splade (Index)’.
- Returns:
The PyTerrier index suitable for retrieval.
tira.pyterrier_util module¶
tira.third_party_integrations module¶
- tira.third_party_integrations.ensure_pyterrier_is_loaded(boot_packages=('com.github.terrierteam:terrier-prf:-SNAPSHOT',), packages=(), patch_ir_datasets=True)[source]¶
- tira.third_party_integrations.extract_previous_stages_from_docker_image(image: str, command: str | None = None)[source]¶
- tira.third_party_integrations.extract_to_be_executed_notebook_from_command_or_none(command: str)[source]¶
- tira.third_party_integrations.get_input_directory_and_output_directory(default_input, default_output: str = '/tmp/')[source]¶
- tira.third_party_integrations.get_preconfigured_chatnoir_client(config_directory, features=['TARGET_URI'], num_results=10, retries=25, page_size=10)[source]¶
- tira.third_party_integrations.persist_and_normalize_run(run, system_name, default_output=None, output_file=None, depth=1000)[source]¶
- tira.third_party_integrations.register_rerank_data_to_ir_datasets(path_to_rerank_file, ir_dataset_id, original_ir_datasets_id=None)[source]¶
Load a dynamic ir_datasets integration from a given re_rank_file. The dataset will be registered for the id ir_dataset_id. The original_ir_datasets_id is used to infer the class of documents, qrels, and queries.