API¶

Todo

Add a nice introductory text

tira.io_utils module¶

tira.io_utils.all_environment_variables_for_github_action_or_fail(params)[source]¶

tira.io_utils.all_lines_to_pandas(input_file: str | Iterable[str], load_default_text: bool) → DataFrame[source]¶: Todo

add documentation

Todo

this function has two semantics: handling a file and handling file-contents

tira.io_utils.load_output_of_directory(directory: Path, evaluation: bool = False) → Dict | DataFrame[source]¶

tira.io_utils.parse_jsonl_line(input: str | bytearray | bytes, load_default_text: bool) → Dict[source]¶

Deseralizes the line using JSON deserialization. Optionally strips the ‘original_query’ and ‘original_document’ fields from the resulting object and converts the qid and docno fields to strings.

Parameters:

input (str | bytearray | bytes) – A json-serialized string.
load_default_text (bool) – If true, the original_query and original_document fields are removed and the qid and docno values are converted to strings.

Returns:

The deserialized and (optionally) processed object.

Return type:

dict

Example:

>>> parse_jsonl_line('{}', False)
{}
>>> parse_jsonl_line('{"original_query": "xxxx"}', False)
{'original_query': 'xxxx'}
>>> parse_jsonl_line('{"original_query": "xxxx"}', True)
{}
>>> parse_jsonl_line('{"original_query": "xxxx", "qid": 42, "pi": 3.14}', False)
{'original_query': 'xxxx', 'qid': 42, 'pi': 3.14}
>>> parse_jsonl_line('{"original_query": "xxxx", "qid": 42, "pi": 3.14}', True)
{'qid': '42', 'pi': 3.14}

tira.io_utils.parse_prototext_key_values(file_name)[source]¶

tira.io_utils.run_cmd(cmd: List[str], ignore_failure=False)[source]¶

tira.io_utils.stream_all_lines(input_file: str | Iterable[bytes], load_default_text: bool) → Generator[Dict, Any, Any][source]¶: Todo

add documentation

Todo

this function has two semantics: handling a file and handling file-contents

tira.io_utils.to_prototext(m: List[Dict[str, Any]], upper_k: str = '') → str[source]¶

tira.ir_datasets_util module¶

class tira.ir_datasets_util.DictDocsstore(docs)[source]¶

Bases: object

get(item)[source]¶

get_many_iter(docids)[source]¶

class tira.ir_datasets_util.TirexQuery(query_id, text, title, query, description, narrative)[source]¶

Bases: NamedTuple

default_text()[source]¶: title

description: str¶: Alias for field number 4

narrative: str¶: Alias for field number 5

query: str¶: Alias for field number 3

query_id: str¶: Alias for field number 0

text: str¶: Alias for field number 1

title: str¶: Alias for field number 2

tira.ir_datasets_util.ir_dataset_from_tira_fallback_to_original_ir_datasets()[source]¶

tira.ir_datasets_util.register_dataset_from_re_rank_file(ir_dataset_id, df_re_rank, original_ir_datasets_id=None)[source]¶: Load a dynamic ir_datasets integration from a given re_rank_file. The dataset will be registered for the id ir_dataset_id. The original_ir_datasets_id is used to infer the class of documents, qrels, and queries.

tira.ir_datasets_util.static_ir_dataset(directory, existing_ir_dataset=None)[source]¶

tira.ir_datasets_util.translate_irds_id_to_tirex(dataset)[source]¶

tira.pyterrier_integration module¶

class tira.pyterrier_integration.PyTerrierAnceIntegration(tira_client)[source]¶

Bases: object

The pyterrier_ance integration to re-use cached ANCE indices. Wraps https://github.com/terrierteam/pyterrier_ance

ance_retrieval(dataset: str)[source]¶

Load a cached pyterrier_ance.ANCEIndexer submitted as workshop-on-open-web-search/ows/pyterrier-anceindex from tira.

References (for citation):: https://arxiv.org/pdf/2007.00808.pdf https://github.com/microsoft/ANCE/
Args:: dataset (str): the dataset id, either an tira or ir_datasets id.
Returns:: pyterrier_ance.ANCERetrieval: the ANCE index.

class tira.pyterrier_integration.PyTerrierIntegration(tira_client)[source]¶

Bases: object

create_rerank_file(run_df=None, run_file=None, irds_dataset_id=None)[source]¶

doc_features(approach, dataset, file_selection=('/*.jsonl', '/*.jsonl.gz'))[source]¶

ensure_dataset_is_cached(irds_dataset_id, dataset)[source]¶

from_retriever_submission(approach, dataset, previous_stage=None, datasets=None)[source]¶

from_submission(approach, dataset=None, datasets=None)[source]¶

index(approach, dataset)[source]¶: Load an PyTerrier index from TIRA.

query_features(approach, dataset, file_selection=('/*.jsonl', '/*.jsonl.gz'))[source]¶

reranker(approach, irds_id=None)[source]¶

retriever(approach, dataset=None)[source]¶

transform_documents(approach, dataset, file_selection=('/*.jsonl', '/*.jsonl.gz'), prefix='')[source]¶

transform_queries(approach, dataset, file_selection=('/*.jsonl', '/*.jsonl.gz'), prefix='')[source]¶

class tira.pyterrier_integration.PyTerrierSpladeIntegration(tira_client)[source]¶

Bases: object

The pyt_splade integration to re-use cached Splade indices. Wraps https://github.com/cmacdonald/pyt_splade

splade_index(dataset: str, approach: str = 'workshop-on-open-web-search/naverlabseurope/Splade (Index)')[source]¶

Load a cached pyt_splade index submitted as the passed approach (default ‘workshop-on-open-web-search/naverlabseurope/Splade (Index)’) from tira.

References (for citation):: https://github.com/naver/splade?tab=readme-ov-file#cite-scroll ToDo: Ask Thibault what to cite.
Args:: dataset (str): the dataset id, either an tira or ir_datasets id. approach (str, optional): the approach id, defaults ‘workshop-on-open-web-search/naverlabseurope/Splade (Index)’.
Returns:: The PyTerrier index suitable for retrieval.

tira.pyterrier_util module¶

tira.third_party_integrations module¶

tira.third_party_integrations.ensure_pyterrier_is_loaded(boot_packages=('com.github.terrierteam:terrier-prf:-SNAPSHOT',), packages=(), patch_ir_datasets=True)[source]¶

tira.third_party_integrations.extract_ast_value(v)[source]¶

tira.third_party_integrations.extract_previous_stages_from_docker_image(image: str, command: str | None = None)[source]¶

tira.third_party_integrations.extract_previous_stages_from_notebook(notebook: Path)[source]¶

tira.third_party_integrations.extract_to_be_executed_notebook_from_command_or_none(command: str)[source]¶

tira.third_party_integrations.get_input_directory_and_output_directory(default_input, default_output: str = '/tmp/')[source]¶

tira.third_party_integrations.get_output_directory(default_output: str = '/tmp/')[source]¶

tira.third_party_integrations.get_preconfigured_chatnoir_client(config_directory, features=['TARGET_URI'], num_results=10, retries=25, page_size=10)[source]¶

tira.third_party_integrations.is_running_as_inference_server()[source]¶

tira.third_party_integrations.load_ir_datasets()[source]¶

tira.third_party_integrations.load_rerank_data(default, load_default_text=True)[source]¶

tira.third_party_integrations.normalize_run(run, system_name, depth=1000)[source]¶

tira.third_party_integrations.parse_ast_extract_assignment(python_line: str)[source]¶

tira.third_party_integrations.parse_extraction_of_tira_approach(python_line: str)[source]¶

tira.third_party_integrations.parse_extraction_of_tira_approach_bash(bash_line: str)[source]¶

tira.third_party_integrations.persist_and_normalize_run(run, system_name, default_output=None, output_file=None, depth=1000)[source]¶

tira.third_party_integrations.register_rerank_data_to_ir_datasets(path_to_rerank_file, ir_dataset_id, original_ir_datasets_id=None)[source]¶: Load a dynamic ir_datasets integration from a given re_rank_file. The dataset will be registered for the id ir_dataset_id. The original_ir_datasets_id is used to infer the class of documents, qrels, and queries.

API¶

tira.io_utils module¶

tira.ir_datasets_util module¶

tira.pyterrier_integration module¶

tira.pyterrier_util module¶

tira.third_party_integrations module¶

Module contents¶