immuneML.util package

Submodules

immuneML.util.AdaptiveImportHelper module

class immuneML.util.AdaptiveImportHelper.AdaptiveImportHelper[source]

Bases: object

static parse_adaptive_germline_to_imgt(dataframe, organism)[source]
static parse_germline(dataframe: pandas.DataFrame, gene_name_replacement: dict, germline_value_replacement: dict)[source]
static preprocess_dataframe(dataframe: pandas.DataFrame, params: immuneML.IO.dataset_import.DatasetImportParams.DatasetImportParams)[source]

immuneML.util.CompAIRRHelper module

class immuneML.util.CompAIRRHelper.CompAIRRHelper[source]

Bases: object

static check_compairr_path(compairr_path)[source]
static determine_compairr_path(compairr_path)[source]
static get_cmd_args(compairr_params, input_file_list, result_path)[source]
static get_repertoire_contents(repertoire, compairr_params)[source]
static process_compairr_output_file(subprocess_result, compairr_params, result_path)[source]
static write_repertoire_file(repertoire_dataset, filename, compairr_params)[source]

immuneML.util.CompAIRRParams module

class immuneML.util.CompAIRRParams.CompAIRRParams(compairr_path: pathlib.Path, keep_compairr_input: bool, differences: int, indels: bool, ignore_counts: bool, ignore_genes: bool, threads: int, output_filename: str, log_filename: str)[source]

Bases: object

compairr_path: pathlib.Path
differences: int
ignore_counts: bool
ignore_genes: bool
indels: bool
keep_compairr_input: bool
log_filename: str
output_filename: str
threads: int

immuneML.util.DistanceMetrics module

immuneML.util.DistanceMetrics.jaccard(vector1, vector2, tmp_vector=None)[source]
immuneML.util.DistanceMetrics.morisita_horn(vector1, vector2, *args, **kwargs)[source]

immuneML.util.DocEnumHelper module

class immuneML.util.DocEnumHelper.DocEnumHelper[source]

Bases: object

static get_enum_names(enum)[source]
static get_enum_names_and_values(enum)[source]

immuneML.util.EncoderHelper module

class immuneML.util.EncoderHelper.EncoderHelper[source]

Bases: object

static build_comparison_data(dataset: immuneML.data_model.dataset.RepertoireDataset.RepertoireDataset, params: immuneML.encodings.EncoderParams.EncoderParams, comparison_attributes, sequence_batch_size)[source]
static build_comparison_params(dataset, comparison_attributes) tuple[source]
static check_positive_class_label(class_name, labels)[source]
static get_current_dataset(dataset, context)[source]
static prepare_training_ids(dataset: immuneML.data_model.dataset.Dataset.Dataset, params: immuneML.encodings.EncoderParams.EncoderParams)[source]
static store(encoded_dataset, params: immuneML.encodings.EncoderParams.EncoderParams)[source]
static sync_encoder_with_cache(cache_params: tuple, encoder_memo_func, encoder, param_names)[source]

immuneML.util.FilenameHandler module

class immuneML.util.FilenameHandler.FilenameHandler[source]

Bases: object

static get_dataset_name(class_name: str)[source]
static get_filename(class_name: str, file_type: str)[source]

converts the class name to snake case and appends given file type :param class_name: name of the class that will be stored in the file :param file_type: file extension: pickle, json :return: filename consisting of concatenated class_name in snake case and file type

static get_model_name(class_name: str, file_type: str = 'pickle')[source]

immuneML.util.ImportHelper module

class immuneML.util.ImportHelper.ImportHelper[source]

Bases: object

DATASET_FORMAT = 'iml_dataset'
static build_receptor_from_rows(first_row, second_row, identifier, chain_pair, metadata_columns)[source]
static drop_empty_sequences(dataframe: pandas.DataFrame, import_empty_aa_sequences: bool, import_empty_nt_sequences: bool) pandas.DataFrame[source]
static drop_illegal_character_sequences(dataframe: pandas.DataFrame, import_illegal_characters: bool) pandas.DataFrame[source]
static extract_sequence_dataset_params(items=None, params=None) dict[source]
static get_chain_for_row(row)[source]
static get_sequence_filenames(path: pathlib.Path, dataset_name: str)[source]
static import_dataset(import_class, params: dict, dataset_name: str) <module 'immuneML.data_model.dataset.Dataset' from '/Users/milenpa/PycharmProjects/BMIImmuneML/immuneML/data_model/dataset/Dataset.py'>[source]
static import_items(import_class, path, params: immuneML.IO.dataset_import.DatasetImportParams.DatasetImportParams)[source]
static import_receptors(df, params) List[immuneML.data_model.receptor.Receptor.Receptor][source]
static import_receptors_by_id(df, identifier, chain_pair, metadata_columns) List[immuneML.data_model.receptor.Receptor.Receptor][source]
static import_repertoire_dataset(import_class, params: immuneML.IO.dataset_import.DatasetImportParams.DatasetImportParams, dataset_name: str) immuneML.data_model.dataset.RepertoireDataset.RepertoireDataset[source]

Function to create a dataset from the metadata and a list of repertoire files and exports dataset pickle file

Parameters
  • import_class – class to use for import

  • params – instance of DatasetImportParams class which includes information on path, columns, result path etc.

  • dataset_name – user-defined name of the dataset

Returns

RepertoireDataset object that was created

static import_sequence(row, metadata_columns=None) immuneML.data_model.receptor.receptor_sequence.ReceptorSequence.ReceptorSequence[source]
static import_sequence_dataset(import_class, params, dataset_name: str)[source]
static is_illegal_sequence(sequence, legal_alphabet) bool[source]
static junction_to_cdr3(df: pandas.DataFrame, region_type: immuneML.data_model.receptor.RegionType.RegionType)[source]

If RegionType is CDR3, the leading C and trailing W are removed from the sequence to match the IMGT CDR3 definition. This method alters the data in the provided dataframe.

static load_chains(df: pandas.DataFrame)[source]
static load_chains_from_chains(df: pandas.DataFrame) list[source]
static load_chains_from_genes(df: pandas.DataFrame) list[source]
static load_dataset_if_exists(params: dict, processed_params, dataset_name: str)[source]
static load_repertoire_as_object(import_class, metadata_row, params: immuneML.IO.dataset_import.DatasetImportParams.DatasetImportParams)[source]
static load_sequence_dataframe(filepath, params, alternative_load_func=None)[source]
static make_new_metadata_file(repertoires: list, metadata: pandas.DataFrame, result_path: pathlib.Path, dataset_name: str) pathlib.Path[source]
static prepare_frame_type_list(params: immuneML.IO.dataset_import.DatasetImportParams.DatasetImportParams) list[source]
static rename_dataframe_columns(df, params: immuneML.IO.dataset_import.DatasetImportParams.DatasetImportParams)[source]
static safe_load_dataframe(filepath, params: immuneML.IO.dataset_import.DatasetImportParams.DatasetImportParams)[source]
static standardize_none_values(dataframe: pandas.DataFrame)[source]
static store_sequence_items(dataset_filenames: list, items: list, sequence_file_size: int)[source]
static strip_alleles(df: pandas.DataFrame, column_name)[source]
static strip_genes(df: pandas.DataFrame, column_name)[source]
static strip_suffix(df: pandas.DataFrame, column_name, delimiter)[source]

Safely removes everything after a delimiter from a column in the DataFrame

static update_gene_info(df: pandas.DataFrame)[source]

Updates gene info in 2 steps:

  • First, columns are added if they were not present. This is done by going from the highest level of information (alleles)

towards the lowest level of information (subgroups) by stripping away suffixes. If gene and subgroup columns were already present, suffixes are still stripped away just in case. - Next, if there are None values present, the highest possible level of information is copied in from the lower level information fields. This is done by moving from subgroups towards alleles. So if for one particular receptor only the subgroup was present, the subgroup will be copied into the genes and alleles column.

immuneML.util.KmerHelper module

class immuneML.util.KmerHelper.KmerHelper[source]

Bases: object

static create_IMGT_gapped_kmers_from_sequence(sequence: immuneML.data_model.receptor.receptor_sequence.ReceptorSequence.ReceptorSequence, sequence_type: immuneML.environment.SequenceType.SequenceType, k_left: int, max_gap: int, k_right: Optional[int] = None, min_gap: int = 0)[source]
static create_IMGT_kmers_from_sequence(sequence: immuneML.data_model.receptor.receptor_sequence.ReceptorSequence.ReceptorSequence, k: int, sequence_type: immuneML.environment.SequenceType.SequenceType)[source]
static create_all_kmers(k: int, alphabet: list)[source]

creates all possible k-mers given a k-mer length and an alphabet :param k: length of k-mer (int) :param alphabet: list of characters from which to make all possible k-mers (list) :return: alphabetically sorted list of k-mers

static create_gapped_kmers_from_sequence(sequence: immuneML.data_model.receptor.receptor_sequence.ReceptorSequence.ReceptorSequence, sequence_type: immuneML.environment.SequenceType.SequenceType, k_left: int, max_gap: int, k_right: Optional[int] = None, min_gap: int = 0)[source]
static create_gapped_kmers_from_string(sequence, k_left: int, max_gap: int, k_right: Optional[int] = None, min_gap: int = 0)[source]
static create_kmers_from_sequence(sequence: immuneML.data_model.receptor.receptor_sequence.ReceptorSequence.ReceptorSequence, k: int, sequence_type: immuneML.environment.SequenceType.SequenceType, overlap: bool = True)[source]
static create_kmers_from_string(sequence, k: int, overlap: bool = True)[source]
static create_kmers_within_HD(kmer: str, alphabet: list, distance: int = 1)[source]
static create_sentences_from_repertoire(repertoire: immuneML.data_model.repertoire.Repertoire.Repertoire, k: int, sequence_type: immuneML.environment.SequenceType.SequenceType, overlap: bool = True)[source]

immuneML.util.Logger module

immuneML.util.Logger.log(func)[source]

immuneML.util.NameBuilder module

class immuneML.util.NameBuilder.NameBuilder[source]

Bases: object

static build_name_from_dict(dictionary: dict, level=0)[source]

Creates a name from dictionary which includes all of its parameters and handles nested dictionaries up to depth of 10 inclusively

Parameters
  • dictionary (dict) – dictionary to create the name from

  • level (int) – controls recursion level, user should keep default

Returns

name (str)

immuneML.util.NumpyHelper module

class immuneML.util.NumpyHelper.NumpyHelper[source]

Bases: object

static get_numpy_representation(obj)[source]

converts object to representation that can be stored without pickle enables in numpy arrays; if it is an object or a dict, it will be serialized to a json string

static group_structured_array_by(data, field)[source]
static is_nan_or_empty(value)[source]
static is_simple_type(t)[source]

returns if the type t is string or a number so that it does not use pickle if serialized

immuneML.util.ParameterValidator module

class immuneML.util.ParameterValidator.ParameterValidator[source]

Bases: object

static assert_all_in_valid_list(values: list, valid_values: list, location: str, parameter_name: str)[source]
static assert_all_type_and_value(values, parameter_type, location: str, parameter_name: str, min_inclusive=None, max_inclusive=None)[source]
static assert_in_valid_list(value, valid_values: list, location: str, parameter_name: str)[source]
static assert_keys(keys, valid_keys, location: str, parameter_name: str, exclusive: bool = True)[source]
static assert_keys_present(values: list, expected_values: list, location: str, parameter_name: str)[source]
static assert_type_and_value(value, parameter_type, location: str, parameter_name: str, min_inclusive=None, max_inclusive=None, exact_value=None)[source]

immuneML.util.PathBuilder module

class immuneML.util.PathBuilder.PathBuilder[source]

Bases: object

static build(path, warn_if_exists=False)[source]

immuneML.util.PositionHelper module

class immuneML.util.PositionHelper.PositionHelper[source]

Bases: object

static adjust_position_weights(sequence_position_weights: dict, imgt_positions, limit: int) dict[source]
Parameters
  • sequence_position_weights – weights supplied by the user as to where in the receptor_sequence to implant

  • imgt_positions – IMGT positions present in the specific receptor_sequence

  • limit – how far from the end of the receptor_sequence the motif at latest must start in order not to elongate the receptor_sequence

Returns

position_weights for implanting a motif instance into a receptor_sequence

static build_position_weights(sequence_position_weights: dict, imgt_positions, limit: int) dict[source]
static gen_imgt_positions_from_length(input_length: int)[source]
static gen_imgt_positions_from_sequence(sequence: immuneML.data_model.receptor.receptor_sequence.ReceptorSequence.ReceptorSequence)[source]

immuneML.util.ReflectionHandler module

class immuneML.util.ReflectionHandler.ReflectionHandler[source]

Bases: object

static all_direct_subclasses(cls, drop_part=None, subdirectory=None)[source]
static all_nonabstract_subclass_basic_names(cls, drop_part: str, subdirectory: str = '')[source]
static all_nonabstract_subclasses(cls, drop_part=None, subdirectory=None)[source]
static all_subclasses(cls)[source]
static discover_classes_by_partial_name(class_name_ending: str, subdirectory: str = '')[source]
static exists(class_name: str, subdirectory: str = '')[source]
static get_class_by_name(class_name: str, subdirectory: str = '')[source]
static get_class_from_path(path, class_name: Optional[str] = None)[source]

obtain the class reference from the given path

Parameters
  • path (str or pathlib.Path) – path to file where the class is located

  • class_name (str) – class name to import_dataset from the file; if None, it is assumed that the class name is the same as the file name

Returns

class

static get_classes_by_partial_name(class_name_ending: str, subdirectory: str = '')[source]
static import_function(function: str, module)[source]
static import_module(name: str, package: Optional[str] = None)[source]
static is_installed(module_name: str) bool[source]

immuneML.util.RepertoireBuilder module

class immuneML.util.RepertoireBuilder.RepertoireBuilder[source]

Bases: object

Helper class for tests: creates repertoires from a list of a list of sequences and stores them in the given path

static build(sequences: list, path: pathlib.Path, labels: Optional[dict] = None, seq_metadata: Optional[list] = None, subject_ids: Optional[list] = None)[source]

immuneML.util.SequenceAnalysisHelper module

class immuneML.util.SequenceAnalysisHelper.SequenceAnalysisHelper[source]

Bases: object

static compute_overlap_matrix(hp_items: List[immuneML.hyperparameter_optimization.states.HPItem.HPItem])[source]

immuneML.util.StringHelper module

class immuneML.util.StringHelper.StringHelper[source]

Bases: object

static camel_case_to_word_string(camel_case_string: str)[source]
static camel_case_to_words(camel_case_string: str)[source]

immuneML.util.TCRdistHelper module

class immuneML.util.TCRdistHelper.TCRdistHelper[source]

Bases: object

static add_default_allele_to_v_gene(v_gene: str)[source]
static compute_tcr_dist(dataset: immuneML.data_model.dataset.ReceptorDataset.ReceptorDataset, label_names: list, cores: int = 1)[source]
static prepare_tcr_dist_dataframe(dataset: immuneML.data_model.dataset.ReceptorDataset.ReceptorDataset, label_names: list) pandas.DataFrame[source]

Module contents