immuneML.util package¶
Submodules¶
immuneML.util.AdaptiveImportHelper module¶
-
class
immuneML.util.AdaptiveImportHelper.
AdaptiveImportHelper
[source]¶ Bases:
object
-
static
parse_germline
(dataframe: pandas.DataFrame, gene_name_replacement: dict, germline_value_replacement: dict)[source]¶
-
static
preprocess_dataframe
(dataframe: pandas.DataFrame, params: immuneML.IO.dataset_import.DatasetImportParams.DatasetImportParams)[source]¶
-
static
immuneML.util.DistanceMetrics module¶
immuneML.util.DocEnumHelper module¶
immuneML.util.EncoderHelper module¶
-
class
immuneML.util.EncoderHelper.
EncoderHelper
[source]¶ Bases:
object
-
static
build_comparison_data
(dataset: immuneML.data_model.dataset.RepertoireDataset.RepertoireDataset, params: immuneML.encodings.EncoderParams.EncoderParams, comparison_attributes, sequence_batch_size)[source]¶
-
static
prepare_training_ids
(dataset: immuneML.data_model.dataset.Dataset.Dataset, params: immuneML.encodings.EncoderParams.EncoderParams)[source]¶
-
static
store
(encoded_dataset, params: immuneML.encodings.EncoderParams.EncoderParams)[source]¶
-
static
immuneML.util.FilenameHandler module¶
-
class
immuneML.util.FilenameHandler.
FilenameHandler
[source]¶ Bases:
object
-
static
get_filename
(class_name: str, file_type: str)[source]¶ converts the class name to snake case and appends given file type :param class_name: name of the class that will be stored in the file :param file_type: file extension: pickle, json :return: filename consisting of concatenated class_name in snake case and file type
-
static
immuneML.util.ImportHelper module¶
-
class
immuneML.util.ImportHelper.
ImportHelper
[source]¶ Bases:
object
-
DATASET_FORMAT
= 'iml_dataset'¶
-
static
build_receptor_from_rows
(first_row, second_row, identifier, chain_pair, metadata_columns)[source]¶
-
static
drop_empty_sequences
(dataframe: pandas.DataFrame, import_empty_aa_sequences: bool, import_empty_nt_sequences: bool) → pandas.DataFrame[source]¶
-
static
drop_illegal_character_sequences
(dataframe: pandas.DataFrame, import_illegal_characters: bool) → pandas.DataFrame[source]¶
-
static
import_dataset
(import_class, params: dict, dataset_name: str) → <module ‘immuneML.data_model.dataset.Dataset’ from ‘/Users/milenpa/PycharmProjects/BMIImmuneML/immuneML/data_model/dataset/Dataset.py’>[source]¶
-
static
import_items
(import_class, path, params: immuneML.IO.dataset_import.DatasetImportParams.DatasetImportParams)[source]¶
-
static
import_receptors
(df, params) → List[immuneML.data_model.receptor.Receptor.Receptor][source]¶
-
static
import_receptors_by_id
(df, identifier, chain_pair, metadata_columns) → List[immuneML.data_model.receptor.Receptor.Receptor][source]¶
-
static
import_repertoire_dataset
(import_class, params: immuneML.IO.dataset_import.DatasetImportParams.DatasetImportParams, dataset_name: str) → immuneML.data_model.dataset.RepertoireDataset.RepertoireDataset[source]¶ Function to create a dataset from the metadata and a list of repertoire files and exports dataset pickle file
- Parameters
import_class – class to use for import
params – instance of DatasetImportParams class which includes information on path, columns, result path etc.
dataset_name – user-defined name of the dataset
- Returns
RepertoireDataset object that was created
-
static
import_sequence
(row, metadata_columns=None) → immuneML.data_model.receptor.receptor_sequence.ReceptorSequence.ReceptorSequence[source]¶
-
static
junction_to_cdr3
(df: pandas.DataFrame, region_type: immuneML.data_model.receptor.RegionType.RegionType)[source]¶ If RegionType is CDR3, the leading C and trailing W are removed from the sequence to match the IMGT CDR3 definition. This method alters the data in the provided dataframe.
-
static
load_repertoire_as_object
(import_class, metadata_row, params: immuneML.IO.dataset_import.DatasetImportParams.DatasetImportParams)[source]¶
-
static
make_new_metadata_file
(repertoires: list, metadata: pandas.DataFrame, result_path: pathlib.Path, dataset_name: str) → pathlib.Path[source]¶
-
static
prepare_frame_type_list
(params: immuneML.IO.dataset_import.DatasetImportParams.DatasetImportParams) → list[source]¶
-
static
rename_dataframe_columns
(df, params: immuneML.IO.dataset_import.DatasetImportParams.DatasetImportParams)[source]¶
-
static
safe_load_dataframe
(filepath, params: immuneML.IO.dataset_import.DatasetImportParams.DatasetImportParams)[source]¶
-
static
strip_suffix
(df: pandas.DataFrame, column_name, delimiter)[source]¶ Safely removes everything after a delimiter from a column in the DataFrame
-
static
update_gene_info
(df: pandas.DataFrame)[source]¶ Updates gene info in 2 steps: - First, columns are added if they were not present. This is done by going from the highest level of information (alleles)
towards the lowest level of information (subgroups) by stripping away suffixes. If gene and subgroup columns were already present, suffixes are still stripped away just in case.
Next, if there are None values present, the highest possible level of information is copied in from the lower level information fields. This is done by moving from subgroups towards alleles. So if for one particular receptor only the subgroup was present, the subgroup will be copied into the genes and alleles column.
-
immuneML.util.KmerHelper module¶
-
class
immuneML.util.KmerHelper.
KmerHelper
[source]¶ Bases:
object
-
static
create_IMGT_gapped_kmers_from_sequence
(sequence: immuneML.data_model.receptor.receptor_sequence.ReceptorSequence.ReceptorSequence, k_left: int, max_gap: int, k_right: Optional[int] = None, min_gap: int = 0)[source]¶
-
static
create_IMGT_kmers_from_sequence
(sequence: immuneML.data_model.receptor.receptor_sequence.ReceptorSequence.ReceptorSequence, k: int)[source]¶
-
static
create_all_kmers
(k: int, alphabet: list)[source]¶ creates all possible k-mers given a k-mer length and an alphabet :param k: length of k-mer (int) :param alphabet: list of characters from which to make all possible k-mers (list) :return: alphabetically sorted list of k-mers
-
static
create_gapped_kmers_from_sequence
(sequence: immuneML.data_model.receptor.receptor_sequence.ReceptorSequence.ReceptorSequence, k_left: int, max_gap: int, k_right: Optional[int] = None, min_gap: int = 0)[source]¶
-
static
create_gapped_kmers_from_string
(sequence, k_left: int, max_gap: int, k_right: Optional[int] = None, min_gap: int = 0)[source]¶
-
static
create_kmers_from_sequence
(sequence: immuneML.data_model.receptor.receptor_sequence.ReceptorSequence.ReceptorSequence, k: int, overlap: bool = True)[source]¶
-
static
create_sentences_from_repertoire
(repertoire: immuneML.data_model.repertoire.Repertoire.Repertoire, k: int, overlap: bool = True)[source]¶
-
static
immuneML.util.NameBuilder module¶
-
class
immuneML.util.NameBuilder.
NameBuilder
[source]¶ Bases:
object
-
static
build_name_from_dict
(dictionary: dict, level=0)[source]¶ Creates a name from dictionary which includes all of its parameters and handles nested dictionaries up to depth of 10 inclusively
- Parameters
dictionary (dict) – dictionary to create the name from
level (int) – controls recursion level, user should keep default
- Returns
name (str)
-
static
immuneML.util.NumpyHelper module¶
immuneML.util.ParameterValidator module¶
-
class
immuneML.util.ParameterValidator.
ParameterValidator
[source]¶ Bases:
object
-
static
assert_all_in_valid_list
(values: list, valid_values: list, location: str, parameter_name: str)[source]¶
-
static
assert_all_type_and_value
(values, parameter_type, location: str, parameter_name: str, min_inclusive=None, max_inclusive=None)[source]¶
-
static
assert_keys
(keys, valid_keys, location: str, parameter_name: str, exclusive: bool = True)[source]¶
-
static
immuneML.util.PathBuilder module¶
immuneML.util.PositionHelper module¶
-
class
immuneML.util.PositionHelper.
PositionHelper
[source]¶ Bases:
object
-
static
adjust_position_weights
(sequence_position_weights: dict, imgt_positions, limit: int) → dict[source]¶ - Parameters
sequence_position_weights – weights supplied by the user as to where in the receptor_sequence to implant
imgt_positions – IMGT positions present in the specific receptor_sequence
limit – how far from the end of the receptor_sequence the motif at latest must start in order not to elongate the receptor_sequence
- Returns
position_weights for implanting a motif instance into a receptor_sequence
-
static
build_position_weights
(sequence_position_weights: dict, imgt_positions, limit: int) → dict[source]¶
-
static
gen_imgt_positions_from_sequence
(sequence: immuneML.data_model.receptor.receptor_sequence.ReceptorSequence.ReceptorSequence)[source]¶
-
static
immuneML.util.ReflectionHandler module¶
-
class
immuneML.util.ReflectionHandler.
ReflectionHandler
[source]¶ Bases:
object
-
static
get_class_from_path
(path, class_name: Optional[str] = None)[source]¶ obtain the class reference from the given path
- Parameters
path (str or pathlib.Path) – path to file where the class is located
class_name (str) – class name to import_dataset from the file; if None, it is assumed that the class name is the same as the file name
- Returns
class
-
static
immuneML.util.RepertoireBuilder module¶
immuneML.util.SequenceAnalysisHelper module¶
-
class
immuneML.util.SequenceAnalysisHelper.
SequenceAnalysisHelper
[source]¶ Bases:
object
-
static
compute_overlap_matrix
(hp_items: List[immuneML.hyperparameter_optimization.states.HPItem.HPItem])[source]¶
-
static
immuneML.util.StringHelper module¶
immuneML.util.TCRdistHelper module¶
-
class
immuneML.util.TCRdistHelper.
TCRdistHelper
[source]¶ Bases:
object
-
static
compute_tcr_dist
(dataset: immuneML.data_model.dataset.ReceptorDataset.ReceptorDataset, labels: list, cores: int = 1)[source]¶
-
static
prepare_tcr_dist_dataframe
(dataset: immuneML.data_model.dataset.ReceptorDataset.ReceptorDataset, labels: list) → pandas.DataFrame[source]¶
-
static