immuneML.data_model.dataset package
Submodules
immuneML.data_model.dataset.Dataset module
immuneML.data_model.dataset.ElementDataset module
- class immuneML.data_model.dataset.ElementDataset.ElementDataset(labels: Optional[dict] = None, encoded_data: Optional[immuneML.data_model.encoded_data.EncodedData.EncodedData] = None, filenames: Optional[list] = None, identifier: Optional[str] = None, file_size: int = 50000, name: Optional[str] = None, element_class_name: Optional[str] = None, element_ids: Optional[list] = None)[source]
Bases:
immuneML.data_model.dataset.Dataset.Dataset
This is the base class for ReceptorDataset and SequenceDataset which implements all the functionality for both classes. The only difference between these two classes is whether paired or single chain data is stored.
- make_subset(example_indices, path, dataset_type: str)[source]
Creates a new dataset object with only those examples (receptors or receptor sequences) available which were given by index in example_indices argument.
- Parameters
example_indices (list) – a list of indices of examples (receptors or receptor sequences) to use in the new dataset
path (Path) – a path where to store the newly created dataset
dataset_type (str) – a type of the dataset used as a part of the name of the resulting dataset; the values are defined as constants in
Dataset
- Returns
a new dataset object (ReceptorDataset or SequenceDataset, as the original dataset) which includes only the examples specified under example_indices
immuneML.data_model.dataset.ReceptorDataset module
- class immuneML.data_model.dataset.ReceptorDataset.ReceptorDataset(labels: Optional[dict] = None, encoded_data: Optional[immuneML.data_model.encoded_data.EncodedData.EncodedData] = None, filenames: Optional[list] = None, identifier: Optional[str] = None, file_size: int = 50000, name: Optional[str] = None, element_class_name: Optional[str] = None, element_ids: Optional[list] = None)[source]
Bases:
immuneML.data_model.dataset.ElementDataset.ElementDataset
A dataset class for receptor datasets (paired chain). All the functionality is implemented in ElementDataset class, except creating a new dataset and obtaining metadata.
- classmethod build_from_objects(receptors: List[immuneML.data_model.receptor.Receptor.Receptor], file_size: int, path: pathlib.Path, name: Optional[str] = None)[source]
immuneML.data_model.dataset.RepertoireDataset module
- class immuneML.data_model.dataset.RepertoireDataset.RepertoireDataset(labels: Optional[dict] = None, encoded_data: Optional[immuneML.data_model.encoded_data.EncodedData.EncodedData] = None, repertoires: Optional[list] = None, identifier: Optional[str] = None, metadata_file: Optional[pathlib.Path] = None, name: Optional[str] = None, metadata_fields: Optional[list] = None, repertoire_ids: Optional[list] = None)[source]
Bases:
immuneML.data_model.dataset.Dataset.Dataset
- add_encoded_data(encoded_data: immuneML.data_model.encoded_data.EncodedData.EncodedData)[source]
- get_label_names(refresh=False)[source]
Returns the list of metadata fields which can be used as labels; if refresh=True, it reloads the fields from disk
- get_metadata(field_names: list, return_df: bool = False)[source]
A function to get the metadata of the repertoires. It can be useful in encodings or reports when the repertoire information needed is not present only in the label chosen for the ML model (e.g., disease), but also other information (e.g., age, HLA).
- Parameters
field_names (list) – list of the metadata fields to return; the fields must be present in the metadata files. To find fields available, use
get_label_names
function.return_df (bool) – determines if the results should be returned as a dataframe where each column corresponds to a field or as a dictionary
- Returns
a dictionary where keys are fields names and values are lists of field values for each repertoire; alternatively returns the same information in dataframe format
- get_metadata_fields(refresh=False)[source]
Returns the list of metadata fields, includes also the fields that will typically not be used as labels, like filename or identifier
- get_repertoire(index: int = - 1, repertoire_identifier: str = '') immuneML.data_model.repertoire.Repertoire.Repertoire [source]
- get_repertoire_ids() list [source]
Returns a list of repertoire identifiers, same as get_example_ids()
- make_subset(example_indices, path: pathlib.Path, dataset_type: str)[source]
Creates a new dataset object with only those examples (repertoires) available which were given by index in example_indices argument.
- Parameters
example_indices (list) – a list of indices of examples (repertoires) to use in the new dataset
path (Path) – a path where to store the newly created dataset
dataset_type (str) – a type of the dataset used as a part of the name of the resulting dataset; the values are defined as constants in
Dataset
- Returns
a new RepertoireDataset object which includes only the repertoires specified under example_indices
immuneML.data_model.dataset.SequenceDataset module
- class immuneML.data_model.dataset.SequenceDataset.SequenceDataset(**kwargs)[source]
Bases:
immuneML.data_model.dataset.ElementDataset.ElementDataset
A dataset class for sequence datasets (single chain). All the functionality is implemented in ElementDataset class, except creating a new dataset and obtaining metadata.
- classmethod build_from_objects(sequences: List[immuneML.data_model.receptor.receptor_sequence.ReceptorSequence.ReceptorSequence], file_size: int, path: pathlib.Path, name: Optional[str] = None)[source]