immuneML.data_model.repertoire package

Submodules

immuneML.data_model.repertoire.Repertoire module

class immuneML.data_model.repertoire.Repertoire.Repertoire(data_filename: Path, metadata_filename: Path, identifier: str)[source]

Bases: DatasetItem

Repertoire object consisting of sequence objects, each sequence attribute is stored as a list across all sequences and can be loaded separately. Internally, this class relies on numpy to store/import_dataset the data.

FIELDS = ('sequence_aas', 'sequences', 'v_genes', 'j_genes', 'v_subgroups', 'j_subgroups', 'v_alleles', 'j_alleles', 'chains', 'counts', 'region_types', 'frame_types', 'sequence_identifiers', 'cell_ids')
classmethod build(sequence_aas: list = None, sequences: list = None, v_genes: list = None, j_genes: list = None, v_subgroups: list = None, j_subgroups: list = None, v_alleles: list = None, j_alleles: list = None, chains: list = None, counts: list = None, region_types: list = None, frame_types: list = None, custom_lists: dict = None, sequence_identifiers: list = None, path: Path = None, metadata: dict = None, signals: dict = None, cell_ids: List[str] = None, filename_base: str = None)[source]
classmethod build_from_sequence_objects(sequence_objects: list, path: Path, metadata: dict, filename_base: str = None)[source]
classmethod build_like(repertoire, indices_to_keep: list, result_path: Path, filename_base: str = None)[source]
property cells: CellList
A property that creates a list of Cell objects based on the cell_ids field in the following manner:
  • all sequences that have the same cell_id are grouped together

  • they are divided into groups based on the chain

  • all valid combinations of chains are created and used to make a receptor object - this means that if a cell has two beta (b1 and b2) and one alpha chain (a1), two receptor objects will be created: receptor1 (b1, a1), receptor2 (b2, a1)

  • an object of the Cell class is created from all receptors with the same cell_id created as described in the previous steps

To avoid have multiple receptors in the same cell, use some of the preprocessing classes which could merge/eliminate multiple sequences. See the documentation of the preprocessing module for more information.

Returns:

a list of objects of Cell class

Return type:

CellList

static check_count(sequence_aas: list = None, sequences: list = None, custom_lists: dict = None) int[source]
free_memory()[source]
get_attribute(attribute)[source]
get_attributes(attributes: list)[source]
get_chains()[source]
get_counts()[source]
get_element_count()[source]
get_j_genes()[source]
get_region_type()[source]
get_sequence_aas()[source]
get_sequence_identifiers()[source]
get_sequence_objects(load_implants: bool = True) List[ReceptorSequence][source]

Lazily loads sequences from disk to reduce RAM consumption

Parameters:

load_implants – whether implants should be parsed to objects and converted to ImplantAnnotations; if True, might slow down the loading

Returns:

a list of ReceptorSequence objects

get_v_genes()[source]
load_data()[source]
static process_custom_lists(custom_lists)[source]
property receptors: List[Receptor]
A property that creates a list of Receptor objects based on the cell_ids field in the following manner:
  • all sequences that have the same cell_id are grouped together

  • they are divided into groups based on the chain

  • all valid combinations of chains are created and used to make a receptor object - this means that if a cell has two beta (b1 and b2) and one alpha chain (a1), two receptor objects will be created: receptor1 (b1, a1), receptor2 (b2, a1)

To avoid have multiple receptors in the same cell, use some of the preprocessing classes which could merge/eliminate multiple sequences. See the documentation of the preprocessing module for more information.

Returns:

a list of objects of Receptor class

Return type:

ReceptorList

property sequences

Module contents