immuneML.encodings.filtered_sequence_encoding package¶

Submodules¶

immuneML.encodings.filtered_sequence_encoding.SequenceAbundanceEncoder module¶

class immuneML.encodings.filtered_sequence_encoding.SequenceAbundanceEncoder.SequenceAbundanceEncoder(comparison_attributes, p_value_threshold: float, sequence_batch_size: int, repertoire_batch_size: int, name: Optional[str] = None)[source]¶

Bases: immuneML.encodings.DatasetEncoder.DatasetEncoder

This encoder represents the repertoires as vectors where:

the first element corresponds to the number of label-associated clonotypes
the second element is the total number of unique clonotypes

To determine what clonotypes (with features defined by comparison_attributes) are label-associated based on a statistical test. The statistical test used is Fisher’s exact test (one-sided).

Reference: Emerson, Ryan O. et al. ‘Immunosequencing Identifies Signatures of Cytomegalovirus Exposure History and HLA-Mediated Effects on the T Cell Repertoire’. Nature Genetics 49, no. 5 (May 2017): 659–65. doi.org/10.1038/ng.3822.

Parameters

comparison_attributes (list) – The attributes to be considered to group receptors into clonotypes. Only the fields specified in
will be considered (comparison_attributes) –
other fields are ignored. Valid comparison value can be any repertoire field name. (all) –
p_value_threshold (float) – The p value threshold to be used by the statistical test.
sequence_batch_size (int) – The number of sequences in a batch when comparing sequences across repertoires, typically 100s of thousands.
does not affect the results of the encoding (This) –
the speed. (only) –
repertoire_batch_size (int) – How many repertoires will be loaded at once. This does not affect the result of the encoding, only the speed.
value is a trade-off between the number of repertoires that can fit the RAM at the time and loading time from disk. (This) –

YAML specification:

my_sa_encoding:
    SequenceAbundance:
        comparison_attributes:
            - sequence_aas
            - v_genes
            - j_genes
            - chains
            - region_types
        p_value_threshold: 0.05
        sequence_batch_size: 100000
        repertoire_batch_size: 32

RELEVANT_SEQUENCE_ABUNDANCE = 'relevant_sequence_abundance'¶

TOTAL_SEQUENCE_ABUNDANCE = 'total_sequence_abundance'¶

static build_object(dataset, **params)[source]¶

encode(dataset, params: immuneML.encodings.EncoderParams.EncoderParams)[source]¶

static export_encoder(path: pathlib.Path, encoder) → pathlib.Path[source]¶

get_additional_files() → List[pathlib.Path][source]¶

static get_documentation()[source]¶

static load_encoder(encoder_file: pathlib.Path)[source]¶

set_context(context: dict)[source]¶

store(encoded_dataset, params: immuneML.encodings.EncoderParams.EncoderParams)[source]¶

immuneML.encodings.filtered_sequence_encoding.SequenceFilterHelper module¶

class immuneML.encodings.filtered_sequence_encoding.SequenceFilterHelper.SequenceFilterHelper[source]¶

Bases: object

INVALID_P_VALUE = 2¶

static build_comparison_data(dataset: immuneML.data_model.dataset.RepertoireDataset.RepertoireDataset, context: dict, comparison_attributes: list, params: immuneML.encodings.EncoderParams.EncoderParams, sequence_batch_size: int)[source]¶

static filter_sequences(dataset: immuneML.data_model.dataset.RepertoireDataset.RepertoireDataset, comparison_data: immuneML.pairwise_repertoire_comparison.ComparisonData.ComparisonData, label: immuneML.environment.Label.Label, p_value_threshold: float)[source]¶

static find_label_associated_sequence_p_values(comparison_data: immuneML.pairwise_repertoire_comparison.ComparisonData.ComparisonData, repertoires: List[immuneML.data_model.repertoire.Repertoire.Repertoire], label: immuneML.environment.Label.Label)[source]¶

static get_relevant_sequences(dataset: immuneML.data_model.dataset.RepertoireDataset.RepertoireDataset, params: immuneML.encodings.EncoderParams.EncoderParams, comparison_data: immuneML.pairwise_repertoire_comparison.ComparisonData.ComparisonData, label: str, p_value_threshold, comparison_attributes: list, sequence_indices_path: pathlib.Path)[source]¶

immuneML.encodings.filtered_sequence_encoding package¶

Submodules¶

immuneML.encodings.filtered_sequence_encoding.SequenceAbundanceEncoder module¶

immuneML.encodings.filtered_sequence_encoding.SequenceFilterHelper module¶

Module contents¶