immuneML.encodings.filtered_sequence_encoding package

Submodules

immuneML.encodings.filtered_sequence_encoding.SequenceAbundanceEncoder module

class immuneML.encodings.filtered_sequence_encoding.SequenceAbundanceEncoder.SequenceAbundanceEncoder(comparison_attributes, p_value_threshold: float, sequence_batch_size: int, repertoire_batch_size: int, name: Optional[str] = None)[source]

Bases: immuneML.encodings.DatasetEncoder.DatasetEncoder

This encoder represents the repertoires as vectors where:

  • the first element corresponds to the number of label-associated clonotypes

  • the second element is the total number of unique clonotypes

To determine what clonotypes (with features defined by comparison_attributes) are label-associated based on a statistical test. The statistical test used is Fisher’s exact test (one-sided).

Reference: Emerson, Ryan O. et al. ‘Immunosequencing Identifies Signatures of Cytomegalovirus Exposure History and HLA-Mediated Effects on the T Cell Repertoire’. Nature Genetics 49, no. 5 (May 2017): 659–65. doi.org/10.1038/ng.3822.

Parameters
  • comparison_attributes (list) – The attributes to be considered to group receptors into clonotypes. Only the fields specified in

  • will be considered (comparison_attributes) –

  • other fields are ignored. Valid comparison value can be any repertoire field name. (all) –

  • p_value_threshold (float) – The p value threshold to be used by the statistical test.

  • sequence_batch_size (int) – The number of sequences in a batch when comparing sequences across repertoires, typically 100s of thousands.

  • does not affect the results of the encoding (This) –

  • the speed. (only) –

  • repertoire_batch_size (int) – How many repertoires will be loaded at once. This does not affect the result of the encoding, only the speed.

  • value is a trade-off between the number of repertoires that can fit the RAM at the time and loading time from disk. (This) –

YAML specification:

my_sa_encoding:
    SequenceAbundance:
        comparison_attributes:
            - sequence_aas
            - v_genes
            - j_genes
            - chains
            - region_types
        p_value_threshold: 0.05
        sequence_batch_size: 100000
        repertoire_batch_size: 32
RELEVANT_SEQUENCE_ABUNDANCE = 'relevant_sequence_abundance'
TOTAL_SEQUENCE_ABUNDANCE = 'total_sequence_abundance'
static build_object(dataset, **params)[source]
encode(dataset, params: immuneML.encodings.EncoderParams.EncoderParams)[source]
static export_encoder(path: pathlib.Path, encoder) → pathlib.Path[source]
get_additional_files() → List[pathlib.Path][source]
static get_documentation()[source]
static load_encoder(encoder_file: pathlib.Path)[source]
set_context(context: dict)[source]
store(encoded_dataset, params: immuneML.encodings.EncoderParams.EncoderParams)[source]

immuneML.encodings.filtered_sequence_encoding.SequenceFilterHelper module

class immuneML.encodings.filtered_sequence_encoding.SequenceFilterHelper.SequenceFilterHelper[source]

Bases: object

INVALID_P_VALUE = 2
static build_comparison_data(dataset: immuneML.data_model.dataset.RepertoireDataset.RepertoireDataset, context: dict, comparison_attributes: list, params: immuneML.encodings.EncoderParams.EncoderParams, sequence_batch_size: int)[source]
static filter_sequences(dataset: immuneML.data_model.dataset.RepertoireDataset.RepertoireDataset, comparison_data: immuneML.pairwise_repertoire_comparison.ComparisonData.ComparisonData, label: immuneML.environment.Label.Label, p_value_threshold: float)[source]
static find_label_associated_sequence_p_values(comparison_data: immuneML.pairwise_repertoire_comparison.ComparisonData.ComparisonData, repertoires: List[immuneML.data_model.repertoire.Repertoire.Repertoire], label: immuneML.environment.Label.Label)[source]
static get_relevant_sequences(dataset: immuneML.data_model.dataset.RepertoireDataset.RepertoireDataset, params: immuneML.encodings.EncoderParams.EncoderParams, comparison_data: immuneML.pairwise_repertoire_comparison.ComparisonData.ComparisonData, label: str, p_value_threshold, comparison_attributes: list, sequence_indices_path: pathlib.Path)[source]

Module contents