immuneML.encodings.diversity_encoding package¶

Submodules¶

immuneML.encodings.diversity_encoding.EvennessProfileEncoder module¶

class immuneML.encodings.diversity_encoding.EvennessProfileEncoder.EvennessProfileEncoder(min_alpha: float, max_alpha: float, dimension: int, name: str = None)[source]¶

Bases: DatasetEncoder

The EvennessProfileEncoder class encodes a repertoire based on the clonal frequency distribution. The evenness for a given repertoire is defined as follows:

\[^{\alpha} \mathrm{E}(\mathrm{f})=\frac{\left(\sum_{\mathrm{i}=1}^{\mathrm{n}} \mathrm{f}_{\mathrm{i}}^{\alpha}\right)^{\frac{1}{1-\alpha}}}{\mathrm{n}}\]

That is, it is the exponential of Renyi entropy at a given alpha divided by the species richness, or number of unique sequences.

Reference: Greiff et al. (2015). A bioinformatic framework for immune repertoire diversity profiling enables detection of immunological status. Genome Medicine, 7(1), 49. doi.org/10.1186/s13073-015-0169-8

Dataset type:

RepertoireDatasets

Specification arguments:

min_alpha (float): minimum alpha value to use
max_alpha (float): maximum alpha value to use
dimension (int): dimension of output evenness profile vector, or the number of alpha values to linearly space between min_alpha and max_alpha

YAML specification:

definitions:
    encodings:
        my_evenness_profile:
            EvennessProfile:
                min_alpha: 0
                max_alpha: 10
                dimension: 51

STEP_ENCODED = 'encoded'¶

STEP_VECTORIZED = 'vectorized'¶

static build_object(dataset=None, **params)[source]¶

Creates an instance of the relevant subclass of the DatasetEncoder class using the given parameters. This method will be called during parsing time (early in the immuneML run), such that parameters and dataset type can be tested here.

The build_object method should do the following:

Check parameters: immuneML should crash if wrong user parameters are specified. The ParameterValidator utility class may be used for parameter testing.

Check the dataset type: immuneML should crash if the wrong dataset type is specified for this encoder. For example, DeepRCEncoder should only work for RepertoireDatasets and crash if the dataset is of another type.

Create an instance of the correct Encoder class, using the given parameters. Return this object. Some encoders have different subclasses depending on the dataset type. Make sure to return an instance of the correct subclass. For instance: KmerFrequencyEncoder has different subclasses for each dataset type. When the dataset is a Repertoire dataset, KmerFreqRepertoireEncoder should be returned.

Parameters:

dataset – Dataset object of the same class as the dataset to be encoded later; in case there are multiple dataset types supported by the encoder, the dataset should be of one of these types and the correct subclass of the encoder should be returned
**params – keyword arguments that will be provided by users in the specification (if immuneML is used as a command line tool) or in the dictionary when calling the method from the code, and which should be used to create the Encoder object

Returns:

the object of the appropriate Encoder class

dataset_mapping = {'RepertoireDataset': 'EvennessProfileRepertoireEncoder'}¶

encode(dataset, params: EncoderParams)[source]¶

This is the main encoding method of the Encoder. It takes in a given dataset, computes an EncodedData object, and returns a copy of the dataset with the attached EncodedData object.

Parameters:

dataset – A dataset object (Sequence, Receptor or RepertoireDataset)
params – An EncoderParams object containing few utility parameters which may be used during encoding (e.g., number of parallel processes to use).

Returns:

A copy of the original dataset, with an EncodedData object added to the dataset.encoded_data field.

immuneML.encodings.diversity_encoding.EvennessProfileRepertoireEncoder module¶

class immuneML.encodings.diversity_encoding.EvennessProfileRepertoireEncoder.EvennessProfileRepertoireEncoder(min_alpha: float, max_alpha: float, dimension: int, name: str = None)[source]¶

Bases: EvennessProfileEncoder

encode_repertoire(repertoire, params: EncoderParams)[source]¶

get_encoded_repertoire(repertoire_id: str, repertoire: bytes | Repertoire, params: EncoderParams)[source]¶

immuneML.encodings.diversity_encoding.ShannonDiversityEncoder module¶

class immuneML.encodings.diversity_encoding.ShannonDiversityEncoder.ShannonDiversityEncoder(name: str = None)[source]¶

Bases: DatasetEncoder

ShannonDiversity encoder calculates the Shannon diversity index for each repertoire in a dataset. The diversity is computed as:

\[diversity = - \sum_{i=1}^{n} p_i \log(p_i)\]

where \(p_i\) is the clonal count for each unique sequence in the repertoire (from duplicate_count field) divided by the total clonal counts, and \(n\) is the total number of clonotypes (sequences) in the repertoire.

Dataset type:

RepertoireDataset

Specification arguments:

No arguments are needed for this encoder.

YAML specification:

definitions:
    encodings:
        shannon_div_enc: ShannonDiversity

static build_object(dataset: Dataset, **params)[source]¶

The build_object method should do the following:

Check parameters: immuneML should crash if wrong user parameters are specified. The ParameterValidator utility class may be used for parameter testing.

Check the dataset type: immuneML should crash if the wrong dataset type is specified for this encoder. For example, DeepRCEncoder should only work for RepertoireDatasets and crash if the dataset is of another type.

Create an instance of the correct Encoder class, using the given parameters. Return this object. Some encoders have different subclasses depending on the dataset type. Make sure to return an instance of the correct subclass. For instance: KmerFrequencyEncoder has different subclasses for each dataset type. When the dataset is a Repertoire dataset, KmerFreqRepertoireEncoder should be returned.

Parameters:

dataset – Dataset object of the same class as the dataset to be encoded later; in case there are multiple dataset types supported by the encoder, the dataset should be of one of these types and the correct subclass of the encoder should be returned
**params – keyword arguments that will be provided by users in the specification (if immuneML is used as a command line tool) or in the dictionary when calling the method from the code, and which should be used to create the Encoder object

Returns:

the object of the appropriate Encoder class

encode(dataset, params: EncoderParams) → Dataset[source]¶

This is the main encoding method of the Encoder. It takes in a given dataset, computes an EncodedData object, and returns a copy of the dataset with the attached EncodedData object.

Parameters:

dataset – A dataset object (Sequence, Receptor or RepertoireDataset)
params – An EncoderParams object containing few utility parameters which may be used during encoding (e.g., number of parallel processes to use).

Returns:

A copy of the original dataset, with an EncodedData object added to the dataset.encoded_data field.

immuneML.encodings.diversity_encoding package¶

Submodules¶

immuneML.encodings.diversity_encoding.EvennessProfileEncoder module¶

immuneML.encodings.diversity_encoding.EvennessProfileRepertoireEncoder module¶

immuneML.encodings.diversity_encoding.ShannonDiversityEncoder module¶

Module contents¶