immuneML.encodings.distance_encoding package¶
Submodules¶
immuneML.encodings.distance_encoding.DistanceEncoder module¶
-
class
immuneML.encodings.distance_encoding.DistanceEncoder.
DistanceEncoder
(distance_metric: immuneML.encodings.distance_encoding.DistanceMetricType.DistanceMetricType, attributes_to_match: list, sequence_batch_size: int, context: Optional[dict] = None, name: Optional[str] = None)[source]¶ Bases:
immuneML.encodings.DatasetEncoder.DatasetEncoder
Encodes a given RepertoireDataset as distance matrix, where the pairwise distance between each of the repertoires is calculated. The distance is calculated based on the presence/absence of elements defined under attributes_to_match. Thus, if attributes_to_match contains only ‘sequence_aas’, this means the distance between two repertoires is maximal if they contain the same set of sequence_aas, and the distance is minimal if none of the sequence_aas are shared between two repertoires.
- Parameters
distance_metric (
DistanceMetricType
) – The metric used to calculate thebetween two repertoires. Names of different distance metric types are allowed values in the specification. (distance) –
attributes_to_match – The attributes to consider when determining whether a sequence is present in both repertoires.
the fields defined under attributes_to_match will be considered (Only) –
other fields are ignored. (all) –
values include any repertoire attribute (Valid) –
YAML specification:
my_distance_encoder: Distance: distance_metric: JACCARD sequence_batch_size: 1000 attributes_to_match: - sequence_aas - v_genes - j_genes - chains - region_types
-
build_distance_matrix
(dataset: immuneML.data_model.dataset.RepertoireDataset.RepertoireDataset, params: immuneML.encodings.EncoderParams.EncoderParams, train_repertoire_ids: list)[source]¶
-
build_labels
(dataset: immuneML.data_model.dataset.RepertoireDataset.RepertoireDataset, params: immuneML.encodings.EncoderParams.EncoderParams) → dict[source]¶
immuneML.encodings.distance_encoding.DistanceMetricType module¶
immuneML.encodings.distance_encoding.TCRdistEncoder module¶
-
class
immuneML.encodings.distance_encoding.TCRdistEncoder.
TCRdistEncoder
(cores: int, name: Optional[str] = None)[source]¶ Bases:
immuneML.encodings.DatasetEncoder.DatasetEncoder
Encodes the given ReceptorDataset as a distance matrix between all receptors, where the distance is computed using TCRdist from the paper: Dash P, Fiore-Gartland AJ, Hertz T, et al. Quantifiable predictive features define epitope-specific T cell receptor repertoires. Nature. 2017; 547(7661):89-93. doi:10.1038/nature22383.
For the implementation, TCRdist3 library was used (source code available here).
- Parameters
cores (int) – number of processes to use for the computation
YAML specification:
my_tcr_dist_enc: # user-defined name TCRdist: cores: 4
-
encode
(dataset, params: immuneML.encodings.EncoderParams.EncoderParams)[source]¶