immuneML.encodings.word2vec package¶
Submodules¶
immuneML.encodings.word2vec.W2VRepertoireEncoder module¶
-
class
immuneML.encodings.word2vec.W2VRepertoireEncoder.
W2VRepertoireEncoder
(vector_size: int, k: int, model_type: immuneML.encodings.word2vec.model_creator.ModelType.ModelType, name: Optional[str] = None)[source]¶ Bases:
immuneML.encodings.word2vec.Word2VecEncoder.Word2VecEncoder
immuneML.encodings.word2vec.Word2VecEncoder module¶
-
class
immuneML.encodings.word2vec.Word2VecEncoder.
Word2VecEncoder
(vector_size: int, k: int, model_type: immuneML.encodings.word2vec.model_creator.ModelType.ModelType, name: Optional[str] = None)[source]¶ Bases:
immuneML.encodings.DatasetEncoder.DatasetEncoder
Word2VecEncoder learns the vector representations of k-mers in the sequences in a repertoire from the context the k-mers appear in. It relies on gensim’s implementation of Word2Vec and KmerHelper for k-mer extraction.
- Parameters
vector_size (int) – The size of the vector to be learnt.
model_type (
ModelType
) – The context which will beto infer the representation of the sequence. (used) –
:param If
SEQUENCE
is used: :param the context of: :param a k-mer is defined by the sequence it occurs in (e.g. if the sequence is CASTTY and k-mer is AST: :param : :param then its context consists of k-mers CAS: :param STT: :param TTY): :param IfKMER_PAIR
is used: :param the context for: :param the k-mer is defined as all the k-mers that within one edit distance (e.g. for k-mer CAS: :param the context: :param includes CAA: :param CAC: :param CAD etc.).: :param Valid values for this parameter are names of the ModelType enum.: :param k: The length of the k-mers used for the encoding. :type k: intYAML specification:
encodings: my_w2v: Word2Vec: vector_size: 16 k: 3 model_type: SEQUENCE
-
DESCRIPTION_LABELS
= 'labels'¶
-
DESCRIPTION_REPERTOIRES
= 'repertoires'¶
-
dataset_mapping
= {'RepertoireDataset': 'W2VRepertoireEncoder'}¶
-
encode
(dataset, params: immuneML.encodings.EncoderParams.EncoderParams)[source]¶