immuneML.encodings.word2vec package

Submodules

immuneML.encodings.word2vec.W2VRepertoireEncoder module

class immuneML.encodings.word2vec.W2VRepertoireEncoder.W2VRepertoireEncoder(vector_size: int, k: int, model_type: immuneML.encodings.word2vec.model_creator.ModelType.ModelType, name: Optional[str] = None)[source]

Bases: immuneML.encodings.word2vec.Word2VecEncoder.Word2VecEncoder

immuneML.encodings.word2vec.Word2VecEncoder module

class immuneML.encodings.word2vec.Word2VecEncoder.Word2VecEncoder(vector_size: int, k: int, model_type: immuneML.encodings.word2vec.model_creator.ModelType.ModelType, name: Optional[str] = None)[source]

Bases: immuneML.encodings.DatasetEncoder.DatasetEncoder

Word2VecEncoder learns the vector representations of k-mers in the sequences in a repertoire from the context the k-mers appear in. It relies on gensim’s implementation of Word2Vec and KmerHelper for k-mer extraction.

Parameters
  • vector_size (int) – The size of the vector to be learnt.

  • model_type (ModelType) – The context which will be

  • to infer the representation of the sequence. (used) –

:param If SEQUENCE is used: :param the context of: :param a k-mer is defined by the sequence it occurs in (e.g. if the sequence is CASTTY and k-mer is AST: :param : :param then its context consists of k-mers CAS: :param STT: :param TTY): :param If KMER_PAIR is used: :param the context for: :param the k-mer is defined as all the k-mers that within one edit distance (e.g. for k-mer CAS: :param the context: :param includes CAA: :param CAC: :param CAD etc.).: :param Valid values for this parameter are names of the ModelType enum.: :param k: The length of the k-mers used for the encoding. :type k: int

YAML specification:

encodings:
    my_w2v:
        Word2Vec:
            vector_size: 16
            k: 3
            model_type: SEQUENCE
DESCRIPTION_LABELS = 'labels'
DESCRIPTION_REPERTOIRES = 'repertoires'
static build_object(dataset=None, **params)[source]
dataset_mapping = {'RepertoireDataset': 'W2VRepertoireEncoder'}
encode(dataset, params: immuneML.encodings.EncoderParams.EncoderParams)[source]
static export_encoder(path: pathlib.Path, encoder) → str[source]
get_additional_files() → List[str][source]
static get_documentation()[source]
static load_encoder(encoder_file: pathlib.Path)[source]

Module contents