immuneML.simulation.implants package¶

Submodules¶

immuneML.simulation.implants.ImplantAnnotation module¶

class immuneML.simulation.implants.ImplantAnnotation.ImplantAnnotation(signal_id: str = None, motif_id: str = None, motif_instance: str = None, position: int = None)[source]¶

Bases: object

motif_id: str = None¶

motif_instance: str = None¶

position: int = None¶

signal_id: str = None¶

immuneML.simulation.implants.LigoPWM module¶

class immuneML.simulation.implants.LigoPWM.LigoPWM(identifier: str, file_path: Path, pwm_matrix: PWM, threshold: float)[source]¶

Bases: Motif

Motifs defined by a positional weight matrix and using bionumpy’s PWM internally. For more details on bionumpy’s implementation of PWM, as well as for supported formats, see the documentation at https://bionumpy.github.io/bionumpy/tutorials/position_weight_matrix.html.

Specification arguments:

file_path: path to the file where the PWM is stored
threshold (float): when matching PWM to a sequence, this is the threshold to consider the sequence as containing the motif

YAML specification:

definitions:
    motifs:
        my_custom_pwm: # this will be the identifier of the motif
            file_path: my_pwm_1.csv
            threshold: 2

classmethod build(identifier: str, file_path, threshold: float)[source]¶

file_path: Path¶

get_all_possible_instances(sequence_type: SequenceType)[source]¶

get_alphabet() → List[str][source]¶

get_max_length() → int[source]¶

instantiate_motif(sequence_type: SequenceType = SequenceType.AMINO_ACID)[source]¶

pwm_matrix: PWM¶

threshold: float¶

immuneML.simulation.implants.Motif module¶

class immuneML.simulation.implants.Motif.Motif(identifier: str)[source]¶

Bases: object

Motifs are the objects which are implanted into sequences during simulation. They are defined under definitions/motifs. There are several different motif types, each having their own parameters.

abstract get_all_possible_instances(sequence_type: SequenceType)[source]¶

abstract get_alphabet() → List[str][source]¶

abstract get_max_length() → int[source]¶

identifier: str¶

abstract instantiate_motif(sequence_type: SequenceType = SequenceType.AMINO_ACID) → MotifInstance[source]¶

immuneML.simulation.implants.MotifInstance module¶

class immuneML.simulation.implants.MotifInstance.MotifInstance(instance: str, gap: int)[source]¶: Bases: object

class immuneML.simulation.implants.MotifInstance.MotifInstanceGroup(iterable=(), /)[source]¶: Bases: list

immuneML.simulation.implants.SeedMotif module¶

class immuneML.simulation.implants.SeedMotif.SeedMotif(identifier: str, seed: str = None, min_gap: int = 0, max_gap: int = 0, hamming_distance_probabilities: dict = None, position_weights: dict = None, alphabet_weights: dict = None, all_possible_instances: list = None)[source]¶

Bases: Motif

Describes motifs by seed, possible gaps, allowed hamming distances, positions that can be changed and what they can be changed to.

Specification arguments:

seed (str): An amino acid sequence that represents the basic motif seed. All implanted motifs correspond to the seed, or a modified version thereof, as specified in its instantiation strategy. If this argument is set, seed_chain1 and seed_chain2 arguments are not used.
min_gap (int): The minimum gap length, in case the original seed contains a gap.
max_gap (int): The maximum gap length, in case the original seed contains a gap.
hamming_distance_probabilities (dict): The probability of modifying the given seed with each number of modifications. The keys represent the number of modifications (hamming distance) between the original seed and the implanted motif, and the values represent the probabilities for the respective number of modifications. For example {0: 0.7, 1: 0.3} means that 30% of the time one position will be modified, and the remaining 70% of the time the motif will remain unmodified with respect to the seed. The values of hamming_distance_probabilities must sum to 1.
position_weights (dict): A dictionary containing the relative probabilities of choosing each position for hamming distance modification. The keys represent the position in the seed, where counting starts at 0. If the index of a gap is specified in position_weights, it will be removed. The values represent the relative probabilities for modifying each position when it gets selected for modification. For example {0: 0.6, 1: 0, 2: 0.4} means that when a sequence is selected for a modification (as specified in hamming_distance_probabilities), then 60% of the time the amino acid at index 0 is modified, and the remaining 40% of the time the amino acid at index 2. If the values of position_weights do not sum to 1, the remainder will be redistributed over all positions, including those not specified.
alphabet_weights (dict): A dictionary describing the relative probabilities of choosing each amino acid for hamming distance modification. The keys of the dictionary represent the amino acids and the values are the relative probabilities for choosing this amino acid. If the values of alphabet_weights do not sum to 1, the remainder will be redistributed over all possible amino acids, including those not specified.

YAML specification:

definitions:
    motifs:
        # examples for single chain receptor data
        my_simple_motif: # this will be the identifier of the motif
            seed: AAA # motif is always AAA
        my_gapped_motif:
            seed: AA/A # this motif can be AAA, AA_A, CAA, CA_A, DAA, DA_A, EAA, EA_A
            min_gap: 0
            max_gap: 1
            hamming_distance_probabilities: # it can have a max of 1 substitution
                0: 0.7
                1: 0.3
            position_weights: # note that index 2, the position of the gap, is excluded from position_weights
                0: 1 # only first position can be changed
                1: 0
                3: 0
            alphabet_weights: # the first A can be replaced by C, D or E
                C: 0.4
                D: 0.4
                E: 0.2

all_possible_instances: list = None¶

alphabet_weights: dict = None¶

get_all_possible_instances(sequence_type: SequenceType)[source]¶

get_alphabet() → List[str][source]¶

get_max_length()[source]¶

hamming_distance_probabilities: dict = None¶

instantiate_motif(sequence_type: SequenceType = SequenceType.AMINO_ACID) → MotifInstance[source]¶

max_gap: int = 0¶

min_gap: int = 0¶

position_weights: dict = None¶

seed: str = None¶

set_default_weights(weights, allowed_keys)[source]¶

immuneML.simulation.implants.Signal module¶

class immuneML.simulation.implants.Signal.Signal(id: str, motifs: List[Motif | List[Motif]] = None, sequence_position_weights: dict = None, v_call: str = None, j_call: str = None, clonal_frequency: dict = None, is_present_custom_func: Callable = None)[source]¶

Bases: object

A signal represents a collection of motifs, and optionally, position weights showing where one of the motifs of the signal can occur in a sequence. The signals are defined under definitions/signals.

A signal is associated with a metadata label, which is assigned to a receptor or repertoire. For example antigen-specific/disease-associated (receptor) or diseased (repertoire).

Note

IMGT positions

To use sequence position weights, IMGT positions should be explicitly specified as strings, under quotation marks, to allow for all positions to be properly distinguished.

Specification arguments:

motifs (list): A list of the motifs associated with this signal, either defined by seed or by position weight matrix. Alternatively, it can be a list of a list of motifs, in which case the motifs in the same sublist (max 2 motifs) have to co-occur in the same sequence
sequence_position_weights (dict): a dictionary specifying for each IMGT position in the sequence how likely it is for the signal to be there. If the position is not present in the sequence, the probability of the signal occurring at that position will be redistributed to other positions with probabilities that are not explicitly set to 0 by the user.
v_call (str): V gene with allele if available that has to co-occur with one of the motifs for the signal to exist; can be used in combination with rejection sampling, or full sequence implanting, otherwise ignored; to match in a sequence for rejection sampling, it is checked if this value is contained in the same field of generated sequence;
j_call (str): J gene with allele if available that has to co-occur with one of the motifs for the signal to exist; can be used in combination with rejection sampling, or full sequence implanting, otherwise ignored; to match in a sequence for rejection sampling, it is checked if this value is contained in the same field of generated sequence;
source_file (str): path to the file where the custom signal function is; cannot be combined with the arguments listed above (motifs, v_call, j_call, sequence_position_weights)
is_present_func (str): name of the function from the source_file file that will be used to specify the signal; the function’s signature must be:

def is_present(sequence_aa: str, sequence: str, v_call: str, j_call: str) -> bool:
    # custom implementation where all or some of these arguments can be used

clonal_frequency (dict): clonal frequency in Ligo is simulated through scipy’s zeta distribution function for generating random numbers, with parameters provided under clonal_frequency parameter. If clonal frequency should not be used, this parameter can be None

clonal_frequency:
  a: 2 # shape parameter of the distribution
  loc: 0 # 0 by default but can be used to shift the distribution

YAML specification:

definitions:
    signals:
        my_signal:
            motifs:
                - my_simple_motif
                - my_gapped_motif
            sequence_position_weights:
                '109': 0.5
                '110': 0.5
            v_call: TRBV1
            j_call: TRBJ1
            clonal_frequency:
                a: 2
                loc: 0
        signal_with_custom_func:
            source_file: signal_func.py
            is_present_func: is_signal_present
            clonal_frequency:
                a: 2
                loc: 0

clonal_frequency: dict = None¶

get_all_motif_instances(sequence_type: SequenceType)[source]¶

id: str¶

is_present_custom_func: Callable = None¶

j_call: str = None¶

make_motif_instances(count, sequence_type: SequenceType)[source]¶

motifs: List[Motif | List[Motif]] = None¶

sequence_position_weights: dict = None¶

v_call: str = None¶

class immuneML.simulation.implants.Signal.SignalPair(signal1: immuneML.simulation.implants.Signal.Signal, signal2: immuneML.simulation.implants.Signal.Signal)[source]¶

Bases: object

property clonal_frequency¶

property id: str¶

property j_call¶

signal1: Signal¶

signal2: Signal¶

property v_call¶

immuneML.simulation.implants package¶

Submodules¶

immuneML.simulation.implants.ImplantAnnotation module¶

immuneML.simulation.implants.LigoPWM module¶

immuneML.simulation.implants.Motif module¶

immuneML.simulation.implants.MotifInstance module¶

immuneML.simulation.implants.SeedMotif module¶

immuneML.simulation.implants.Signal module¶

Module contents¶