immuneML.dsl.instruction_parsers package

Submodules

immuneML.dsl.instruction_parsers.DatasetExportParser module

class immuneML.dsl.instruction_parsers.DatasetExportParser.DatasetExportParser[source]

Bases: object

Specification of instruction with a random datasets:

definitions:

datasets:

my_generated_dataset: # a dataset to be exported in the given format

format: RandomRepertoireDataset params:

result_path: generated_dataset/ repertoire_count: 100 sequence_count_probabilities:

100: 0.5 120: 0.5

sequence_length_probabilities:
12: 0.333 13: 0.333 14: 0.333

labels:

immune_event_1:
yes: 0.5 no: 0.5

preprocessing_sequences:

my_preprocessing:

my_filter:

ClonesPerRepertoireFilter:
lower_limit: 110 upper_limit: 200

instructions:

my_instruction1: # instruction name

type: DatasetExport datasets: # list of datasets to export

my_generated_dataset

preprocessing_sequence: my_preprocessing_sequence export_formats: # list of formats to export the datasets to

AIRR

ImmuneML

OPTIONAL_KEYS = ['preprocessing_sequence']

REQUIRED_KEYS = ['type', 'datasets', 'export_formats']

parse(key: str, instruction: dict, symbol_table: immuneML.dsl.symbol_table.SymbolTable.SymbolTable, path: Optional[pathlib.Path] = None) → immuneML.workflows.instructions.dataset_generation.DatasetExportInstruction.DatasetExportInstruction[source]

immuneML.dsl.instruction_parsers.ExploratoryAnalysisParser module

class immuneML.dsl.instruction_parsers.ExploratoryAnalysisParser.ExploratoryAnalysisParser[source]

Bases: object

The specification consists of a list of analyses that need to be performed;

Each analysis is defined by a dataset identifier, a report identifier and optionally encoding and labels and are loaded into ExploratoryAnalysisUnit objects;

DSL example for ExploratoryAnalysisInstruction assuming that d1, r1, r2, e1 are defined previously in definitions section:

instruction_name:
    type: ExploratoryAnalysis
    number_of_processes: 4
    analyses:
        my_first_analysis:
            dataset: d1
            report: r1
        my_second_analysis:
            dataset: d1
            encoding: e1
            report: r2
            labels:
                - CD
                - CMV

parse(key: str, instruction: dict, symbol_table: immuneML.dsl.symbol_table.SymbolTable.SymbolTable, path: Optional[pathlib.Path] = None) → immuneML.workflows.instructions.exploratory_analysis.ExploratoryAnalysisInstruction.ExploratoryAnalysisInstruction[source]

immuneML.dsl.instruction_parsers.LabelHelper module

class immuneML.dsl.instruction_parsers.LabelHelper.LabelHelper[source]

Bases: object

static check_label_format(labels: list, instruction_name: str, yaml_location: str)[source]

static create_label_config(labels: list, dataset: immuneML.data_model.dataset.Dataset.Dataset, instruction_name: str, yaml_location: str) → immuneML.environment.LabelConfiguration.LabelConfiguration[source]

immuneML.dsl.instruction_parsers.MLApplicationParser module

class immuneML.dsl.instruction_parsers.MLApplicationParser.MLApplicationParser[source]

Bases: object

Specification example for the MLApplication instruction:

instruction_name:
    type: MLApplication
    dataset: d1
    config_path: ./config.zip
    number_of_processes: 4
    label: CD

parse(key: str, instruction: dict, symbol_table: immuneML.dsl.symbol_table.SymbolTable.SymbolTable, path: pathlib.Path) → immuneML.workflows.instructions.ml_model_application.MLApplicationInstruction.MLApplicationInstruction[source]

immuneML.dsl.instruction_parsers.SimulationParser module

class immuneML.dsl.instruction_parsers.SimulationParser.SimulationParser[source]

Bases: object

YAML specification:

definitions:
    dataset:
        my_dataset:
            ...

    motifs:
        m1:
            seed: AAC # "/" character denotes the gap in the seed if present (e.g. AA/C)
            instantiation:
                GappedKmer:
                    # probability that when hamming distance is allowed a letter in the seed will be replaced by
                    # other alphabet letters - alphabet_weights
                    alphabet_weights:
                        A: 0.2
                        C: 0.2
                        D: 0.4
                        E: 0.2
                    # Relative probabilities of choosing each position in the seed for hamming distance modification.
                    # The probabilities will be scaled to sum to one - position_weights
                    position_weights:
                        0: 1
                        1: 0
                        2: 0
                    hamming_distance_probabilities:
                        0: 0.5 # Hamming distance of 0 (no change) with probability 0.5
                        1: 0.5 # Hamming distance of 1 (one letter change) with probability 0.5
                    min_gap: 0
                    max_gap: 1
    signals:
        s1:
            motifs: # list of all motifs for signal which will be uniformly sampled to get a motif instance for implanting
                - m1
            sequence_position_weights: # likelihood of implanting at IMGT position of receptor sequence
                107: 0.5
            implanting: HealthySequence # choose only sequences with no other signals for to implant one of the motifs
    simulations:
        sim1: # one Simulation object consists of a dict of Implanting objects
            i1:
                dataset_implanting_rate: 0.5 # percentage of repertoire where the signals will be implanted
                repertoire_implanting_rate: 0.01 # percentage of sequences within repertoire where the signals will be implanted
                signals:
                    - s1

instructions:
    my_simulation_instruction:
        type: Simulation
        dataset: my_dataset
        simulation: sim1
        export_formats: [AIRR, ImmuneML]

parse(key: str, instruction: dict, symbol_table: immuneML.dsl.symbol_table.SymbolTable.SymbolTable, path: Optional[pathlib.Path] = None) → immuneML.workflows.instructions.SimulationInstruction.SimulationInstruction[source]

parse_exporters(instruction)[source]

immuneML.dsl.instruction_parsers.SubsamplingParser module

class immuneML.dsl.instruction_parsers.SubsamplingParser.SubsamplingParser[source]

Bases: object

parse(key: str, instruction: dict, symbol_table: immuneML.dsl.symbol_table.SymbolTable.SymbolTable, path: Optional[pathlib.Path] = None) → immuneML.workflows.instructions.subsampling.SubsamplingInstruction.SubsamplingInstruction[source]

immuneML.dsl.instruction_parsers.TrainMLModelParser module

class immuneML.dsl.instruction_parsers.TrainMLModelParser.TrainMLModelParser[source]

Bases: object

parse(key: str, instruction: dict, symbol_table: immuneML.dsl.symbol_table.SymbolTable.SymbolTable, path: Optional[pathlib.Path] = None) → immuneML.workflows.instructions.TrainMLModelInstruction.TrainMLModelInstruction[source]

immuneML.dsl.instruction_parsers package

Submodules

immuneML.dsl.instruction_parsers.DatasetExportParser module

immuneML.dsl.instruction_parsers.ExploratoryAnalysisParser module

immuneML.dsl.instruction_parsers.LabelHelper module

immuneML.dsl.instruction_parsers.MLApplicationParser module

immuneML.dsl.instruction_parsers.SimulationParser module

immuneML.dsl.instruction_parsers.SubsamplingParser module

immuneML.dsl.instruction_parsers.TrainMLModelParser module

Module contents