immuneML.workflows.instructions.ml_model_application package

Submodules

immuneML.workflows.instructions.ml_model_application.MLApplicationInstruction module

class immuneML.workflows.instructions.ml_model_application.MLApplicationInstruction.MLApplicationInstruction(dataset: Dataset, label_configuration: LabelConfiguration, hp_setting: HPSetting, metrics: List[ClassificationMetric], number_of_processes: int, name: str)[source]

Bases: Instruction

Instruction which enables using trained ML models and encoders on new datasets which do not necessarily have labeled data. When the same label is provided as the ML setting was trained for, performance metrics can be computed.

The predictions are stored in the predictions.csv in the result path in the following format:

example_id

cmv_predicted_class

cmv_1_proba

cmv_0_proba

e1

1

0.8

0.2

e2

0

0.2

0.8

e3

1

0.78

0.22

If the same label that the ML setting was trained for is present in the provided dataset, the ‘true’ label value will be added to the predictions table in addition:

example_id

cmv_predicted_class

cmv_1_proba

cmv_0_proba

cmv_true_class

e1

1

0.8

0.2

1

e2

0

0.2

0.8

0

e3

1

0.78

0.22

0

Specification arguments:

  • dataset: dataset for which examples need to be classified

  • config_path: path to the zip file exported from MLModelTraining instruction (which includes train ML model, encoder, preprocessing etc.)

  • number_of_processes (int): how many processes should be created at once to speed up the analysis. For personal machines, 4 or 8 is usually a good choice.

  • metrics (list): a list of metrics to compute between the true and predicted classes. These metrics will only be computed when the same label with the same classes is provided for the dataset as the original label the ML setting was trained for.

YAML specification:

instructions:
    instruction_name:
        type: MLApplication
        dataset: d1
        config_path: ./config.zip
        metrics:
        - accuracy
        - precision
        - recall
        number_of_processes: 4
static get_documentation()[source]
run(result_path: Path)[source]

immuneML.workflows.instructions.ml_model_application.MLApplicationState module

class immuneML.workflows.instructions.ml_model_application.MLApplicationState.MLApplicationState(dataset: immuneML.data_model.datasets.Dataset.Dataset, hp_setting: immuneML.hyperparameter_optimization.HPSetting.HPSetting, label_config: immuneML.environment.LabelConfiguration.LabelConfiguration, pool_size: int, name: str, metrics: list = None, path: pathlib.Path = None, predictions_path: pathlib.Path = None, metrics_path: pathlib.Path = None)[source]

Bases: object

dataset: Dataset
hp_setting: HPSetting
label_config: LabelConfiguration
metrics: list = None
metrics_path: Path = None
name: str
path: Path = None
pool_size: int
predictions_path: Path = None

Module contents