immuneML.workflows.instructions.ml_model_application package¶
Submodules¶
immuneML.workflows.instructions.ml_model_application.MLApplicationInstruction module¶
- class immuneML.workflows.instructions.ml_model_application.MLApplicationInstruction.MLApplicationInstruction(dataset: Dataset, label_configuration: LabelConfiguration, hp_setting: HPSetting, metrics: List[ClassificationMetric], number_of_processes: int, name: str)[source]¶
Bases:
Instruction
Instruction which enables using trained ML models and encoders on new datasets which do not necessarily have labeled data. When the same label is provided as the ML setting was trained for, performance metrics can be computed.
The predictions are stored in the predictions.csv in the result path in the following format:
example_id
cmv_predicted_class
cmv_1_proba
cmv_0_proba
e1
1
0.8
0.2
e2
0
0.2
0.8
e3
1
0.78
0.22
If the same label that the ML setting was trained for is present in the provided dataset, the ‘true’ label value will be added to the predictions table in addition:
example_id
cmv_predicted_class
cmv_1_proba
cmv_0_proba
cmv_true_class
e1
1
0.8
0.2
1
e2
0
0.2
0.8
0
e3
1
0.78
0.22
0
Specification arguments:
dataset: dataset for which examples need to be classified
config_path: path to the zip file exported from MLModelTraining instruction (which includes train ML model, encoder, preprocessing etc.)
number_of_processes (int): how many processes should be created at once to speed up the analysis. For personal machines, 4 or 8 is usually a good choice.
metrics (list): a list of metrics to compute between the true and predicted classes. These metrics will only be computed when the same label with the same classes is provided for the dataset as the original label the ML setting was trained for.
YAML specification:
instructions: instruction_name: type: MLApplication dataset: d1 config_path: ./config.zip metrics: - accuracy - precision - recall number_of_processes: 4
immuneML.workflows.instructions.ml_model_application.MLApplicationState module¶
- class immuneML.workflows.instructions.ml_model_application.MLApplicationState.MLApplicationState(dataset: immuneML.data_model.datasets.Dataset.Dataset, hp_setting: immuneML.hyperparameter_optimization.HPSetting.HPSetting, label_config: immuneML.environment.LabelConfiguration.LabelConfiguration, pool_size: int, name: str, metrics: list = None, path: pathlib.Path = None, predictions_path: pathlib.Path = None, metrics_path: pathlib.Path = None)[source]¶
Bases:
object
- dataset: Dataset¶
- label_config: LabelConfiguration¶
- metrics: list = None¶
- metrics_path: Path = None¶
- name: str¶
- path: Path = None¶
- pool_size: int¶
- predictions_path: Path = None¶