immuneML.workflows.instructions.split_dataset package

Submodules

immuneML.workflows.instructions.split_dataset.SplitDatasetInstruction module

class immuneML.workflows.instructions.split_dataset.SplitDatasetInstruction.SplitDatasetInstruction(state: SplitDatasetState)[source]

Bases: Instruction

This instruction splits the dataset into two as defined by the instruction parameters. It can be used as a first step in clustering to obtain discovery and validation datasets, or to leave out the test dataset for classification.

For classification, TrainMLModel instruction can be used for more complex data splitting (e.g., nested cross-validation with different data splitting strategies).

Specification arguments:

  • dataset (str): name of the dataset to split, as defined previously in the analysis specification

  • split_config (SplitConfig): the split configuration; split_count has to be 1

YAML specification:

instructions:
    split_dataset1:
        type: SplitDataset
        dataset: d1
        split_config:
            split_count: 1
            split_strategy: random
            training_percentage: 0.5
run(result_path: Path) SplitDatasetState[source]
class immuneML.workflows.instructions.split_dataset.SplitDatasetInstruction.SplitDatasetState(dataset: immuneML.data_model.datasets.Dataset.Dataset, split_config: immuneML.hyperparameter_optimization.config.SplitConfig.SplitConfig, name: str = None, result_path: pathlib.Path = None, train_data_path: pathlib.Path = None, test_data_path: pathlib.Path = None)[source]

Bases: object

dataset: Dataset
name: str = None
result_path: Path = None
split_config: SplitConfig
test_data_path: Path = None
train_data_path: Path = None

Module contents