immuneML.workflows.instructions.split_dataset package¶
Submodules¶
immuneML.workflows.instructions.split_dataset.SplitDatasetInstruction module¶
- class immuneML.workflows.instructions.split_dataset.SplitDatasetInstruction.SplitDatasetInstruction(state: SplitDatasetState)[source]¶
Bases:
InstructionThis instruction splits the dataset into two as defined by the instruction parameters. It can be used as a first step in clustering to obtain discovery and validation datasets, or to leave out the test dataset for classification.
For classification, TrainMLModel instruction can be used for more complex data splitting (e.g., nested cross-validation with different data splitting strategies).
Specification arguments:
dataset (str): name of the dataset to split, as defined previously in the analysis specification
split_config (SplitConfig): the split configuration; split_count has to be 1
YAML specification:
instructions: split_dataset1: type: SplitDataset dataset: d1 split_config: split_count: 1 split_strategy: random training_percentage: 0.5
- run(result_path: Path) SplitDatasetState[source]¶
- class immuneML.workflows.instructions.split_dataset.SplitDatasetInstruction.SplitDatasetState(dataset: immuneML.data_model.datasets.Dataset.Dataset, split_config: immuneML.hyperparameter_optimization.config.SplitConfig.SplitConfig, name: str = None, result_path: pathlib.Path = None, train_data_path: pathlib.Path = None, test_data_path: pathlib.Path = None)[source]¶
Bases:
object- name: str = None¶
- result_path: Path = None¶
- split_config: SplitConfig¶
- test_data_path: Path = None¶
- train_data_path: Path = None¶