immuneML.workflows.instructions.subsampling package

Submodules

immuneML.workflows.instructions.subsampling.SubsamplingInstruction module

class immuneML.workflows.instructions.subsampling.SubsamplingInstruction.SubsamplingInstruction(dataset: Dataset, subsampled_dataset_sizes: List[int], dataset_export_formats: list, result_path: Path = None, name: str = None)[source]

Bases: Instruction

Subsampling is an instruction that subsamples a given dataset and creates multiple smaller dataset according to the parameters provided.

Specification arguments:

  • dataset (str): original dataset which will be used as a basis for subsampling

  • subsampled_dataset_sizes (list): a list of dataset sizes (number of examples) each subsampled dataset should have

  • dataset_export_formats (list): in which formats to export the subsampled datasets. Valid formats are class names of any non-abstract class inheriting DataExporter.

YAML specification:

instructions:
    my_subsampling_instruction: # user-defined name of the instruction
        type: Subsampling # which instruction to execute
        dataset: my_dataset # original dataset to be subsampled, with e.g., 300 examples
        subsampled_dataset_sizes: # how large the subsampled datasets should be, one dataset will be created for each list item
            - 200 # one subsampled dataset with 200 examples (200 repertoires if my_dataset was repertoire dataset)
            - 100 # the other subsampled dataset will have 100 examples
        dataset_export_formats: # in which formats to export the subsampled datasets
            - ImmuneML
            - AIRR
export_dataset(new_dataset, new_dataset_path)[source]
static get_documentation()[source]
run(result_path: Path)[source]

immuneML.workflows.instructions.subsampling.SubsamplingState module

class immuneML.workflows.instructions.subsampling.SubsamplingState.SubsamplingState(dataset: immuneML.data_model.datasets.Dataset.Dataset, subsampled_dataset_sizes: List[int] = <factory>, dataset_exporters: List[immuneML.IO.dataset_export.DataExporter.DataExporter] = <factory>, result_path: pathlib.Path = None, name: str = None, subsampled_datasets: List[immuneML.data_model.datasets.Dataset.Dataset] = <factory>, subsampled_dataset_paths: dict = <factory>)[source]

Bases: object

dataset: Dataset
dataset_exporters: List[DataExporter]
name: str = None
result_path: Path = None
subsampled_dataset_paths: dict
subsampled_dataset_sizes: List[int]
subsampled_datasets: List[Dataset]

Module contents