immuneML.workflows.instructions.exploratory_analysis package

Submodules

immuneML.workflows.instructions.exploratory_analysis.ExploratoryAnalysisInstruction module

class immuneML.workflows.instructions.exploratory_analysis.ExploratoryAnalysisInstruction.ExploratoryAnalysisInstruction(exploratory_analysis_units: dict, name: str = None)[source]

Bases: Instruction

Allows exploratory analysis of different datasets using encodings and reports.

Analysis is defined by a dictionary of ExploratoryAnalysisUnit objects that encapsulate a dataset, an encoding [optional] and a report to be executed on the [encoded] dataset. Each analysis specified under analyses is completely independent from all others.

Specification arguments:

  • analyses (dict): a dictionary of analyses to perform. The keys are the names of different analyses, and the values for each of the analyses are:

    • dataset: dataset on which to perform the exploratory analysis

    • preprocessing_sequence: which preprocessings to use on the dataset, this item is optional and does not have to be specified.

    • example_weighting: which example weighting strategy to use before encoding the data, this item is optional and does not have to be specified.

    • encoding: how to encode the dataset before running the report, this item is optional and does not have to be specified.

    • labels: if encoding is specified, the relevant labels should be specified here.

    • dim_reduction: which dimensionality reduction to apply;

    • report: which report to run on the dataset. Reports specified here may be of the category Data reports or Encoding reports, depending on whether ‘encoding’ was specified.

  • number_of_processes: (int): how many processes should be created at once to speed up the analysis. For personal machines, 4 or 8 is usually a good choice.

YAML specification:

instructions:
    my_expl_analysis_instruction: # user-defined instruction name
        type: ExploratoryAnalysis # which instruction to execute
        analyses: # analyses to perform
            my_first_analysis: # user-defined name of the analysis
                dataset: d1 # dataset to use in the first analysis
                preprocessing_sequence: p1 # preprocessing sequence to use in the first analysis
                report: r1 # which report to generate using the dataset d1
            my_second_analysis: # user-defined name of another analysis
                dataset: d1 # dataset to use in the second analysis - can be the same or different from other analyses
                encoding: e1 # encoding to apply on the specified dataset (d1)
                report: r2 # which report to generate in the second analysis
                labels: # labels present in the dataset d1 which will be included in the encoded data on which report r2 will be run
                    - celiac # name of the first label as present in the column of dataset's metadata file
                    - CMV # name of the second label as present in the column of dataset's metadata file
            my_third_analysis: # user-defined name of another analysis
                dataset: d1 # dataset to use in the second analysis - can be the same or different from other analyses
                encoding: e1 # encoding to apply on the specified dataset (d1)
                dim_reduction: umap # or None; which dimensionality reduction method to apply to encoded d1
                report: r3 # which report to generate in the third analysis
        number_of_processes: 4 # number of parallel processes to create (could speed up the computation)
encode(unit: ExploratoryAnalysisUnit, result_path: Path) Dataset[source]
preprocess_dataset(unit: ExploratoryAnalysisUnit, result_path: Path) Dataset[source]
run(result_path: Path)[source]
run_report(unit: ExploratoryAnalysisUnit, result_path: Path)[source]
run_unit(unit: ExploratoryAnalysisUnit, result_path: Path) ReportResult[source]
weight_examples(unit: ExploratoryAnalysisUnit, result_path: Path)[source]

immuneML.workflows.instructions.exploratory_analysis.ExploratoryAnalysisState module

class immuneML.workflows.instructions.exploratory_analysis.ExploratoryAnalysisState.ExploratoryAnalysisState(exploratory_analysis_units: dict, result_path: pathlib.Path = None, name: str = None)[source]

Bases: object

exploratory_analysis_units: dict
name: str = None
result_path: Path = None

immuneML.workflows.instructions.exploratory_analysis.ExploratoryAnalysisUnit module

class immuneML.workflows.instructions.exploratory_analysis.ExploratoryAnalysisUnit.ExploratoryAnalysisUnit(dataset: immuneML.data_model.datasets.Dataset.Dataset, report: immuneML.reports.Report.Report, preprocessing_sequence: list = None, encoder: immuneML.encodings.DatasetEncoder.DatasetEncoder = None, example_weighting: immuneML.example_weighting.ExampleWeightingStrategy.ExampleWeightingStrategy = None, label_config: immuneML.environment.LabelConfiguration.LabelConfiguration = None, number_of_processes: int = 1, report_result: immuneML.reports.ReportResult.ReportResult = None, dim_reduction: immuneML.ml_methods.dim_reduction.DimRedMethod.DimRedMethod = None)[source]

Bases: object

dataset: Dataset
dim_reduction: DimRedMethod = None
encoder: DatasetEncoder = None
example_weighting: ExampleWeightingStrategy = None
label_config: LabelConfiguration = None
number_of_processes: int = 1
preprocessing_sequence: list = None
report: Report
report_result: ReportResult = None

Module contents