immuneML.reports.multi_dataset_reports package¶

Submodules¶

immuneML.reports.multi_dataset_reports.DiseaseAssociatedSequenceOverlap module¶

class immuneML.reports.multi_dataset_reports.DiseaseAssociatedSequenceOverlap.DiseaseAssociatedSequenceOverlap(instruction_states: List[TrainMLModelState] = None, name: str = None, result_path: Path = None, number_of_processes: int = 1)[source]¶

Bases: MultiDatasetReport

DiseaseAssociatedSequenceOverlap report makes a heatmap showing the overlap of disease-associated sequences (or k-mers) produced by the SequenceAbundanceEncoder, CompAIRRSequenceAbundanceEncoder or KmerAbundanceEncoder between multiple datasets of different sizes (different number of repertoires per dataset).

This plot can be used only with MultiDatasetBenchmarkTool.

The overlap is computed by the following equation:

o v e r l a p (X, Y) = \frac{| X \cap Y |}{m i n (| X |, | Y |)} * 100

For details, see: Greiff V, Menzel U, Miho E, et al. Systems Analysis Reveals High Genetic and Antigen-Driven Predetermination of Antibody Repertoires throughout B Cell Development. Cell Reports. 2017;19(7):1467-1478. doi:10.1016/j.celrep.2017.04.054.

YAML specification:

definitions:
    reports:
        my_overlap_report: DiseaseAssociatedSequenceOverlap # report has no parameters

classmethod build_object(**kwargs)[source]¶

Creates the object of the subclass of the Report class from the parameters so that it can be used in the analysis. Depending on the type of the report, the parameters provided here will be provided in parsing time, while the other necessary parameters (e.g., subset of the data from which the report should be created) will be provided at runtime. For more details, see specific direct subclasses of this class, describing different types of reports.

Parameters:: **kwargs – keyword arguments that will be provided by users in the specification (if immuneML is used as a command line tool) or in the dictionary when calling the method from the code, and which should be used to create the report object
Returns:: the object of the appropriate report class

immuneML.reports.multi_dataset_reports.MultiDatasetReport module¶

class immuneML.reports.multi_dataset_reports.MultiDatasetReport.MultiDatasetReport(instruction_states: List[TrainMLModelState] = None, name: str = None, result_path: Path = None, number_of_processes: int = 1)[source]¶

Bases: Report

Multi dataset reports are special reports that can be specified when running immuneML with the MultiDatasetBenchmarkTool. See Manuscript use case 1: Robustness assessment for an example.

When running the MultiDatasetBenchmarkTool, multi dataset reports can be specified under ‘benchmark_reports’. Example:

my_instruction:
    type: TrainMLModel
    benchmark_reports:
        - my_benchmark_report
    # other parameters...

DOCS_TITLE = 'Multi dataset reports'¶

__init__(instruction_states: List[TrainMLModelState] = None, name: str = None, result_path: Path = None, number_of_processes: int = 1)[source]¶

When using the reports with MultiDatasetBenchmarkTool, the arguments defined below are set at runtime by the instruction.

Concrete classes inheriting MultiDatasetReport may include additional parameters that will be set by the user in the form of input arguments.

name (str): user-defined name of the report used in the HTML overview automatically generated by the platform result_path (Path): location where the report results will be stored instruction_states (list): a list of states for each instruction that was run as a part of the tool, e.g., TrainMLModelState objects number_of_processes (int): how many processes should be created at once to speed up the analysis. For personal machines, 4 or 8 is usually a good choice.

immuneML.reports.multi_dataset_reports.PerformanceOverview module¶

class immuneML.reports.multi_dataset_reports.PerformanceOverview.PerformanceOverview(instruction_states: List[TrainMLModelState] = None, name: str = None, result_path: Path = None, number_of_processes: int = 1)[source]¶

Bases: MultiDatasetReport

PerformanceOverview report creates an ROC plot and precision-recall plot for optimal trained models on multiple datasets. The labels on the plots are the names of the datasets, so it might be good to have user-friendly names when defining datasets that are still a combination of letters, numbers and the underscore sign.

This report can be used only with MultiDatasetBenchmarkTool as it will plot ROC and PR curve for trained models across datasets. Also, it requires the task to be immune repertoire classification and cannot be used for receptor or sequence classification. Furthermore, it uses predictions on the test dataset to assess the performance and plot the curves. If the parameter refit_optimal_model is set to True, all data will be used to fit the optimal model, so there will not be a test dataset which can be used to assess performance and the report will not be generated.

If datasets have the same number of examples, the baseline PR curve will be plotted as described in this publication: Saito T, Rehmsmeier M. The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets. PLOS ONE. 2015;10(3):e0118432. doi:10.1371/journal.pone.0118432

If the datasets have different number of examples, the baseline PR curve will not be plotted.