immuneML.reports.multi_dataset_reports package

Submodules

immuneML.reports.multi_dataset_reports.DiseaseAssociatedSequenceOverlap module

class immuneML.reports.multi_dataset_reports.DiseaseAssociatedSequenceOverlap.DiseaseAssociatedSequenceOverlap(instruction_states: Optional[List[immuneML.hyperparameter_optimization.states.TrainMLModelState.TrainMLModelState]] = None, name: Optional[str] = None, result_path: Optional[pathlib.Path] = None)[source]

Bases: immuneML.reports.multi_dataset_reports.MultiDatasetReport.MultiDatasetReport

DiseaseAssociatedSequenceOverlap report makes a heatmap showing the overlap of disease-associated sequences produced by SequenceAbundance encoders between multiple datasets of different sizes (different number of repertoires per dataset).

This plot can be used only with MultiDatasetBenchmarkTool

The overlap is computed by the following equation:

\[overlap(X,Y) = \frac{|X \cap Y|}{min(|X|, |Y|)} x 100\]

For details, see Greiff V, Menzel U, Miho E, et al. Systems Analysis Reveals High Genetic and Antigen-Driven Predetermination of Antibody Repertoires throughout B Cell Development. Cell Reports. 2017;19(7):1467-1478. doi:10.1016/j.celrep.2017.04.054.

YAML specification:

reports: # the report is defined with all other reports under definitions/reports
    my_overlap_report: DiseaseAssociatedSequenceOverlap # report has no parameters

classmethod build_object(**kwargs)[source]

immuneML.reports.multi_dataset_reports.MultiDatasetReport module

class immuneML.reports.multi_dataset_reports.MultiDatasetReport.MultiDatasetReport(instruction_states: Optional[List[immuneML.hyperparameter_optimization.states.TrainMLModelState.TrainMLModelState]] = None, name: Optional[str] = None, result_path: Optional[pathlib.Path] = None)[source]

Bases: immuneML.reports.Report.Report

Multi dataset reports are special reports that can be specified when running immuneML with the MultiDatasetBenchmarkTool.

When running the MultiDatasetBenchmarkTool, multi dataset reports can be specified under ‘benchmark_reports’.

When using the reports with MultiDatasetBenchmarkTool, the arguments defined below are set at runtime by the instruction. Concrete classes inheriting MultiDatasetReport may include additional parameters that will be set by the user in the form of input arguments.

Parameters

name (str) – user-defined name of the report used in the HTML overview automatically generated by the platform
result_path (Path) – location where the report results will be stored
instruction_states (list) – a list of states for each instruction that was run as a part of the tool, e.g., TrainMLModelState objects

static get_title()[source]

immuneML.reports.multi_dataset_reports.PerformanceOverview module

class immuneML.reports.multi_dataset_reports.PerformanceOverview.PerformanceOverview(instruction_states: Optional[List[immuneML.hyperparameter_optimization.states.TrainMLModelState.TrainMLModelState]] = None, name: Optional[str] = None, result_path: Optional[pathlib.Path] = None)[source]

Bases: immuneML.reports.multi_dataset_reports.MultiDatasetReport.MultiDatasetReport

PerformanceOverview report creates an ROC plot and precision-recall plot for optimal trained models on multiple datasets. The labels on the plots are the names of the datasets, so it might be good to have user-friendly names when defining datasets that are still a combination of letters, numbers and the underscore sign.

This report can be used only with MultiDatasetBenchmarkTool as it will plot ROC and PR curve for trained models across datasets. Also, it requires the task to be immune repertoire classification and cannot be used for receptor or sequence classification. Furthermore, it uses predictions on the test dataset to assess the performance and plot the curves. If the parameter refit_optimal_model is set to True, all data will be used to fit the optimal model, so there will not be a test dataset which can be used to assess performance and the report will not be generated.

If datasets have the same number of examples, the baseline PR curve will be plotted as described in this publication: Saito T, Rehmsmeier M. The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets. PLOS ONE. 2015;10(3):e0118432. doi:10.1371/journal.pone.0118432

If the datasets have different number of examples, the baseline PR curve will not be plotted.