immuneML.reports.multi_dataset_reports package

Submodules

immuneML.reports.multi_dataset_reports.DiseaseAssociatedSequenceOverlap module

class immuneML.reports.multi_dataset_reports.DiseaseAssociatedSequenceOverlap.DiseaseAssociatedSequenceOverlap(instruction_states: Optional[List[immuneML.hyperparameter_optimization.states.TrainMLModelState.TrainMLModelState]] = None, name: Optional[str] = None, result_path: Optional[pathlib.Path] = None, number_of_processes: int = 1)[source]

Bases: immuneML.reports.multi_dataset_reports.MultiDatasetReport.MultiDatasetReport

DiseaseAssociatedSequenceOverlap report makes a heatmap showing the overlap of disease-associated sequences produced by SequenceAbundance encoders between multiple datasets of different sizes (different number of repertoires per dataset).

This plot can be used only with MultiDatasetBenchmarkTool

The overlap is computed by the following equation:

\[overlap(X,Y) = \frac{|X \cap Y|}{min(|X|, |Y|)} x 100\]

For details, see Greiff V, Menzel U, Miho E, et al. Systems Analysis Reveals High Genetic and Antigen-Driven Predetermination of Antibody Repertoires throughout B Cell Development. Cell Reports. 2017;19(7):1467-1478. doi:10.1016/j.celrep.2017.04.054.

YAML specification:

reports: # the report is defined with all other reports under definitions/reports
    my_overlap_report: DiseaseAssociatedSequenceOverlap # report has no parameters
classmethod build_object(**kwargs)[source]

immuneML.reports.multi_dataset_reports.MultiDatasetReport module

class immuneML.reports.multi_dataset_reports.MultiDatasetReport.MultiDatasetReport(instruction_states: Optional[List[immuneML.hyperparameter_optimization.states.TrainMLModelState.TrainMLModelState]] = None, name: Optional[str] = None, result_path: Optional[pathlib.Path] = None, number_of_processes: int = 1)[source]

Bases: immuneML.reports.Report.Report

Multi dataset reports are special reports that can be specified when running immuneML with the MultiDatasetBenchmarkTool.

When running the MultiDatasetBenchmarkTool, multi dataset reports can be specified under ‘benchmark_reports’.

When using the reports with MultiDatasetBenchmarkTool, the arguments defined below are set at runtime by the instruction. Concrete classes inheriting MultiDatasetReport may include additional parameters that will be set by the user in the form of input arguments.

Parameters
  • name (str) – user-defined name of the report used in the HTML overview automatically generated by the platform

  • result_path (Path) – location where the report results will be stored

  • instruction_states (list) – a list of states for each instruction that was run as a part of the tool, e.g., TrainMLModelState objects

  • number_of_processes (int) – how many processes should be created at once to speed up the analysis. For personal machines, 4 or 8 is usually a good choice.

static get_title()[source]

immuneML.reports.multi_dataset_reports.PerformanceOverview module

class immuneML.reports.multi_dataset_reports.PerformanceOverview.PerformanceOverview(instruction_states: Optional[List[immuneML.hyperparameter_optimization.states.TrainMLModelState.TrainMLModelState]] = None, name: Optional[str] = None, result_path: Optional[pathlib.Path] = None, number_of_processes: int = 1)[source]

Bases: immuneML.reports.multi_dataset_reports.MultiDatasetReport.MultiDatasetReport

PerformanceOverview report creates an ROC plot and precision-recall plot for optimal trained models on multiple datasets. The labels on the plots are the names of the datasets, so it might be good to have user-friendly names when defining datasets that are still a combination of letters, numbers and the underscore sign.

This report can be used only with MultiDatasetBenchmarkTool as it will plot ROC and PR curve for trained models across datasets. Also, it requires the task to be immune repertoire classification and cannot be used for receptor or sequence classification. Furthermore, it uses predictions on the test dataset to assess the performance and plot the curves. If the parameter refit_optimal_model is set to True, all data will be used to fit the optimal model, so there will not be a test dataset which can be used to assess performance and the report will not be generated.

If datasets have the same number of examples, the baseline PR curve will be plotted as described in this publication: Saito T, Rehmsmeier M. The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets. PLOS ONE. 2015;10(3):e0118432. doi:10.1371/journal.pone.0118432

If the datasets have different number of examples, the baseline PR curve will not be plotted.

YAML specification:

reports:
    my_performance_report: PerformanceOverview
classmethod build_object(**kwargs)[source]
plot_precision_recall(optimal_hp_items: list, label: immuneML.environment.Label.Label, colors)[source]
plot_roc(optimal_hp_items, label: immuneML.environment.Label.Label, colors) Tuple[immuneML.reports.ReportOutput.ReportOutput, List[immuneML.reports.ReportOutput.ReportOutput]][source]

Module contents