immuneML.reports.train_ml_model_reports package¶

Submodules¶

immuneML.reports.train_ml_model_reports.CVFeaturePerformance module¶

class immuneML.reports.train_ml_model_reports.CVFeaturePerformance.CVFeaturePerformance(feature: str = None, state: TrainMLModelState = None, result_path: Path = None, label: Label = None, name: str = None, is_feature_axis_categorical: bool = None, number_of_processes: int = 1)[source]¶

Bases: TrainMLModelReport

This report plots the average training vs test performance w.r.t. given encoding parameter which is explicitly set in the feature attribute. It can be used only in combination with TrainMLModel instruction and can be only specified under ‘reports’

Specification arguments:

feature: name of the encoder parameter w.r.t. which the performance across training and test will be shown. Possible values depend on the encoder on which it is used.
is_feature_axis_categorical (bool): if the x-axis of the plot where features are shown should be categorical; alternatively it is automatically determined based on the feature values

YAML specification:

definitions:
    reports:
        report1:
            CVFeaturePerformance:
                feature: p_value_threshold # parameter value of SequenceAbundance encoder
                is_feature_axis_categorical: True # show x-axis as categorical

classmethod build_object(**kwargs)[source]¶

Creates the object of the subclass of the Report class from the parameters so that it can be used in the analysis. Depending on the type of the report, the parameters provided here will be provided in parsing time, while the other necessary parameters (e.g., subset of the data from which the report should be created) will be provided at runtime. For more details, see specific direct subclasses of this class, describing different types of reports.

Parameters:: **kwargs – keyword arguments that will be provided by users in the specification (if immuneML is used as a command line tool) or in the dictionary when calling the method from the code, and which should be used to create the report object
Returns:: the object of the appropriate report class

check_prerequisites()[source]¶

Checks prerequisites for the generation of the report of specific class (e.g., if the class of the MLMethod instance is the one required by the report, if the data has been encoded to make a report of encoded dataset). In the instructions in immuneML, this function is used to determine whether to call generate_report() in the specific situation. Each report subclass has its own set of prerequisites. If the report cannot be run, the information on this will be logged and the report skipped in the specific situation. No error will be raised. See subclasses of the class Instruction for more information on how the reports are executed.

Returns:: boolean value True if the prerequisites are o.k., and False otherwise.

immuneML.reports.train_ml_model_reports.DiseaseAssociatedSequenceCVOverlap module¶

class immuneML.reports.train_ml_model_reports.DiseaseAssociatedSequenceCVOverlap.DiseaseAssociatedSequenceCVOverlap(state: TrainMLModelState = None, result_path: Path = None, name: str = None, compare_in_selection: bool = False, compare_in_assessment: bool = False, label: Label = None, number_of_processes: int = 1)[source]¶

Bases: TrainMLModelReport

DiseaseAssociatedSequenceCVOverlap report makes one heatmap per label showing the overlap of disease-associated sequences (or k-mers) produced by the SequenceAbundanceEncoder, CompAIRRSequenceAbundanceEncoder or KmerAbundanceEncoder between folds of cross-validation (either inner or outer loop of the nested CV). The overlap is computed by the following equation:

o v e r l a p (X, Y) = \frac{| X \cap Y |}{m i n (| X |, | Y |)} x 100

For details, see Greiff V, Menzel U, Miho E, et al. Systems Analysis Reveals High Genetic and Antigen-Driven Predetermination of Antibody Repertoires throughout B Cell Development. Cell Reports. 2017;19(7):1467-1478. doi:10.1016/j.celrep.2017.04.054.

Specification arguments:

compare_in_selection (bool): whether to compute the overlap over the inner loop of the nested CV - the sequence overlap is shown across CV folds for the model chosen as optimal within that selection
compare_in_assessment (bool): whether to compute the overlap over the optimal models in the outer loop of the nested CV

YAML specification:

definitions:
    reports:
        my_overlap_report: DiseaseAssociatedSequenceCVOverlap # report has no parameters

COMPATIBLE_ENCODERS = (<class 'immuneML.encodings.abundance_encoding.SequenceAbundanceEncoder.SequenceAbundanceEncoder'>, <class 'immuneML.encodings.abundance_encoding.CompAIRRSequenceAbundanceEncoder.CompAIRRSequenceAbundanceEncoder'>, <class 'immuneML.encodings.abundance_encoding.KmerAbundanceEncoder.KmerAbundanceEncoder'>)¶

classmethod build_object(**kwargs)[source]¶

Parameters:: **kwargs – keyword arguments that will be provided by users in the specification (if immuneML is used as a command line tool) or in the dictionary when calling the method from the code, and which should be used to create the report object
Returns:: the object of the appropriate report class

immuneML.reports.train_ml_model_reports.MLSettingsPerformance module¶

class immuneML.reports.train_ml_model_reports.MLSettingsPerformance.MLSettingsPerformance(single_axis_labels, x_label_position, y_label_position, name: str = None, state: TrainMLModelState = None, label: Label = None, result_path: Path = None, number_of_processes: int = 1)[source]¶

Bases: TrainMLModelReport

Report for TrainMLModel instruction: plots the performance for each of the setting combinations as defined under ‘settings’ in the assessment (outer validation) loop.

The performances are grouped by label (horizontal panels) encoding (vertical panels) and ML method (bar color). When multiple data splits are used, the average performance over the data splits is shown with an error bar representing the standard deviation.

This report can be used only with TrainMLModel instruction under ‘reports’.

Specification arguments:

single_axis_labels (bool): whether to use single axis labels. Note that using single axis labels makes the figure unsuited for rescaling, as the label position is given in a fixed distance from the axis. By default, single_axis_labels is False, resulting in standard plotly axis labels.
x_label_position (float): if single_axis_labels is True, this should be an integer specifying the x axis label position relative to the x axis. The default value for label_position is -0.1.
y_label_position (float): same as x_label_position, but for the y-axis.

YAML specification:

definitions:
    reports:
        my_hp_report: MLSettingsPerformance

classmethod build_object(**kwargs)[source]¶

Parameters:: **kwargs – keyword arguments that will be provided by users in the specification (if immuneML is used as a command line tool) or in the dictionary when calling the method from the code, and which should be used to create the report object
Returns:: the object of the appropriate report class

check_prerequisites()[source]¶

Returns:: boolean value True if the prerequisites are o.k., and False otherwise.

std(x)[source]¶

immuneML.reports.train_ml_model_reports.PerformancePerLabel module¶

class immuneML.reports.train_ml_model_reports.PerformancePerLabel.PerformancePerLabel(alternative_label: str, metric: str = 'balanced_accuracy', compute_for_selection: bool = True, compute_for_assessment: bool = True, state: TrainMLModelState = None, result_path: Path = None, name: str = None, label: Label = None, number_of_processes: int = 1, plot_on_train: bool = False, plot_on_test: bool = True)[source]¶

Bases: TrainMLModelReport

Report that shows the performance of the model where the examples are grouped by alternative_label. It can be used to investigate if the model is learning the alternative_label instead of label of interest for classification.

Specification arguments:

alternative_label (str): The name of the alternative_label column in the dataset.
metric (str): The metric to use for the report. Default is balanced_accuracy.
compute_for_selection (bool): If True, the report will be computed for the selection. Default is True.
compute_for_assessment (bool): If True, the report will be computed for the assessment. Default is True.
plot_on_test (bool): If True, the report will be plotted on the test data. Default is True.
plot_on_train (bool): If True, the report will be plotted on the training data. Default is False.

YAML specification:

reports:
    my_report:
        PerformancePerLabel:
            alternative_label: batch
            metric: balanced_accuracy

classmethod build_object(**kwargs)[source]¶

Parameters:: **kwargs – keyword arguments that will be provided by users in the specification (if immuneML is used as a command line tool) or in the dictionary when calling the method from the code, and which should be used to create the report object
Returns:: the object of the appropriate report class

check_prerequisites()[source]¶

Returns:: boolean value True if the prerequisites are o.k., and False otherwise.

discover_alternative_label_values()[source]¶

immuneML.reports.train_ml_model_reports.ROCCurveSummary module¶

class immuneML.reports.train_ml_model_reports.ROCCurveSummary.ROCCurveSummary(name: str = None, state: TrainMLModelState = None, label: Label = None, result_path: Path = None, number_of_processes: int = 1)[source]¶

Bases: TrainMLModelReport

This report plots ROC curves for all trained ML settings ([preprocessing], encoding, ML model) in the outer loop of cross-validation in the TrainMLModel instruction. If there are multiple splits in the outer loop, this report will make one plot per split. This report is defined only for binary classification. If there are multiple labels defined in the instruction, each label has to have two classes to be included in this report.

YAML specification:

definitions:
    reports:
        my_roc_summary_report: ROCCurveSummary

classmethod build_object(**kwargs)[source]¶

Parameters:: **kwargs – keyword arguments that will be provided by users in the specification (if immuneML is used as a command line tool) or in the dictionary when calling the method from the code, and which should be used to create the report object
Returns:: the object of the appropriate report class

immuneML.reports.train_ml_model_reports.ReferenceSequenceOverlap module¶

class immuneML.reports.train_ml_model_reports.ReferenceSequenceOverlap.ReferenceSequenceOverlap(reference_path: Path = None, comparison_attributes: list = None, name: str = None, state: TrainMLModelState = None, result_path: Path = None, label: Label = None, number_of_processes: int = 1)[source]¶

Bases: TrainMLModelReport

The ReferenceSequenceOverlap report compares a list of disease-associated sequences (or k-mers) produced by the SequenceAbundanceEncoder, CompAIRRSequenceAbundanceEncoder or KmerAbundanceEncoder to a list of reference sequences. It outputs a Venn diagram and a list of sequences found both in the encoder and reference list.

The report compares the sequences by their sequence content and the additional comparison_attributes (such as V or J gene), as specified by the user.

Specification arguments:

reference_path (str): path to the reference file in csv format which contains one entry per row and has columns that correspond to the attributes listed under comparison_attributes argument
comparison_attributes (list): list of attributes to use for comparison; all of them have to be present in the reference file where they should be the names of the columns
label (str): name of the label for which the reference sequences/k-mers should be compared to the model; if none, it takes the one label from the instruction; if it is none and multiple labels were specified for the instruction, the report will not be generated

YAML specification:

definitions:
    reports:
        my_reference_overlap_report:
            ReferenceSequenceOverlap:
                reference_path: reference_sequences.csv  # example usage with SequenceAbundanceEncoder or CompAIRRSequenceAbundanceEncoder
                comparison_attributes:
                    - sequence_aa
                    - v_call
                    - j_call
        my_reference_overlap_report_with_kmers:
            ReferenceSequenceOverlap:
                reference_path: reference_kmers.csv  # example usage with KmerAbundanceEncoder
                comparison_attributes:
                    - k-mer

classmethod build_object(**kwargs)[source]¶

Parameters:: **kwargs – keyword arguments that will be provided by users in the specification (if immuneML is used as a command line tool) or in the dictionary when calling the method from the code, and which should be used to create the report object
Returns:: the object of the appropriate report class

check_prerequisites()[source]¶

Returns:: boolean value True if the prerequisites are o.k., and False otherwise.

immuneML.reports.train_ml_model_reports.TrainMLModelReport module¶

class immuneML.reports.train_ml_model_reports.TrainMLModelReport.TrainMLModelReport(name: str = None, state: TrainMLModelState = None, label: Label = None, result_path: Path = None, number_of_processes: int = 1)[source]¶

Bases: Report

Train ML model reports plot general statistics or export data of multiple models simultaneously when running the TrainMLModel instruction.

In the TrainMLModel instruction, train ML model reports can be specified under ‘reports’. Example:

my_instruction:
    type: TrainMLModel
    reports:
        - my_train_ml_model_report
    # other parameters...

DOCS_TITLE = 'Train ML model reports'¶

__init__(name: str = None, state: TrainMLModelState = None, label: Label = None, result_path: Path = None, number_of_processes: int = 1)[source]¶

The arguments defined below are set at runtime by the instruction.

Concrete classes inheriting TrainMLModelReport may include additional parameters that will be set by the user in the form of input arguments. name (str): user-defined name of the report used in the HTML overview automatically generated by the platform state (TrainMLModelState): a state object that includes all the information, trained models, encodings and datasets from the nested cross-validation procedure used to train the optimal model. result_path (Path): location where the report results will be stored number_of_processes (int): how many processes should be created at once to speed up the analysis. For personal machines, 4 or 8 is usually a good choice.

immuneML.reports.train_ml_model_reports package¶

Submodules¶

immuneML.reports.train_ml_model_reports.CVFeaturePerformance module¶

immuneML.reports.train_ml_model_reports.DiseaseAssociatedSequenceCVOverlap module¶

immuneML.reports.train_ml_model_reports.MLSettingsPerformance module¶

immuneML.reports.train_ml_model_reports.PerformancePerLabel module¶

immuneML.reports.train_ml_model_reports.ROCCurveSummary module¶

immuneML.reports.train_ml_model_reports.ReferenceSequenceOverlap module¶

immuneML.reports.train_ml_model_reports.TrainMLModelReport module¶

Module contents¶