immuneML.reports.train_ml_model_reports package¶
Submodules¶
immuneML.reports.train_ml_model_reports.CVFeaturePerformance module¶
- class immuneML.reports.train_ml_model_reports.CVFeaturePerformance.CVFeaturePerformance(feature: str = None, state: TrainMLModelState = None, result_path: Path = None, label: Label = None, name: str = None, is_feature_axis_categorical: bool = None, number_of_processes: int = 1)[source]¶
Bases:
TrainMLModelReport
This report plots the average training vs test performance w.r.t. given encoding parameter which is explicitly set in the feature attribute. It can be used only in combination with TrainMLModel instruction and can be only specified under ‘reports’
Specification arguments:
feature: name of the encoder parameter w.r.t. which the performance across training and test will be shown. Possible values depend on the encoder on which it is used.
is_feature_axis_categorical (bool): if the x-axis of the plot where features are shown should be categorical; alternatively it is automatically determined based on the feature values
YAML specification:
definitions: reports: report1: CVFeaturePerformance: feature: p_value_threshold # parameter value of SequenceAbundance encoder is_feature_axis_categorical: True # show x-axis as categorical
- classmethod build_object(**kwargs)[source]¶
Creates the object of the subclass of the Report class from the parameters so that it can be used in the analysis. Depending on the type of the report, the parameters provided here will be provided in parsing time, while the other necessary parameters (e.g., subset of the data from which the report should be created) will be provided at runtime. For more details, see specific direct subclasses of this class, describing different types of reports.
- Parameters:
**kwargs – keyword arguments that will be provided by users in the specification (if immuneML is used as a command line tool) or in the dictionary when calling the method from the code, and which should be used to create the report object
- Returns:
the object of the appropriate report class
- check_prerequisites()[source]¶
Checks prerequisites for the generation of the report of specific class (e.g., if the class of the MLMethod instance is the one required by the report, if the data has been encoded to make a report of encoded dataset). In the instructions in immuneML, this function is used to determine whether to call generate_report() in the specific situation. Each report subclass has its own set of prerequisites. If the report cannot be run, the information on this will be logged and the report skipped in the specific situation. No error will be raised. See subclasses of the class
Instruction
for more information on how the reports are executed.- Returns:
boolean value True if the prerequisites are o.k., and False otherwise.
immuneML.reports.train_ml_model_reports.DiseaseAssociatedSequenceCVOverlap module¶
- class immuneML.reports.train_ml_model_reports.DiseaseAssociatedSequenceCVOverlap.DiseaseAssociatedSequenceCVOverlap(state: TrainMLModelState = None, result_path: Path = None, name: str = None, compare_in_selection: bool = False, compare_in_assessment: bool = False, label: Label = None, number_of_processes: int = 1)[source]¶
Bases:
TrainMLModelReport
DiseaseAssociatedSequenceCVOverlap report makes one heatmap per label showing the overlap of disease-associated sequences (or k-mers) produced by the
SequenceAbundanceEncoder
,CompAIRRSequenceAbundanceEncoder
orKmerAbundanceEncoder
between folds of cross-validation (either inner or outer loop of the nested CV). The overlap is computed by the following equation:\[overlap(X,Y) = \frac{|X \cap Y|}{min(|X|, |Y|)} x 100\]For details, see Greiff V, Menzel U, Miho E, et al. Systems Analysis Reveals High Genetic and Antigen-Driven Predetermination of Antibody Repertoires throughout B Cell Development. Cell Reports. 2017;19(7):1467-1478. doi:10.1016/j.celrep.2017.04.054.
Specification arguments:
compare_in_selection (bool): whether to compute the overlap over the inner loop of the nested CV - the sequence overlap is shown across CV folds for the model chosen as optimal within that selection
compare_in_assessment (bool): whether to compute the overlap over the optimal models in the outer loop of the nested CV
YAML specification:
definitions: reports: my_overlap_report: DiseaseAssociatedSequenceCVOverlap # report has no parameters
- COMPATIBLE_ENCODERS = (<class 'immuneML.encodings.abundance_encoding.SequenceAbundanceEncoder.SequenceAbundanceEncoder'>, <class 'immuneML.encodings.abundance_encoding.CompAIRRSequenceAbundanceEncoder.CompAIRRSequenceAbundanceEncoder'>, <class 'immuneML.encodings.abundance_encoding.KmerAbundanceEncoder.KmerAbundanceEncoder'>)¶
- classmethod build_object(**kwargs)[source]¶
Creates the object of the subclass of the Report class from the parameters so that it can be used in the analysis. Depending on the type of the report, the parameters provided here will be provided in parsing time, while the other necessary parameters (e.g., subset of the data from which the report should be created) will be provided at runtime. For more details, see specific direct subclasses of this class, describing different types of reports.
- Parameters:
**kwargs – keyword arguments that will be provided by users in the specification (if immuneML is used as a command line tool) or in the dictionary when calling the method from the code, and which should be used to create the report object
- Returns:
the object of the appropriate report class
immuneML.reports.train_ml_model_reports.MLSettingsPerformance module¶
- class immuneML.reports.train_ml_model_reports.MLSettingsPerformance.MLSettingsPerformance(single_axis_labels, x_label_position, y_label_position, name: str = None, state: TrainMLModelState = None, label: Label = None, result_path: Path = None, number_of_processes: int = 1)[source]¶
Bases:
TrainMLModelReport
Report for TrainMLModel instruction: plots the performance for each of the setting combinations as defined under ‘settings’ in the assessment (outer validation) loop.
The performances are grouped by label (horizontal panels) encoding (vertical panels) and ML method (bar color). When multiple data splits are used, the average performance over the data splits is shown with an error bar representing the standard deviation.
This report can be used only with TrainMLModel instruction under ‘reports’.
Specification arguments:
single_axis_labels (bool): whether to use single axis labels. Note that using single axis labels makes the figure unsuited for rescaling, as the label position is given in a fixed distance from the axis. By default, single_axis_labels is False, resulting in standard plotly axis labels.
x_label_position (float): if single_axis_labels is True, this should be an integer specifying the x axis label position relative to the x axis. The default value for label_position is -0.1.
y_label_position (float): same as x_label_position, but for the y-axis.
YAML specification:
definitions: reports: my_hp_report: MLSettingsPerformance
- classmethod build_object(**kwargs)[source]¶
Creates the object of the subclass of the Report class from the parameters so that it can be used in the analysis. Depending on the type of the report, the parameters provided here will be provided in parsing time, while the other necessary parameters (e.g., subset of the data from which the report should be created) will be provided at runtime. For more details, see specific direct subclasses of this class, describing different types of reports.
- Parameters:
**kwargs – keyword arguments that will be provided by users in the specification (if immuneML is used as a command line tool) or in the dictionary when calling the method from the code, and which should be used to create the report object
- Returns:
the object of the appropriate report class
- check_prerequisites()[source]¶
Checks prerequisites for the generation of the report of specific class (e.g., if the class of the MLMethod instance is the one required by the report, if the data has been encoded to make a report of encoded dataset). In the instructions in immuneML, this function is used to determine whether to call generate_report() in the specific situation. Each report subclass has its own set of prerequisites. If the report cannot be run, the information on this will be logged and the report skipped in the specific situation. No error will be raised. See subclasses of the class
Instruction
for more information on how the reports are executed.- Returns:
boolean value True if the prerequisites are o.k., and False otherwise.
immuneML.reports.train_ml_model_reports.ROCCurveSummary module¶
- class immuneML.reports.train_ml_model_reports.ROCCurveSummary.ROCCurveSummary(name: str = None, state: TrainMLModelState = None, label: Label = None, result_path: Path = None, number_of_processes: int = 1)[source]¶
Bases:
TrainMLModelReport
This report plots ROC curves for all trained ML settings ([preprocessing], encoding, ML model) in the outer loop of cross-validation in the TrainMLModel instruction. If there are multiple splits in the outer loop, this report will make one plot per split. This report is defined only for binary classification. If there are multiple labels defined in the instruction, each label has to have two classes to be included in this report.
YAML specification:
definitions: reports: my_roc_summary_report: ROCCurveSummary
- classmethod build_object(**kwargs)[source]¶
Creates the object of the subclass of the Report class from the parameters so that it can be used in the analysis. Depending on the type of the report, the parameters provided here will be provided in parsing time, while the other necessary parameters (e.g., subset of the data from which the report should be created) will be provided at runtime. For more details, see specific direct subclasses of this class, describing different types of reports.
- Parameters:
**kwargs – keyword arguments that will be provided by users in the specification (if immuneML is used as a command line tool) or in the dictionary when calling the method from the code, and which should be used to create the report object
- Returns:
the object of the appropriate report class
immuneML.reports.train_ml_model_reports.ReferenceSequenceOverlap module¶
- class immuneML.reports.train_ml_model_reports.ReferenceSequenceOverlap.ReferenceSequenceOverlap(reference_path: Path = None, comparison_attributes: list = None, name: str = None, state: TrainMLModelState = None, result_path: Path = None, label: Label = None, number_of_processes: int = 1)[source]¶
Bases:
TrainMLModelReport
The ReferenceSequenceOverlap report compares a list of disease-associated sequences (or k-mers) produced by the
SequenceAbundanceEncoder
,CompAIRRSequenceAbundanceEncoder
orKmerAbundanceEncoder
to a list of reference sequences. It outputs a Venn diagram and a list of sequences found both in the encoder and reference list.The report compares the sequences by their sequence content and the additional comparison_attributes (such as V or J gene), as specified by the user.
Specification arguments:
reference_path (str): path to the reference file in csv format which contains one entry per row and has columns that correspond to the attributes listed under comparison_attributes argument
comparison_attributes (list): list of attributes to use for comparison; all of them have to be present in the reference file where they should be the names of the columns
label (str): name of the label for which the reference sequences/k-mers should be compared to the model; if none, it takes the one label from the instruction; if it is none and multiple labels were specified for the instruction, the report will not be generated
YAML specification:
definitions: reports: my_reference_overlap_report: ReferenceSequenceOverlap: reference_path: reference_sequences.csv # example usage with SequenceAbundanceEncoder or CompAIRRSequenceAbundanceEncoder comparison_attributes: - sequence_aa - v_call - j_call my_reference_overlap_report_with_kmers: ReferenceSequenceOverlap: reference_path: reference_kmers.csv # example usage with KmerAbundanceEncoder comparison_attributes: - k-mer
- classmethod build_object(**kwargs)[source]¶
Creates the object of the subclass of the Report class from the parameters so that it can be used in the analysis. Depending on the type of the report, the parameters provided here will be provided in parsing time, while the other necessary parameters (e.g., subset of the data from which the report should be created) will be provided at runtime. For more details, see specific direct subclasses of this class, describing different types of reports.
- Parameters:
**kwargs – keyword arguments that will be provided by users in the specification (if immuneML is used as a command line tool) or in the dictionary when calling the method from the code, and which should be used to create the report object
- Returns:
the object of the appropriate report class
- check_prerequisites()[source]¶
Checks prerequisites for the generation of the report of specific class (e.g., if the class of the MLMethod instance is the one required by the report, if the data has been encoded to make a report of encoded dataset). In the instructions in immuneML, this function is used to determine whether to call generate_report() in the specific situation. Each report subclass has its own set of prerequisites. If the report cannot be run, the information on this will be logged and the report skipped in the specific situation. No error will be raised. See subclasses of the class
Instruction
for more information on how the reports are executed.- Returns:
boolean value True if the prerequisites are o.k., and False otherwise.
immuneML.reports.train_ml_model_reports.TrainMLModelReport module¶
- class immuneML.reports.train_ml_model_reports.TrainMLModelReport.TrainMLModelReport(name: str = None, state: TrainMLModelState = None, label: Label = None, result_path: Path = None, number_of_processes: int = 1)[source]¶
Bases:
Report
Train ML model reports plot general statistics or export data of multiple models simultaneously when running the TrainMLModel instruction.
In the TrainMLModel instruction, train ML model reports can be specified under ‘reports’. Example:
my_instruction: type: TrainMLModel reports: - my_train_ml_model_report # other parameters...
- DOCS_TITLE = 'Train ML model reports'¶
- __init__(name: str = None, state: TrainMLModelState = None, label: Label = None, result_path: Path = None, number_of_processes: int = 1)[source]¶
The arguments defined below are set at runtime by the instruction.
Concrete classes inheriting TrainMLModelReport may include additional parameters that will be set by the user in the form of input arguments. name (str): user-defined name of the report used in the HTML overview automatically generated by the platform state (TrainMLModelState): a state object that includes all the information, trained models, encodings and datasets from the nested cross-validation procedure used to train the optimal model. result_path (Path): location where the report results will be stored number_of_processes (int): how many processes should be created at once to speed up the analysis. For personal machines, 4 or 8 is usually a good choice.