immuneML.reports.train_ml_model_reports package
Submodules
immuneML.reports.train_ml_model_reports.CVFeaturePerformance module
- class immuneML.reports.train_ml_model_reports.CVFeaturePerformance.CVFeaturePerformance(feature: Optional[str] = None, state: Optional[immuneML.hyperparameter_optimization.states.TrainMLModelState.TrainMLModelState] = None, result_path: Optional[pathlib.Path] = None, label: Optional[immuneML.environment.Label.Label] = None, name: Optional[str] = None, is_feature_axis_categorical: Optional[bool] = None, number_of_processes: int = 1)[source]
Bases:
immuneML.reports.train_ml_model_reports.TrainMLModelReport.TrainMLModelReport
This report plots the average training vs test performance w.r.t. given encoding parameter which is explicitly set in the feature attribute. It can be used only in combination with TrainMLModel instruction and can be only specified under ‘reports’
- Parameters
feature – name of the encoder parameter w.r.t. which the performance across training and test will be shown. Possible values depend
used. (on the encoder on which it is) –
is_feature_axis_categorical (bool) – if the x-axis of the plot where features are shown should be categorical; alternatively it is
values (automatically determined based on the feature) –
YAML specification:
report1: CVFeaturePerformance: feature: p_value_threshold # parameter value of SequenceAbundance encoder is_feature_axis_categorical: True # show x-axis as categorical
- check_prerequisites()[source]
Checks prerequisites for the generation of the report of specific class (e.g., if the class of the MLMethod instance is the one required by the report, if the data has been encoded to make a report of encoded dataset). In the instructions in immuneML, this function is used to determine whether to call generate_report() in the specific situation. Each report subclass has its own set of prerequisites. If the report cannot be run, the information on this will be logged and the report skipped in the specific situation. No error will be raised. See subclasses of the class
Instruction
for more information on how the reports are executed.- Returns
boolean value True if the prerequisites are o.k., and False otherwise.
immuneML.reports.train_ml_model_reports.DiseaseAssociatedSequenceCVOverlap module
- class immuneML.reports.train_ml_model_reports.DiseaseAssociatedSequenceCVOverlap.DiseaseAssociatedSequenceCVOverlap(state: Optional[immuneML.hyperparameter_optimization.states.TrainMLModelState.TrainMLModelState] = None, result_path: Optional[pathlib.Path] = None, name: Optional[str] = None, compare_in_selection: bool = False, compare_in_assessment: bool = False, label: Optional[immuneML.environment.Label.Label] = None, number_of_processes: int = 1)[source]
Bases:
immuneML.reports.train_ml_model_reports.TrainMLModelReport.TrainMLModelReport
DiseaseAssociatedSequenceCVOverlap report makes one heatmap per label showing the overlap of disease-associated sequences produced by the SequenceAbundance encoder between folds of cross-validation (either inner or outer loop of the nested CV). The overlap is computed by the following equation:
\[overlap(X,Y) = \frac{|X \cap Y|}{min(|X|, |Y|)} x 100\]For details, see Greiff V, Menzel U, Miho E, et al. Systems Analysis Reveals High Genetic and Antigen-Driven Predetermination of Antibody Repertoires throughout B Cell Development. Cell Reports. 2017;19(7):1467-1478. doi:10.1016/j.celrep.2017.04.054.
- Parameters
compare_in_selection (bool) – whether to compute the overlap over the inner loop of the nested CV - the sequence overlap is shown across CV
selection (folds for the model chosen as optimal within that) –
compare_in_assessment (bool) – whether to compute the overlap over the optimal models in the outer loop of the nested CV
YAML specification:
reports: # the report is defined with all other reports under definitions/reports my_overlap_report: DiseaseAssociatedSequenceCVOverlap # report has no parameters
immuneML.reports.train_ml_model_reports.MLSettingsPerformance module
- class immuneML.reports.train_ml_model_reports.MLSettingsPerformance.MLSettingsPerformance(single_axis_labels, x_label_position, y_label_position, name: Optional[str] = None, state: Optional[immuneML.hyperparameter_optimization.states.TrainMLModelState.TrainMLModelState] = None, label: Optional[immuneML.environment.Label.Label] = None, result_path: Optional[pathlib.Path] = None, number_of_processes: int = 1)[source]
Bases:
immuneML.reports.train_ml_model_reports.TrainMLModelReport.TrainMLModelReport
Report for TrainMLModel instruction: plots the performance for each of the setting combinations as defined under ‘settings’ in the assessment (outer validation) loop.
The performances are grouped by label (horizontal panels) encoding (vertical panels) and ML method (bar color). When multiple data splits are used, the average performance over the data splits is shown with an error bar representing the standard deviation.
This report can be used only with TrainMLModel instruction under ‘reports’.
- Parameters
single_axis_labels (bool) – whether to use single axis labels. Note that using single axis labels makes the figure unsuited for rescaling, as the label position is given in a fixed distance from the axis. By default, single_axis_labels is False, resulting in standard plotly axis labels.
x_label_position (float) – if single_axis_labels is True, this should be an integer specifying the x axis label position relative to the x axis. The default value for label_position is -0.1.
y_label_position (float) – same as x_label_position, but for the y axis.
YAML specification:
my_hp_report: MLSettingsPerformance
- check_prerequisites()[source]
Checks prerequisites for the generation of the report of specific class (e.g., if the class of the MLMethod instance is the one required by the report, if the data has been encoded to make a report of encoded dataset). In the instructions in immuneML, this function is used to determine whether to call generate_report() in the specific situation. Each report subclass has its own set of prerequisites. If the report cannot be run, the information on this will be logged and the report skipped in the specific situation. No error will be raised. See subclasses of the class
Instruction
for more information on how the reports are executed.- Returns
boolean value True if the prerequisites are o.k., and False otherwise.
immuneML.reports.train_ml_model_reports.MLSubseqPerformance module
- class immuneML.reports.train_ml_model_reports.MLSubseqPerformance.MLSubseqPerformance(name: Optional[str] = None, state: Optional[immuneML.hyperparameter_optimization.states.TrainMLModelState.TrainMLModelState] = None, label: Optional[immuneML.environment.Label.Label] = None, result_path: Optional[pathlib.Path] = None, number_of_processes: int = 1)[source]
Bases:
immuneML.reports.train_ml_model_reports.MLSettingsPerformance.MLSettingsPerformance
Report for TrainMLModel: Similar to
MLSettingsPerformance
, this report plots the performance of certain combinations of encodings and ML methods.Similarly to MLSettingsPerformance, the performances are grouped by label (horizontal panels). However, the bar color is determined by the ml method class (thus several ML methods with different parameters may be grouped together) and the vertical panel grouping is determined by the subsequence size used for motif recovery. This subsequence size is either the k-mer size or the kernel size (DeepRC).
This report can only be used to plot the results for setting combinations using k-mer encoding with continuous k-mers (in combination with any ML method), or DeepRC encoding + ml method.
This report can only be used with TrainMLModel instruction under ‘reports’.
YAML specification:
my_hp_report: MLSubseqPerformance
- check_prerequisites()[source]
Checks prerequisites for the generation of the report of specific class (e.g., if the class of the MLMethod instance is the one required by the report, if the data has been encoded to make a report of encoded dataset). In the instructions in immuneML, this function is used to determine whether to call generate_report() in the specific situation. Each report subclass has its own set of prerequisites. If the report cannot be run, the information on this will be logged and the report skipped in the specific situation. No error will be raised. See subclasses of the class
Instruction
for more information on how the reports are executed.- Returns
boolean value True if the prerequisites are o.k., and False otherwise.
immuneML.reports.train_ml_model_reports.ROCCurveSummary module
- class immuneML.reports.train_ml_model_reports.ROCCurveSummary.ROCCurveSummary(name: Optional[str] = None, state: Optional[immuneML.hyperparameter_optimization.states.TrainMLModelState.TrainMLModelState] = None, label: Optional[immuneML.environment.Label.Label] = None, result_path: Optional[pathlib.Path] = None, number_of_processes: int = 1)[source]
Bases:
immuneML.reports.train_ml_model_reports.TrainMLModelReport.TrainMLModelReport
This report plots ROC curves for all trained ML settings ([preprocessing], encoding, ML model) in the outer loop of cross-validation in the TrainMLModel instruction. If there are multiple splits in the outer loop, this report will make one plot per split. This report is defined only for binary classification. If there are multiple labels defined in the instruction, each label has to have two classes to be included in this report.
Arguments: there are no arguments for this report.
YAML specification:
- reports:
my_roc_summary_report: ROCCurveSummary
immuneML.reports.train_ml_model_reports.ReferenceSequenceOverlap module
- class immuneML.reports.train_ml_model_reports.ReferenceSequenceOverlap.ReferenceSequenceOverlap(reference_path: Optional[pathlib.Path] = None, comparison_attributes: Optional[list] = None, name: Optional[str] = None, state: Optional[immuneML.hyperparameter_optimization.states.TrainMLModelState.TrainMLModelState] = None, result_path: Optional[pathlib.Path] = None, label: Optional[immuneML.environment.Label.Label] = None, number_of_processes: int = 1)[source]
Bases:
immuneML.reports.train_ml_model_reports.TrainMLModelReport.TrainMLModelReport
The ReferenceSequenceOverlap report compares a list of disease-associated sequences produced by the SequenceAbundance encoder to a list of reference receptor sequences. It outputs a Venn diagram and a list of receptor sequences found both in the encoder and reference.
The report compares the sequences by their sequence content and the additional comparison_attributes (such as V or J gene), as specified by the user.
- Parameters
reference_path (str) – path to the reference file in csv format which contains one entry per row and has columns that correspond to the attributes
argument (listed under comparison_attributes) –
comparison_attributes (list) – list of attributes to use for comparison; all of them have to be present in the reference file where they should
columns (be the names of the) –
label (str) – name of the label for which the reference sequences should be compared to the model; if none, it takes the one label from the
instruction (instruction; if it is none and multiple labels were specified for the) –
generated (the report will not be) –
YAML specification:
reports: # the report is defined with all other reports under definitions/reports my_reference_overlap_report: ReferenceSequenceOverlap: reference_path: reference.csv # a reference file with columns listed under comparison_attributes comparison_attributes: - sequence_aas - v_genes - j_genes
- check_prerequisites()[source]
Checks prerequisites for the generation of the report of specific class (e.g., if the class of the MLMethod instance is the one required by the report, if the data has been encoded to make a report of encoded dataset). In the instructions in immuneML, this function is used to determine whether to call generate_report() in the specific situation. Each report subclass has its own set of prerequisites. If the report cannot be run, the information on this will be logged and the report skipped in the specific situation. No error will be raised. See subclasses of the class
Instruction
for more information on how the reports are executed.- Returns
boolean value True if the prerequisites are o.k., and False otherwise.
immuneML.reports.train_ml_model_reports.TrainMLModelReport module
- class immuneML.reports.train_ml_model_reports.TrainMLModelReport.TrainMLModelReport(name: Optional[str] = None, state: Optional[immuneML.hyperparameter_optimization.states.TrainMLModelState.TrainMLModelState] = None, label: Optional[immuneML.environment.Label.Label] = None, result_path: Optional[pathlib.Path] = None, number_of_processes: int = 1)[source]
Bases:
immuneML.reports.Report.Report
Train ML model reports plot general statistics or export data of multiple models simultaneously when running the TrainMLModel instruction.
In the TrainMLModel instruction, train ML model reports can be specified under ‘reports’.
When using the reports with TrainMLModel instruction, the arguments defined below are set at runtime by the instruction. Concrete classes inheriting TrainMLModelReport may include additional parameters that will be set by the user in the form of input arguments.
- Parameters
name (str) – user-defined name of the report used in the HTML overview automatically generated by the platform
state (TrainMLModelState) – a state object that includes all the information, trained models, encodings and datasets from the nested cross-validation procedure used to train the optimal model.
result_path (Path) – location where the report results will be stored
number_of_processes (int) – how many processes should be created at once to speed up the analysis. For personal machines, 4 or 8 is usually a good choice.