immuneML.reports.encoding_reports package¶
Submodules¶
immuneML.reports.encoding_reports.DesignMatrixExporter module¶
- class immuneML.reports.encoding_reports.DesignMatrixExporter.DesignMatrixExporter(dataset: Dataset = None, result_path: Path = None, file_format: str = None, number_of_processes: int = 1, name: str = None)[source]¶
Bases:
EncodingReport
Exports the design matrix and related information of a given encoded Dataset to csv files. If the encoded data has more than 2 dimensions (such as when using the OneHot encoder with option Flatten=False), the data are then exported to different formats to facilitate their import with external software.
Specification arguments:
file_format (str): the format and extension of the file to store the design matrix. The supported formats are: npy, csv, pt, hdf5, npy.zip, csv.zip or hdf5.zip.
Note: when using hdf5 or hdf5.zip output formats, make sure the ‘hdf5’ dependency is installed.
YAML specification:
definitions: reports: my_dme_report: DesignMatrixExporter: file_format: csv
- classmethod build_object(**kwargs)[source]¶
Creates the object of the subclass of the Report class from the parameters so that it can be used in the analysis. Depending on the type of the report, the parameters provided here will be provided in parsing time, while the other necessary parameters (e.g., subset of the data from which the report should be created) will be provided at runtime. For more details, see specific direct subclasses of this class, describing different types of reports.
- Parameters:
**kwargs – keyword arguments that will be provided by users in the specification (if immuneML is used as a command line tool) or in the dictionary when calling the method from the code, and which should be used to create the report object
- Returns:
the object of the appropriate report class
- check_prerequisites()[source]¶
Checks prerequisites for the generation of the report of specific class (e.g., if the class of the MLMethod instance is the one required by the report, if the data has been encoded to make a report of encoded dataset). In the instructions in immuneML, this function is used to determine whether to call generate_report() in the specific situation. Each report subclass has its own set of prerequisites. If the report cannot be run, the information on this will be logged and the report skipped in the specific situation. No error will be raised. See subclasses of the class
Instruction
for more information on how the reports are executed.- Returns:
boolean value True if the prerequisites are o.k., and False otherwise.
immuneML.reports.encoding_reports.EncodingReport module¶
- class immuneML.reports.encoding_reports.EncodingReport.EncodingReport(dataset: Dataset = None, result_path: Path = None, name: str = None, number_of_processes: int = 1)[source]¶
Bases:
Report
Encoding reports show some type of features or statistics about an encoded dataset, or may in some cases export relevant sequences or tables.
When running the TrainMLModel instruction, encoding reports can be specified inside the ‘selection’ or ‘assessment’ specification under the key ‘reports/encoding’. Example:
my_instruction: type: TrainMLModel selection: reports: encoding: - my_encoding_report # other parameters... assessment: reports: encoding: - my_encoding_report # other parameters... # other parameters...
Alternatively, when running the ExploratoryAnalysis instruction, encoding reports can be specified under ‘report’. Example:
my_instruction: type: ExploratoryAnalysis analyses: my_first_analysis: report: my_encoding_report # other parameters... # other parameters...
- DOCS_TITLE = 'Encoding reports'¶
- __init__(dataset: Dataset = None, result_path: Path = None, name: str = None, number_of_processes: int = 1)[source]¶
The arguments defined below are set at runtime by the instruction. Concrete classes inheriting EncodingReport may include additional parameters that will be set by the user in the form of input arguments.
dataset (Dataset): an encoded dataset where encoded_data attribute is set to an instance of EncodedData object result_path (Path): path where the results will be stored (plots, tables, etc.) name (str): user-defined name of the report that will be shown in the HTML overview later number_of_processes (int): how many processes should be created at once to speed up the analysis. For personal machines, 4 or 8 is usually a good choice.
immuneML.reports.encoding_reports.FeatureComparison module¶
- class immuneML.reports.encoding_reports.FeatureComparison.FeatureComparison(dataset: Dataset = None, result_path: Path = None, comparison_label: str = None, color_grouping_label: str = None, row_grouping_label=None, column_grouping_label=None, opacity: float = 0.7, show_error_bar=True, log_scale: bool = False, keep_fraction: int = 1, number_of_processes: int = 1, name: str = None)[source]¶
Bases:
FeatureReport
Encoding a dataset results in a numeric matrix, where the rows are examples (e.g., sequences, receptors, repertoires) and the columns are features. For example, when KmerFrequency encoder is used, the features are the k-mers (AAA, AAC, etc..) and the feature values are the frequencies per k-mer.
This report separates the examples based on a binary metadata label, and plots the mean feature value of each feature in one example group against the other example group (for example: plot the feature value of ‘sick’ repertoires on the x axis, and ‘healthy’ repertoires on the y axis to spot consistent differences). The plot can be separated into different colors or facets using other metadata labels (for example: plot the average feature values of ‘cohort1’, ‘cohort2’ and ‘cohort3’ in different colors to spot biases).
Alternatively, when plotting features without comparing them across a binary label, see:
FeatureValueBarplot
report to plot a simple bar chart per feature (average across examples). OrFeatureDistribution
report to plot the distribution of each feature across examples, rather than only showing the mean value in a bar plot.Example output:
Specification arguments:
comparison_label (str): Mandatory label. This label is used to split the encoded data matrix and define the x and y axes of the plot. This label is only allowed to have 2 classes (for example: sick and healthy, binding and non-binding).
color_grouping_label (str): Optional label that is used to color the points in the scatterplot. This can not be the same as comparison_label.
row_grouping_label (str): Optional label that is used to group scatterplots into different row facets. This can not be the same as comparison_label.
column_grouping_label (str): Optional label that is used to group scatterplots into different column facets. This can not be the same as comparison_label.
show_error_bar (bool): Whether to show the error bar (standard deviation) for the points, both in the x and y dimension.
log_scale (bool): Whether to plot the x and y axes in log10 scale (log_scale = True) or continuous scale (log_scale = False). By default, log_scale is False.
keep_fraction (float): The total number of features may be very large and only the features differing significantly across comparison labels may be of interest. When the keep_fraction parameter is set below 1, only the fraction of features that differs the most across comparison labels is kept for plotting (note that the produced .csv file still contains all data). By default, keep_fraction is 1, meaning that all features are plotted.
opacity (float): a value between 0 and 1 setting the opacity for data points making it easier to see if there are overlapping points
YAML specification:
definitions: reports: my_comparison_report: FeatureComparison: # compare the different classes defined in the label disease comparison_label: disease
- classmethod build_object(**kwargs)[source]¶
Creates the object of the subclass of the Report class from the parameters so that it can be used in the analysis. Depending on the type of the report, the parameters provided here will be provided in parsing time, while the other necessary parameters (e.g., subset of the data from which the report should be created) will be provided at runtime. For more details, see specific direct subclasses of this class, describing different types of reports.
- Parameters:
**kwargs – keyword arguments that will be provided by users in the specification (if immuneML is used as a command line tool) or in the dictionary when calling the method from the code, and which should be used to create the report object
- Returns:
the object of the appropriate report class
- check_prerequisites()[source]¶
Checks prerequisites for the generation of the report of specific class (e.g., if the class of the MLMethod instance is the one required by the report, if the data has been encoded to make a report of encoded dataset). In the instructions in immuneML, this function is used to determine whether to call generate_report() in the specific situation. Each report subclass has its own set of prerequisites. If the report cannot be run, the information on this will be logged and the report skipped in the specific situation. No error will be raised. See subclasses of the class
Instruction
for more information on how the reports are executed.- Returns:
boolean value True if the prerequisites are o.k., and False otherwise.
immuneML.reports.encoding_reports.FeatureDistribution module¶
- class immuneML.reports.encoding_reports.FeatureDistribution.FeatureDistribution(dataset: Dataset = None, result_path: Path = None, color_grouping_label: str = None, row_grouping_label=None, column_grouping_label=None, mode: str = 'auto', x_title: str = None, y_title: str = None, number_of_processes: int = 1, name: str = None)[source]¶
Bases:
FeatureReport
Encoding a dataset results in a numeric matrix, where the rows are examples (e.g., sequences, receptors, repertoires) and the columns are features. For example, when KmerFrequency encoder is used, the features are the k-mers (AAA, AAC, etc..) and the feature values are the frequencies per k-mer.
This report plots the distribution of feature values. For each feature, a violin plot is created to show the distribution of feature values across all examples. The violin plots can be separated into different colors or facets using metadata labels (for example: plot the feature distributions of ‘cohort1’, ‘cohort2’ and ‘cohort3’ in different colors to spot biases).
See also:
FeatureValueBarplot
report to plot a simple bar chart per feature (average across examples), rather than the entire distribution. OrFeatureComparison
report to compare features across binary metadata labels (e.g., plot the feature value of ‘sick’ repertoires on the x axis, and ‘healthy’ repertoires on the y axis).Example output:
Specification arguments:
color_grouping_label (str): The label that is used to color each bar, at each level of the grouping_label.
row_grouping_label (str): The label that is used to group bars into different row facets.
column_grouping_label (str): The label that is used to group bars into different column facets.
mode (str): either ‘normal’, ‘sparse’ or ‘auto’ (default). in the ‘normal’ mode there are normal boxplots corresponding to each column of the encoded dataset matrix; in the ‘sparse’ mode all zero cells are eliminated before passing the data to the boxplots. If mode is set to ‘auto’, then it will automatically set to ‘sparse’ if the density of the matrix is below 0.01
x_title (str): x-axis label
y_title (str): y-axis label
YAML specification:
definitions: reports: my_fdistr_report: FeatureDistribution: mode: sparse
- classmethod build_object(**kwargs)[source]¶
Creates the object of the subclass of the Report class from the parameters so that it can be used in the analysis. Depending on the type of the report, the parameters provided here will be provided in parsing time, while the other necessary parameters (e.g., subset of the data from which the report should be created) will be provided at runtime. For more details, see specific direct subclasses of this class, describing different types of reports.
- Parameters:
**kwargs – keyword arguments that will be provided by users in the specification (if immuneML is used as a command line tool) or in the dictionary when calling the method from the code, and which should be used to create the report object
- Returns:
the object of the appropriate report class
immuneML.reports.encoding_reports.FeatureReport module¶
- class immuneML.reports.encoding_reports.FeatureReport.FeatureReport(dataset: Dataset = None, result_path: Path = None, color_grouping_label: str = None, row_grouping_label=None, column_grouping_label=None, name: str = None, number_of_processes: int = 1)[source]¶
Bases:
EncodingReport
Base class for reports that plot something about the reshaped feature values of any dataset.
- check_prerequisites()[source]¶
Checks prerequisites for the generation of the report of specific class (e.g., if the class of the MLMethod instance is the one required by the report, if the data has been encoded to make a report of encoded dataset). In the instructions in immuneML, this function is used to determine whether to call generate_report() in the specific situation. Each report subclass has its own set of prerequisites. If the report cannot be run, the information on this will be logged and the report skipped in the specific situation. No error will be raised. See subclasses of the class
Instruction
for more information on how the reports are executed.- Returns:
boolean value True if the prerequisites are o.k., and False otherwise.
immuneML.reports.encoding_reports.FeatureValueBarplot module¶
- class immuneML.reports.encoding_reports.FeatureValueBarplot.FeatureValueBarplot(dataset: RepertoireDataset = None, result_path: Path = None, color_grouping_label: str = None, row_grouping_label=None, column_grouping_label=None, x_title: str = None, y_title: str = None, show_error_bar=True, name: str = None, plot_all_features: bool = True, number_of_processes: int = 1, plot_top_n: int = None, plot_bottom_n: int = None)[source]¶
Bases:
FeatureReport
Encoding a dataset results in a numeric matrix, where the rows are examples (e.g., sequences, receptors, repertoires) and the columns are features. For example, when KmerFrequency encoder is used, the features are the k-mers (AAA, AAC, etc..) and the feature values are the frequencies per k-mer.
This report plots the mean feature values per feature. A bar plot is created where the average feature value across all examples is shown, with optional error bars. The bar plots can be separated into different colors or facets using metadata labels (for example: plot the average feature values of ‘cohort1’, ‘cohort2’ and ‘cohort3’ in different colors to spot biases).
See also:
FeatureDistribution
report to plot the distribution of each feature across examples, rather than only showin the mean value in a bar plot. OrFeatureComparison
report to compare features across binary metadata labels (e.g., plot the feature value of ‘sick’ repertoires on the x axis, and ‘healthy’ repertoires on the y axis.).Example output:
Specification arguments:
color_grouping_label (str): The label that is used to color each bar, at each level of the grouping_label.
row_grouping_label (str): The label that is used to group bars into different row facets.
column_grouping_label (str): The label that is used to group bars into different column facets.
show_error_bar (bool): Whether to show the error bar (standard deviation) for the bars.
x_title (str): x-axis label
y_title (str): y-axis label
plot_top_n (int): plot n of the largest features on average separately (useful when there are too many features to plot at the same time)
plot_bottom_n (int): plot n of the smallest features on average separately (useful when there are too many features to plot at the same time)
plot_all_features (bool): whether to plot all (might be slow for large number of features)
YAML specification:
definitions: reports: my_fvb_report: FeatureValueBarplot: # timepoint, disease_status and age_group are metadata labels column_grouping_label: timepoint row_grouping_label: disease_status color_grouping_label: age_group plot_all_features: true plot_top_n: 10 plot_bottom_n: 5
- classmethod build_object(**kwargs)[source]¶
Creates the object of the subclass of the Report class from the parameters so that it can be used in the analysis. Depending on the type of the report, the parameters provided here will be provided in parsing time, while the other necessary parameters (e.g., subset of the data from which the report should be created) will be provided at runtime. For more details, see specific direct subclasses of this class, describing different types of reports.
- Parameters:
**kwargs – keyword arguments that will be provided by users in the specification (if immuneML is used as a command line tool) or in the dictionary when calling the method from the code, and which should be used to create the report object
- Returns:
the object of the appropriate report class
immuneML.reports.encoding_reports.Matches module¶
- class immuneML.reports.encoding_reports.Matches.Matches(dataset: RepertoireDataset = None, result_path: Path = None, name: str = None, number_of_processes: int = 1)[source]¶
Bases:
EncodingReport
Reports the number of matches that were found when using one of the following encoders:
MatchedSequences encoder
MatchedReceptors encoder
MatchedRegex encoder
Report results are:
A table containing all matches, where the rows correspond to the Repertoires, and the columns correspond to the objects to match (regular expressions or receptor sequences).
The repertoire sizes (read frequencies and the number of unique sequences per repertoire), for each of the chains. This can be used to calculate the percentage of matched sequences in a repertoire.
When using MatchedSequences encoder or MatchedReceptors encoder, tables describing the chains and receptors (ids, chains, V and J genes and sequences).
When using MatchedReceptors encoder or using MatchedRegex encoder with chain pairs, tables describing the paired matches (where a match was found in both chains) per repertoire.
YAML specification:
definitions: reports: my_match_report: Matches
- classmethod build_object(**kwargs)[source]¶
Creates the object of the subclass of the Report class from the parameters so that it can be used in the analysis. Depending on the type of the report, the parameters provided here will be provided in parsing time, while the other necessary parameters (e.g., subset of the data from which the report should be created) will be provided at runtime. For more details, see specific direct subclasses of this class, describing different types of reports.
- Parameters:
**kwargs – keyword arguments that will be provided by users in the specification (if immuneML is used as a command line tool) or in the dictionary when calling the method from the code, and which should be used to create the report object
- Returns:
the object of the appropriate report class
- check_prerequisites()[source]¶
Checks prerequisites for the generation of the report of specific class (e.g., if the class of the MLMethod instance is the one required by the report, if the data has been encoded to make a report of encoded dataset). In the instructions in immuneML, this function is used to determine whether to call generate_report() in the specific situation. Each report subclass has its own set of prerequisites. If the report cannot be run, the information on this will be logged and the report skipped in the specific situation. No error will be raised. See subclasses of the class
Instruction
for more information on how the reports are executed.- Returns:
boolean value True if the prerequisites are o.k., and False otherwise.
immuneML.reports.encoding_reports.RelevantSequenceExporter module¶
- class immuneML.reports.encoding_reports.RelevantSequenceExporter.RelevantSequenceExporter(dataset: RepertoireDataset = None, result_path: Path = None, name: str = None, number_of_processes: int = 1)[source]¶
Bases:
EncodingReport
Exports the sequences that are extracted as label-associated when using the
SequenceAbundanceEncoder
orCompAIRRSequenceAbundanceEncoder
in AIRR-compliant format.YAML specification:
definitions: reports: my_relevant_sequences: RelevantSequenceExporter
- classmethod build_object(**kwargs)[source]¶
Creates the object of the subclass of the Report class from the parameters so that it can be used in the analysis. Depending on the type of the report, the parameters provided here will be provided in parsing time, while the other necessary parameters (e.g., subset of the data from which the report should be created) will be provided at runtime. For more details, see specific direct subclasses of this class, describing different types of reports.
- Parameters:
**kwargs – keyword arguments that will be provided by users in the specification (if immuneML is used as a command line tool) or in the dictionary when calling the method from the code, and which should be used to create the report object
- Returns:
the object of the appropriate report class
- check_prerequisites()[source]¶
Checks prerequisites for the generation of the report of specific class (e.g., if the class of the MLMethod instance is the one required by the report, if the data has been encoded to make a report of encoded dataset). In the instructions in immuneML, this function is used to determine whether to call generate_report() in the specific situation. Each report subclass has its own set of prerequisites. If the report cannot be run, the information on this will be logged and the report skipped in the specific situation. No error will be raised. See subclasses of the class
Instruction
for more information on how the reports are executed.- Returns:
boolean value True if the prerequisites are o.k., and False otherwise.