immuneML.reports.encoding_reports package¶
Submodules¶
immuneML.reports.encoding_reports.DesignMatrixExporter module¶
-
class
immuneML.reports.encoding_reports.DesignMatrixExporter.
DesignMatrixExporter
(dataset: Optional[immuneML.data_model.dataset.Dataset.Dataset] = None, result_path: Optional[pathlib.Path] = None, name: Optional[str] = None, file_format: Optional[str] = None)[source]¶ Bases:
immuneML.reports.encoding_reports.EncodingReport.EncodingReport
Exports the design matrix and related information of a given encoded Dataset to csv files. If the encoded data has more than 2 dimensions (such as when using the OneHot encoder with option Flatten=False), the data are then exported to different formats to facilitate their import with external software.
- Parameters
file_format (str) – the format and extension of the file to store the design matrix. The supported formats are:
npy –
csv –
hdf5 –
npy.zip –
or hdf5.zip. (csv.zip) –
YAML specification:
my_dme_report: DesignMatrixExporter DesignMatrixExporter: file_format: csv
-
classmethod
build_object
(**kwargs)[source]¶ Creates the object of the subclass of the Report class from the parameters so that it can be used in the analysis. Depending on the type of the report, the parameters provided here will be provided in parsing time, while the other necessary parameters (e.g., subset of the data from which the report should be created) will be provided at runtime. For more details, see specific direct subclasses of this class, describing different types of reports.
- Parameters
**kwargs – keyword arguments that will be provided by users in the specification (if immuneML is used as a command line tool) or in the dictionary when calling the method from the code, and which should be used to create the report object
- Returns
the object of the appropriate report class
-
check_prerequisites
()[source]¶ Checks prerequisites for the generation of the report of specific class (e.g., if the class of the MLMethod instance is the one required by the report, if the data has been encoded to make a report of encoded dataset). In the instructions in immuneML, this function is used to determine whether to call generate_report() in the specific situation. Each report subclass has its own set of prerequisites. If the report cannot be run, the information on this will be logged and the report skipped in the specific situation. No error will be raised. See subclasses of the class
Instruction
for more information on how the reports are executed.- Returns
boolean value True if the prerequisites are o.k., and False otherwise.
-
dataset
: immuneML.data_model.dataset.Dataset.Dataset = None¶
-
file_format
: str = None¶
-
name
: str = None¶
-
result_path
: pathlib.Path = None¶
immuneML.reports.encoding_reports.EncodingReport module¶
-
class
immuneML.reports.encoding_reports.EncodingReport.
EncodingReport
(dataset: Optional[immuneML.data_model.dataset.Dataset.Dataset] = None, result_path: Optional[pathlib.Path] = None, name: Optional[str] = None)[source]¶ Bases:
immuneML.reports.Report.Report
Encoding reports show some type of features or statistics about an encoded dataset, or may in some cases export relevant sequences or tables.
When running the TrainMLModel instruction, encoding reports can be specified inside the ‘selection’ or ‘assessment’ specification under the key ‘reports:encoding’. Alternatively, when running the ExploratoryAnalysis instruction, encoding reports can be specified under ‘reports’.
When using the reports with instructions such as ExploratoryAnalysis or TrainMLModel, the arguments defined below are set at runtime by the instruction. Concrete classes inheriting EncodingReport may include additional parameters that will be set by the user in the form of input arguments.
- Parameters
dataset (Dataset) – an encoded dataset where encoded_data attribute is set to an instance of EncodedData object
result_path (Path) – path where the results will be stored (plots, tables, etc.)
name (str) – user-defined name of the report that will be shown in the HTML overview later
-
dataset
: immuneML.data_model.dataset.Dataset.Dataset = None¶
-
name
: str = None¶
-
result_path
: pathlib.Path = None¶
immuneML.reports.encoding_reports.FeatureDistribution module¶
-
class
immuneML.reports.encoding_reports.FeatureDistribution.
FeatureDistribution
(dataset: Optional[immuneML.data_model.dataset.Dataset.Dataset] = None, result_path: Optional[pathlib.Path] = None, color_grouping_label: Optional[str] = None, row_grouping_label=None, column_grouping_label=None, mode: str = 'auto', x_title: Optional[str] = None, y_title: Optional[str] = None, name: Optional[str] = None)[source]¶ Bases:
immuneML.reports.encoding_reports.FeatureReport.FeatureReport
Plots a boxplot for each feature in the encoded data matrix. Can be used in combination with any encoding and dataset type. Each boxplot represents a feature and shows the distribution of values for that feature. For example, when KmerFrequency encoder is used, the features are the k-mers (AAA, AAC, etc..) and the feature values are the frequencies per k-mer.
Two modes can be used: in the ‘normal’ mode there are normal boxplots corresponding to each column of the encoded dataset matrix; in the ‘sparse’ mode all zero cells are eliminated before passing the data to the boxplots. If mode is set to ‘auto’, then it will automatically set to ‘sparse’ if the density of the matrix is below 0.01
Optional metadata labels can be specified to divide the boxplots into groups based on color, row facets or column facets. These labels are specified in the metadata file for repertoire datasets, or as metadata columns for sequence and receptor datasets.
Alternatively, when only the mean feature values are of interest (as opposed to showing the complete distribution, as done here), please consider using FeatureValueBarplot instead. When comparing the feature values between two subsets of the data, please use FeatureComparison.
- Parameters
color_grouping_label (str) – The label that is used to color each bar, at each level of the grouping_label.
row_grouping_label (str) – The label that is used to group bars into different row facets.
column_grouping_label (str) – The label that is used to group bars into different column facets.
mode (str) – either ‘normal’, ‘sparse’ or ‘auto’ (default)
x_title (str) – x-axis label
y_title (str) – y-axis label
YAML specification:
my_fdistr_report: FeatureDistribution: mode: sparse
-
classmethod
build_object
(**kwargs)[source]¶ Creates the object of the subclass of the Report class from the parameters so that it can be used in the analysis. Depending on the type of the report, the parameters provided here will be provided in parsing time, while the other necessary parameters (e.g., subset of the data from which the report should be created) will be provided at runtime. For more details, see specific direct subclasses of this class, describing different types of reports.
- Parameters
**kwargs – keyword arguments that will be provided by users in the specification (if immuneML is used as a command line tool) or in the dictionary when calling the method from the code, and which should be used to create the report object
- Returns
the object of the appropriate report class
immuneML.reports.encoding_reports.FeatureValueBarplot module¶
-
class
immuneML.reports.encoding_reports.FeatureValueBarplot.
FeatureValueBarplot
(dataset: Optional[immuneML.data_model.dataset.RepertoireDataset.RepertoireDataset] = None, result_path: Optional[pathlib.Path] = None, color_grouping_label: Optional[str] = None, row_grouping_label=None, column_grouping_label=None, x_title: Optional[str] = None, y_title: Optional[str] = None, show_error_bar=True, name: Optional[str] = None)[source]¶ Bases:
immuneML.reports.encoding_reports.FeatureReport.FeatureReport
Plots a barplot of the feature values in a given encoded data matrix, averaged across examples. Can be used in combination with any encoding and dataset type. Each bar in the barplot represents the mean value of a given feature, and along the x-axis are the different features. For example, when KmerFrequency encoder is used, the features are the k-mers (AAA, AAC, etc..) and the feature values are the frequencies per k-mer.
Optional metadata labels can be specified to divide the barplot into groups based on color, row facets or column facets. In this case, the average feature values in each group are plotted. These labels are specified in the metadata file for repertoire datasets, or as metadata columns for sequence and receptor datasets.
Alternatively, when the distribution of feature values is of interest (as opposed to showing only the mean, as done here), please consider using FeatureDistribution instead. When comparing the feature values between two subsets of the data, please use FeatureComparison.
- Parameters
color_grouping_label (str) – The label that is used to color each bar, at each level of the grouping_label.
row_grouping_label (str) – The label that is used to group bars into different row facets.
column_grouping_label (str) – The label that is used to group bars into different column facets.
show_error_bar (bool) – Whether to show the error bar (standard deviation) for the bars.
x_title (str) – x-axis label
y_title (str) – y-axis label
YAML specification:
my_fvb_report: FeatureValueBarplot: # timepoint, disease_status and age_group are metadata labels column_grouping_label: timepoint row_grouping_label: disease_status color_grouping_label: age_group
-
classmethod
build_object
(**kwargs)[source]¶ Creates the object of the subclass of the Report class from the parameters so that it can be used in the analysis. Depending on the type of the report, the parameters provided here will be provided in parsing time, while the other necessary parameters (e.g., subset of the data from which the report should be created) will be provided at runtime. For more details, see specific direct subclasses of this class, describing different types of reports.
- Parameters
**kwargs – keyword arguments that will be provided by users in the specification (if immuneML is used as a command line tool) or in the dictionary when calling the method from the code, and which should be used to create the report object
- Returns
the object of the appropriate report class
immuneML.reports.encoding_reports.Matches module¶
-
class
immuneML.reports.encoding_reports.Matches.
Matches
(dataset: Optional[immuneML.data_model.dataset.RepertoireDataset.RepertoireDataset] = None, result_path: Optional[pathlib.Path] = None, name: Optional[str] = None)[source]¶ Bases:
immuneML.reports.encoding_reports.EncodingReport.EncodingReport
Reports the number of matches that were found when using one of the following encoders:
MatchedSequences encoder
MatchedReceptors encoder
MatchedRegex encoder
Report results are:
A table containing all matches, where the rows correspond to the Repertoires, and the columns correspond to the objects to match (regular expressions or receptor sequences).
The repertoire sizes (read frequencies and the number of unique sequences per repertoire), for each of the chains. This can be used to calculate the percentage of matched sequences in a repertoire.
When using MatchedSequences encoder or MatchedReceptors encoder, tables describing the chains and receptors (ids, chains, V and J genes and sequences).
When using MatchedReceptors encoder or using MatchedRegex encoder with chain pairs, tables describing the paired matches (where a match was found in both chains) per repertoire.
YAML Specification:
my_match_report: Matches
-
classmethod
build_object
(**kwargs)[source]¶ Creates the object of the subclass of the Report class from the parameters so that it can be used in the analysis. Depending on the type of the report, the parameters provided here will be provided in parsing time, while the other necessary parameters (e.g., subset of the data from which the report should be created) will be provided at runtime. For more details, see specific direct subclasses of this class, describing different types of reports.
- Parameters
**kwargs – keyword arguments that will be provided by users in the specification (if immuneML is used as a command line tool) or in the dictionary when calling the method from the code, and which should be used to create the report object
- Returns
the object of the appropriate report class
-
check_prerequisites
()[source]¶ Checks prerequisites for the generation of the report of specific class (e.g., if the class of the MLMethod instance is the one required by the report, if the data has been encoded to make a report of encoded dataset). In the instructions in immuneML, this function is used to determine whether to call generate_report() in the specific situation. Each report subclass has its own set of prerequisites. If the report cannot be run, the information on this will be logged and the report skipped in the specific situation. No error will be raised. See subclasses of the class
Instruction
for more information on how the reports are executed.- Returns
boolean value True if the prerequisites are o.k., and False otherwise.
immuneML.reports.encoding_reports.RelevantSequenceExporter module¶
-
class
immuneML.reports.encoding_reports.RelevantSequenceExporter.
RelevantSequenceExporter
(dataset: Optional[immuneML.data_model.dataset.RepertoireDataset.RepertoireDataset] = None, result_path: Optional[pathlib.Path] = None, name: Optional[str] = None)[source]¶ Bases:
immuneML.reports.encoding_reports.EncodingReport.EncodingReport
Exports the sequences that are extracted as label-associated using the SequenceAbundance encoder in AIRR-compliant format.
Arguments: there are no arguments for this report.
YAML specification:
my_relevant_sequences: RelevantSequenceExporter
-
COLUMN_MAPPING
= {'j_genes': 'j_call', 'sequence_aas': 'cdr3_aa', 'sequences': 'cdr3', 'v_genes': 'v_call'}¶
-
classmethod
build_object
(**kwargs)[source]¶ Creates the object of the subclass of the Report class from the parameters so that it can be used in the analysis. Depending on the type of the report, the parameters provided here will be provided in parsing time, while the other necessary parameters (e.g., subset of the data from which the report should be created) will be provided at runtime. For more details, see specific direct subclasses of this class, describing different types of reports.
- Parameters
**kwargs – keyword arguments that will be provided by users in the specification (if immuneML is used as a command line tool) or in the dictionary when calling the method from the code, and which should be used to create the report object
- Returns
the object of the appropriate report class
-
check_prerequisites
()[source]¶ Checks prerequisites for the generation of the report of specific class (e.g., if the class of the MLMethod instance is the one required by the report, if the data has been encoded to make a report of encoded dataset). In the instructions in immuneML, this function is used to determine whether to call generate_report() in the specific situation. Each report subclass has its own set of prerequisites. If the report cannot be run, the information on this will be logged and the report skipped in the specific situation. No error will be raised. See subclasses of the class
Instruction
for more information on how the reports are executed.- Returns
boolean value True if the prerequisites are o.k., and False otherwise.
-