immuneML.reports.encoding_reports package

Submodules

immuneML.reports.encoding_reports.DesignMatrixExporter module

class immuneML.reports.encoding_reports.DesignMatrixExporter.DesignMatrixExporter(dataset: Optional[immuneML.data_model.dataset.Dataset.Dataset] = None, result_path: Optional[pathlib.Path] = None, name: Optional[str] = None, file_format: Optional[str] = None)[source]

Bases: immuneML.reports.encoding_reports.EncodingReport.EncodingReport

Exports the design matrix and related information of a given encoded Dataset to csv files. If the encoded data has more than 2 dimensions (such as when using the OneHot encoder with option Flatten=False), the data are then exported to different formats to facilitate their import with external software.

Parameters
  • file_format (str) – the format and extension of the file to store the design matrix. The supported formats are:

  • npy

  • csv

  • hdf5

  • npy.zip

  • or hdf5.zip. (csv.zip) –

YAML specification:

my_dme_report: DesignMatrixExporter
    DesignMatrixExporter:
        file_format: csv
classmethod build_object(**kwargs)[source]

Creates the object of the subclass of the Report class from the parameters so that it can be used in the analysis. Depending on the type of the report, the parameters provided here will be provided in parsing time, while the other necessary parameters (e.g., subset of the data from which the report should be created) will be provided at runtime. For more details, see specific direct subclasses of this class, describing different types of reports.

Parameters

**kwargs – keyword arguments that will be provided by users in the specification (if immuneML is used as a command line tool) or in the dictionary when calling the method from the code, and which should be used to create the report object

Returns

the object of the appropriate report class

check_prerequisites()[source]

Checks prerequisites for the generation of the report of specific class (e.g., if the class of the MLMethod instance is the one required by the report, if the data has been encoded to make a report of encoded dataset). In the instructions in immuneML, this function is used to determine whether to call generate_report() in the specific situation. Each report subclass has its own set of prerequisites. If the report cannot be run, the information on this will be logged and the report skipped in the specific situation. No error will be raised. See subclasses of the class Instruction for more information on how the reports are executed.

Returns

boolean value True if the prerequisites are o.k., and False otherwise.

dataset: immuneML.data_model.dataset.Dataset.Dataset = None
file_format: str = None
name: str = None
result_path: pathlib.Path = None

immuneML.reports.encoding_reports.EncodingReport module

class immuneML.reports.encoding_reports.EncodingReport.EncodingReport(dataset: Optional[immuneML.data_model.dataset.Dataset.Dataset] = None, result_path: Optional[pathlib.Path] = None, name: Optional[str] = None)[source]

Bases: immuneML.reports.Report.Report

Encoding reports show some type of features or statistics about an encoded dataset, or may in some cases export relevant sequences or tables.

When running the TrainMLModel instruction, encoding reports can be specified inside the ‘selection’ or ‘assessment’ specification under the key ‘reports:encoding’. Alternatively, when running the ExploratoryAnalysis instruction, encoding reports can be specified under ‘reports’.

When using the reports with instructions such as ExploratoryAnalysis or TrainMLModel, the arguments defined below are set at runtime by the instruction. Concrete classes inheriting EncodingReport may include additional parameters that will be set by the user in the form of input arguments.

Parameters
  • dataset (Dataset) – an encoded dataset where encoded_data attribute is set to an instance of EncodedData object

  • result_path (Path) – path where the results will be stored (plots, tables, etc.)

  • name (str) – user-defined name of the report that will be shown in the HTML overview later

dataset: immuneML.data_model.dataset.Dataset.Dataset = None
static get_title()[source]
name: str = None
result_path: pathlib.Path = None

immuneML.reports.encoding_reports.FeatureDistribution module

class immuneML.reports.encoding_reports.FeatureDistribution.FeatureDistribution(dataset: Optional[immuneML.data_model.dataset.Dataset.Dataset] = None, result_path: Optional[pathlib.Path] = None, color_grouping_label: Optional[str] = None, row_grouping_label=None, column_grouping_label=None, mode: str = 'auto', x_title: Optional[str] = None, y_title: Optional[str] = None, name: Optional[str] = None)[source]

Bases: immuneML.reports.encoding_reports.FeatureReport.FeatureReport

Plots a boxplot for each feature in the encoded data matrix. Can be used in combination with any encoding and dataset type. Each boxplot represents a feature and shows the distribution of values for that feature. For example, when KmerFrequency encoder is used, the features are the k-mers (AAA, AAC, etc..) and the feature values are the frequencies per k-mer.

Two modes can be used: in the ‘normal’ mode there are normal boxplots corresponding to each column of the encoded dataset matrix; in the ‘sparse’ mode all zero cells are eliminated before passing the data to the boxplots. If mode is set to ‘auto’, then it will automatically set to ‘sparse’ if the density of the matrix is below 0.01

Optional metadata labels can be specified to divide the boxplots into groups based on color, row facets or column facets. These labels are specified in the metadata file for repertoire datasets, or as metadata columns for sequence and receptor datasets.

Alternatively, when only the mean feature values are of interest (as opposed to showing the complete distribution, as done here), please consider using FeatureValueBarplot instead. When comparing the feature values between two subsets of the data, please use FeatureComparison.

Parameters
  • color_grouping_label (str) – The label that is used to color each bar, at each level of the grouping_label.

  • row_grouping_label (str) – The label that is used to group bars into different row facets.

  • column_grouping_label (str) – The label that is used to group bars into different column facets.

  • mode (str) – either ‘normal’, ‘sparse’ or ‘auto’ (default)

  • x_title (str) – x-axis label

  • y_title (str) – y-axis label

YAML specification:

my_fdistr_report:
    FeatureDistribution:
        mode: sparse
classmethod build_object(**kwargs)[source]

Creates the object of the subclass of the Report class from the parameters so that it can be used in the analysis. Depending on the type of the report, the parameters provided here will be provided in parsing time, while the other necessary parameters (e.g., subset of the data from which the report should be created) will be provided at runtime. For more details, see specific direct subclasses of this class, describing different types of reports.

Parameters

**kwargs – keyword arguments that will be provided by users in the specification (if immuneML is used as a command line tool) or in the dictionary when calling the method from the code, and which should be used to create the report object

Returns

the object of the appropriate report class

immuneML.reports.encoding_reports.FeatureValueBarplot module

class immuneML.reports.encoding_reports.FeatureValueBarplot.FeatureValueBarplot(dataset: Optional[immuneML.data_model.dataset.RepertoireDataset.RepertoireDataset] = None, result_path: Optional[pathlib.Path] = None, color_grouping_label: Optional[str] = None, row_grouping_label=None, column_grouping_label=None, x_title: Optional[str] = None, y_title: Optional[str] = None, show_error_bar=True, name: Optional[str] = None)[source]

Bases: immuneML.reports.encoding_reports.FeatureReport.FeatureReport

Plots a barplot of the feature values in a given encoded data matrix, averaged across examples. Can be used in combination with any encoding and dataset type. Each bar in the barplot represents the mean value of a given feature, and along the x-axis are the different features. For example, when KmerFrequency encoder is used, the features are the k-mers (AAA, AAC, etc..) and the feature values are the frequencies per k-mer.

Optional metadata labels can be specified to divide the barplot into groups based on color, row facets or column facets. In this case, the average feature values in each group are plotted. These labels are specified in the metadata file for repertoire datasets, or as metadata columns for sequence and receptor datasets.

Alternatively, when the distribution of feature values is of interest (as opposed to showing only the mean, as done here), please consider using FeatureDistribution instead. When comparing the feature values between two subsets of the data, please use FeatureComparison.

Parameters
  • color_grouping_label (str) – The label that is used to color each bar, at each level of the grouping_label.

  • row_grouping_label (str) – The label that is used to group bars into different row facets.

  • column_grouping_label (str) – The label that is used to group bars into different column facets.

  • show_error_bar (bool) – Whether to show the error bar (standard deviation) for the bars.

  • x_title (str) – x-axis label

  • y_title (str) – y-axis label

YAML specification:

my_fvb_report:
    FeatureValueBarplot: # timepoint, disease_status and age_group are metadata labels
        column_grouping_label: timepoint
        row_grouping_label: disease_status
        color_grouping_label: age_group
classmethod build_object(**kwargs)[source]

Creates the object of the subclass of the Report class from the parameters so that it can be used in the analysis. Depending on the type of the report, the parameters provided here will be provided in parsing time, while the other necessary parameters (e.g., subset of the data from which the report should be created) will be provided at runtime. For more details, see specific direct subclasses of this class, describing different types of reports.

Parameters

**kwargs – keyword arguments that will be provided by users in the specification (if immuneML is used as a command line tool) or in the dictionary when calling the method from the code, and which should be used to create the report object

Returns

the object of the appropriate report class

immuneML.reports.encoding_reports.Matches module

class immuneML.reports.encoding_reports.Matches.Matches(dataset: Optional[immuneML.data_model.dataset.RepertoireDataset.RepertoireDataset] = None, result_path: Optional[pathlib.Path] = None, name: Optional[str] = None)[source]

Bases: immuneML.reports.encoding_reports.EncodingReport.EncodingReport

Reports the number of matches that were found when using one of the following encoders:

Report results are:

  • A table containing all matches, where the rows correspond to the Repertoires, and the columns correspond to the objects to match (regular expressions or receptor sequences).

  • The repertoire sizes (read frequencies and the number of unique sequences per repertoire), for each of the chains. This can be used to calculate the percentage of matched sequences in a repertoire.

  • When using MatchedSequences encoder or MatchedReceptors encoder, tables describing the chains and receptors (ids, chains, V and J genes and sequences).

  • When using MatchedReceptors encoder or using MatchedRegex encoder with chain pairs, tables describing the paired matches (where a match was found in both chains) per repertoire.

YAML Specification:

my_match_report: Matches
classmethod build_object(**kwargs)[source]

Creates the object of the subclass of the Report class from the parameters so that it can be used in the analysis. Depending on the type of the report, the parameters provided here will be provided in parsing time, while the other necessary parameters (e.g., subset of the data from which the report should be created) will be provided at runtime. For more details, see specific direct subclasses of this class, describing different types of reports.

Parameters

**kwargs – keyword arguments that will be provided by users in the specification (if immuneML is used as a command line tool) or in the dictionary when calling the method from the code, and which should be used to create the report object

Returns

the object of the appropriate report class

check_prerequisites()[source]

Checks prerequisites for the generation of the report of specific class (e.g., if the class of the MLMethod instance is the one required by the report, if the data has been encoded to make a report of encoded dataset). In the instructions in immuneML, this function is used to determine whether to call generate_report() in the specific situation. Each report subclass has its own set of prerequisites. If the report cannot be run, the information on this will be logged and the report skipped in the specific situation. No error will be raised. See subclasses of the class Instruction for more information on how the reports are executed.

Returns

boolean value True if the prerequisites are o.k., and False otherwise.

immuneML.reports.encoding_reports.RelevantSequenceExporter module

class immuneML.reports.encoding_reports.RelevantSequenceExporter.RelevantSequenceExporter(dataset: Optional[immuneML.data_model.dataset.RepertoireDataset.RepertoireDataset] = None, result_path: Optional[pathlib.Path] = None, name: Optional[str] = None)[source]

Bases: immuneML.reports.encoding_reports.EncodingReport.EncodingReport

Exports the sequences that are extracted as label-associated using the SequenceAbundance encoder in AIRR-compliant format.

Arguments: there are no arguments for this report.

YAML specification:

my_relevant_sequences: RelevantSequenceExporter
COLUMN_MAPPING = {'j_genes': 'j_call', 'sequence_aas': 'cdr3_aa', 'sequences': 'cdr3', 'v_genes': 'v_call'}
classmethod build_object(**kwargs)[source]

Creates the object of the subclass of the Report class from the parameters so that it can be used in the analysis. Depending on the type of the report, the parameters provided here will be provided in parsing time, while the other necessary parameters (e.g., subset of the data from which the report should be created) will be provided at runtime. For more details, see specific direct subclasses of this class, describing different types of reports.

Parameters

**kwargs – keyword arguments that will be provided by users in the specification (if immuneML is used as a command line tool) or in the dictionary when calling the method from the code, and which should be used to create the report object

Returns

the object of the appropriate report class

check_prerequisites()[source]

Checks prerequisites for the generation of the report of specific class (e.g., if the class of the MLMethod instance is the one required by the report, if the data has been encoded to make a report of encoded dataset). In the instructions in immuneML, this function is used to determine whether to call generate_report() in the specific situation. Each report subclass has its own set of prerequisites. If the report cannot be run, the information on this will be logged and the report skipped in the specific situation. No error will be raised. See subclasses of the class Instruction for more information on how the reports are executed.

Returns

boolean value True if the prerequisites are o.k., and False otherwise.

Module contents