immuneML.reports.encoding_reports package

Submodules

immuneML.reports.encoding_reports.DesignMatrixExporter module

class immuneML.reports.encoding_reports.DesignMatrixExporter.DesignMatrixExporter(dataset: Dataset = None, result_path: Path = None, file_format: str = None, number_of_processes: int = 1, name: str = None)[source]

Bases: EncodingReport

Exports the design matrix and related information of a given encoded Dataset to csv files. If the encoded data has more than 2 dimensions (such as when using the OneHot encoder with option Flatten=False), the data are then exported to different formats to facilitate their import with external software.

Parameters:
  • file_format (str) – the format and extension of the file to store the design matrix. The supported formats are:

  • npy

  • csv

  • hdf5

  • npy.zip

  • hdf5.zip. (csv.zip or) –

  • Note – when using hdf5 or hdf5.zip output formats, make sure the ‘hdf5’ dependency is installed.

YAML specification:

my_dme_report:
    DesignMatrixExporter:
        file_format: csv
classmethod build_object(**kwargs)[source]

Creates the object of the subclass of the Report class from the parameters so that it can be used in the analysis. Depending on the type of the report, the parameters provided here will be provided in parsing time, while the other necessary parameters (e.g., subset of the data from which the report should be created) will be provided at runtime. For more details, see specific direct subclasses of this class, describing different types of reports.

Parameters:

**kwargs – keyword arguments that will be provided by users in the specification (if immuneML is used as a command line tool) or in the dictionary when calling the method from the code, and which should be used to create the report object

Returns:

the object of the appropriate report class

check_prerequisites()[source]

Checks prerequisites for the generation of the report of specific class (e.g., if the class of the MLMethod instance is the one required by the report, if the data has been encoded to make a report of encoded dataset). In the instructions in immuneML, this function is used to determine whether to call generate_report() in the specific situation. Each report subclass has its own set of prerequisites. If the report cannot be run, the information on this will be logged and the report skipped in the specific situation. No error will be raised. See subclasses of the class Instruction for more information on how the reports are executed.

Returns:

boolean value True if the prerequisites are o.k., and False otherwise.

immuneML.reports.encoding_reports.EncodingReport module

class immuneML.reports.encoding_reports.EncodingReport.EncodingReport(dataset: Dataset = None, result_path: Path = None, name: str = None, number_of_processes: int = 1)[source]

Bases: Report

Encoding reports show some type of features or statistics about an encoded dataset, or may in some cases export relevant sequences or tables.

When running the TrainMLModel instruction, encoding reports can be specified inside the ‘selection’ or ‘assessment’ specification under the key ‘reports/encoding’. Example:

my_instruction:
    type: TrainMLModel
    selection:
        reports:
            encoding:
                - my_encoding_report
        # other parameters...
    assessment:
        reports:
            encoding:
                - my_encoding_report
        # other parameters...
    # other parameters...

Alternatively, when running the ExploratoryAnalysis instruction, encoding reports can be specified under ‘report’. Example:

my_instruction:
    type: ExploratoryAnalysis
    analyses:
        my_first_analysis:
            report: my_encoding_report
            # other parameters...
    # other parameters...
__init__(dataset: Dataset = None, result_path: Path = None, name: str = None, number_of_processes: int = 1)[source]

The arguments defined below are set at runtime by the instruction. Concrete classes inheriting EncodingReport may include additional parameters that will be set by the user in the form of input arguments.

dataset (Dataset): an encoded dataset where encoded_data attribute is set to an instance of EncodedData object result_path (Path): path where the results will be stored (plots, tables, etc.) name (str): user-defined name of the report that will be shown in the HTML overview later number_of_processes (int): how many processes should be created at once to speed up the analysis. For personal machines, 4 or 8 is usually a good choice.

static get_title()[source]

immuneML.reports.encoding_reports.FeatureComparison module

class immuneML.reports.encoding_reports.FeatureComparison.FeatureComparison(dataset: Dataset = None, result_path: Path = None, comparison_label: str = None, color_grouping_label: str = None, row_grouping_label=None, column_grouping_label=None, opacity: float = 0.7, show_error_bar=True, log_scale: bool = False, keep_fraction: int = 1, number_of_processes: int = 1, name: str = None)[source]

Bases: FeatureReport

Compares the feature values in a given encoded data matrix across two values for a metadata label. These labels are specified in the metadata file for repertoire datasets, or as metadata columns for sequence and receptor datasets. Can be used in combination with any encoding and dataset type. This report produces a scatterplot, where each point represents one feature, and the values on the x and y axes are the average feature values across two subsets of the data. For example, when KmerFrequency encoder is used, and the comparison_label is used to represent a disease (true/false), then the features are the k-mers (AAA, AAC, etc..) and their x and y position in the scatterplot is determined by their frequency in the subset of the data where disease=true and disease=false.

Optional metadata labels can be specified to divide the scatterplot into groups based on color, row facets or column facets.

Alternatively, when the feature values are of interest without comparing them between labelled subgroups of the data, please use FeatureValueBarplot or FeatureDistribution instead.

Parameters:
  • comparison_label (str) – Mandatory label. This label is used to split the encoded data matrix and define the x and y axes of the plot.

  • example (This label is only allowed to have 2 classes (for) – sick and healthy, binding and non-binding).

  • color_grouping_label (str) – Optional label that is used to color the points in the scatterplot. This can not be the same as comparison_label.

  • row_grouping_label (str) – Optional label that is used to group scatterplots into different row facets. This can not be the same as comparison_label.

  • column_grouping_label (str) – Optional label that is used to group scatterplots into different column facets. This can not be the same as comparison_label.

  • show_error_bar (bool) – Whether to show the error bar (standard deviation) for the points, both in the x and y dimension.

  • log_scale (bool) – Whether to plot the x and y axes in log10 scale (log_scale = True) or continuous scale (log_scale = False). By default, log_scale is False.

  • keep_fraction (float) – The total number of features may be very large and only the features differing significantly across

  • 1 (keep_fraction is) –

  • that (only the fraction of features) –

  • plotting (differs the most across comparison labels is kept for) –

  • default (By) –

  • 1

  • plotted. (meaning that all features are) –

  • opacity (float) – a value between 0 and 1 setting the opacity for data points making it easier to see if there are overlapping points

YAML specification:

my_comparison_report:
    FeatureComparison: # compare the different classes defined in the label disease
        comparison_label: disease
add_diagonal(figure)[source]
classmethod build_object(**kwargs)[source]

Creates the object of the subclass of the Report class from the parameters so that it can be used in the analysis. Depending on the type of the report, the parameters provided here will be provided in parsing time, while the other necessary parameters (e.g., subset of the data from which the report should be created) will be provided at runtime. For more details, see specific direct subclasses of this class, describing different types of reports.

Parameters:

**kwargs – keyword arguments that will be provided by users in the specification (if immuneML is used as a command line tool) or in the dictionary when calling the method from the code, and which should be used to create the report object

Returns:

the object of the appropriate report class

check_prerequisites()[source]

Checks prerequisites for the generation of the report of specific class (e.g., if the class of the MLMethod instance is the one required by the report, if the data has been encoded to make a report of encoded dataset). In the instructions in immuneML, this function is used to determine whether to call generate_report() in the specific situation. Each report subclass has its own set of prerequisites. If the report cannot be run, the information on this will be logged and the report skipped in the specific situation. No error will be raised. See subclasses of the class Instruction for more information on how the reports are executed.

Returns:

boolean value True if the prerequisites are o.k., and False otherwise.

immuneML.reports.encoding_reports.FeatureDistribution module

class immuneML.reports.encoding_reports.FeatureDistribution.FeatureDistribution(dataset: Dataset = None, result_path: Path = None, color_grouping_label: str = None, row_grouping_label=None, column_grouping_label=None, mode: str = 'auto', x_title: str = None, y_title: str = None, number_of_processes: int = 1, name: str = None)[source]

Bases: FeatureReport

Plots a boxplot for each feature in the encoded data matrix. Can be used in combination with any encoding and dataset type. Each boxplot represents a feature and shows the distribution of values for that feature. For example, when KmerFrequency encoder is used, the features are the k-mers (AAA, AAC, etc..) and the feature values are the frequencies per k-mer.

Two modes can be used: in the ‘normal’ mode there are normal boxplots corresponding to each column of the encoded dataset matrix; in the ‘sparse’ mode all zero cells are eliminated before passing the data to the boxplots. If mode is set to ‘auto’, then it will automatically set to ‘sparse’ if the density of the matrix is below 0.01

Optional metadata labels can be specified to divide the boxplots into groups based on color, row facets or column facets. These labels are specified in the metadata file for repertoire datasets, or as metadata columns for sequence and receptor datasets.

Alternatively, when only the mean feature values are of interest (as opposed to showing the complete distribution, as done here), please consider using FeatureValueBarplot instead. When comparing the feature values between two subsets of the data, please use FeatureComparison.

Parameters:
  • color_grouping_label (str) – The label that is used to color each bar, at each level of the grouping_label.

  • row_grouping_label (str) – The label that is used to group bars into different row facets.

  • column_grouping_label (str) – The label that is used to group bars into different column facets.

  • mode (str) – either ‘normal’, ‘sparse’ or ‘auto’ (default)

  • x_title (str) – x-axis label

  • y_title (str) – y-axis label

YAML specification:

my_fdistr_report:
    FeatureDistribution:
        mode: sparse
classmethod build_object(**kwargs)[source]

Creates the object of the subclass of the Report class from the parameters so that it can be used in the analysis. Depending on the type of the report, the parameters provided here will be provided in parsing time, while the other necessary parameters (e.g., subset of the data from which the report should be created) will be provided at runtime. For more details, see specific direct subclasses of this class, describing different types of reports.

Parameters:

**kwargs – keyword arguments that will be provided by users in the specification (if immuneML is used as a command line tool) or in the dictionary when calling the method from the code, and which should be used to create the report object

Returns:

the object of the appropriate report class

immuneML.reports.encoding_reports.FeatureReport module

class immuneML.reports.encoding_reports.FeatureReport.FeatureReport(dataset: Dataset = None, result_path: Path = None, color_grouping_label: str = None, row_grouping_label=None, column_grouping_label=None, name: str = None, number_of_processes: int = 1)[source]

Bases: EncodingReport

Base class for reports that plot something about the reshaped feature values of any dataset.

check_prerequisites()[source]

Checks prerequisites for the generation of the report of specific class (e.g., if the class of the MLMethod instance is the one required by the report, if the data has been encoded to make a report of encoded dataset). In the instructions in immuneML, this function is used to determine whether to call generate_report() in the specific situation. Each report subclass has its own set of prerequisites. If the report cannot be run, the information on this will be logged and the report skipped in the specific situation. No error will be raised. See subclasses of the class Instruction for more information on how the reports are executed.

Returns:

boolean value True if the prerequisites are o.k., and False otherwise.

std(x)[source]

immuneML.reports.encoding_reports.FeatureValueBarplot module

class immuneML.reports.encoding_reports.FeatureValueBarplot.FeatureValueBarplot(dataset: RepertoireDataset = None, result_path: Path = None, color_grouping_label: str = None, row_grouping_label=None, column_grouping_label=None, x_title: str = None, y_title: str = None, show_error_bar=True, name: str = None, plot_all_features: bool = True, number_of_processes: int = 1, plot_top_n: int = None, plot_bottom_n: int = None)[source]

Bases: FeatureReport

Plots a barplot of the feature values in a given encoded data matrix, averaged across examples. Can be used in combination with any encoding and dataset type. Each bar in the barplot represents the mean value of a given feature, and along the x-axis are the different features. For example, when KmerFrequency encoder is used, the features are the k-mers (AAA, AAC, etc..) and the feature values are the frequencies per k-mer.

Optional metadata labels can be specified to divide the barplot into groups based on color, row facets or column facets. In this case, the average feature values in each group are plotted. These labels are specified in the metadata file for repertoire datasets, or as metadata columns for sequence and receptor datasets.

Alternatively, when the distribution of feature values is of interest (as opposed to showing only the mean, as done here), please consider using FeatureDistribution instead. When comparing the feature values between two subsets of the data, please use FeatureComparison.

Parameters:
  • color_grouping_label (str) – The label that is used to color each bar, at each level of the grouping_label.

  • row_grouping_label (str) – The label that is used to group bars into different row facets.

  • column_grouping_label (str) – The label that is used to group bars into different column facets.

  • show_error_bar (bool) – Whether to show the error bar (standard deviation) for the bars.

  • x_title (str) – x-axis label

  • y_title (str) – y-axis label

  • plot_top_n (int) – plot n of the largest features on average separately (useful when there are too many features to plot at the same time)

  • plot_bottom_n (int) – plot n of the smallest features on average separately (useful when there are too many features to plot at the same time)

  • plot_all_features (bool) – whether to plot all (might be slow for large number of features)

YAML specification:

my_fvb_report:
    FeatureValueBarplot: # timepoint, disease_status and age_group are metadata labels
        column_grouping_label: timepoint
        row_grouping_label: disease_status
        color_grouping_label: age_group
        plot_all_features: true
        plot_top_n: 10
        plot_bottom_n: 5
classmethod build_object(**kwargs)[source]

Creates the object of the subclass of the Report class from the parameters so that it can be used in the analysis. Depending on the type of the report, the parameters provided here will be provided in parsing time, while the other necessary parameters (e.g., subset of the data from which the report should be created) will be provided at runtime. For more details, see specific direct subclasses of this class, describing different types of reports.

Parameters:

**kwargs – keyword arguments that will be provided by users in the specification (if immuneML is used as a command line tool) or in the dictionary when calling the method from the code, and which should be used to create the report object

Returns:

the object of the appropriate report class

immuneML.reports.encoding_reports.Matches module

class immuneML.reports.encoding_reports.Matches.Matches(dataset: RepertoireDataset = None, result_path: Path = None, name: str = None, number_of_processes: int = 1)[source]

Bases: EncodingReport

Reports the number of matches that were found when using one of the following encoders:

  • MatchedSequences encoder

  • MatchedReceptors encoder

  • MatchedRegex encoder

Report results are:

  • A table containing all matches, where the rows correspond to the Repertoires, and the columns correspond to the objects to match (regular expressions or receptor sequences).

  • The repertoire sizes (read frequencies and the number of unique sequences per repertoire), for each of the chains. This can be used to calculate the percentage of matched sequences in a repertoire.

  • When using MatchedSequences encoder or MatchedReceptors encoder, tables describing the chains and receptors (ids, chains, V and J genes and sequences).

  • When using MatchedReceptors encoder or using MatchedRegex encoder with chain pairs, tables describing the paired matches (where a match was found in both chains) per repertoire.

YAML Specification:

my_match_report: Matches
classmethod build_object(**kwargs)[source]

Creates the object of the subclass of the Report class from the parameters so that it can be used in the analysis. Depending on the type of the report, the parameters provided here will be provided in parsing time, while the other necessary parameters (e.g., subset of the data from which the report should be created) will be provided at runtime. For more details, see specific direct subclasses of this class, describing different types of reports.

Parameters:

**kwargs – keyword arguments that will be provided by users in the specification (if immuneML is used as a command line tool) or in the dictionary when calling the method from the code, and which should be used to create the report object

Returns:

the object of the appropriate report class

check_prerequisites()[source]

Checks prerequisites for the generation of the report of specific class (e.g., if the class of the MLMethod instance is the one required by the report, if the data has been encoded to make a report of encoded dataset). In the instructions in immuneML, this function is used to determine whether to call generate_report() in the specific situation. Each report subclass has its own set of prerequisites. If the report cannot be run, the information on this will be logged and the report skipped in the specific situation. No error will be raised. See subclasses of the class Instruction for more information on how the reports are executed.

Returns:

boolean value True if the prerequisites are o.k., and False otherwise.

immuneML.reports.encoding_reports.RelevantSequenceExporter module

class immuneML.reports.encoding_reports.RelevantSequenceExporter.RelevantSequenceExporter(dataset: RepertoireDataset = None, result_path: Path = None, name: str = None, number_of_processes: int = 1)[source]

Bases: EncodingReport

Exports the sequences that are extracted as label-associated when using the SequenceAbundanceEncoder or CompAIRRSequenceAbundanceEncoder in AIRR-compliant format.

Arguments: there are no arguments for this report.

YAML specification:

my_relevant_sequences: RelevantSequenceExporter
COLUMN_MAPPING = {'j_genes': 'j_call', 'sequence_aas': 'cdr3_aa', 'sequences': 'cdr3', 'v_genes': 'v_call'}
classmethod build_object(**kwargs)[source]

Creates the object of the subclass of the Report class from the parameters so that it can be used in the analysis. Depending on the type of the report, the parameters provided here will be provided in parsing time, while the other necessary parameters (e.g., subset of the data from which the report should be created) will be provided at runtime. For more details, see specific direct subclasses of this class, describing different types of reports.

Parameters:

**kwargs – keyword arguments that will be provided by users in the specification (if immuneML is used as a command line tool) or in the dictionary when calling the method from the code, and which should be used to create the report object

Returns:

the object of the appropriate report class

check_prerequisites()[source]

Checks prerequisites for the generation of the report of specific class (e.g., if the class of the MLMethod instance is the one required by the report, if the data has been encoded to make a report of encoded dataset). In the instructions in immuneML, this function is used to determine whether to call generate_report() in the specific situation. Each report subclass has its own set of prerequisites. If the report cannot be run, the information on this will be logged and the report skipped in the specific situation. No error will be raised. See subclasses of the class Instruction for more information on how the reports are executed.

Returns:

boolean value True if the prerequisites are o.k., and False otherwise.

Module contents