immuneML.reports.data_reports package¶

Submodules¶

immuneML.reports.data_reports.AminoAcidFrequencyDistribution module¶

immuneML.reports.data_reports.CompAIRRClusteringReport module¶

immuneML.reports.data_reports.DataReport module¶

class immuneML.reports.data_reports.DataReport.DataReport(dataset: Dataset = None, result_path: Path = None, name: str = None, number_of_processes: int = 1)[source]¶

Bases: Report

Data reports show some type of features or statistics about a given dataset.

When running the TrainMLModel instruction, data reports can be specified inside the ‘selection’ or ‘assessment’ specification under the keys ‘reports/data’ (current cross-validation split) or ‘reports/data_splits’ (train/test sub-splits). Example:

definitions:
    reports:
        my_data_report: SequenceCountDistribution
my_instruction:
    type: TrainMLModel
    selection:
        reports:
            data:
                - my_data_report
        # other parameters...
    assessment:
        reports:
            data:
                - my_data_report
        # other parameters...
    # other parameters...

Alternatively, when running the ExploratoryAnalysis instruction, data reports can be specified under ‘report’. Example:

my_instruction:
    type: ExploratoryAnalysis
    analyses:
        my_first_analysis:
            report: my_data_report
            # other parameters...
    # other parameters...

DOCS_TITLE = 'Data reports'¶

__init__(dataset: Dataset = None, result_path: Path = None, name: str = None, number_of_processes: int = 1)[source]¶

The arguments defined below are set at runtime by the instruction. Concrete classes inheriting DataReport may include additional parameters that will be set by the user in the form of input arguments.

dataset (Dataset): a dataset object (can be repertoire, receptor or sequence dataset, depending on the specific report) result_path (Path): location where the results (plots, tables, etc.) will be stored name (str): user-defined name of the report used in the HTML overview automatically generated by the platform number_of_processes (int): how many processes should be created at once to speed up the analysis. For personal machines, 4 or 8 is usually a good choice.

immuneML.reports.data_reports.GLIPH2Exporter module¶

class immuneML.reports.data_reports.GLIPH2Exporter.GLIPH2Exporter(dataset: ReceptorDataset = None, result_path: Path = None, name: str = None, condition: str = None, number_of_processes: int = 1)[source]¶

Bases: DataReport

Report which exports the receptor data to GLIPH2 format so that it can be directly used in GLIPH2 tool. Currently, the report accepts only receptor datasets.

GLIPH2 publication: Huang H, Wang C, Rubelt F, Scriba TJ, Davis MM. Analyzing the Mycobacterium tuberculosis immune response by T-cell receptor clustering with GLIPH2 and genome-wide antigen screening. Nature Biotechnology. Published online April 27, 2020:1-9. doi:10.1038/s41587-020-0505-4

Specification arguments:

condition (str): name of the parameter present in the receptor metadata in the dataset; condition can be anything which can be processed in GLIPH2, such as tissue type or treatment.

YAML specification:

definitions:
    reports:
        my_gliph2_exporter:
            GLIPH2Exporter:
                condition: epitope # for instance, epitope parameter is present in receptors' metadata with values such as "MtbLys" for Mycobacterium tuberculosis (as shown in the original paper).

classmethod build_object(**kwargs)[source]¶

Creates the object of the subclass of the Report class from the parameters so that it can be used in the analysis. Depending on the type of the report, the parameters provided here will be provided in parsing time, while the other necessary parameters (e.g., subset of the data from which the report should be created) will be provided at runtime. For more details, see specific direct subclasses of this class, describing different types of reports.

Parameters:: **kwargs – keyword arguments that will be provided by users in the specification (if immuneML is used as a command line tool) or in the dictionary when calling the method from the code, and which should be used to create the report object
Returns:: the object of the appropriate report class

check_prerequisites()[source]¶

Checks prerequisites for the generation of the report of specific class (e.g., if the class of the MLMethod instance is the one required by the report, if the data has been encoded to make a report of encoded dataset). In the instructions in immuneML, this function is used to determine whether to call generate_report() in the specific situation. Each report subclass has its own set of prerequisites. If the report cannot be run, the information on this will be logged and the report skipped in the specific situation. No error will be raised. See subclasses of the class Instruction for more information on how the reports are executed.

Returns:: boolean value True if the prerequisites are o.k., and False otherwise.

immuneML.reports.data_reports.LabelOverlap module¶

class immuneML.reports.data_reports.LabelOverlap.LabelOverlap(dataset: Dataset = None, result_path: Path = None, name: str = None, number_of_processes: int = 1, column_label: str = None, row_label: str = None)[source]¶

Bases: DataReport

This report creates a heatmap where the columns are the values of one label and rows are the values of another label, and the cells contain the number of samples that have both label values. It works for any dataset type.

Specification arguments:

column_label (str): Name of the label to be used as columns in the heatmap.
row_label (str): Name of the label to be used as rows in the heatmap.

YAML specification:

my_data_report:
    LabelOverlap:
        column_label: epitope
        row_label: batch

classmethod build_object(**kwargs)[source]¶

Parameters:: **kwargs – keyword arguments that will be provided by users in the specification (if immuneML is used as a command line tool) or in the dictionary when calling the method from the code, and which should be used to create the report object
Returns:: the object of the appropriate report class

check_prerequisites()[source]¶

Returns:: boolean value True if the prerequisites are o.k., and False otherwise.

immuneML.reports.data_reports.MotifGeneralizationAnalysis module¶

immuneML.reports.data_reports.ReceptorDatasetOverview module¶

class immuneML.reports.data_reports.ReceptorDatasetOverview.ReceptorDatasetOverview(batch_size: int, dataset: ReceptorDataset = None, result_path: Path = None, number_of_processes: int = 1, name: str = None)[source]¶

Bases: DataReport

This report plots the length distribution per chain for a receptor (paired-chain) dataset.

Specification arguments:

batch_size (int): how many receptors to load at once; 50 000 by default

YAML specification:

definitions:
    reports:
        my_receptor_overview_report: ReceptorDatasetOverview

classmethod build_object(**kwargs)[source]¶

Parameters:: **kwargs – keyword arguments that will be provided by users in the specification (if immuneML is used as a command line tool) or in the dictionary when calling the method from the code, and which should be used to create the report object
Returns:: the object of the appropriate report class

check_prerequisites()[source]¶

Returns:: boolean value True if the prerequisites are o.k., and False otherwise.

immuneML.reports.data_reports.RecoveredSignificantFeatures module¶

immuneML.reports.data_reports.RepertoireClonotypeSummary module¶

immuneML.reports.data_reports.SequenceCountDistribution module¶

immuneML.reports.data_reports.SequenceLengthDistribution module¶

class immuneML.reports.data_reports.SequenceLengthDistribution.SequenceLengthDistribution(dataset: Dataset = None, batch_size: int = 1, result_path: Path = None, number_of_processes: int = 1, region_type: RegionType = RegionType.IMGT_CDR3, sequence_type: SequenceType = SequenceType.AMINO_ACID, name: str = None, label: str = None, split_by_label: bool = False, plot_frequencies: bool = False)[source]¶

Bases: DataReport

Generates a histogram of the lengths of the sequences in a dataset.

Specification arguments:

sequence_type (str): whether to check the length of amino acid or nucleotide sequences; default value is ‘amino_acid’
region_type (str): which part of the sequence to examine; e.g., IMGT_CDR3
split_by_label (bool): Whether to split the plots by a label. If set to true, the Dataset must either contain a single label, or alternatively the label of interest can be specified under ‘label’. By default, split_by_label is False.
label (str): if split_by_label is set to True, a label can be specified here.
plot_frequencies (bool): if set to True, the plot will show the frequencies of the sequence lengths instead of the counts. By default, plot_frequencies is False.

YAML specification:

definitions:
    reports:
        my_sld_report:
            SequenceLengthDistribution:
                sequence_type: amino_acid
                region_type: IMGT_CDR3
                label: label_1
                split_by_label: True
                plot_frequencies: True

classmethod build_object(**kwargs)[source]¶

Parameters:: **kwargs – keyword arguments that will be provided by users in the specification (if immuneML is used as a command line tool) or in the dictionary when calling the method from the code, and which should be used to create the report object
Returns:: the object of the appropriate report class

check_prerequisites()[source]¶

Returns:: boolean value True if the prerequisites are o.k., and False otherwise.

immuneML.reports.data_reports.SequencesWithSignificantKmers module¶

immuneML.reports.data_reports.SignificantFeatures module¶

immuneML.reports.data_reports.SignificantKmerPositions module¶

immuneML.reports.data_reports.SimpleDatasetOverview module¶

class immuneML.reports.data_reports.SimpleDatasetOverview.SimpleDatasetOverview(dataset: Dataset = None, result_path: Path = None, number_of_processes: int = 1, name: str = None)[source]¶

Bases: DataReport

Generates a simple text-based overview of the properties of any dataset, including the dataset name, size, and metadata labels.

YAML specification:

definitions:
    reports:
        my_overview: SimpleDatasetOverview

UNKNOWN_CHAIN = 'unknown'¶

classmethod build_object(**kwargs)[source]¶

Parameters:: **kwargs – keyword arguments that will be provided by users in the specification (if immuneML is used as a command line tool) or in the dictionary when calling the method from the code, and which should be used to create the report object
Returns:: the object of the appropriate report class

immuneML.reports.data_reports package¶

Submodules¶

immuneML.reports.data_reports.AminoAcidFrequencyDistribution module¶

immuneML.reports.data_reports.CompAIRRClusteringReport module¶

immuneML.reports.data_reports.DataReport module¶

immuneML.reports.data_reports.GLIPH2Exporter module¶

immuneML.reports.data_reports.LabelOverlap module¶

immuneML.reports.data_reports.MotifGeneralizationAnalysis module¶

immuneML.reports.data_reports.ReceptorDatasetOverview module¶

immuneML.reports.data_reports.RecoveredSignificantFeatures module¶

immuneML.reports.data_reports.RepertoireClonotypeSummary module¶

immuneML.reports.data_reports.SequenceCountDistribution module¶

immuneML.reports.data_reports.SequenceLengthDistribution module¶

immuneML.reports.data_reports.SequencesWithSignificantKmers module¶

immuneML.reports.data_reports.SignificantFeatures module¶

immuneML.reports.data_reports.SignificantKmerPositions module¶

immuneML.reports.data_reports.SimpleDatasetOverview module¶

immuneML.reports.data_reports.VJGeneDistribution module¶

Module contents¶