immuneML.reports.data_reports package

Submodules

immuneML.reports.data_reports.CytoscapeNetworkExporter module

class immuneML.reports.data_reports.CytoscapeNetworkExporter.CytoscapeNetworkExporter(dataset: Optional[immuneML.data_model.dataset.Dataset.Dataset] = None, result_path: Optional[pathlib.Path] = None, chains=('alpha', 'beta'), drop_duplicates=True, additional_node_attributes=[], additional_edge_attributes=[], number_of_processes: int = 1, name: Optional[str] = None)[source]

Bases: immuneML.reports.data_reports.DataReport.DataReport

This report exports the Receptor sequences to .sif format, such that they can directly be imported as a network in Cytoscape, to visualize chain sharing between the different receptors in a dataset (for example, for TCRs: how often one alpha chain is shared with multiple beta chains, and vice versa).

The Receptor sequences can be provided as a ReceptorDataset, or a RepertoireDataset (containing paired sequence information). In the latter case, one .sif file is exported per Repertoire.

YAML specification:

my_cyto_export: CytoscapeNetworkExporter

classmethod build_object(**kwargs)[source]

check_prerequisites()[source]

Checks prerequisites for the generation of the report of specific class (e.g., if the class of the MLMethod instance is the one required by the report, if the data has been encoded to make a report of encoded dataset). In the instructions in immuneML, this function is used to determine whether to call generate_report() in the specific situation. Each report subclass has its own set of prerequisites. If the report cannot be run, the information on this will be logged and the report skipped in the specific situation. No error will be raised. See subclasses of the class Instruction for more information on how the reports are executed.

Returns: boolean value True if the prerequisites are o.k., and False otherwise.

export_receptorlist(receptors, result_path: pathlib.Path)[source]

get_formatted_edge_metadata(seq1, seq2)[source]

get_formatted_node_metadata(seq: immuneML.data_model.receptor.receptor_sequence.ReceptorSequence.ReceptorSequence)[source]

get_shared_name(seq: immuneML.data_model.receptor.receptor_sequence.ReceptorSequence.ReceptorSequence)[source]: Returns a string containing a representation of the given receptor chain, with the chain, sequence, v and j genes. For example: *a*s=AMREGPEHSGYALN*v=V7-3*j=J41

immuneML.reports.data_reports.DataReport module

class immuneML.reports.data_reports.DataReport.DataReport(dataset: Optional[immuneML.data_model.dataset.Dataset.Dataset] = None, result_path: Optional[pathlib.Path] = None, name: Optional[str] = None, number_of_processes: int = 1)[source]

Bases: immuneML.reports.Report.Report

Data reports show some type of features or statistics about a given dataset.

When running the TrainMLModel instruction, data reports can be specified under the key ‘data_reports’, to run the report on the whole dataset, or inside the ‘selection’ or ‘assessment’ specification under the keys ‘reports/data’ (current cross-validation split) or ‘reports/data_splits’ (train/test sub-splits).

Alternatively, when running the ExploratoryAnalysis instruction, data reports can be specified under ‘reports’.

When using the reports with instructions such as ExploratoryAnalysis or TrainMLModel, the arguments defined below are set at runtime by the instruction. Concrete classes inheriting DataReport may include additional parameters that will be set by the user in the form of input arguments.

Parameters

dataset (Dataset) – a dataset object (can be repertoire, receptor or sequence dataset, depending on the specific report)
result_path (Path) – location where the results (plots, tables, etc.) will be stored
name (str) – user-defined name of the report used in the HTML overview automatically generated by the platform
number_of_processes (int) – how many processes should be created at once to speed up the analysis. For personal machines, 4 or 8 is usually a good choice.

static get_title()[source]

immuneML.reports.data_reports.GLIPH2Exporter module

class immuneML.reports.data_reports.GLIPH2Exporter.GLIPH2Exporter(dataset: Optional[immuneML.data_model.dataset.ReceptorDataset.ReceptorDataset] = None, result_path: Optional[pathlib.Path] = None, name: Optional[str] = None, condition: Optional[str] = None, number_of_processes: int = 1)[source]

Bases: immuneML.reports.data_reports.DataReport.DataReport

Report which exports the receptor data to GLIPH2 format so that it can be directly used in GLIPH2 tool. Currently, the report accepts only receptor datasets.

GLIPH2 publication: Huang H, Wang C, Rubelt F, Scriba TJ, Davis MM. Analyzing the Mycobacterium tuberculosis immune response by T-cell receptor clustering with GLIPH2 and genome-wide antigen screening. Nature Biotechnology. Published online April 27, 2020:1-9. doi:10.1038/s41587-020-0505-4

Parameters

condition (str) – name of the parameter present in the receptor metadata in the dataset; condition can be anything which can be processed in
GLIPH2 –
treatment. (such as tissue type or) –

YAML specification:

my_gliph2_exporter: # user-defined name
    GLIPH2Exporter:
        condition: epitope # for instance, epitope parameter is present in receptors' metadata with values such as "MtbLys" for Mycobacterium tuberculosis (as shown in the original paper).

classmethod build_object(**kwargs)[source]

check_prerequisites()[source]

Checks prerequisites for the generation of the report of specific class (e.g., if the class of the MLMethod instance is the one required by the report, if the data has been encoded to make a report of encoded dataset). In the instructions in immuneML, this function is used to determine whether to call generate_report() in the specific situation. Each report subclass has its own set of prerequisites. If the report cannot be run, the information on this will be logged and the report skipped in the specific situation. No error will be raised. See subclasses of the class Instruction for more information on how the reports are executed.

Returns: boolean value True if the prerequisites are o.k., and False otherwise.

immuneML.reports.data_reports.ReceptorDatasetOverview module

class immuneML.reports.data_reports.ReceptorDatasetOverview.ReceptorDatasetOverview(batch_size: int, dataset: Optional[immuneML.data_model.dataset.ReceptorDataset.ReceptorDataset] = None, result_path: Optional[pathlib.Path] = None, number_of_processes: int = 1, name: Optional[str] = None)[source]

Bases: immuneML.reports.data_reports.DataReport.DataReport

This report plots the length distribution per chain for a receptor (paired-chain) dataset.

Parameters: batch_size (int) – how many receptors to load at once; 50 000 by default

YAML specification:

reports:
    my_receptor_overview_report: ReceptorDatasetOverview

classmethod build_object(**kwargs)[source]

check_prerequisites()[source]

Checks prerequisites for the generation of the report of specific class (e.g., if the class of the MLMethod instance is the one required by the report, if the data has been encoded to make a report of encoded dataset). In the instructions in immuneML, this function is used to determine whether to call generate_report() in the specific situation. Each report subclass has its own set of prerequisites. If the report cannot be run, the information on this will be logged and the report skipped in the specific situation. No error will be raised. See subclasses of the class Instruction for more information on how the reports are executed.

Returns: boolean value True if the prerequisites are o.k., and False otherwise.

immuneML.reports.data_reports.SequenceLengthDistribution module

class immuneML.reports.data_reports.SequenceLengthDistribution.SequenceLengthDistribution(dataset: Optional[immuneML.data_model.dataset.RepertoireDataset.RepertoireDataset] = None, batch_size: int = 1, result_path: Optional[pathlib.Path] = None, number_of_processes: int = 1, name: Optional[str] = None)[source]

Bases: immuneML.reports.data_reports.DataReport.DataReport

Generates a histogram of the lengths of the sequences in a RepertoireDataset.

YAML specification:

my_sld_report: SequenceLengthDistribution

classmethod build_object(**kwargs)[source]

check_prerequisites()[source]

Checks prerequisites for the generation of the report of specific class (e.g., if the class of the MLMethod instance is the one required by the report, if the data has been encoded to make a report of encoded dataset). In the instructions in immuneML, this function is used to determine whether to call generate_report() in the specific situation. Each report subclass has its own set of prerequisites. If the report cannot be run, the information on this will be logged and the report skipped in the specific situation. No error will be raised. See subclasses of the class Instruction for more information on how the reports are executed.

Returns: boolean value True if the prerequisites are o.k., and False otherwise.

immuneML.reports.data_reports.SimpleDatasetOverview module

class immuneML.reports.data_reports.SimpleDatasetOverview.SimpleDatasetOverview(dataset: Optional[immuneML.data_model.dataset.Dataset.Dataset] = None, result_path: Optional[pathlib.Path] = None, number_of_processes: int = 1, name: Optional[str] = None)[source]

Bases: immuneML.reports.data_reports.DataReport.DataReport

Generates a simple text-based overview of the properties of any dataset, including the dataset name, size, and metadata labels.

YAML specification:

reports:
    my_overview: SimpleDatasetOverview

UNKNOWN_CHAIN = 'unknown'

classmethod build_object(**kwargs)[source]

immuneML.reports.data_reports package

Submodules

immuneML.reports.data_reports.CytoscapeNetworkExporter module

immuneML.reports.data_reports.DataReport module

immuneML.reports.data_reports.GLIPH2Exporter module

immuneML.reports.data_reports.ReceptorDatasetOverview module

immuneML.reports.data_reports.SequenceLengthDistribution module

immuneML.reports.data_reports.SimpleDatasetOverview module

Module contents