immuneML.reports.data_reports package

Submodules

immuneML.reports.data_reports.AminoAcidFrequencyDistribution module

class immuneML.reports.data_reports.AminoAcidFrequencyDistribution.AminoAcidFrequencyDistribution(dataset: SequenceDataset = None, imgt_positions: bool = None, relative_frequency: bool = None, split_by_label: bool = None, label: str = None, result_path: Path = None, number_of_processes: int = 1, name: str = None)[source]

Bases: DataReport

Generates a barplot showing the relative frequency of each amino acid at each position in the sequences of a dataset.

Parameters:
  • imgt_positions (bool) – Whether to use IMGT positional numbering or sequence index numbering. When imgt_positions is True, IMGT positions are used, meaning sequences of unequal length are aligned according to their IMGT positions. By default imgt_positions is True.

  • relative_frequency (bool) – Whether to plot relative frequencies (true) or absolute counts (false) of the positional amino acids. By default, relative_frequency is True.

  • split_by_label (bool) – Whether to split the plots by a label. If set to true, the Dataset must either contain a single label, or alternatively the label of interest can be specified under ‘label’. By default, split_by_label is False.

  • label (str) – if split_by_label is set to True, a label can be specified here.

YAML specification:

my_aa_freq_report:
  AminoAcidFrequencyDistribution:
    relative_frequency: False
    split_by_label: True
    label: CMV
classmethod build_object(**kwargs)[source]

Creates the object of the subclass of the Report class from the parameters so that it can be used in the analysis. Depending on the type of the report, the parameters provided here will be provided in parsing time, while the other necessary parameters (e.g., subset of the data from which the report should be created) will be provided at runtime. For more details, see specific direct subclasses of this class, describing different types of reports.

Parameters:

**kwargs – keyword arguments that will be provided by users in the specification (if immuneML is used as a command line tool) or in the dictionary when calling the method from the code, and which should be used to create the report object

Returns:

the object of the appropriate report class

check_prerequisites()[source]

Checks prerequisites for the generation of the report of specific class (e.g., if the class of the MLMethod instance is the one required by the report, if the data has been encoded to make a report of encoded dataset). In the instructions in immuneML, this function is used to determine whether to call generate_report() in the specific situation. Each report subclass has its own set of prerequisites. If the report cannot be run, the information on this will be logged and the report skipped in the specific situation. No error will be raised. See subclasses of the class Instruction for more information on how the reports are executed.

Returns:

boolean value True if the prerequisites are o.k., and False otherwise.

immuneML.reports.data_reports.CytoscapeNetworkExporter module

class immuneML.reports.data_reports.CytoscapeNetworkExporter.CytoscapeNetworkExporter(dataset: Dataset = None, result_path: Path = None, chains=('alpha', 'beta'), drop_duplicates=True, additional_node_attributes=[], additional_edge_attributes=[], number_of_processes: int = 1, name: str = None)[source]

Bases: DataReport

This report exports the Receptor sequences to .sif format, such that they can directly be imported as a network in Cytoscape, to visualize chain sharing between the different receptors in a dataset (for example, for TCRs: how often one alpha chain is shared with multiple beta chains, and vice versa).

The Receptor sequences can be provided as a ReceptorDataset, or a RepertoireDataset (containing paired sequence information). In the latter case, one .sif file is exported per Repertoire.

YAML specification:

my_cyto_export: CytoscapeNetworkExporter
classmethod build_object(**kwargs)[source]

Creates the object of the subclass of the Report class from the parameters so that it can be used in the analysis. Depending on the type of the report, the parameters provided here will be provided in parsing time, while the other necessary parameters (e.g., subset of the data from which the report should be created) will be provided at runtime. For more details, see specific direct subclasses of this class, describing different types of reports.

Parameters:

**kwargs – keyword arguments that will be provided by users in the specification (if immuneML is used as a command line tool) or in the dictionary when calling the method from the code, and which should be used to create the report object

Returns:

the object of the appropriate report class

check_prerequisites()[source]

Checks prerequisites for the generation of the report of specific class (e.g., if the class of the MLMethod instance is the one required by the report, if the data has been encoded to make a report of encoded dataset). In the instructions in immuneML, this function is used to determine whether to call generate_report() in the specific situation. Each report subclass has its own set of prerequisites. If the report cannot be run, the information on this will be logged and the report skipped in the specific situation. No error will be raised. See subclasses of the class Instruction for more information on how the reports are executed.

Returns:

boolean value True if the prerequisites are o.k., and False otherwise.

export_receptorlist(receptors, result_path: Path)[source]
get_formatted_edge_metadata(seq1, seq2)[source]
get_formatted_node_metadata(seq: ReceptorSequence)[source]
get_shared_name(seq: ReceptorSequence)[source]

Returns a string containing a representation of the given receptor chain, with the chain, sequence, v and j genes. For example: *a*s=AMREGPEHSGYALN*v=V7-3*j=J41

immuneML.reports.data_reports.DataReport module

class immuneML.reports.data_reports.DataReport.DataReport(dataset: Dataset = None, result_path: Path = None, name: str = None, number_of_processes: int = 1)[source]

Bases: Report

Data reports show some type of features or statistics about a given dataset.

When running the TrainMLModel instruction, data reports can be specified inside the ‘selection’ or ‘assessment’ specification under the keys ‘reports/data’ (current cross-validation split) or ‘reports/data_splits’ (train/test sub-splits). Example:

my_instruction:
    type: TrainMLModel
    selection:
        reports:
            data:
                - my_data_report
        # other parameters...
    assessment:
        reports:
            data:
                - my_data_report
        # other parameters...
    # other parameters...

Alternatively, when running the ExploratoryAnalysis instruction, data reports can be specified under ‘report’. Example:

my_instruction:
    type: ExploratoryAnalysis
    analyses:
        my_first_analysis:
            report: my_data_report
            # other parameters...
    # other parameters...
__init__(dataset: Dataset = None, result_path: Path = None, name: str = None, number_of_processes: int = 1)[source]

The arguments defined below are set at runtime by the instruction. Concrete classes inheriting DataReport may include additional parameters that will be set by the user in the form of input arguments.

dataset (Dataset): a dataset object (can be repertoire, receptor or sequence dataset, depending on the specific report) result_path (Path): location where the results (plots, tables, etc.) will be stored name (str): user-defined name of the report used in the HTML overview automatically generated by the platform number_of_processes (int): how many processes should be created at once to speed up the analysis. For personal machines, 4 or 8 is usually a good choice.

static get_title()[source]

immuneML.reports.data_reports.GLIPH2Exporter module

class immuneML.reports.data_reports.GLIPH2Exporter.GLIPH2Exporter(dataset: ReceptorDataset = None, result_path: Path = None, name: str = None, condition: str = None, number_of_processes: int = 1)[source]

Bases: DataReport

Report which exports the receptor data to GLIPH2 format so that it can be directly used in GLIPH2 tool. Currently, the report accepts only receptor datasets.

GLIPH2 publication: Huang H, Wang C, Rubelt F, Scriba TJ, Davis MM. Analyzing the Mycobacterium tuberculosis immune response by T-cell receptor clustering with GLIPH2 and genome-wide antigen screening. Nature Biotechnology. Published online April 27, 2020:1-9. doi:10.1038/s41587-020-0505-4

Parameters:
  • condition (str) – name of the parameter present in the receptor metadata in the dataset; condition can be anything which can be processed in

  • GLIPH2

  • treatment. (such as tissue type or) –

YAML specification:

my_gliph2_exporter: # user-defined name
    GLIPH2Exporter:
        condition: epitope # for instance, epitope parameter is present in receptors' metadata with values such as "MtbLys" for Mycobacterium tuberculosis (as shown in the original paper).
classmethod build_object(**kwargs)[source]

Creates the object of the subclass of the Report class from the parameters so that it can be used in the analysis. Depending on the type of the report, the parameters provided here will be provided in parsing time, while the other necessary parameters (e.g., subset of the data from which the report should be created) will be provided at runtime. For more details, see specific direct subclasses of this class, describing different types of reports.

Parameters:

**kwargs – keyword arguments that will be provided by users in the specification (if immuneML is used as a command line tool) or in the dictionary when calling the method from the code, and which should be used to create the report object

Returns:

the object of the appropriate report class

check_prerequisites()[source]

Checks prerequisites for the generation of the report of specific class (e.g., if the class of the MLMethod instance is the one required by the report, if the data has been encoded to make a report of encoded dataset). In the instructions in immuneML, this function is used to determine whether to call generate_report() in the specific situation. Each report subclass has its own set of prerequisites. If the report cannot be run, the information on this will be logged and the report skipped in the specific situation. No error will be raised. See subclasses of the class Instruction for more information on how the reports are executed.

Returns:

boolean value True if the prerequisites are o.k., and False otherwise.

immuneML.reports.data_reports.ReceptorDatasetOverview module

class immuneML.reports.data_reports.ReceptorDatasetOverview.ReceptorDatasetOverview(batch_size: int, dataset: ReceptorDataset = None, result_path: Path = None, number_of_processes: int = 1, name: str = None)[source]

Bases: DataReport

This report plots the length distribution per chain for a receptor (paired-chain) dataset.

Parameters:

batch_size (int) – how many receptors to load at once; 50 000 by default

YAML specification:

reports:
    my_receptor_overview_report: ReceptorDatasetOverview
classmethod build_object(**kwargs)[source]

Creates the object of the subclass of the Report class from the parameters so that it can be used in the analysis. Depending on the type of the report, the parameters provided here will be provided in parsing time, while the other necessary parameters (e.g., subset of the data from which the report should be created) will be provided at runtime. For more details, see specific direct subclasses of this class, describing different types of reports.

Parameters:

**kwargs – keyword arguments that will be provided by users in the specification (if immuneML is used as a command line tool) or in the dictionary when calling the method from the code, and which should be used to create the report object

Returns:

the object of the appropriate report class

check_prerequisites()[source]

Checks prerequisites for the generation of the report of specific class (e.g., if the class of the MLMethod instance is the one required by the report, if the data has been encoded to make a report of encoded dataset). In the instructions in immuneML, this function is used to determine whether to call generate_report() in the specific situation. Each report subclass has its own set of prerequisites. If the report cannot be run, the information on this will be logged and the report skipped in the specific situation. No error will be raised. See subclasses of the class Instruction for more information on how the reports are executed.

Returns:

boolean value True if the prerequisites are o.k., and False otherwise.

immuneML.reports.data_reports.RecoveredSignificantFeatures module

class immuneML.reports.data_reports.RecoveredSignificantFeatures.RecoveredSignificantFeatures(dataset: RepertoireDataset = None, groundtruth_sequences_path: Path = None, trim_leading_trailing: bool = None, p_values: List[float] = None, k_values: List[int] = None, label: dict = None, compairr_path: Path = None, result_path: Path = None, name: str = None, number_of_processes: int = 1)[source]

Bases: DataReport

Compares a given collection of groundtruth implanted signals (sequences or k-mers) to the significant label-associated k-mers or sequences according to Fisher’s exact test.

Internally uses the KmerAbundanceEncoder for calculating significant k-mers, and SequenceAbundanceEncoder or CompAIRRSequenceAbundanceEncoder to calculate significant full sequences (depending on whether the argument compairr_path was set).

This report creates two plots:

  • the first plot is a bar chart showing what percentage of the groundtruth implanted signals were found to be significant.

  • the second plot is a bar chart showing what percentage of the k-mers/sequences found to be significant match the

groundtruth implanted signals.

To compare k-mers or sequences of differing lengths, the groundtruth sequences or long k-mers are split into k-mers of the given size through a sliding window approach. When comparing ‘full_sequences’ to groundtruth sequences, a match is only registered if both sequences are of equal length.

Parameters:
  • groundtruth_sequences_path (str) – Path to a file containing the true implanted (sub)sequences, e.g., full sequences or k-mers.

  • line (The file should contain one sequence per) –

  • header (without a) –

  • genes. (and without V or J) –

  • trim_leading_trailing (bool) – Whether to trim the leading and trailing first positions from the provided groundtruth sequences,

  • e.g.

  • acids. (the leading C and trailing Y/F amino) –

  • trim (This is necessary for comparing full sequences when the main dataset is imported using settings that also) –

  • positions (the leading and trailing) –

  • p_values (list) – The p value thresholds to be used by Fisher’s exact test. Each p-value specified here will become one panel in the output figure.

  • k_values (list) – Length of the k-mers (number of amino acids) created by the KmerAbundanceEncoder.

:param When using a full sequence encoding (SequenceAbundanceEncoder or: :param CompAIRRSequenceAbundanceEncoder): :param specify ‘full_sequence’ here.: :param Each value specified under k_values will represent one bar in the output figure.: :param label: A label configuration. One label should be specified, and the positive_class for this label should be defined. See the YAML specification below for an example. :type label: dict :param compairr_path: If ‘full_sequence’ is listed under k_values, the path to the CompAIRR executable may be provided. :type compairr_path: str :param If the compairr_path is specified: :param the CompAIRRSequenceAbundanceEncoder: :param will be used to compute the significant sequences. If the path is not specified and ‘full_sequence’ is listed under: :param k-values: :param SequenceAbundanceEncoder will be used.:

YAML specification:

my_recovered_significant_features_report:
    RecoveredSignificantFeatures:
        groundtruth_sequences_path: path/to/groundtruth/sequences.txt
        trim_leading_trailing: False
        p_values:
            - 0.1
            - 0.01
            - 0.001
            - 0.0001
        k_values:
            - 3
            - 4
            - 5
            - full_sequence
        compairr_path: path/to/compairr # can be specified if 'full_sequence' is listed under k_values
        label: # Define a label, and the positive class for that given label
            CMV:
                positive_class: +
classmethod build_object(**kwargs)[source]

Creates the object of the subclass of the Report class from the parameters so that it can be used in the analysis. Depending on the type of the report, the parameters provided here will be provided in parsing time, while the other necessary parameters (e.g., subset of the data from which the report should be created) will be provided at runtime. For more details, see specific direct subclasses of this class, describing different types of reports.

Parameters:

**kwargs – keyword arguments that will be provided by users in the specification (if immuneML is used as a command line tool) or in the dictionary when calling the method from the code, and which should be used to create the report object

Returns:

the object of the appropriate report class

check_prerequisites()[source]

Checks prerequisites for the generation of the report of specific class (e.g., if the class of the MLMethod instance is the one required by the report, if the data has been encoded to make a report of encoded dataset). In the instructions in immuneML, this function is used to determine whether to call generate_report() in the specific situation. Each report subclass has its own set of prerequisites. If the report cannot be run, the information on this will be logged and the report skipped in the specific situation. No error will be raised. See subclasses of the class Instruction for more information on how the reports are executed.

Returns:

boolean value True if the prerequisites are o.k., and False otherwise.

immuneML.reports.data_reports.RepertoireClonotypeSummary module

class immuneML.reports.data_reports.RepertoireClonotypeSummary.RepertoireClonotypeSummary(dataset: Dataset = None, result_path: Path = None, name: str = None, number_of_processes: int = 1, color_by_label: str = None)[source]

Bases: DataReport

Shows the number of distinct clonotypes per repertoire in a given dataset as a bar plot.

Parameters:

color_by_label (str) – name of the label to use to color the plot, e.g., could be disease label, or None

YAML specification:

my_clonotype_summary_rep:
  RepertoireClonotypeSummary:
    color_by_label: celiac
classmethod build_object(**kwargs)[source]

Creates the object of the subclass of the Report class from the parameters so that it can be used in the analysis. Depending on the type of the report, the parameters provided here will be provided in parsing time, while the other necessary parameters (e.g., subset of the data from which the report should be created) will be provided at runtime. For more details, see specific direct subclasses of this class, describing different types of reports.

Parameters:

**kwargs – keyword arguments that will be provided by users in the specification (if immuneML is used as a command line tool) or in the dictionary when calling the method from the code, and which should be used to create the report object

Returns:

the object of the appropriate report class

immuneML.reports.data_reports.SequenceLengthDistribution module

class immuneML.reports.data_reports.SequenceLengthDistribution.SequenceLengthDistribution(dataset: RepertoireDataset | SequenceDataset = None, batch_size: int = 1, result_path: Path = None, number_of_processes: int = 1, sequence_type: SequenceType = SequenceType.AMINO_ACID, name: str = None)[source]

Bases: DataReport

Generates a histogram of the lengths of the sequences in a repertoire or sequence dataset.

Parameters:

sequence_type (str) – whether to check the length of amino acid or nucletoide sequences; default value is ‘amino_acid’

YAML specification:

my_sld_report:
    SequenceLengthDistribution:
        sequence_type: amino_acid
classmethod build_object(**kwargs)[source]

Creates the object of the subclass of the Report class from the parameters so that it can be used in the analysis. Depending on the type of the report, the parameters provided here will be provided in parsing time, while the other necessary parameters (e.g., subset of the data from which the report should be created) will be provided at runtime. For more details, see specific direct subclasses of this class, describing different types of reports.

Parameters:

**kwargs – keyword arguments that will be provided by users in the specification (if immuneML is used as a command line tool) or in the dictionary when calling the method from the code, and which should be used to create the report object

Returns:

the object of the appropriate report class

check_prerequisites()[source]

Checks prerequisites for the generation of the report of specific class (e.g., if the class of the MLMethod instance is the one required by the report, if the data has been encoded to make a report of encoded dataset). In the instructions in immuneML, this function is used to determine whether to call generate_report() in the specific situation. Each report subclass has its own set of prerequisites. If the report cannot be run, the information on this will be logged and the report skipped in the specific situation. No error will be raised. See subclasses of the class Instruction for more information on how the reports are executed.

Returns:

boolean value True if the prerequisites are o.k., and False otherwise.

immuneML.reports.data_reports.SequencesWithSignificantKmers module

class immuneML.reports.data_reports.SequencesWithSignificantKmers.SequencesWithSignificantKmers(dataset: RepertoireDataset = None, reference_sequences_path: Path = None, p_values: List[float] = None, k_values: List[int] = None, label: dict = None, result_path: Path = None, name: str = None, number_of_processes: int = 1)[source]

Bases: DataReport

Given a list of reference sequences, this report writes out the subsets of reference sequences containing significant k-mers (as computed by the KmerAbundanceEncoder using Fisher’s exact test).

For each combination of p-value and k-mer size given, a file is written containing all sequences containing a significant k-mer of the given size at the given p-value.

Parameters:
  • reference_sequences_path (str) – Path to a file containing the reference sequences,

  • line (The file should contain one sequence per) –

  • header (without a) –

  • genes. (and without V or J) –

  • p_values (list) – The p value thresholds to be used by Fisher’s exact test. Each p-value specified here will become one panel in the output figure.

  • k_values (list) – Length of the k-mers (number of amino acids) created by the KmerAbundanceEncoder.

  • figure. (Each k-mer length will become one panel in the output) –

  • label (dict) – A label configuration. One label should be specified, and the positive_class for this label should be defined. See the YAML specification below for an example.

YAML specification:

my_sequences_with_significant_kmers:
    SequencesWithSignificantKmers:
        reference_sequences_path: path/to/reference/sequences.txt
        p_values:
            - 0.1
            - 0.01
            - 0.001
            - 0.0001
        k_values:
            - 3
            - 4
            - 5
        label: # Define a label, and the positive class for that given label
            CMV:
                positive_class: +
classmethod build_object(**kwargs)[source]

Creates the object of the subclass of the Report class from the parameters so that it can be used in the analysis. Depending on the type of the report, the parameters provided here will be provided in parsing time, while the other necessary parameters (e.g., subset of the data from which the report should be created) will be provided at runtime. For more details, see specific direct subclasses of this class, describing different types of reports.

Parameters:

**kwargs – keyword arguments that will be provided by users in the specification (if immuneML is used as a command line tool) or in the dictionary when calling the method from the code, and which should be used to create the report object

Returns:

the object of the appropriate report class

check_prerequisites()[source]

Checks prerequisites for the generation of the report of specific class (e.g., if the class of the MLMethod instance is the one required by the report, if the data has been encoded to make a report of encoded dataset). In the instructions in immuneML, this function is used to determine whether to call generate_report() in the specific situation. Each report subclass has its own set of prerequisites. If the report cannot be run, the information on this will be logged and the report skipped in the specific situation. No error will be raised. See subclasses of the class Instruction for more information on how the reports are executed.

Returns:

boolean value True if the prerequisites are o.k., and False otherwise.

immuneML.reports.data_reports.SignificantFeatures module

class immuneML.reports.data_reports.SignificantFeatures.SignificantFeatures(dataset: RepertoireDataset = None, p_values: List[float] = None, k_values: List[int] = None, label: dict = None, compairr_path: Path = None, log_scale: bool = False, result_path: Path = None, name: str = None, number_of_processes: int = 1)[source]

Bases: DataReport

Plots a boxplot of the number of significant features (label-associated k-mers or sequences) per Repertoire according to Fisher’s exact test, across different classes for the given label.

Internally uses the KmerAbundanceEncoder for calculating significant k-mers, and SequenceAbundanceEncoder or CompAIRRSequenceAbundanceEncoder to calculate significant full sequences (depending on whether the argument compairr_path was set).

Parameters:
  • p_values (list) – The p value thresholds to be used by Fisher’s exact test. Each p-value specified here will become one panel in the output figure.

  • k_values (list) – Length of the k-mers (number of amino acids) created by the KmerAbundanceEncoder.

:param When using a full sequence encoding (SequenceAbundanceEncoder or: :param CompAIRRSequenceAbundanceEncoder): :param specify ‘full_sequence’ here.: :param Each value specified under k_values will represent one boxplot in the output figure.: :param label: A label configuration. One label should be specified, and the positive_class for this label should be defined. See the YAML specification below for an example. :type label: dict :param compairr_path: If ‘full_sequence’ is listed under k_values, the path to the CompAIRR executable may be provided. :type compairr_path: str :param If the compairr_path is specified: :param the CompAIRRSequenceAbundanceEncoder: :param will be used to compute the significant sequences. If the path is not specified and ‘full_sequence’ is listed under: :param k-values: :param SequenceAbundanceEncoder will be used.: :param log_scale: Whether to plot the y axis in log10 scale (log_scale = True) or continuous scale (log_scale = False). By default, log_scale is False. :type log_scale: bool

YAML specification:

my_significant_features_report:
    SignificantFeatures:
        p_values:
            - 0.1
            - 0.01
            - 0.001
            - 0.0001
        k_values:
            - 3
            - 4
            - 5
            - full_sequence
        compairr_path: path/to/compairr # can be specified if 'full_sequence' is listed under k_values
        label: # Define a label, and the positive class for that given label
            CMV:
                positive_class: +
        log_scale: False
classmethod build_object(**kwargs)[source]

Creates the object of the subclass of the Report class from the parameters so that it can be used in the analysis. Depending on the type of the report, the parameters provided here will be provided in parsing time, while the other necessary parameters (e.g., subset of the data from which the report should be created) will be provided at runtime. For more details, see specific direct subclasses of this class, describing different types of reports.

Parameters:

**kwargs – keyword arguments that will be provided by users in the specification (if immuneML is used as a command line tool) or in the dictionary when calling the method from the code, and which should be used to create the report object

Returns:

the object of the appropriate report class

check_prerequisites()[source]

Checks prerequisites for the generation of the report of specific class (e.g., if the class of the MLMethod instance is the one required by the report, if the data has been encoded to make a report of encoded dataset). In the instructions in immuneML, this function is used to determine whether to call generate_report() in the specific situation. Each report subclass has its own set of prerequisites. If the report cannot be run, the information on this will be logged and the report skipped in the specific situation. No error will be raised. See subclasses of the class Instruction for more information on how the reports are executed.

Returns:

boolean value True if the prerequisites are o.k., and False otherwise.

immuneML.reports.data_reports.SignificantKmerPositions module

class immuneML.reports.data_reports.SignificantKmerPositions.SignificantKmerPositions(dataset: RepertoireDataset = None, reference_sequences_path: Path = None, p_values: List[float] = None, k_values: List[int] = None, label: dict = None, compairr_path: Path = None, result_path: Path = None, name: str = None, number_of_processes: int = 1)[source]

Bases: DataReport

Plots the number of significant k-mers (as computed by the KmerAbundanceEncoder using Fisher’s exact test) observed at each IMGT position of a given list of reference sequences. This report creates a stacked bar chart, where each bar represents an IMGT position, and each segment of the stack represents the observed frequency of one ‘significant’ k-mer at that position.

Parameters:
  • reference_sequences_path (str) – Path to a file containing the reference sequences,

  • line (The file should contain one sequence per) –

  • header (without a) –

  • genes. (and without V or J) –

  • p_values (list) – The p value thresholds to be used by Fisher’s exact test. Each p-value specified here will become one panel in the output figure.

  • k_values (list) – Length of the k-mers (number of amino acids) created by the KmerAbundanceEncoder.

  • figure. (Each k-mer length will become one panel in the output) –

  • label (dict) – A label configuration. One label should be specified, and the positive_class for this label should be defined. See the YAML specification below for an example.

YAML specification:

my_significant_kmer_positions_report:
    SignificantKmerPositions:
        reference_sequences_path: path/to/reference/sequences.txt
        p_values:
            - 0.1
            - 0.01
            - 0.001
            - 0.0001
        k_values:
            - 3
            - 4
            - 5
        label: # Define a label, and the positive class for that given label
            CMV:
                positive_class: +
classmethod build_object(**kwargs)[source]

Creates the object of the subclass of the Report class from the parameters so that it can be used in the analysis. Depending on the type of the report, the parameters provided here will be provided in parsing time, while the other necessary parameters (e.g., subset of the data from which the report should be created) will be provided at runtime. For more details, see specific direct subclasses of this class, describing different types of reports.

Parameters:

**kwargs – keyword arguments that will be provided by users in the specification (if immuneML is used as a command line tool) or in the dictionary when calling the method from the code, and which should be used to create the report object

Returns:

the object of the appropriate report class

check_prerequisites()[source]

Checks prerequisites for the generation of the report of specific class (e.g., if the class of the MLMethod instance is the one required by the report, if the data has been encoded to make a report of encoded dataset). In the instructions in immuneML, this function is used to determine whether to call generate_report() in the specific situation. Each report subclass has its own set of prerequisites. If the report cannot be run, the information on this will be logged and the report skipped in the specific situation. No error will be raised. See subclasses of the class Instruction for more information on how the reports are executed.

Returns:

boolean value True if the prerequisites are o.k., and False otherwise.

immuneML.reports.data_reports.SimpleDatasetOverview module

class immuneML.reports.data_reports.SimpleDatasetOverview.SimpleDatasetOverview(dataset: Dataset = None, result_path: Path = None, number_of_processes: int = 1, name: str = None)[source]

Bases: DataReport

Generates a simple text-based overview of the properties of any dataset, including the dataset name, size, and metadata labels.

YAML specification:

reports:
    my_overview: SimpleDatasetOverview
UNKNOWN_CHAIN = 'unknown'
classmethod build_object(**kwargs)[source]

Creates the object of the subclass of the Report class from the parameters so that it can be used in the analysis. Depending on the type of the report, the parameters provided here will be provided in parsing time, while the other necessary parameters (e.g., subset of the data from which the report should be created) will be provided at runtime. For more details, see specific direct subclasses of this class, describing different types of reports.

Parameters:

**kwargs – keyword arguments that will be provided by users in the specification (if immuneML is used as a command line tool) or in the dictionary when calling the method from the code, and which should be used to create the report object

Returns:

the object of the appropriate report class

Module contents