immuneML Galaxy tools

If you are unfamiliar with Galaxy, we recommend to first read Introduction to Galaxy.

Overview of Galaxy tool functionalities

Each immuneML Galaxy tool provides an interface to run a specific immuneML instruction or workflow. To quickly test different immuneML functionalities, the tools provide button-based interfaces with limited options. Alternatively, a YAML file may be used as input, which is identical to the YAML file used on the command line interface.

Galaxy tool

immuneML instruction

Interface type

Create Dataset with Reports

Creates dataset and runs optional reports with ExploratoryAnalysis instruction

Button or YAML-based

Simulate a Random Dummy Dataset

Creates dataset with random dataset import

Button or YAML-based

Simulate Immune Events with LIgO

Modifies dataset with LigoSim instruction

Button or YAML-based

Train ML Classifiers

YAML-based interface for training a classifier with TrainMLModel instruction

YAML-based

Train Receptor Classifier (Simplified Interface)

Simplified interface for training a classifier with TrainMLModel instruction with sequence/receptor dataset

Button-based

Train Repertoire Classifier (Simplified Interface)

Simplified interface for training a classifier with TrainMLModel instruction with repertoire dataset

Button-based

Apply ML Classifier

Applies an ML classifier with MLApplication instruction

Button-based

Train Generative Model

Trains a generative model with TrainGenModel instruction

Button or YAML-based

Apply Generative Model

Creates dataset with ApplyGenModel by applying a trained generative model

Button-based

Clustering

Clusters a dataset with Clustering instruction

Button or YAML-based

Run immuneML with any YAML specification

Runs any instruction (recommended for e.g., advanced ExploratoryAnalysis, or instructions not covered by other tools)

YAML-based

immuneML datasets in Galaxy

In Galaxy, an immuneML dataset is a special type of history element, which internally contains an immuneML dataset stored in AIRR format. Datasets can be imported from files using the Create Dataset with Reports tool. Some other tools also produce (synthetic) immuneML datasets.

Tips for importing data:

  • If your dataset contains many files, you may want to consider using a Galaxy collection as input using a Galaxy collection as input.

  • For quick testing of Galaxy, a dataset of random sequences can quickly be generated using the Simulate a Random Dummy Dataset tool.

  • See How to import data into immuneML for general information about datasets in immuneML.

When running a YAML-based tool, the tool will ask you to select a dataset from the Galaxy history, and the YAML should contain the following snippet to ensure the selected dataset is imported:

definitions:
  datasets:
    dataset:
      format: AIRR
      params:
        path: dataset.yaml

Galaxy tool input and output

Galaxy tools produce their output as history elements which can be viewed, downloaded, or used as input for subsequent tools. immuneML tools will output the following history elements:

  • A summary HTML file showing the results (or error in the case of a failed run). For tools generating datasets, the dataset element also serves as the HTML summary.

  • An archive containing the zipped folder with all internally generated results (identical to the results you get when running immuneML on the command line).

  • Each button-based tool will also return the YAML file that was generated based on the user options to run immuneML.

  • Classifiers or generative models generated by the respective tools (these may be used as input for subsequent tools).

bug report