How to run any AIRR ML analysis in Galaxy
To be able to run any possible YAML-based immuneML analysis in Galaxy, the tool Run immuneML with YAML specification should be used. It is typically recommended to use the analysis-specific Galaxy tools for creating datasets, simulating synthetic data, implanting synthetic immune signals or training ML models instead of this tool. These other tools are able to export the relevant output files to Galaxy history elements.
However, when you want to run the ExploratoryAnalysis instruction, or other analyses that do not have a corresponding Galaxy tool, this generic tool can be used.
An example Galaxy history showing how to use this tool can be found here.
Creating the YAML specification
This Galaxy tool takes as input an immuneML dataset from the Galaxy history, optional additional files, and a YAML specification file. To see the details on how to write the YAML specification, see How to specify an analysis with YAML.
When writing an analysis specification for Galaxy, it can be assumed that all files selected under ‘Additional files’ are present in the current working directory. A path to an additional file thus consists only of the filename.
The following YAML specification shows an example of how to run the ExploratoryAnalysis instruction inside Galaxy:
definitions: datasets: dataset: # user-defined dataset name format: ImmuneML # the default format used by the 'Create dataset' galaxy tool is Pickle params: path: dataset.iml_dataset # specify the dataset name, the default name used by # the 'Create dataset' galaxy tool is dataset.iml_dataset encodings: my_sequence_matches: MatchedSequences: reference: params: path: reference_sequences.tsv # this file must be selected from the galaxy history as an 'additional file' format: AIRR reports: my_seq_lengths: SequenceLengthDistribution # reports without parameters my_matches: Matches instructions: my_instruction: # user-defined instruction name type: ExploratoryAnalysis analyses: my_analysis_1: # user-defined analysis name dataset: dataset report: my_seq_lengths my_analysis_2: dataset: dataset encoding: my_sequence_matches report: my_matches
All files referenced in the YAML can be found in the example Galaxy history.
This Galaxy tool will produce the following history elements:
Summary: immuneML analysis: a HTML page that allows you to browse through all results.
ImmuneML Analysis Archive: a .zip file containing the complete output folder as it was produced by immuneML. This folder contains the output of the instruction that was used, including all raw data files. Furthermore, the folder contains the complete YAML specification file for the immuneML run, the HTML output and a log file.