LIgO simulation quickstart ======================= As LIgO is now a part of immuneML, it can be directly used for simulation. How to run LIgO --------------------------------- You can run LIgO in command line using the following command: .. code-block:: console ligo specs.yaml output_folder Where * **specs.yaml** — simulation parameters described by the user in a yaml file. Please see :doc:`specification` for more information about LIgO parameters. * **output_folder** — output folder name defined by the user (should not exist before the run). How to explore LIgO results --------------------------------- The output folder structure is the same for all LIgO runs. The output folder should include: - **index.html**: main output file which gives an overview of the simulation: link to the full specification, the used LIgO version, some general information on the dataset and the link to the dataset exported in the standard AIRR format - **full_specs.yaml** file: includes the specification and default parameters if any of the parameters where left unfilled - **inst1** folder: this folder name is the same as the name given to the instruction by the user, all results are located here; the simulated dataset is located under `inst1/exported_dataset/airr/` - **HTML_output** folder: presentation of figures and reports if specified How to use LIgO for receptor-level simulation ------------------------------------------------- Simulation of a TCR dataset containing two immune signals ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ In this quickstart tutorial, we will simulate a dataset of 300 productive TRB receptors — 100 TRBs containing signal 1, 100 TRBs containing signal 2, and 100 TRBs containing no immune signal (background receptors), see the illustration below. Signal 1 consists of a 2-mer {AS} and TRBV7, i.e., only TRBs containing both TRBV7 and 2-mer {AS} contain Signal 1. Signal 2 consists of two gapped k-mers {G.G} and {G..G}. Signal-specific TRBs will be generated using the rejection sampling strategy and the default OLGA model (humanTRB). .. image:: ../_static/images/quickstart_receptor-level.png LIgO reports the simulated TRBs as a triple of TRBV gene name, CDR3 AA sequence, and TRBJ gene name. If you also want to report the generation probabilities (pgen) of the simulated receptors according to the default OLGA humanTRB model, set the *export_p_gens* parameter to true. Please keep in mind that pgen evaluation may take time. Step 1: YAML specification ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ We need to define the YAML file describing the simulation parameters. - First, we define the immune signals 1 and 2 in the **definitions** section.. You can read more about the yaml file parameters in :doc:`specification`. - Second, we define the number of TRBs per each signal in the **simulations** section. You can read more about the yaml file parameters in :doc:`specification`. - Finally, we define technical parameters of the simulation in the **instructions** section. You can read more about the yaml file parameters in :doc:`specification`. Here is the complete YAML specification for the simulation: .. collapse:: receptor_ligo_quickstart.yaml .. code-block:: yaml definitions: motifs: motif1: seed: AS motif2: seed: G/G max_gap: 2 min_gap: 1 signals: signal1: v_call: TRBV7 motifs: [motif1] signal2: motifs: [motif2] simulations: sim1: is_repertoire: false paired: false sequence_type: amino_acid simulation_strategy: RejectionSampling remove_seqs_with_signals: true # remove signal-specific AIRs from the background sim_items: sim_item1: # group of AIRs with the same parameters generative_model: chain: beta default_model_name: humanTRB model_path: null type: OLGA number_of_examples: 100 signals: signal1: 1 sim_item2: generative_model: chain: beta default_model_name: humanTRB model_path: null type: OLGA number_of_examples: 100 signals: signal2: 1 sim_item3: generative_model: chain: beta default_model_name: humanTRB model_path: null type: OLGA number_of_examples: 100 signals: {} # no signal instructions: my_sim_inst: export_p_gens: false max_iterations: 100 number_of_processes: 4 sequence_batch_size: 1000 simulation: sim1 type: LigoSim Step 2: Running LIgO ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ After saving the yaml specification to a file (e.g., quickstart_receptor.yaml), you can proceed with the analysis by following these steps: #. Activate the virtual environment where you have installed LIgO, for example .. code-block:: console source ligo_env/bin/activate #. Navigate to the directory where the yaml specification (quickstart_receptor.yaml) was saved. #. Execute the following command: .. code-block:: console ligo quickstart_receptor.yaml quickstart_output_receptor All results will be located in quickstart_output_receptor. Note that the output folder (quickstart_output_receptor) should not exist prior to the run. Step 3: Understanding the output ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The simulated dataset is located under quickstart_output_receptor/inst1/exported_dataset/airr/batch1.tsv. In the output, each row represent one AIR. Some of the columns are shown in the table below: .. list-table:: Simulated receptors in AIRR format :header-rows: 1 * - v_call - j_call - junction_aa - signal1 - signal2 - signal1_position - signal2_position * - TRBV10-1*01 - TRBJ2-5*01 - CARPDRGGGYTF - 0 - 1 - m000000000000 - m000000100000 * - TRBV7-2*02 - TRBJ2-5*01 - CASSRGHFQETQYF - 1 - 0 - m01000000000000 - m00000000000000 * - TRBV7-8*01 - TRBJ2-3*01 - CASSSPGGVRIYSTDTQYF - 1 - 0 - m0100000000000000000 - m0000000000000000000 Next steps ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ You can find more information about yaml parameters in :doc:`specification`. Other tutorials for how to use LIgO can be found under :doc:`tutorials`. How to use LIgO for repertoire-level simulation ------------------------------------------------- Simulation of BCR repertoires labeled with two immune events ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ In this quickstart tutorial, we will generate a dataset of 20 BCR repertoires, with each repertoire containing 6 BCRs. Out of these, 10 repertoires will be labeled as immune event 1 and will consist of 30% BCRs with signal 1 and 30% BCRs with signal 2. The remaining 10 repertoires will be labeled as immune event 2 and will consist of 50% BCRs with signal 1 and 50% BCRs with signal 2. Signal 1 is composed of a 2-mer {AA}, while signal 2 is composed of a 2-mer {GG}. Signal-specific ИСКыs will be generated using the signal implantation strategy, where any implanting position is allowed, and the default OLGA model (humanIGH). Step 1: YAML specification ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ LIgO simulation starts with defining the YAML file with the simulation parameters. - First, we define the immune signals 1 and 2 in the **definitions** section. You can read more about the yaml file parameters in :doc:`specification`. - Second, we define the immune events and the repertoire parameters, such as the number of repertoires and the number of BCRs in therepertoire, in the **simulations** section. You can read more about the yaml file parameters in :doc:`specification`. - Finally, we define technical parameters of the simulation in the **instructions** section. You can read more about the yaml file parameters in :doc:`specification`. Here is the complete YAML specification for the simulation: .. collapse:: repertoire_ligo_quickstart.yaml .. code-block:: yaml definitions: motifs: motif1: seed: AA motif2: seed: GG signals: signal1: motifs: [motif1] signal2: motifs: [motif2] simulations: sim1: is_repertoire: true paired: false sequence_type: amino_acid simulation_strategy: Implanting remove_seqs_with_signals: true # remove signal-specific AIRs from the background sim_items: sim_item: # group of AIRs with the same parameters AIRR1: immune_events: ievent1: True ievent1: False signals: [signal1: 0.3, signal2: 0.3] number_of_examples: 10 is_noise: False receptors_in_repertoire_count: 6, generative_model: chain: heavy default_model_name: humanIGH model_path: null type: OLGA AIRR2: immune_events: ievent1: False ievent1: True signals: [signal1: 0.5, signal2: 0.5] number_of_examples: 10 is_noise: False receptors_in_repertoire_count: 6, generative_model: chain: heavy default_model_name: humanIGH model_path: null type: OLGA instructions: my_sim_inst: export_p_gens: false max_iterations: 100 number_of_processes: 4 sequence_batch_size: 1000 simulation: sim1 type: LigoSim Step 2: Running LIgO ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ After saving the yaml specification to a file (e.g., quickstart_repertoire.yaml), you can proceed with the analysis by following these steps: #. Activate the virtual environment where you have installed LIgO, for example .. code-block:: console source ligo_env/bin/activate #. Navigate to the directory where the yaml specification (quickstart_repertoire.yaml) was saved. #. Execute the following command: .. code-block:: console ligo quickstart_repertoire.yaml quickstart_output_repertoire All results will be located in quickstart_output_repertoire. Note that the output folder (quickstart_output_repertoire) should not exist prior to the run. Next steps ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ You can find more information about yaml parameters in :doc:`specification`. Other tutorials for how to use LIgO can be found under :doc:`tutorials`.