immuneML data model

immuneML data model supports three types of datasets that can be used for analyses:

  1. Repertoire dataset (RepertoireDataset) - one example in the dataset is one repertoire typically coming from one subject

  2. Receptor dataset (ReceptorDataset) - one example is one receptor with both chains set

  3. Sequence dataset (SequenceDataset) - one example is one receptor sequence with single chain information.

Useful function in the dataset classes include getting the metadata information from the RepertoireDataset, using get_metadata function, obtaining the number of examples in the dataset, checking possible labels or making subsets.

The UML diagram showing these classes and the underlying dependencies is shown below.


UML diagram showing the immuneML data model, where white classes are abstract and define the interface only, while green are concrete and used throughout the codebase.

Implementation details for ReceptorDataset and SequenceDataset are available in ElementDataset.