immuneML.data_model.encoded_data package


immuneML.data_model.encoded_data.EncodedData module

class immuneML.data_model.encoded_data.EncodedData.EncodedData(examples, labels: Optional[dict] = None, example_ids: Optional[list] = None, feature_names: Optional[list] = None, feature_annotations: Optional[pandas.DataFrame] = None, encoding: Optional[str] = None, info: Optional[dict] = None)[source]

Bases: object

When a dataset is encoded, it is stored in an object of EncodedData class.

  • examples – a matrix of example_count x feature_count elements (can be a numpy array or a sparse matrix); there are some exceptions to this, for instance, source.encodings.onehot.OneHotEncoder.OneHotEncoder where the numpy array has more than two dimensions, but most of the encodings follow the matrix format.

  • feature_names – a list of feature names with feature_count elements

  • feature_annotations – a data frame consisting of annotations for each unique feature

  • example_ids – a list of example (repertoire/sequence/receptor) IDs; it must be the same length as the example_count in the examples matrix

  • labels – a dict of labels where label names are keys and the values are lists of values for the label across examples: {label_name1: […], label_name2: […]}. Each list associated with a label has to have values for all examples.

Module contents