immuneML.analysis.data_manipulation package¶
Submodules¶
immuneML.analysis.data_manipulation.DataReshaper module¶
- class immuneML.analysis.data_manipulation.DataReshaper.DataReshaper[source]¶
Bases:
object- static reshape(dataset: Dataset, labels=None)[source]¶
Takes a 2D matrix of values from the encoded data and reshapes it to long format, retaining the column and row annotations. This is for ease of use in plotting the data. It is suggested that some sort of filtering is done first, otherwise the memory usage may explode, as the resulting data frame is of shape (matrix.shape[0] * matrix.shape[1], labels.shape[0] + feature_annotations.shape[1] + 1)
immuneML.analysis.data_manipulation.NormalizationType module¶
- class immuneML.analysis.data_manipulation.NormalizationType.NormalizationType(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]¶
Bases:
EnumDifferent normalization types for vectors.
RELATIVE_FREQUENCY: Each value is divided by the sum of all values (L1 normalization).
L2: Each value is divided by the L2 norm (Euclidean norm) [focuses on the direction of the vector, not the magnitude].
MAX: Each value is divided by the maximum value in the vector.
BINARY: Each value is set to 1 if it is greater than 0, otherwise it is set to 0.
NONE: No normalization is applied.
Used in encodings like GeneFrequencyEncoder, KmerFrequencyEncoder, etc.
- BINARY = 'binary'¶
- L2 = 'l2'¶
- MAX = 'max'¶
- NONE = 'none'¶
- RELATIVE_FREQUENCY = 'l1'¶