Decomposition¶

Data Fusion by Matrix Factorization (DFMF)¶

class imml.decomposition.DFMF(n_components: int = 10, max_iter: int = 100, init_type: str | list = 'random_c', n_run: int = 1, stopping=None, stopping_system=None, verbose=0, compute_err=False, callback=None, random_state: int = None, n_jobs=1, fill_value=0)[source]¶

Bases: TransformerMixin, BaseEstimator

Data Fusion by Matrix Factorization (DFMF). [1] [2]

DMFM is a data fusion approach with penalized matrix tri-factorization (DFMF) that simultaneously factorizes data matrices to reveal hidden associations.

This method can deal with both block- and single-wise missing.

Parameters:

n_components (int, default=10) -- Number of components to keep.
max_iter (int, default=100) -- Maximum number of iterations to perform.
init_type (str or list of str, default='random_c') -- The algorithm to initialize latent matrix factors. Options are 'random', 'random_c' and 'random_vcol'. It can be a list, each item being for fit and transform, respectively.
n_run (int, default=1) -- Number of components to keep.
stopping (tuple (target_matrix, eps), default=None) -- Terminate iteration if the reconstruction error of target matrix improves by less than eps.
stopping_system (float, default=None) -- Terminate iteration if the reconstruction error of the fused system improves by less than eps. compute_err is to True to compute the error of the fused system.
compute_err (bool, default=False) -- Compute the reconstruction error of every relation matrix if True.
callback (callable, default=None) -- An optional user-supplied function to call after each iteration. Called as callback(G, S, cur_iter), where S and G are current latent estimates.
fill_value (float, default=0) -- Value to use to initially fill missing values.
random_state (int, default=None) -- Determines the randomness. Use an int to make the randomness deterministic.
verbose (bool, default=False) -- Verbosity mode.
n_jobs (int, default=None) -- Number of jobs to run in parallel. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors.

fuser_¶

Model.

Type:: Dfmf object

transformer_¶

Object for transforming unseen data.

Type:: DfmfTransform object

t_¶

Type:: fusion.ObjectType

ts_¶

Type:: list of fusion.ObjectType

References

Joint Non-Negative Matrix Factorization (JNMF)¶

class imml.decomposition.JNMF(n_components: int = 10, init_W=None, init_V=None, init_H=None, l1_W: float = 1e-10, l1_V: float = 1e-10, l1_H: float = 1e-10, l2_W: float = 1e-10, l2_V: float = 1e-10, l2_H: float = 1e-10, weights=None, beta_loss: list = None, p: float = 1.0, tol: float = 1e-10, max_iter: int = 100, verbose=0, random_state: int = None, engine: str = 'r')[source]¶

Bases: TransformerMixin, BaseEstimator

Joint Non-Negative Matrix Factorization (JNMF). [3] [4] [5] [6] [7] [8] [9] [10]

JNMF decompose the matrices to low-dimensional factor matrices.

It can deal with both modality- and feature-wise missing.

Parameters:

n_components (int, default=10) -- Number of components to keep.
init_W (array-like, default=None) -- The initial values of factor matrix W, which has n_samples-rows and n_components-columns.
init_V (array-like, default=None) -- A list containing the initial values of multiple factor matrices.
init_H (array-like, default=None) -- A list containing the initial values of multiple factor matrices.
l1_W (float, default=1e-10) -- Paramter for L1 regularitation. This also works as small positive constant to prevent division by zero, so should be set as 0.
l1_V (float, default=1e-10) -- Paramter for L1 regularitation. This also works as small positive constant to prevent division by zero, so should be set as 0.
l1_H (float, default=1e-10) -- Paramter for L1 regularitation. This also works as small positive constant to prevent division by zero, so should be set as 0.
l2_W (float, default=1e-10) -- Parameter for L2 regularitation.
l2_V (float, default=1e-10) -- Parameter for L2 regularitation.
l2_H (float, default=1e-10) -- Parameter for L2 regularitation.
weights (list, default=None) -- Weight vector.
beta_loss (int, default='Frobenius') -- One of ["Frobenius", "KL", "IS", "PLTF"].
p (float, default=None) -- The parameter of Probabilistic Latent Tensor Factorization (p=0: Frobenius, p=1: KL, p=2: IS) .
tol (int, default=1e-10) -- Tolerance of the stopping condition.
max_iter (int, default=100) -- Maximum number of iterations to perform.
random_state (int, default=None) -- Determines the randomness. Use an int to make the randomness deterministic.
verbose (bool, default=False) -- Verbosity mode.
engine (str, default='r') -- Engine to use for computing the model. Currently only 'r' is supported.

H_¶

List of specific factorization matrix.

Type:: list of n_mods array-likes objects of shape (n_features_i, n_components)

V_¶

List of specific factorization matrix.

Type:: list of n_mods array-likes objects of shape (n_samples, n_components)

reconstruction_err_¶

Beta-divergence between the training data X and the reconstructed data WH from the fitted model.

Type:: list of float

observed_reconstruction_err_¶

Beta-divergence between the observed values and the reconstructed data WH from the fitted model.

Type:: list of float

missing_reconstruction_err_¶

Beta-divergence between the missing values and the reconstructed data WH from the fitted model.

Type:: list of float

relchange_¶

The relative change of the error.

Type:: list of float

References

Multi-Omics Factor Analysis (MOFA)¶

class imml.decomposition.MOFA(n_components: int = 10, impute: bool = True, data_options: dict = None, data_matrix: dict = None, model_options: dict = None, train_options: dict = None, stochastic_options: dict = None, covariates: dict = None, smooth_options: dict = None, random_state: int = None, verbose=False)[source]¶

Bases: TransformerMixin, BaseEstimator

Multi-Omics Factor Analysis (MOFA). [11] [12] [13]

MOFA is a factor analysis model that provides a general framework for the integration of (originally, multi-omic data sets) incomplete multi-modal datasets, in an unsupervised fashion. Intuitively, MOFA can be viewed as a versatile and statistically rigorous generalization of principal component analysis to multi-modal data. Given several data matrices with measurements of multiple data types on the same or on overlapping sets of samples, MOFA infers an interpretable low-dimensional representation in terms of a few latent factors.

It can deal with both modality- and feature-wise missing.

Parameters:

n_components (int, default=10) -- Number of components to keep.
impute (bool, default=True) -- True if missing values should be imputed.
data_options (dict, default=None) -- Data processing options, such as scale_views and scale_groups.
data_matrix (dict, default=None) -- Keys such as likelihoods, view_names, etc.
model_options (dict, default=None) -- Model options, such as ard_factors or ard_weights.
train_options (dict, default=None) -- Keys such as iter, tolerance.
stochastic_options (dict, default=None) -- Stochastic variational inference options, such as learning rate or batch size.
covariates (dict, default=None) -- Slot to store sample covariate for training in MEFISTO. Keys are sample_cov and covariates_names.
smooth_options (dict, default=None) -- options for smooth inference, such as scale_cov or model_groups.
random_state (int, default=None) -- Determines the randomness. Use an int to make the randomness deterministic.
verbose (bool, default=False) -- Verbosity mode.

mofa_¶

Entry point as the original library. This can be used for data analysis and explainability.

Type:: mofa object

factors_¶

Factors computed by the model.

Type:: array-like of shape (n_samples, n_components)

weights_¶

Weights of the MOFA model.

Type:: list of n_mods array-likes objects of shape (n_features_i, n_components)

References