Decomposition

Data Fusion by Matrix Factorization (DFMF)

class imml.decomposition.DFMF(n_components: int = 10, max_iter: int = 100, init_type: str | list = 'random_c', n_run: int = 1, stopping=None, stopping_system=None, verbose=0, compute_err=False, callback=None, random_state: int = None, n_jobs=1, fill_value=0)[source]

Bases: TransformerMixin, BaseEstimator

Data Fusion by Matrix Factorization (DFMF). [1] [2]

DMFM is a data fusion approach with penalized matrix tri-factorization (DFMF) that simultaneously factorizes data matrices to reveal hidden associations.

This method can deal with both block- and single-wise missing.

Parameters:
  • n_components (int, default=10) -- Number of components to keep.

  • max_iter (int, default=100) -- Maximum number of iterations to perform.

  • init_type (str or list of str, default='random_c') -- The algorithm to initialize latent matrix factors. Options are 'random', 'random_c' and 'random_vcol'. It can be a list, each item being for fit and transform, respectively.

  • n_run (int, default=1) -- Number of components to keep.

  • stopping (tuple (target_matrix, eps), default=None) -- Terminate iteration if the reconstruction error of target matrix improves by less than eps.

  • stopping_system (float, default=None) -- Terminate iteration if the reconstruction error of the fused system improves by less than eps. compute_err is to True to compute the error of the fused system.

  • compute_err (bool, default=False) -- Compute the reconstruction error of every relation matrix if True.

  • callback (callable, default=None) -- An optional user-supplied function to call after each iteration. Called as callback(G, S, cur_iter), where S and G are current latent estimates.

  • fill_value (float, default=0) -- Value to use to initially fill missing values.

  • random_state (int, default=None) -- Determines the randomness. Use an int to make the randomness deterministic.

  • verbose (bool, default=False) -- Verbosity mode.

  • n_jobs (int, default=None) -- Number of jobs to run in parallel. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors.

fuser_

Model.

Type:

Dfmf object

transformer_

Object for transforming unseen data.

Type:

DfmfTransform object

t_
Type:

fusion.ObjectType

ts_
Type:

list of fusion.ObjectType

References

See also

DFMFImputer

Example

>>> import numpy as np
>>> import pandas as pd
>>> from imml.decomposition import DFMF
>>> Xs = [pd.DataFrame(np.random.default_rng(42).random((20, 10))) for i in range(3)]
>>> transformer = DFMF(n_components = 5)
>>> transformed_Xs = transformer.fit_transform(Xs)

Joint Non-Negative Matrix Factorization (JNMF)

class imml.decomposition.JNMF(n_components: int = 10, init_W=None, init_V=None, init_H=None, l1_W: float = 1e-10, l1_V: float = 1e-10, l1_H: float = 1e-10, l2_W: float = 1e-10, l2_V: float = 1e-10, l2_H: float = 1e-10, weights=None, beta_loss: list = None, p: float = 1.0, tol: float = 1e-10, max_iter: int = 100, verbose=0, random_state: int = None, engine: str = 'r')[source]

Bases: TransformerMixin, BaseEstimator

Joint Non-Negative Matrix Factorization (JNMF). [3] [4] [5] [6] [7] [8] [9] [10]

JNMF decompose the matrices to low-dimensional factor matrices.

It can deal with both modality- and feature-wise missing.

Parameters:
  • n_components (int, default=10) -- Number of components to keep.

  • init_W (array-like, default=None) -- The initial values of factor matrix W, which has n_samples-rows and n_components-columns.

  • init_V (array-like, default=None) -- A list containing the initial values of multiple factor matrices.

  • init_H (array-like, default=None) -- A list containing the initial values of multiple factor matrices.

  • l1_W (float, default=1e-10) -- Paramter for L1 regularitation. This also works as small positive constant to prevent division by zero, so should be set as 0.

  • l1_V (float, default=1e-10) -- Paramter for L1 regularitation. This also works as small positive constant to prevent division by zero, so should be set as 0.

  • l1_H (float, default=1e-10) -- Paramter for L1 regularitation. This also works as small positive constant to prevent division by zero, so should be set as 0.

  • l2_W (float, default=1e-10) -- Parameter for L2 regularitation.

  • l2_V (float, default=1e-10) -- Parameter for L2 regularitation.

  • l2_H (float, default=1e-10) -- Parameter for L2 regularitation.

  • weights (list, default=None) -- Weight vector.

  • beta_loss (int, default='Frobenius') -- One of ["Frobenius", "KL", "IS", "PLTF"].

  • p (float, default=None) -- The parameter of Probabilistic Latent Tensor Factorization (p=0: Frobenius, p=1: KL, p=2: IS) .

  • tol (int, default=1e-10) -- Tolerance of the stopping condition.

  • max_iter (int, default=100) -- Maximum number of iterations to perform.

  • random_state (int, default=None) -- Determines the randomness. Use an int to make the randomness deterministic.

  • verbose (bool, default=False) -- Verbosity mode.

  • engine (str, default='r') -- Engine to use for computing the model. Currently only 'r' is supported.

H_

List of specific factorization matrix.

Type:

list of n_mods array-likes objects of shape (n_features_i, n_components)

V_

List of specific factorization matrix.

Type:

list of n_mods array-likes objects of shape (n_samples, n_components)

reconstruction_err_

Beta-divergence between the training data X and the reconstructed data WH from the fitted model.

Type:

list of float

observed_reconstruction_err_

Beta-divergence between the observed values and the reconstructed data WH from the fitted model.

Type:

list of float

missing_reconstruction_err_

Beta-divergence between the missing values and the reconstructed data WH from the fitted model.

Type:

list of float

relchange_

The relative change of the error.

Type:

list of float

References

Example

>>> import numpy as np
>>> import pandas as pd
>>> from imml.decomposition import JNMF
>>> Xs = [pd.DataFrame(np.random.default_rng(42).uniform(size=(20, 10))) for i in range(3)]
>>> transformer = JNMF(n_components = 5)
>>> transformed_Xs = transformer.fit_transform(Xs)

Multi-Omics Factor Analysis (MOFA)

class imml.decomposition.MOFA(n_components: int = 10, impute: bool = True, data_options: dict = None, data_matrix: dict = None, model_options: dict = None, train_options: dict = None, stochastic_options: dict = None, covariates: dict = None, smooth_options: dict = None, random_state: int = None, verbose=False)[source]

Bases: TransformerMixin, BaseEstimator

Multi-Omics Factor Analysis (MOFA). [11] [12] [13]

MOFA is a factor analysis model that provides a general framework for the integration of (originally, multi-omic data sets) incomplete multi-modal datasets, in an unsupervised fashion. Intuitively, MOFA can be viewed as a versatile and statistically rigorous generalization of principal component analysis to multi-modal data. Given several data matrices with measurements of multiple data types on the same or on overlapping sets of samples, MOFA infers an interpretable low-dimensional representation in terms of a few latent factors.

It can deal with both modality- and feature-wise missing.

Parameters:
  • n_components (int, default=10) -- Number of components to keep.

  • impute (bool, default=True) -- True if missing values should be imputed.

  • data_options (dict, default=None) -- Data processing options, such as scale_views and scale_groups.

  • data_matrix (dict, default=None) -- Keys such as likelihoods, view_names, etc.

  • model_options (dict, default=None) -- Model options, such as ard_factors or ard_weights.

  • train_options (dict, default=None) -- Keys such as iter, tolerance.

  • stochastic_options (dict, default=None) -- Stochastic variational inference options, such as learning rate or batch size.

  • covariates (dict, default=None) -- Slot to store sample covariate for training in MEFISTO. Keys are sample_cov and covariates_names.

  • smooth_options (dict, default=None) -- options for smooth inference, such as scale_cov or model_groups.

  • random_state (int, default=None) -- Determines the randomness. Use an int to make the randomness deterministic.

  • verbose (bool, default=False) -- Verbosity mode.

mofa_

Entry point as the original library. This can be used for data analysis and explainability.

Type:

mofa object

factors_

Factors computed by the model.

Type:

array-like of shape (n_samples, n_components)

weights_

Weights of the MOFA model.

Type:

list of n_mods array-likes objects of shape (n_features_i, n_components)

References

See also

MOFAImputer

Example

>>> import numpy as np
>>> import pandas as pd
>>> from imml.decomposition import MOFA
>>> Xs = [pd.DataFrame(np.random.default_rng(42).random((20, 10))) for i in range(3)]
>>> transformer = MOFA(n_components = 5)
>>> transformed_Xs = transformer.fit_transform(Xs)