Cluster¶

Doubly Aligned Incomplete Multi-view Clustering (DAIMC)¶

class imml.cluster.DAIMC(n_clusters: int = 8, alpha: float = 1, beta: float = 1, random_state: int = None, engine: str = 'python', verbose=False, clean_space: bool = True)[source]¶

Bases: BaseEstimator, ClassifierMixin

Doubly Aligned Incomplete Multi-view Clustering (DAIMC). [1] [2] [3]

The DAIMC algorithm integrates weighted semi-nonnegative matrix factorization (semi-NMF) to address incomplete multi-view clustering challenges. It leverages instance alignment information to learn a unified latent feature matrix across views and employs L2,1-Norm regularized regression to establish a consensus basis matrix, minimizing the impact of missing instances.

Parameters:

n_clusters (int, default=8) -- The number of clusters to generate.
alpha (float, default=1) -- Nonnegative value.
beta (float, default=1) -- Define the trade-off between sparsity and accuracy of regression for the i-th modality.
random_state (int, default=None) -- Determines the randomness. Use an int to make the randomness deterministic.
engine (str, default=python) -- Engine to use for computing the model. Current options are 'matlab' or 'python'.
verbose (bool, default=False) -- Verbosity mode.
clean_space (bool, default=True) -- If engine is 'matlab' and clean_space is True, the session will be closed after fitting the model.

labels_¶

Labels of each point in training data.

Type:: array-like of shape (n_samples,)

embedding_¶

Commont latent feature matrix to be used as input for the KMeans clustering step.

Type:: array-like of shape (n_samples, n_clusters)

U_¶

Basis matrices.

Type:: list of n_mods array-like of shape (n_features_i, n_clusters)

B_¶

Regression coefficient matrices.

Type:: list of n_mods array-like of shape (n_features_i, n_clusters)

References

Example

>>> import numpy as np
>>> import pandas as pd
>>> from imml.cluster import DAIMC
>>> Xs = [pd.DataFrame(np.random.default_rng(42).random((20, 10))) for i in range(3)]
>>> estimator = DAIMC(n_clusters = 2)
>>> labels = estimator.fit_predict(Xs)

fit(Xs, y=None)[source]¶

Fit the transformer to the input data.

Parameters:

Xs (list of array-likes objects) --
- Xs length: n_mods
- Xs[i] shape: (n_samples, n_features_i)
A list of different modalities.
y (Ignored) -- Not used, present here for API consistency by convention.

Returns:

self

Return type:

Fitted estimator.

fit_predict(Xs, y=None)[source]¶

Fit the model and return clustering results. Convenience method; equivalent to calling fit(X) followed by predict(X).

Parameters:

Xs (list of array-likes objects) --

Xs length: n_mods
Xs[i] shape: (n_samples, n_features_i)

A list of different modalities.

Returns:

labels -- Index of the cluster each sample belongs to.

Return type:

ndarray of shape (n_samples,)

set_fit_request(*, Xs: bool | None | str = '$UNCHANGED$') → DAIMC¶

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:: Xs (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) -- Metadata routing for Xs parameter in fit.
Returns:: self -- The updated object.
Return type:: object

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → DAIMC¶

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) -- Metadata routing for sample_weight parameter in score.
Returns:: self -- The updated object.
Return type:: object

Efficient and Effective Incomplete Multi-view Clustering (EE-IMVC)¶

class imml.cluster.EEIMVC(n_clusters: int = 8, kernel: callable = None, lambda_reg: float = 1.0, qnorm: float = 2.0, random_state: int = None, engine: str = 'python', verbose=False, clean_space: bool = True)[source]¶

Bases: BaseEstimator, ClassifierMixin

Efficient and Effective Incomplete Multi-view Clustering (EE-IMVC). [4] [5]

EE-IMVC impute missing views with a consensus clustering matrix that is regularized with prior knowledge.

Parameters:

n_clusters (int, default=8) -- The number of clusters to generate.
kernel (callable, default=None) -- Specifies the kernel type to be used in the algorithm. By default, it applies dot product kernel.
lambda_reg (float, default=1.) -- Regularization parameter. The algorithm demonstrated stable performance across a wide range of this hyperparameter.
qnorm (float, default=2.) -- Regularization parameter. The algorithm demonstrated stable performance across a wide range of this hyperparameter.
random_state (int, default=None) -- Determines the randomness. Use an int to make the randomness deterministic.
engine (str, default=python) -- Engine to use for computing the model. Current options are 'matlab' or 'python'.
verbose (bool, default=False) -- Verbosity mode.
clean_space (bool, default=True) -- If engine is 'matlab' and clean_space is True, the session will be closed after fitting the model.

labels_¶

Labels of each point in training data.

Type:: array-like of shape (n_samples,)

embedding_¶

Consensus clustering matrix to be used as input for the KMeans clustering step.

Type:: array-like of shape (n_samples, n_clusters)

WP_¶

p-th permutation matrix.

Type:: array-like of shape (n_clusters, n_clusters, n_mods)

HP_¶

missing part of the p-th base clustering matrix.

Type:: array-like of shape (n_samples, n_clusters, n_mods)

beta_¶

Adaptive weights of clustering matrices.

Type:: array-like of shape (n_mods,)

loss_¶

Values of the loss function.

Type:: array-like of shape (n_iter_,)

n_iter_¶

Number of iterations.

Type:: int

References

Example

>>> import numpy as np
>>> import pandas as pd
>>> from imml.cluster import EEIMVC
>>> Xs = [pd.DataFrame(np.random.default_rng(42).random((20, 10))) for i in range(3)]
>>> estimator = EEIMVC(n_clusters = 2)
>>> labels = estimator.fit_predict(Xs)

fit(Xs, y=None)[source]¶

Fit the transformer to the input data.

Parameters:

Xs (list of array-likes objects) --
- Xs length: n_mods
- Xs[i] shape: (n_samples, n_features_i)
A list of different modalities.
y (Ignored) -- Not used, present here for API consistency by convention.

Returns:

self

Return type:

Fitted estimator.

fit_predict(Xs, y=None)[source]¶

Fit the model and return clustering results. Convenience method; equivalent to calling fit(X) followed by predict(X).

Parameters:

Xs (list of array-likes objects) --

Xs length: n_mods
Xs[i] shape: (n_samples, n_features_i)

A list of different modalities.

Returns:

labels -- Index of the cluster each sample belongs to.

Return type:

ndarray of shape (n_samples,)

set_fit_request(*, Xs: bool | None | str = '$UNCHANGED$') → EEIMVC¶

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:: Xs (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) -- Metadata routing for Xs parameter in fit.
Returns:: self -- The updated object.
Return type:: object

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → EEIMVC¶

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) -- Metadata routing for sample_weight parameter in score.
Returns:: self -- The updated object.
Return type:: object

Incomplete Multiview Spectral Clustering With Adaptive Graph Learning (IMSCAGL)¶

class imml.cluster.IMSCAGL(n_clusters: int = 8, lambda1: float = 0.1, lambda2: float = 1000, lambda3: float = 100, k: int = 5, neighbor_mode: str = 'KNN', weight_mode: str = 'Binary', max_iter: int = 100, miu: float = 0.01, rho: float = 1.1, random_state: int = None, engine: str = 'matlab', verbose=False, clean_space: bool = True)[source]¶

Bases: BaseEstimator, ClassifierMixin

Incomplete Multiview Spectral Clustering With Adaptive Graph Learning (IMSCAGL). [6] [7] [8] [9]

IMSCAGL utilizes graph learning and spectral clustering techniques to derive a unified representation for incomplete multiview clustering.

Parameters:

n_clusters (int, default=8) -- The number of clusters to generate.
lambda1 (float, default=0.1) -- Penalty parameter for learning model of the multi-modal subspace clustering.
lambda2 (float, default=1000) -- Penalty parameter for learning model of the multi-modal subspace clustering.
lambda3 (float, default=100) -- Penalty parameter for learning the consensus representation from those cluster indicator matrices of all views.
k (int, default=5) -- Parameter k of KNN graph.
neighbor_mode (str, default='KNN') -- Indicates how to construct the graph. Options are 'KNN' (default), and 'Supervised'.
weight_mode (str, default='Binary') -- Indicates how to assign weights for each edge in the graph. Options are 'Binary' (default), 'Cosine' and 'HeatKernel'.
max_iter (int, default=100) -- Maximum number of iterations.
miu (float, default=0.01) -- Constant for updating variables during the learning process.
rho (float, default=100) -- Constant for updating variables during the learning process.
random_state (int, default=None) -- Determines the randomness. Use an int to make the randomness deterministic.
engine (str, default=matlab) -- Engine to use for computing the model. Currently only 'matlab' is supported.
verbose (bool, default=False) -- Verbosity mode.
clean_space (bool, default=True) -- If engine is 'matlab' and clean_space is True, the session will be closed after fitting the model.

labels_¶

Labels of each point in training data.

Type:: array-like of shape (n_samples,)

embedding_¶

Consensus representation matrix to be used as input for the KMeans clustering step.

Type:: array-like of shape (n_samples, n_clusters)

References

Example

>>> import numpy as np
>>> import pandas as pd
>>> from imml.cluster import IMSCAGL
>>> Xs = [pd.DataFrame(np.random.default_rng(42).random((20, 10))) for i in range(3)]
>>> estimator = IMSCAGL(n_clusters = 2)
>>> labels = estimator.fit_predict(Xs)

fit(Xs, y=None)[source]¶

Fit the transformer to the input data.

Parameters:

Xs (list of array-likes objects) --
- Xs length: n_mods
- Xs[i] shape: (n_samples, n_features_i)
A list of different modalities.
y (Ignored) -- Not used, present here for API consistency by convention.

Returns:

self

Return type:

Fitted estimator.

fit_predict(Xs, y=None)[source]¶

Fit the model and return clustering results. Convenience method; equivalent to calling fit(X) followed by predict(X).

Parameters:

Xs (list of array-likes objects) --

Xs length: n_mods
Xs[i] shape: (n_samples, n_features_i)

A list of different modalities.

Returns:

labels -- Index of the cluster each sample belongs to.

Return type:

ndarray of shape (n_samples,)

set_fit_request(*, Xs: bool | None | str = '$UNCHANGED$') → IMSCAGL¶

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:: Xs (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) -- Metadata routing for Xs parameter in fit.
Returns:: self -- The updated object.
Return type:: object

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → IMSCAGL¶

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) -- Metadata routing for sample_weight parameter in score.
Returns:: self -- The updated object.
Return type:: object

Self-representation Subspace Clustering for Incomplete Multi-view Data (IMSR)¶

class imml.cluster.IMSR(n_clusters: int = 8, lbd: float = 1, gamma: float = 1, random_state: int = None, engine: str = 'python', verbose=False, clean_space: bool = True)[source]¶

Bases: BaseEstimator, ClassifierMixin

Self-representation Subspace Clustering for Incomplete Multi-view Data (IMSR). [10] [11]

IMSR performs feature extraction, imputation and self-representation learning to obtain a low-rank regularized consensus coefficient matrix.

Parameters:

n_clusters (int, default=8) -- The number of clusters to generate.
lbd (float, default=1) -- Positive trade-off parameter used for the optimization function. It is recommended to set from 0 to 1.
gamma (float, default=1) -- Positive trade-off parameter used for the optimization function. It is recommended to set from 0 to 1.
random_state (int, default=None) -- Determines the randomness. Use an int to make the randomness deterministic.
engine (str, default=python) -- Engine to use for computing the model. Current options are 'matlab' or 'python'.
verbose (bool, default=False) -- Verbosity mode.
clean_space (bool, default=True) -- If engine is 'matlab' and clean_space is True, the session will be closed after fitting the model.

labels_¶

Labels of each point in training data.

Type:: array-like of shape (n_samples,)

embedding_¶

Consensus clustering matrix to be used as input for the KMeans clustering step.

Type:: array-like of shape (n_samples, n_clusters)

loss_¶

Values of the loss function.

Type:: array-like of shape (n_iter_,)

n_iter_¶

Number of iterations.

Type:: int

References

Example

>>> import numpy as np
>>> import pandas as pd
>>> from imml.cluster import IMSR
>>> Xs = [pd.DataFrame(np.random.default_rng(42).random((20, 10))) for i in range(3)]
>>> estimator = IMSR(n_clusters = 2)
>>> labels = estimator.fit_predict(Xs)

fit(Xs, y=None)[source]¶

Fit the transformer to the input data.

Parameters:

Xs (list of array-likes objects) --
- Xs length: n_mods
- Xs[i] shape: (n_samples, n_features_i)
A list of different modalities.
y (Ignored) -- Not used, present here for API consistency by convention.

Returns:

self

Return type:

Fitted estimator.

fit_predict(Xs, y=None)[source]¶

Fit the model and return clustering results. Convenience method; equivalent to calling fit(X) followed by predict(X).

Parameters:

Xs (list of array-likes objects) --

Xs length: n_mods
Xs[i] shape: (n_samples, n_features_i)

A list of different modalities.

Returns:

labels -- Index of the cluster each sample belongs to.

Return type:

ndarray of shape (n_samples,)

set_fit_request(*, Xs: bool | None | str = '$UNCHANGED$') → IMSR¶

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:: Xs (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) -- Metadata routing for Xs parameter in fit.
Returns:: self -- The updated object.
Return type:: object

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → IMSR¶

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) -- Metadata routing for sample_weight parameter in score.
Returns:: self -- The updated object.
Return type:: object

Integrate Any Omics (IntegrAO)¶

class imml.cluster.IntegrAO(Xs, model: object = None, n_clusters: int = 8, neighbor_size: int = None, hidden_channels: int = 128, embedding_dims: int = 50, fusing_iteration: int = 20, mu: float = 0.5, learning_rate: float = 0.001, weight_decay: float = 0.0001, random_state: int = None)[source]¶

Bases: object

Integrate Any Omics (IntegrAO). [12] [13]

IntegrAO first combines partially overlapping sample graphs from diverse sources and utilizes graph neural networks to produce unified sample embeddings.

This class provides training, validation, testing, and prediction logic compatible with the Lightning Trainer.

Parameters:

Xs (list of array-likes objects) --
- Xs length: n_mods
- Xs[i] shape: (n_samples, n_features_i)
A list of different modalities. It will be used to create the neural network architecture.
model (nn.Module, default=None) -- Deep learning model. If None, it will select IntegrAOModule.
n_clusters (int, default=8) -- The number of clusters to generate.
neighbor_size (int, default=None) -- Number of neighbors to use. If None, it will use N/6).
hidden_channels (int, default=128) -- Hidden dimension size.
embedding_dims (int, default=50) -- Size of the shared embedding space where modalities are projected.
fusing_iteration (int, default=20) -- Number of iterations for fusing.
mu (float, default=0.5) -- Normalization factor to scale similarity kernel.
learning_rate (float, default=1e-3) -- Learning rate for the optimizer.
weight_decay (float, default=2e-2) -- Weight decay used by the optimizer.
random_state (int, default=None) -- Determines the randomness. Use an int to make the randomness deterministic.

embedding_¶

Commont latent feature matrix.

Type:: array-like of shape (n_samples, n_clusters)

cluster_model_¶

Scikit-learn SpectralClustering object.

Type:: SpectralClustering

fused_networks_¶

Modal-specific graphs.

Type:: list of array-like of shape (n_samples_i, n_samples_i)

References

Example

>>> import numpy as np
>>> import torch
>>> from imml.cluster import IntegrAO
>>> from lightning import Trainer
>>> from torch.utils.data import DataLoader
>>> from imml.load import IntegrAODataset
>>> Xs = [torch.from_numpy(np.random.default_rng(42).random((20, 10))) for i in range(3)]
>>> estimator = IntegrAO(Xs=Xs, random_state=42)
>>> train_data = IntegrAODataset(Xs=Xs, neighbor_size=estimator.neighbor_size, networks=estimator.fused_networks_)
>>> train_dataloader = DataLoader(dataset=train_data)
>>> trainer = Trainer(max_epochs=2, logger=False, enable_checkpointing=False)
>>> trainer.fit(estimator, train_dataloader)
>>> labels = trainer.predict(estimator, train_dataloader)[0]

training_step(batch, batch_idx=None)[source]¶: Method required for training using Lightning Trainer.

validation_step(batch, batch_idx=None)[source]¶: Method required for validating using Lightning Trainer.

test_step(batch, batch_idx=None)[source]¶: Method required for testing using Lightning Trainer.

predict_step(batch, batch_idx=None)[source]¶: Method required for predicting using Lightning Trainer.

configure_optimizers()[source]¶: Method required for training using Lightning Trainer.

Late Fusion Incomplete Multi-View Clustering (LF-IMVC)¶

class imml.cluster.LFIMVC(n_clusters: int = 8, kernel: callable = DotProduct(sigma_0=1) + WhiteKernel(noise_level=1), lambda_reg: float = 1.0, max_iter: int = 200, random_state: int = None, engine: str = 'python', verbose=False, clean_space: bool = True)[source]¶

Bases: BaseEstimator, ClassifierMixin

Late Fusion Incomplete Multi-View Clustering (LF-IMVC). [14] [15]

LF-IMVC jointly learns a consensus clustering matrix, imputes each incomplete base matrix, and optimizes the corresponding permutation matrices to integrate the incomplete clustering matrices generated by incomplete views.

Parameters:

n_clusters (int, default=8) -- The number of clusters to generate.
kernel (callable, default=kernels.Sum(kernels.DotProduct(), kernels.WhiteKernel())) -- Specifies the kernel type to be used in the algorithm.
lambda_reg (float, default=1.) -- Regularization parameter. The algorithm demonstrated stable performance across a wide range of this hyperparameter.
max_iter (int, default=100) -- Maximum number of iterations.
random_state (int, default=None) -- Determines the randomness. Use an int to make the randomness deterministic.
engine (str, default=python) -- Engine to use for computing the model. Current options are 'matlab' or 'python'.
verbose (bool, default=False) -- Verbosity mode.
clean_space (bool, default=True) -- If engine is 'matlab' and clean_space is True, the session will be closed after fitting the model.

labels_¶

Labels of each point in training data.

Type:: array-like of shape (n_samples,)

embedding_¶

Consensus clustering matrix to be used as input for the KMeans clustering step.

Type:: np.array

WP_¶

p-th permutation matrix.

Type:: array-like of shape (n_clusters, n_clusters, n_mods)

HP_¶

missing part of the p-th base clustering matrix.

Type:: array-like of shape (n_samples, n_clusters, n_mods)

loss_¶

Values of the loss function.

Type:: array-like of shape (n_iter_,)

n_iter_¶

Number of iterations.

Type:: int

References

Example

>>> import numpy as np
>>> import pandas as pd
>>> from imml.cluster import LFIMVC
>>> Xs = [pd.DataFrame(np.random.default_rng(42).random((20, 10))) for i in range(3)]
>>> estimator = LFIMVC(n_clusters = 2)
>>> labels = estimator.fit_predict(Xs)

fit(Xs, y=None)[source]¶

Fit the transformer to the input data.

Parameters:

Xs (list of array-likes objects) --
- Xs length: n_mods
- Xs[i] shape: (n_samples, n_features_i)
A list of different modalities.
y (Ignored) -- Not used, present here for API consistency by convention.

Returns:

self

Return type:

Fitted estimator.

fit_predict(Xs, y=None)[source]¶

Fit the model and return clustering results. Convenience method; equivalent to calling fit(X) followed by predict(X).

Parameters:

Xs (list of array-likes objects) --

Xs length: n_mods
Xs[i] shape: (n_samples, n_features_i)

A list of different modalities.

Returns:

labels -- Index of the cluster each sample belongs to.

Return type:

ndarray of shape (n_samples,)

set_fit_request(*, Xs: bool | None | str = '$UNCHANGED$') → LFIMVC¶

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:: Xs (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) -- Metadata routing for Xs parameter in fit.
Returns:: self -- The updated object.
Return type:: object

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → LFIMVC¶

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) -- Metadata routing for sample_weight parameter in score.
Returns:: self -- The updated object.
Return type:: object

Multiple Kernel K-Means with Incomplete Kernels (MKKM-IK)¶

class imml.cluster.MKKMIK(n_clusters: int = 8, kernel_initialization: str = 'zeros', kernel: callable = None, qnorm: float = 2.0, random_state: int = None, engine: str = 'matlab', verbose=False, clean_space: bool = True)[source]¶

Bases: BaseEstimator, ClassifierMixin

Multiple Kernel K-Means with Incomplete Kernels (MKKM-IK). [16] [17]

MKKM-IK integrates imputation and clustering into a single optimization procedure. Thus, the clustering result guides the missing kernel imputation, and the latter is used to conduct the subsequent clustering. Both procedures will be performed until convergence.

Parameters:

n_clusters (int, default=8) -- The number of clusters to generate.
kernel (callable, default=None) -- Specifies the kernel type to be used in the algorithm. It uses dot product kernel by default.
kernel_initialization (str, default="zeros") -- Specifies the algorithm to initialize the kernel. It should be one of ['zeros', 'mean', 'knn', 'em', 'laplacian'].
lambda_reg (float, default=1.) -- Regularization parameter. The algorithm demonstrated stable performance across a wide range of this hyperparameter.
qnorm (float, default=2.) -- Regularization parameter. The algorithm demonstrated stable performance across a wide range of this hyperparameter.
random_state (int, default=None) -- Determines the randomness. Use an int to make the randomness deterministic.
engine (str, default=matlab) -- Engine to use for computing the model. Current options are 'matlab'.
verbose (bool, default=False) -- Verbosity mode.
clean_space (bool, default=True) -- If engine is 'matlab' and clean_space is True, the session will be closed after fitting the model.

labels_¶

Labels of each point in training data.

Type:: array-like of shape (n_samples,)

embedding_¶

Consensus clustering matrix to be used as input for the KMeans clustering step.

Type:: array-like of shape (n_samples, n_clusters)

gamma_¶

Kernel weights.

Type:: array-like of shape (n_mods,)

KA_¶

Kernel sub-matrix.

Type:: array-like of shape (n_samples, n_mods)

loss_¶

Values of the loss function.

Type:: array-like of shape (n_iter_,)

n_iter_¶

Number of iterations.

Type:: int

References

Example

>>> import numpy as np
>>> import pandas as pd
>>> from imml.cluster import MKKMIK
>>> Xs = [pd.DataFrame(np.random.default_rng(42).random((20, 10))) for i in range(3)]
>>> estimator = MKKMIK(n_clusters = 2)
>>> labels = estimator.fit_predict(Xs)

fit(Xs, y=None)[source]¶

Fit the transformer to the input data.

Parameters:

Xs (list of array-likes objects) --
- Xs length: n_mods
- Xs[i] shape: (n_samples, n_features_i)
A list of different modalities.
y (Ignored) -- Not used, present here for API consistency by convention.

Returns:

self

Return type:

Fitted estimator.

fit_predict(Xs, y=None)[source]¶

Fit the model and return clustering results. Convenience method; equivalent to calling fit(X) followed by predict(X).

Parameters:

Xs (list of array-likes objects) --

Xs length: n_mods
Xs[i] shape: (n_samples, n_features_i)

A list of different modalities.

Returns:

labels -- Index of the cluster each sample belongs to.

Return type:

ndarray of shape (n_samples,)

set_fit_request(*, Xs: bool | None | str = '$UNCHANGED$') → MKKMIK¶

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:: Xs (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) -- Metadata routing for Xs parameter in fit.
Returns:: self -- The updated object.
Return type:: object

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → MKKMIK¶

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) -- Metadata routing for sample_weight parameter in score.
Returns:: self -- The updated object.
Return type:: object

Multi Omic Clustering by Non-Exhaustive Types (MONET)¶

class imml.cluster.MONET(n_clusters: int = None, num_repeats: int = 15, similarity_mode: str = 'corr', init_modules: dict = None, iters: int = 500, num_of_seeds: int = 10, num_of_samples_in_seed: int = 10, min_mod_size: int = 10, max_sams_per_action: int = 10, percentile_remove_edge: int = None, random_state: int = None, verbose: bool = False, n_jobs: int = None)[source]¶

Bases: BaseEstimator, ClassifierMixin

Multi Omic Clustering by Non-Exhaustive Types (MONET). [18] [19]

MONET operates in two distinct phases to extract meaningful information from multi-omics datasets. In the first phase, it constructs an edge-weighted graph for each omic, where the nodes represent individual samples, and the weights indicate the similarity between samples within that particular omic. Moving on to the second phase, MONET identifies modules by identifying dense subgraphs that are shared across multiple omic graphs.

The resulting output comprises a collection of modules, each representing a subset of samples. These modules are mutually exclusive, meaning that samples are assigned to only one module. It is important to note that not all samples are necessarily assigned to a module; those remaining unassigned are referred to as "lonely" samples. Each module, denoted as M, is characterized by its constituent samples, referred to as samples(M), and the set of omics it encompasses, denoted as omics(M). Intuitively, samples(M) exhibit similarity with one another specifically within the omics(M) context.

Parameters:

n_clusters (Ignored) -- Ignored.
num_repeats (int (default=15)) -- Times the algorithm will be repeated in order to avoid suboptimal (local maximum) solutions. The best solution will be returned.
similarity_mode (str (default='prob')) -- One of ['prob', 'corr']. If 'corr', the weighting scheme is computed basen on correlation; if 'prob', a probabilistic formulation is used.
init_modules (dict (default=None)) -- an optional module initialization for MONET. A dict mapping between module names to sample ids. All modules are initialized to cover all views. Set to None to use MONET's seed finding algorithm for initialization.
iters (int (default=500)) -- Maximal number of iterations.
num_of_seeds (int (default=10)) -- Number of seeds to create in MONET's module initialization algorithm.
num_of_samples_in_seed (int (default=10)) -- Number of samples to put in each seeds to create in MONET's module initialization algorithm.
min_mod_size (int (default=10)) -- Minimal size (number of samples) for a MONET module.
max_samples_per_action (int (default=10)) -- Maximal number of samples in a single MONET action (maximal number of samples added to a module or replaced between modules in a single action).
percentile_remove_edge (int (default=None)) -- Only edges with weight percentile above (for positive weights) or below (for negative weights) this percentile are kept in the graph. For example, percentile_remove_edge=90 keeps only the 10% edges with highest positive weight and lowest negative weight in the graph. one keeps all edges in the graph.
random_state (int (default=None)) -- Determines the randomness. Use an int to make the randomness deterministic.
verbose (bool, default=False) -- Verbosity mode.
n_jobs (int (default=None)) -- The number of jobs to run in parallel. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors.

labels_¶

Labels of each point in training data.

Type:: array-like of shape (n_samples,)

glob_var_¶

Module names to Module objects mapping. Every module instance includes its set of samples (under the "samples" attribute) and its set of views (the "views" attribute).

Type:: dict

total_weight_¶

Sum of the weights (similarity between samples within the module) of all modules.

Type:: float

mod_graphs_¶

Graph of each modality.

Type:: list of dataframes of shape (n_samples, n_samples)

mod_views_¶

Views used for each module.

Type:: list of length n_mods.

n_clusters_¶

Number of clusters.

Type:: int

References

Example

>>> import numpy as np
>>> import pandas as pd
>>> from imml.cluster import MONET
>>> Xs = [pd.DataFrame(np.random.default_rng(42).random((20, 10))) for i in range(3)]
>>> estimator = MONET()
>>> labels = estimator.fit_predict(Xs)

fit(Xs, y=None)[source]¶

Fit the transformer to the input data.

Parameters:

Xs (list of array-likes objects) --
- Xs length: n_mods
- Xs[i] shape: (n_samples_i, n_features_i)
A list of different views.
y (array-like, shape (n_samples,)) -- Labels for each sample. Only used by supervised algorithms.

Returns:

self

Return type:

returns an instance of self.

fit_predict(Xs, y=None)[source]¶

Fit the model and return clustering results. Convenience method; equivalent to calling fit(X) followed by predict(X).

Parameters:

Xs (list of array-likes objects) --

Xs length: n_mods
Xs[i] shape: (n_samples_i, n_features_i)

A list of different views.

Returns:

labels -- The predicted data.

Return type:

ndarray, shape (n_samples,)

set_fit_request(*, Xs: bool | None | str = '$UNCHANGED$') → MONET¶

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:: Xs (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) -- Metadata routing for Xs parameter in fit.
Returns:: self -- The updated object.
Return type:: object

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → MONET¶

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) -- Metadata routing for sample_weight parameter in score.
Returns:: self -- The updated object.
Return type:: object

Multi-Reconstruction Graph Convolutional Network (MRGCN)¶

class imml.cluster.MRGCN(n_clusters: int = 8, Xs=None, k_num: int = 10, learning_rate: float = 0.001, reg2: float = 1.0, reg3: float = 1.0)[source]¶

Bases: LightningModule

Multi-Reconstruction Graph Convolutional Network (MRGCN). [20] [21]

MRGCN encodes and reconstructs data and similarity relationships from multiple sources simultaneously, consolidating them into a shared latent embedding space. Additionally, MRGCN utilizes an indicator matrix to represent the presence of missing modalities, effectively merging the processing of complete and incomplete multi-modal data within a single unified framework.

Incomplete samples should be filled with 0.

This class provides training, validation, testing, and prediction logic compatible with the Lightning Trainer.

Parameters:

n_clusters (int, default=8) -- The number of clusters to generate.
Xs (list of array-likes objects, default=None) -- Multi-modal dataset. It will be used to create the neural network architecture.
k_num (int, default=10) -- Number of neighbors to use.
learning_rate (float, default=1e-3) -- Learning rate.
reg2 (float, default=1.) -- Trade-off parameter to control the graph structure reconstruction.
reg3 (float, default=1.) -- Trade-off parameter to control the self-supervised learning mechanism.

kmeans_¶

Scikit-learn KMeans object.

Type:: KMeans object

References

Example

>>> import numpy as np
>>> import torch
>>> from imml.cluster import MRGCN
>>> from lightning import Trainer
>>> from torch.utils.data import DataLoader
>>> from imml.load import MRGCNDataset
>>> Xs = [torch.from_numpy(np.random.default_rng(42).random((20, 10))) for i in range(3)]
>>> train_data = MRGCNDataset(Xs=Xs)
>>> train_dataloader = DataLoader(dataset=train_data)
>>> trainer = Trainer(max_epochs=2, logger=False, enable_checkpointing=False)
>>> estimator = MRGCN(Xs=Xs, n_clusters=2)
>>> trainer.fit(estimator, train_dataloader)
>>> labels = trainer.predict(estimator, train_dataloader)[0]

configure_optimizers()[source]¶: Method required for training using Lightning Trainer.

training_step(batch, batch_idx=None)[source]¶: Method required for training using Lightning Trainer.

validation_step(batch, batch_idx=None)[source]¶: Method required for validating using Lightning Trainer.

test_step(batch, batch_idx=None)[source]¶: Method required for testing using Lightning Trainer.

predict_step(batch, batch_idx=None)[source]¶: Method required for predicting using Lightning Trainer.

on_fit_end()[source]¶: Method required for training using Lightning Trainer.

NEighborhood based Multi-Omics clustering (NEMO)¶

class imml.cluster.NEMO(n_clusters: int | list = 8, num_neighbors=None, num_neighbors_ratio: int = 6, metric='sqeuclidean', random_state: int = None, engine: str = 'python', verbose=False)[source]¶

Bases: BaseEstimator, ClassifierMixin

NEighborhood based Multi-Omics clustering (NEMO). [22] [23]

NEMO is a method used for clustering data from multiple modalities sources. This algorithm operates through three main stages. Initially, it constructs a similarity matrix for each modality that represents the similarities between different samples. Then, it merges these individual modality matrices into a unified one, combining the information from all modalities. Finally, the algorithm performs the actual clustering process on this integrated network, grouping similar samples together based on their multi-modal data patterns.

Parameters:

n_clusters (int or list-of-int) -- The number of clusters to generate. If it is a list, the number of clusters will be estimated by the algorithm with this range of number of clusters to choose between.
num_neighbors (list or int, default=None) -- The number of neighbors to use for each modality. It can either be a number, a list of numbers or None. If it is a number, this is the number of neighbors used for all modalities. If this is a list, the number of neighbors are taken for each modality from that list. If it is None, each modality chooses the number of neighbors to be the number of samples divided by num_neighbors_ratio.
num_neighbors_ratio (int, default=6) -- The number of clusters to generate. If it is not provided, it will be estimated by the algorithm.
metric (str or list-of-str, default="sqeuclidean") -- Distance metric to compute. Must be one of available metrics in :py:func`scipy.spatial.distance.pdist`. If multiple arrays a provided an equal number of metrics may be supplied.
random_state (int, default=None) -- Determines the randomness. Use an int to make the randomness deterministic.
engine (str, default='python') -- Engine to use for computing the model. Must be one of ["python", "r"].
verbose (bool, default=False) -- Verbosity mode.

labels_¶

Labels of each point in training data.

Type:: array-like of shape (n_samples,)

embedding_¶

The final representation of the data to be used as input for the clustering step.

Type:: array-like of shape (n_samples, n_clusters)

n_clusters_¶

Final number of clusters.

Type:: int

num_neighbors_¶

Final number of neighbors.

Type:: int

affinity_matrix_¶

Affinity matrix.

Type:: np.array (n_samples, n_samples)

References

Example

>>> import numpy as np
>>> import pandas as pd
>>> from imml.cluster import NEMO
>>> Xs = [pd.DataFrame(np.random.default_rng(42).random((20, 10))) for i in range(3)]
>>> estimator = NEMO(n_clusters = 2)
>>> labels = estimator.fit_predict(Xs)

fit(Xs, y=None)[source]¶

Fit the transformer to the input data.

Parameters:

Xs (list of array-likes objects) --
- Xs length: n_mods
- Xs[i] shape: (n_samples, n_features_i)
A list of different modalities.
y (Ignored) -- Not used, present here for API consistency by convention.

Returns:

self

Return type:

Fitted estimator.

fit_predict(Xs, y=None)[source]¶

Fit the model and return clustering results. Convenience method; equivalent to calling fit(X) followed by predict(X).

Parameters:

Xs (list of array-likes objects) --

Xs length: n_mods
Xs[i] shape: (n_samples, n_features_i)

A list of different modalities.

Returns:

labels -- Index of the cluster each sample belongs to.

Return type:

ndarray of shape (n_samples,)

set_fit_request(*, Xs: bool | None | str = '$UNCHANGED$') → NEMO¶

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:: Xs (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) -- Metadata routing for Xs parameter in fit.
Returns:: self -- The updated object.
Return type:: object

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → NEMO¶

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) -- Metadata routing for sample_weight parameter in score.
Returns:: self -- The updated object.
Return type:: object

Online Multi-View Clustering (OMVC)¶

class imml.cluster.OMVC(n_clusters: int = 8, max_iter: int = 200, tol: float = 0.0001, decay: float = 1, block_size: int = 50, n_pass: int = 1, random_state: int = None, engine: str = 'matlab', verbose=False, clean_space: bool = True)[source]¶

Bases: BaseEstimator, ClassifierMixin

Online Multi-View Clustering (OMVC). [24] [25]

OMVC aims to learn latent feature matrices for all views while driving them towards a consensus. To enhance the robustness of these learned matrices, it incorporates lasso regularization. Additionally, to mitigate the impact of incomplete data, it introduces dynamic weight adjustment.

Parameters:

n_clusters (int, default=8) -- The number of clusters to generate.
max_iter (int, default=3) -- Maximum number of iterations.
tol (float, default=1e-4) -- Tolerance of the stopping condition.
block_size (int, default=50) -- Size of the chunk.
n_pass (int, default=1) -- Number of passes.
random_state (int, default=None) -- Determines the randomness. Use an int to make the randomness deterministic.
engine (str, default=matlab) -- Engine to use for computing the model. Current options are 'matlab'.
verbose (bool, default=False) -- Verbosity mode.
clean_space (bool, default=True) -- If engine is 'matlab' and clean_space is True, the session will be closed after fitting the model.

labels_¶

Labels of each point in training data.

Type:: array-like of shape (n_samples,)

embedding_¶

Common consensus, latent feature matrix across all the views to be used as input for the KMeans clustering step.

Type:: array-like of shape (n_samples, n_clusters)

U_¶

Basis matrix.

Type:: list of n_mods array-like of shape (n_samples, n_clusters)

V_¶

Latent feature matrix.

Type:: list of n_mods array-like of shape (n_features_i, n_clusters)

loss_¶

Values of the loss function.

Type:: array-like of shape (n_iter_,)

n_iter_¶

Number of iterations.

Type:: int

References

Example

>>> import numpy as np
>>> import pandas as pd
>>> from imml.cluster import OMVC
>>> Xs = [pd.DataFrame(np.random.default_rng(42).random((20, 10))) for i in range(3)]
>>> estimator = OMVC(n_clusters = 2)
>>> labels = estimator.fit_predict(Xs)

fit(Xs, y=None)[source]¶

Fit the transformer to the input data.

Parameters:

Xs (list of array-likes objects) --
- Xs length: n_mods
- Xs[i] shape: (n_samples, n_features_i)
A list of different modalities.
y (Ignored) -- Not used, present here for API consistency by convention.

Returns:

self

Return type:

Fitted estimator.

fit_predict(Xs, y=None)[source]¶

Fit the model and return clustering results. Convenience method; equivalent to calling fit(X) followed by predict(X).

Parameters:

Xs (list of array-likes objects) --

Xs length: n_mods
Xs[i] shape: (n_samples, n_features_i)

A list of different modalities.

Returns:

labels -- Index of the cluster each sample belongs to.

Return type:

ndarray of shape (n_samples,)

set_fit_request(*, Xs: bool | None | str = '$UNCHANGED$') → OMVC¶

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:: Xs (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) -- Metadata routing for Xs parameter in fit.
Returns:: self -- The updated object.
Return type:: object

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → OMVC¶

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) -- Metadata routing for sample_weight parameter in score.
Returns:: self -- The updated object.
Return type:: object

One-Pass Incomplete Multi-View Clustering (OPIMC)¶

class imml.cluster.OPIMC(n_clusters: int = 8, alpha: float = 10, num_passes: int = 1, max_iter: int = 30, tol: float = 1e-06, block_size: int = 250, random_state: int = None, engine: str = 'matlab', verbose=False, clean_space: bool = True)[source]¶

Bases: BaseEstimator, ClassifierMixin

One-Pass Incomplete Multi-View Clustering (OPIMC). [26] [27] [28]

OPIMC deals with large scale incomplete multi-view clustering problem by considering the instance missing information with the help of regularized matrix factorization and weighted matrix factorization.

Parameters:

n_clusters (int, default=8) -- The number of clusters to generate.
alpha (float, default=10) -- Nonnegative parameter.
max_iter (int, default=30) -- Maximum number of iterations.
tol (float, default=1e-6) -- Tolerance of the stopping condition.
block_size (int, default=50) -- Size of the chunk.
random_state (int, default=None) -- Determines the randomness. Use an int to make the randomness deterministic.
engine (str, default=matlab) -- Engine to use for computing the model. Current options are 'matlab'.
verbose (bool, default=False) -- Verbosity mode.
clean_space (bool, default=True) -- If engine is 'matlab' and clean_space is True, the session will be closed after fitting the model.

labels_¶

Labels of each point in training data.

Type:: array-like of shape (n_samples,)

embedding_¶

Consensus clustering matrix to be used as input for the KMeans clustering step.

Type:: array-like of shape (n_samples, n_clusters)

References

Example

>>> import numpy as np
>>> import pandas as pd
>>> from imml.cluster import OPIMC
>>> Xs = [pd.DataFrame(np.random.default_rng(42).random((20, 10))) for i in range(3)]
>>> estimator = OPIMC(n_clusters = 2)
>>> labels = estimator.fit_predict(Xs)

fit(Xs, y=None)[source]¶

Fit the transformer to the input data.

Parameters:

Xs (list of array-likes objects) --
- Xs length: n_mods
- Xs[i] shape: (n_samples, n_features_i)
A list of different modalities.
y (Ignored) -- Not used, present here for API consistency by convention.

Returns:

self

Return type:

Fitted estimator.

fit_predict(Xs, y=None)[source]¶

Fit the model and return clustering results. Convenience method; equivalent to calling fit(X) followed by predict(X).

Parameters:

Xs (list of array-likes objects) --

Xs length: n_mods
Xs[i] shape: (n_samples, n_features_i)

A list of different modalities.

Returns:

labels -- Index of the cluster each sample belongs to.

Return type:

ndarray of shape (n_samples,)

set_fit_request(*, Xs: bool | None | str = '$UNCHANGED$') → OPIMC¶

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:: Xs (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) -- Metadata routing for Xs parameter in fit.
Returns:: self -- The updated object.
Return type:: object

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → OPIMC¶

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) -- Metadata routing for sample_weight parameter in score.
Returns:: self -- The updated object.
Return type:: object

One-Stage Incomplete Multi-View Clustering via Late Fusion (OS-LF-IMVC)¶

class imml.cluster.OSLFIMVC(n_clusters: int = 8, kernel: callable = DotProduct(sigma_0=1) + WhiteKernel(noise_level=1), lambda_reg: float = 1.0, random_state: int = None, engine: str = 'matlab', verbose=False, clean_space: bool = True)[source]¶

Bases: BaseEstimator, ClassifierMixin

One-Stage Incomplete Multi-View Clustering via Late Fusion (OS-LF-IMVC). [29] [30]

OS-LF-IMVC integrates the processes of imputing incomplete views and clustering into a cohesive optimization procedure. This approach enables the direct utilization of the learned consensus partition matrix to enhance the final clustering task.

Parameters:

n_clusters (int, default=8) -- The number of clusters to generate.
kernel (callable, default=kernels.Sum(kernels.DotProduct(), kernels.WhiteKernel())) -- Specifies the kernel type to be used in the algorithm.
lambda_reg (float, default=1.) -- Regularization parameter. The algorithm demonstrated stable performance across a wide range of this hyperparameter.
random_state (int, default=None) -- Determines the randomness. Use an int to make the randomness deterministic.
engine (str, default=matlab) -- Engine to use for computing the model. Current options are 'matlab'.
verbose (bool, default=False) -- Verbosity mode.
clean_space (bool, default=True) -- If engine is 'matlab' and clean_space is True, the session will be closed after fitting the model.

labels_¶

Labels of each point in training data.

Type:: array-like of shape (n_samples,)

embedding_¶

Consensus clustering matrix to be used as input for the KMeans clustering step.

Type:: np.array

WP_¶

p-th permutation matrix.

Type:: array-like of shape (n_clusters, n_clusters, n_mods)

C_¶

Centroids.

Type:: array-like of shape (n_clusters, n_clusters)

beta_¶

Adaptive weights of clustering matrices.

Type:: array-like of shape (n_mods,)

loss_¶

Values of the loss function.

Type:: array-like of shape (n_iter_,)

n_iter_¶

Number of iterations.

Type:: int

References

Example

>>> import numpy as np
>>> import pandas as pd
>>> from imml.cluster import OSLFIMVC
>>> Xs = [pd.DataFrame(np.random.default_rng(42).random((20, 10))) for i in range(3)]
>>> estimator = OSLFIMVC(n_clusters = 2)
>>> labels = estimator.fit_predict(Xs)

fit(Xs, y=None)[source]¶

Fit the transformer to the input data.

Parameters:

Xs (list of array-likes objects) --
- Xs length: n_mods
- Xs[i] shape: (n_samples, n_features_i)
A list of different modalities.
y (Ignored) -- Not used, present here for API consistency by convention.

Returns:

self

Return type:

Fitted estimator.

fit_predict(Xs, y=None)[source]¶

Fit the model and return clustering results. Convenience method; equivalent to calling fit(X) followed by predict(X).

Parameters:

Xs (list of array-likes objects) --

Xs length: n_mods
Xs[i] shape: (n_samples, n_features_i)

A list of different modalities.

Returns:

labels -- Index of the cluster each sample belongs to.

Return type:

ndarray of shape (n_samples,)

set_fit_request(*, Xs: bool | None | str = '$UNCHANGED$') → OSLFIMVC¶

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:: Xs (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) -- Metadata routing for Xs parameter in fit.
Returns:: self -- The updated object.
Return type:: object

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → OSLFIMVC¶

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) -- Metadata routing for sample_weight parameter in score.
Returns:: self -- The updated object.
Return type:: object

Projective Incomplete Multi-View Clustering (PIMVC)¶

class imml.cluster.PIMVC(n_clusters: int = 8, dele: float = 0.1, lamb: int = 100000, beta: int = 1, k: int = 3, neighbor_mode: str = 'KNN', weight_mode: str = 'Binary', max_iter: int = 100, random_state: int = None, engine: str = 'matlab', verbose=False, clean_space: bool = True)[source]¶

Bases: BaseEstimator, ClassifierMixin

Projective Incomplete Multi-View Clustering (PIMVC). [31] [32]

The objective of PIMVC is to simultaneously discover the projection matrix for each modality and establish a unified feature representation shared across incomplete multiple views, facilitating clustering. Essentially, PIMVC transforms the traditional multi-modality matrix factorization model into a multi-modality projection learning model. By consolidating various modality-specific objective losses into a cohesive subspace of equal dimensions, it adeptly handles the challenge where a single modality might overly influence consensus representation learning due to imbalanced information across views stemming from diverse dimensions. Furthermore, to capture the data geometric structure, PIMVC incorporates a penalty term for graph regularization.

Parameters:

n_clusters (int, default=8) -- The number of clusters to generate.
dele (float, default=0.1) -- nonnegative.
lamb (float, default=100000) -- Penalty parameters. Should be greather than 0.
beta (float, default=1) -- Trade-off parameter.
k (int, default=3) -- Parameter k of KNN graph.
neighbor_mode (str, default='KNN') -- Indicates how to construct the graph. Options are 'KNN' (default), and 'Supervised'.
weight_mode (str, default='Binary') -- Indicates how to assign weights for each edge in the graph. Options are 'Binary' (default), 'Cosine' and 'HeatKernel'.
max_iter (int, default=100) -- Maximum number of iterations.
random_state (int, default=None) -- Determines the randomness. Use an int to make the randomness deterministic.
engine (str, default=matlab) -- Engine to use for computing the model. Currently only 'matlab' is supported.
verbose (bool, default=False) -- Verbosity mode.
clean_space (bool, default=True) -- If engine is 'matlab' and clean_space is True, the session will be closed after fitting the model.

labels_¶

Labels of each point in training data.

Type:: array-like of shape (n_samples,)

embedding_¶

Consensus clustering matrix to be used as input for the KMeans clustering step.

Type:: array-like of shape (n_samples, n_clusters)

loss_¶

Values of the loss function.

Type:: array-like of shape (n_iter_,)

n_iter_¶

Number of iterations.

Type:: int

References

Example

>>> import numpy as np
>>> import pandas as pd
>>> from imml.cluster import PIMVC
>>> Xs = [pd.DataFrame(np.random.default_rng(42).random((20, 10))) for i in range(3)]
>>> estimator = PIMVC(n_clusters = 2)
>>> labels = estimator.fit_predict(Xs)

fit(Xs, y=None)[source]¶

Fit the transformer to the input data.

Parameters:

Xs (list of array-likes objects) --
- Xs length: n_mods
- Xs[i] shape: (n_samples, n_features_i)
A list of different modalities.
y (Ignored) -- Not used, present here for API consistency by convention.

Returns:

self

Return type:

Fitted estimator.

fit_predict(Xs, y=None)[source]¶

Fit the model and return clustering results. Convenience method; equivalent to calling fit(X) followed by predict(X).

Parameters:

Xs (list of array-likes objects) --

Xs length: n_mods
Xs[i] shape: (n_samples, n_features_i)

A list of different modalities.

Returns:

labels -- Index of the cluster each sample belongs to.

Return type:

ndarray of shape (n_samples,)

set_fit_request(*, Xs: bool | None | str = '$UNCHANGED$') → PIMVC¶

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:: Xs (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) -- Metadata routing for Xs parameter in fit.
Returns:: self -- The updated object.
Return type:: object

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → PIMVC¶

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) -- Metadata routing for sample_weight parameter in score.
Returns:: self -- The updated object.
Return type:: object

Scalable Incomplete Multiview Clustering with Adaptive Data Completion (SIMC-ADC)¶

class imml.cluster.SIMCADC(n_clusters: int = 8, lambda_parameter: float = 1, n_anchors: int = None, beta: float = 1, gamma: float = 1, eps: float = 1e-25, random_state: int = None, engine: str = 'python', verbose=False, clean_space: bool = True)[source]¶

Bases: BaseEstimator, ClassifierMixin

Scalable Incomplete Multiview Clustering with Adaptive Data Completion (SIMC-ADC). [33] [34]

The SIMC-ADC algorithm captures the complementary information from different views by building a view-specific anchor graph. The anchor graph construction and a structure alignment are jointly optimized to enhance clustering quality.

Parameters:

n_clusters (int, default=8) -- The number of clusters to generate.
lambda_parameter (float, default=1) -- Balance the influence between anchor graph generation and alignment term.
n_anchors (int, default=None) -- Number of anchors. If None, use n_clusters.
beta (float, default=1) -- Balance the influence between anchor graph generation and alignment term.
gamma (float, default=1) -- Balance the influence between anchor graph generation and alignment term.
random_state (int, default=None) -- Determines the randomness. Use an int to make the randomness deterministic.
engine (str, default=python) -- Engine to use for computing the model. Current options are 'matlab' or 'python'.
verbose (bool, default=False) -- Verbosity mode.
clean_space (bool, default=True) -- If engine is 'matlab' and clean_space is True, the session will be closed after fitting the model.

labels_¶

Labels of each point in training data.

Type:: array-like of shape (n_samples,)

embedding_¶

Consensus clustering matrix to be used as input for the KMeans clustering step.

Type:: array-like of shape (n_samples, n_clusters)

V_¶

Commont latent feature matrix.

Type:: array-like of shape (n_clusters, n_clusters)

A_¶

Learned anchors.

Type:: array-like of shape (n_clusters, n_clusters)

Z_¶

modality-specific anchor graph.

Type:: array-like of shape (n_clusters, n_samples)

loss_¶

Values of the loss function.

Type:: array-like of shape (n_iter_,)

n_iter_¶

Number of iterations.

Type:: int

References

Example

>>> import numpy as np
>>> import pandas as pd
>>> from imml.cluster import SIMCADC
>>> Xs = [pd.DataFrame(np.random.default_rng(42).random((20, 10))) for i in range(3)]
>>> estimator = SIMCADC(n_clusters = 2)
>>> labels = estimator.fit_predict(Xs)

fit(Xs, y=None)[source]¶

Fit the transformer to the input data.

Parameters:

Xs (list of array-likes objects) --
- Xs length: n_mods
- Xs[i] shape: (n_samples, n_features_i)
A list of different modalities.
y (Ignored) -- Not used, present here for API consistency by convention.

Returns:

self

Return type:

Fitted estimator.

fit_predict(Xs, y=None)[source]¶

Fit the model and return clustering results. Convenience method; equivalent to calling fit(X) followed by predict(X).

Parameters:

Xs (list of array-likes objects) --

Xs length: n_mods
Xs[i] shape: (n_samples, n_features_i)

A list of different modalities.

Returns:

labels -- Index of the cluster each sample belongs to.

Return type:

ndarray of shape (n_samples,)

set_fit_request(*, Xs: bool | None | str = '$UNCHANGED$') → SIMCADC¶

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:: Xs (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) -- Metadata routing for Xs parameter in fit.
Returns:: self -- The updated object.
Return type:: object

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → SIMCADC¶

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) -- Metadata routing for sample_weight parameter in score.
Returns:: self -- The updated object.
Return type:: object

Subtyping Tool for Multi-Omic Data (SUMO)¶

class imml.cluster.SUMO(n_clusters: int = 8, method: str | list = None, missing: list = None, neighbours: float = 0.1, alpha: float = 0.5, sparsity: list = None, repetitions: int = 60, cluster_method: str = 'max_value', max_iter: int = 500, tol: float = 1e-05, subsample: float = 0.05, calc_cost: int = 20, h_init: int = None, rep: int = 5, random_state: int = None, verbose: bool = False, n_jobs: int = 1)[source]¶

Bases: BaseEstimator, ClassifierMixin

Subtyping Tool for Multi-Omic Data (SUMO). [35] [36]

SUMO, originally designed for molecular subtyping in multi-omics datasets, utilizes a state-of-the-art nonnegative matrix factorization (NMF) algorithm to identify clusters of samples with similar characteristics.

The authors strongly suggest removing features and samples with a large fraction of missing values (>10%); log transform or a variant stabilizing transform when using count data as input; and standardize each input feature. For more information, read the sumo's documentation: https://python-sumo.readthedocs.io/en/latest/index.html.

Parameters:

n_clusters (int, default=8) -- The number of clusters to generate. If it is not provided, it will use the default one from the algorithm.
method (str or list of str, default='euclidean') -- either one method of sample-sample similarity calculation, or list of methods for every modality (available methods: ['euclidean', 'cosine', 'pearson', 'spearman']).
missing (float or list of float, default=[0.1]) -- acceptable fraction of available values for assessment of distance/similarity between pairs of samples - either one value or list for every modality.
neighbours (float, default=0.1) -- fraction of nearest neighbours to use for sample similarity calculation using Euclidean distance similarity.
alpha (float, default=0.5) -- hypherparameter of RBF similarity kernel, for Euclidean distance similarity.
sparsity (float or list of float, default=[0.1]) -- either one value or list of sparsity penalty values for H matrix (sumo will try different values and select the best results).
repetitions (int, default=60) -- Number of repetitions.
cluster_method (str, default="max_value") -- Method of cluster extraction. Options are 'max_value' or'spectral'.
max_iter (int, default=500) -- Maximum number of iterations for factorization.
tol (float, default=1e-5) -- If objective cost function value fluctuation is smaller than this value, stop iterations before reaching max_iter.
subsample (float, default=0.05) -- Fraction of samples randomly removed from each run, cannot be greater than 0.5.
calc_cost (int, default=20) -- Number of steps between every calculation of objective cost function.
h_init (int, default=None) -- index of adjacency matrix to use for H matrix initialization (by default using average adjacency).
rep (int, default=5) -- number of times consensus matrix is created for the purpose of assessing clustering quality.
random_state (int, default=None) -- Determines the randomness. Use an int to make the randomness deterministic.
verbose (bool, default=False) -- Verbosity mode.
n_jobs (int, default=1) -- Number of threads to run in parallel.

labels_¶

Labels of each point in training data.

Type:: array-like of shape (n_samples,)

embedding_¶

The final spectral representation of the data to be used as input for the KMeans clustering step.

Type:: array-like of shape (n_samples, n_clusters)

graph_¶

Multi-modal graph.

Type:: MultiplexNet

nmf_¶

The nonnegative matrix factorization (NMF) object.

Type:: UnsupervisedSumoNMF

similarity_¶

List of adjacency matrix.

Type:: dict of length n_mods, with mods as keys and an array-like of shape (n_samples,n_samples) as values.

cophenet_list_¶

Object created by SUMO

Type:: ndarray of shape (rep,).

pac_list_¶

Object created by SUMO

Type:: ndarray of shape (rep,).

References

Example

>>> import numpy as np
>>> import pandas as pd
>>> from imml.cluster import SUMO
>>> Xs = [pd.DataFrame(np.random.default_rng(42).random((20, 10))) for i in range(3)]
>>> estimator = SUMO(n_clusters = 2)
>>> labels = estimator.fit_predict(Xs)

fit(Xs, y=None)[source]¶

Fit the transformer to the input data.

Parameters:

Xs (list of array-likes objects) --
- Xs length: n_mods
- Xs[i] shape: (n_samples, n_features_i)
A list of different modalities.
y (Ignored) -- Not used, present here for API consistency by convention.

Returns:

self

Return type:

returns an instance of self.

fit_predict(Xs, y=None)[source]¶

Fit the model and return clustering results. Convenience method; equivalent to calling fit(X) followed by predict(X).

Parameters:

Xs (list of array-likes objects) --
- Xs length: n_mods
- Xs[i] shape: (n_samples, n_features_i)
A list of different modalities.
y (Ignored) -- Not used, present here for API consistency by convention.

Returns:

labels -- The predicted data.

Return type:

ndarray, shape (n_samples,)

set_fit_request(*, Xs: bool | None | str = '$UNCHANGED$') → SUMO¶

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:: Xs (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) -- Metadata routing for Xs parameter in fit.
Returns:: self -- The updated object.
Return type:: object

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → SUMO¶

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) -- Metadata routing for sample_weight parameter in score.
Returns:: self -- The updated object.
Return type:: object