genericROM.BasicAlgorithms.Clustering
- class ClusteringToolbox(clusteringAlgo=None)[source]
Bases:
objectClass for clustering problems.
- clusteringAlgo
Object containing a clustering algorithm. All clustering algorithms available in Scikit-Learn can be used. If defined by the user, the clustering algorithm must follow Scikit-Learn’s API for clustering.
- clusters
clusters[k] is an array containing the indices of points belonging to cluster k.
- Type
dict
- ClusterRenumbering(clusterIdPermutation)[source]
Changes the numerotation of clusters.
- Parameters
clusterIdPermutation (list) – List of integers such that clusterIdPermutation[k] is the new index for cluster k.
- ReadClusteringResults(resultsFile)[source]
Reads clustering results from a text file.
- Parameters
resultsFile (str) – Name of the txt file containing the clustering results.
- WriteClusteringResults(outputFileName)[source]
Writes clustering results in a text file.
- Parameters
outputFileName (str) – Name of the txt file in which clustering results are written.
- fit(X, **kwargs)[source]
Computes clusters.
- Parameters
X (array of shape (n_samples, n_features)) – Training instances to cluster. Note: if your clustering algorithm works on a distance matrix, then X is the distance matrix of shape (n_samples, n_samples).
- fit_predict(X, returnLabels=False, **kwargs)[source]
Computes clusters and predicts cluster index for each sample.
- Parameters
X (array of shape (n_samples, n_features)) – Training instances to cluster. Note: if your clustering algorithm works on a distance matrix, then X is the distance matrix of shape (n_samples, n_samples).
returnLabels (boolean) – If True, returns labels. If false, it only updates the object’s attributes (self.clusters).
- Returns
labels – Array of integers containing the index of the cluster each sample belongs to. Returned only if returnLabels is True.
- Return type
1D array of length n_samples
- predict(X, returnLabels=False)[source]
Predicts the cluster index for each sample in X.
- Parameters
X (array of shape (n_samples, n_features)) – Training instances to cluster. Note: if your clustering algorithm works on a distance matrix, then X is the distance matrix of shape (n_samples, n_samples).
returnLabels (boolean) – If True, returns labels. If false, it only updates the object’s attributes (self.clusters).
- Returns
labels – Array of integers containing the index of the cluster each sample belongs to. Returned only if returnLabels is True.
- Return type
1D array of length n_samples
- predictTest(X, returnLabels=False)[source]
Predicts the cluster index for each sample in X, where X contains new unseen data.
- Parameters
X (array of shape (n_samples, n_features)) – Training instances to cluster. Note: if your clustering algorithm works on a distance matrix, then X is the distance matrix of shape (n_samples, n_clusters).
returnLabels (boolean) – If True, returns labels. If false, it only updates the object’s attributes (self.clusters).
- Returns
labels – Array of integers containing the index of the cluster each sample belongs to. Returned only if returnLabels is True.
- Return type
1D array of length n_samples
- GetAdjacentClustersFromLabelsVector(labels, localNbSnapshots=None)[source]
Returns a dictionary with keys the cluster number and values the numbers of cluster adjacent from the data used in the clustering (through the labels).
- Parameters
labels (1D array of integers) – labels[j] = k if example j belongs to cluster k.
localNbSnapshots (1D array or list of integers) – localNbSnapshots[j] = is the size of j-th group of values for which adjence is well-defined.
- Returns
adjacentClusters (dict) – adjacentClusters[k] is an array containing the indices of the clusters adjacent to cluster k.
snapshotsOfAdjacentClusters (dict) – snapshotsOfAdjacentClusters[k] is an array containing the indices of points belonging to cluster k and its adjacent clusters.
- GetClustersFromLabelsVector(labels)[source]
Returns a dictionary containing clustering results.
- Parameters
labels (1D array of integers) – labels[j] = k if example j belongs to cluster k.
- Returns
clusters – clusters[k] is an array containing the indices of points belonging to cluster k.
- Return type
dict
- GetLabelsVectorFromClusters(clusters)[source]
Returns a labels vector “labels”.
- Parameters
clusters (dict) – clusters[k] is an array containing the indices of points belonging to cluster k.
- Returns
labels – labels[j] = k if example j belongs to cluster k.
- Return type
1D array of integers
- class KMedoids(nClusters, nIter=100, init='k-meds++', algo='PAM', squaredDist=False, runs=10)[source]
Bases:
objectClass for k-medoids clustering.
- nClusters
Number of clusters.
- Type
int
- nIter
Maximum number of iterations.
- Type
int, default 100
- init
Medoids initialization method. Random selection if ‘random’. If ‘k-meds++’, we use the method described in the following article: Hae-Sang Park, Chi-Hyuck Jun, “A simple and fast algorithm for K-medoids clustering”, 2009. If ‘multipleRuns’, the clustering algorithm is run self.runs times with random initial medoids. The best solution in terms of the cost function is returned.
- Type
str, ‘k-meds++’ or ‘random’, default ‘k-meds++’
- medoids
Array of integers containings the ids of the medoids.
- Type
1D array of length nClusters
- algo
Algorithm for k-medoids. Park & Jun’s algorithm is simpler and faster but explores a smaller search space than PAM (Partitioning around medoids).
- Type
‘ParkJun’ or ‘PAM’, default ‘PAM’
- squaredDist
Says whether the cost function and the medoid update rule use squared dissimilarities.
- Type
boolean, default True.
- runs
Number of times the clustering algorithm is run when using init=’multipleRuns’.
- Type
integer, default 10.
- InitializeMedoids(distanceMatrix)[source]
Initial medoids selection. Method described in Hae-Sang Park, Chi-Hyuck Jun, “A simple and fast algorithm for K-medoids clustering”, 2009.
- Parameters
distanceMatrix (2D array of shape (n_samples,n_samples)) –
- fit_PAM(distMatrix, printCostFct=False, verbose=False)[source]
Implementation of Partitioning Around Medoids (PAM) algorithm for k-medoids.
- Parameters
distMatrix (2D array of shape (n_samples,n_samples)) –
printCostFct (boolean) –
- fit_ParkJun(distMatrix, printCostFct=False, verbose=False)[source]
Implementation of k-medoids clustering based on the Voronoi iteration approach (Park and Jun 2009). This code is a slightly modified version of the code presented in: “NumPy/SciPy recipes for data science: k-Medoids clustering”, C. Bauckhage.
- Parameters
distMatrix (2D array of shape (n_samples,n_samples)) –
printCostFct (boolean) –
- fit_predict(distMatrix, printCostFct=False, verbose=False)[source]
Computes clusters and gives cluster assignments.
- Parameters
distMatrix (2D array of shape (n_samples,n_samples)) –
- Returns
labels – Array of integers containing the index of the cluster each sample belongs to.
- Return type
1D array of length n_samples
- predict(distMatrix)[source]
Gives cluster assignments for training data.
- Parameters
distMatrix (2D array of shape (n_samples,n_samples)) –
- Returns
labels – Array of integers containing the index of the cluster each sample belongs to.
- Return type
1D array of length n_samples
- predictTest(distMatrix)[source]
Gives cluster assignments for test data.
- Parameters
distMatrix (2D array of shape (n_samples,n_clusters)) – such that distMatrix[i,k] is the distance between the i-th test example with the k-th medoid.
- Returns
labels – Array of integers containing the index of the cluster each sample belongs to.
- Return type
1D array of length n_samples