kmeans_clust — kmeans_clust • clustersimulation

Estimate the number of clusters using the K-Means clustering selecting the best solutions using the silhouette value. Before finding the best solutions, the Duda-Hart test is used to check if there are more at least two clusters.

Usage

kmeans_clust(data, cmin = 2, cmax = 5, alpha = 0.05, debug = FALSE)

Arguments

data: a dataframe with only the indicators to be used in the clustering as columns.
cmin: the minimum number of cluster to test. Default to 2
cmax: the maximum number of clusters to test. Default to 5. Be careful that increasing cmax will greatly increase the computation time.
alpha: the alpha value for the Duda-Hart test. Default to 0.05
debug: logical indicating if silhouette values for each model should be returned

Value

the number of estimated clusters

Examples

X <- sim_clust(2, 100, dmin = 0.5, rmin = 0.3, nind = 10)
X <- X[, 1:(ncol(X) - 1)] # excluding the last column
kmeans_clust(X)
#> $nclust
#> [1] 2
#> 
#> $preds
#>   [1] 2 1 2 1 1 1 1 2 2 1 1 1 1 1 1 2 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1
#>  [38] 1 1 1 1 1 1 1 2 2 2 1 1 1 1 2 1 1 2 1 1 1 1 1 1 2 2 1 1 1 1 1 1 1 1 1 1 2
#>  [75] 2 1 2 1 2 1 1 2 1 2 2 1 1 2 2 1 1 1 2 1 1 1 1 1 1 2
#>