Estimate the number of clusters using the K-Means clustering
selecting the
best solutions using the silhouette value. Before finding the best solutions, the
Duda-Hart test is used to check if there are more at least two clusters.
Arguments
- data
a dataframe with only the indicators to be used in the clustering as columns.
- cmin
the minimum number of cluster to test. Default to 2
- cmax
the maximum number of clusters to test. Default to 5. Be careful that increasing
cmax
will greatly increase the computation time.- alpha
the alpha value for the Duda-Hart test. Default to
0.05
- debug
logical indicating if silhouette values for each model should be returned
Examples
X <- sim_clust(2, 100, dmin = 0.5, rmin = 0.3, nind = 10)
X <- X[, 1:(ncol(X) - 1)] # excluding the last column
kmeans_clust(X)
#> $nclust
#> [1] 2
#>
#> $preds
#> [1] 2 1 2 1 1 1 1 2 2 1 1 1 1 1 1 2 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1
#> [38] 1 1 1 1 1 1 1 2 2 2 1 1 1 1 2 1 1 2 1 1 1 1 1 1 2 2 1 1 1 1 1 1 1 1 1 1 2
#> [75] 2 1 2 1 2 1 1 2 1 2 2 1 1 2 2 1 1 1 2 1 1 1 1 1 1 2
#>