Skip to contents

Estimate the number of clusters using the K-Means clustering selecting the best solutions using the silhouette value. Before finding the best solutions, the Duda-Hart test is used to check if there are more at least two clusters.

Usage

kmeans_clust(data, cmin = 2, cmax = 5, alpha = 0.05, debug = FALSE)

Arguments

data

a dataframe with only the indicators to be used in the clustering as columns.

cmin

the minimum number of cluster to test. Default to 2

cmax

the maximum number of clusters to test. Default to 5. Be careful that increasing cmax will greatly increase the computation time.

alpha

the alpha value for the Duda-Hart test. Default to 0.05

debug

logical indicating if silhouette values for each model should be returned

Value

the number of estimated clusters

Examples

X <- sim_clust(2, 100, dmin = 0.5, rmin = 0.3, nind = 10)
X <- X[, 1:(ncol(X) - 1)] # excluding the last column
kmeans_clust(X)
#> $nclust
#> [1] 2
#> 
#> $preds
#>   [1] 2 1 2 1 1 1 1 2 2 1 1 1 1 1 1 2 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1
#>  [38] 1 1 1 1 1 1 1 2 2 2 1 1 1 1 2 1 1 2 1 1 1 1 1 1 2 2 1 1 1 1 1 1 1 1 1 1 2
#>  [75] 2 1 2 1 2 1 1 2 1 2 2 1 1 2 2 1 1 1 2 1 1 1 1 1 1 2
#>