The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

Note: Weight scaling in cluster analysis

Joe Song

2023-08-19

The function Ckmeans.1d.dp() can perform optimal weighted univariate clustering. Depending on the application, weights can indicate sample size, certainty, or signal intensity. Relative values of weights are always consequential on cluster results. Absolute values of weights have an impact on the number of clusters when it must be estimated.

The linear scale of weights is consequential when estimating the number of clusters

When the number of clusters must be estimated, the linear scale of weights heavily influences the estimated number of clusters \(k\). The reason is that linear scaling has a nonlinear effect when calculating the Bayesian information criterion. A large scale will promote more clusters to be used.

Here is a guideline on how to scale the weights:

Linear weight scaling is uninfluential when the number of clusters is given

When an exact number of clusters \(k\) is given by the user, linear weight scaling does not influence cluster analysis in theory. The clustering results are expected to be identical for any linear scaling of weights. However, a large numerical weight can cause overflow and thus should be linearly scaled down to a more tractable range.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.