The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.
Version 4.3.5
2023-08-19
- Created version 4.3.5 from 4.3.4.2
Version 4.3.4.2
2023-08-19
- Updated Ckmeans.1d.dp-package.Rd to remove the table that listed the
package current version and initial version, which is redundant.
- Dropped C++11 specification.
- Added documentation for plot.MuliChannelClusters() function.
2022-04-09
- Fixed MultiChannel.WUC() bug 1: input x is now checked for if being
sorted. If not, sort x first and rearrange the weight matrix Y
accordingly.
- Add S3 plot.MultiChannelClusters() to visualize results from
MultiChannel.WUC().
- Updated MultiChannel.WUC() with more examples and added
visualization of results.
2022-04-08
- MultiChannel.WUC() bug 3: Three well-separated clusters in three
channels are not put into three clusters.
- Added test_that(“Test number-of-clusters selection”, {…}) to
illustrate bug 3.
2022-04-07
- Created version 4.3.4.2 from 4.3.4.1
- MultiChannel.WUC() bug 1: input x should be checked for if being
sorted. If not, sort x first and rearrange the weight matrix Y
accordingly.
- MultiChannel.WUC() bug 2: A normal cluster not split into two parts
correctly. See test case Example 2.
Version 4.3.4.1
2022-04-01
- Created version 4.3.4.1 from 4.3.4
- Fixed a bug in weighted variance calculation in
weighted_select_levels.cpp and MCW_functions.cpp. The bug led to
negative variance for data with a small total weight less than 1. It
affects on cluster number determination in weighted data for both the
single and multiple channel weighted clustering. However, the change
will cause a substantial change in number of clusters for data with very
low total weight, which we understand rare in applications. Change in
number of clusters when total weight is large is unlikely.
Version 4.3.4
2022-01-30
- Created version 4.3.4 from 4.3.3.1
Version 4.3.3.1
2022-01-30
- Removed “LazyData: true” from DESCRIPTION.
2022-01-18
- Updated REFERENCES.bib, CITATION, and README.md.
2020-11-07
- Added table of contents and changed plot colors in vignette
“Tutorial: Optimal univariate clustering.”
- Added table of contents in vignette “Tutorial: Adaptive versus
regular histograms.”
2020-11-04
- Sped up the plot.Ckmeans.1d.dp() function to use histogram-like
vertical lines. Previous versions drew a circle for each point, taking
excessive time for input with a large number of points.
- Updated vignette “Tutorial: Optimal univariate clustering.”
2020-08-30
- Edited DESCRIPTION, CITATION.
2020-07-25
- Created version 4.3.3.1 from 4.3.3
Version 4.3.3
2020-07-21
- Created version 4.3.3 from 4.3.2.1
Version 4.3.2.1 (not
publicly released)
2020-07-18
- Updated REFERENCES.bib, CITATION.
- Updated README.md text and badges.
- Updated user manuals.
2020-03-15
- Created version 4.3.2.1 from 4.3.2
- Updated REFERENCES.bib, CITATION, and manual Rd files.
Version 4.3.2
2020-03-13
- Changed the random data from uniform integer to standard normal to
avoid examples with multile optimal solutions occuring to integers in
test_MC_WUC.R.
2020-01-20
- Updated a vignette to illustrate how to find boundaires between
consecutive clusters.
2020-01-03
- Internally, a specialized version for unweighted Euclidean (L2)
distance based univariate clustering is added. In previous versions,
unweighted and weighted multiple metric clustering algorithms were
implemented in a unified code framework, which is good for software
engineering but carries unnecessary overhead. The new specialized
version can speed up the unweighted L2 algorithm by about 20%. This is
perhaps the most popular task, thus benefiting most users. There is no
change in the user interface.
2020-01-02
- Created version 4.3.2 from 4.3.0
- Revised CITATION file
- Revised Readme.md file
Version 4.3.1 (not publicly
released)
2019-12-08
- Version 4.3.1 was created.
Version 4.3.0
2019-09-06
- Updated package documentation.
- Introduced NEWS.md instead the plain text NEWS.
- Changed the package title from Optimal and Fast Univariate
Clustering to Optimal, Fast, and Reproducible
Univariate Clustering
- Created README.md to introduce the package.
2019-09-02
- Added the optimal multi-channel weighted univariate clustering
function, called “MultiChannel.WUC” in short for now. Added related R
document and testthat cases. The example in the “MultiChannel.WUC” R
document illustrates how to run the function.
- Added source files: MCW_main.cpp, MCW_functions.cpp,
MCW_functions.h, MCW_fill_SMAWK.cpp
- Added R file: MultiChannel.WUC.R
- Added Rd file: MultiChannel.WUC.Rd
- Added testthat file: test_MC_WUC.R
- Modified DESCRIPTION file.
- Added two imports in the NAMESPACE file.
Version 4.2.3 (not publicly
released)
2018-09-26
- Removed unnecessary version requirement for Rcpp introduced in the
preivous version.
2018-09-24
- Version created to remove typos in DESCRIPTION.
- Updated vignette on weight scaling.
Version 4.2.2
2018-09-21
- Modified the package to use Rcpp interface instead of the old-style
C interface.
Version 4.2.1
2017-06-10
- Added a new vignette “Tutorial: Linear weight scaling in cluster
analysis”.
- Re-organized manuals and updated documentation.
Version 4.2.0
2017-05-29
- Increased log likelihood calculation to long double precision in C++
function weighted_select_levels.cpp.
- Replaced std::accumulate() function by for-loop addition in C++
function weighted_select_levels.cpp. This resolved a numerical overflow
issue when the weight values are large.
- Now R function plotBIC() automatically adjusts the “k*=” text
position, so that the text label is placed entirely within the BIC curve
area and will not extend into the figure margin.
- The cluster size has been changed from integer to double to
accomodate weighted cluster size in both R and C++ code.
- Force any weight vector to be equal weight in new R function
Ckmedian.1d.dp.
- Introduced S3 methods print and plot for Ckmedian.1d.dp and
Cksegs.1d.dp objects.
2017-03-18
- Introduced Cksegs.1d.dp() function for k-segments clustering of y
with or without x. Only method=“quadratic” guarantees optimality.
- Expanded k-median clustering to work with all possible methods. Only
unweighted solution guarantees optimality.
Version 4.1.0
2017-03-02
- Introduced function Ckmedian.1d.dp() for k-median unweighted
clustering.
Version 4.0.2
2017-02-16
- Fixed symbol encoding used in NEWS.
- Updated documentation.
Version 4.0.1
2017-02-16
- Fixed a warning message in the use of ‘R_registerRoutines’ and
‘R_useDynamicSymbols’.
- Fixed a memory leak issue: invalid read of size 8.
Version 4.0.0
2017-02-11
- Removed some examples for future use.
Version 3.4.15
- Minor changes.
Version 3.4.14
2017-01-03
- Minor changes in documentation files.
2017-01-02
- Changed package title to “Optimal and Fast Univariate
Clustering”.
2016-12-27
- When the input vector x is empty, function Ckmeans.1d.dp now
generates a warning message instead of stops on error. Ckmeans.1d.dp.R
is modified.
- Print out appropriate warning messages when input x does not have an
appropriate type. ahist.R is modified.
Version 3.4.13
2016-12-19
- Revised the comparison function in sorting so that the code can be
compiled by C++98, as requested by a user.
2016-12-06
- Expanded the ahist() function to support weighted adaptive
histogram
2016-12-04
- Expanded the vignette of adaptive histograms to a tutorial.
- Expanded the vignette of optimal univariate k-means clustering to a
tutorial.
- Update the time course example in Ckmeans.1d.dp function
2016-10-22
- Added a vignette to visualize examples of adaptive histograms.
- Added a vignette to visualize examples of optimal univariate k-means
clustering.
2016-10-21
- Added an equal bin width histogram example to contrast with the
adaptive histogram.
2016-10-16
- Moved ahist() function from visualize.R to a new R file
ahist.R.
2016-10-15
- Added a breaks argument to ahist() so as use default
graphics::hist() but with the capacity to add sticks to the histogram
generated.
- Added a skip.empty.bin.color argument to ahist() to gain more
control over colors of the histogram bars.
2016-10-12
- Added a data argument to ahist() to provide raw data for
visualization.
- Allow x to ahist() to be an object of the class “Ckmeans.1d.dp” to
avoid recomputing the clustering if it has already been done. This
requires the data for clustering to be provided via the data
argument.
2016-10-11
- Added an argument add.sticks=TRUE to ahist() to turn on or off the
sticks just above the horizontal axis.
- Added an argument style to ahist() for different styles of adaptive
histogram.
2016-10-01
- Added a new function plot() to visualize the clusters.
- Added a new function plotBIC() to show the Bayesian information
criterion as a function of number of clusters.
2016-09-30
- Updated examples of ahist().
- Added sticks to ahist() to show the original input data.
2016-09-27
- Fixed ahist() when there is only a single bin detected.
- Made ahist() run much faster than the previous version.
- Updated previous examples and added more examples to illustrate the
use of ahist() better.
2016-09-25
- Introduced a new function ahist() to generate adaptive histograms
corresponding to the optimal univariate k-means clustering.
2016-09-24
- Known issue: loglinear option may generate optimal clustering
different from linear and quadratic.
- The default k estimation method is updated. Updated number of
cluster k estimation. The main difference is when there are duplicates
in the data. Otherwise, the estimated k would be the same with previous
versions. Added an argument estimate.k in fiction Ckmeans.1d.dp() to use
the BIC method in version 3.4.12 or earlier to estimated k for
compatibility.
Version 3.4.12
2016-08-20
- The weighted univariate k-means now runs in \(O(kn)\), down from \(O(kn^2)\). This is a result of integrating
weighted and unweighted k-means clustering into a unified dynamic
programming function without sacrificing performance. This also fixed a
bug in the previous loglinear-time weighted k-means implementation.
Version 3.4.9
2016-07-19
- If the input array is already sorted, sorting is not performed
again.
- Added an option method to select either the linear or loglinear
algorithm.
2016-07-16
- Implemented linear recursive algorithm based on the method described
in (Aggarwal et al., 1987)
Version 3.4.8
2016-06-01
Implemented an iterative O(nlgn+kn) algorithm. This version
completely eliminates the involved divide-and-conquer strategy reported
in the literature and further reduced the overhead.
This implementation was later determined to be incorrect.
Version 3.4.7
2016-05-30
Implemented an O(nlgn+kn) algorithm combining divide-and-conquer
and dynamic programming. The space is still O(kn). The runtime is now
practical for very large sample sizes for any number of clusters.
This implementation was later determined to be incorrect.
Version 3.4.6
2016-05-25
- Implemented an O(kn lg n) algorithm, speeding up the program
greatly.
Version 3.4.5
2016-05-22
- \(s[j,i]\) is now computed in
constant time based on pre-computed sums of input x and its squares from
0 to i.
Version 3.4.4
2016-05-22
- Incorporated a numerically stable method for computing sample
variance when selecting the number of clusters.
- Improved documentation.
- Removed a typo in describing time complexity.
2016-05-18
- Now Ckmeans.1d.dp() function returns “totss”, “tot.withinss”, and
“betweenss” statistics to summarize the optimal clustering
obtained.
- print.Ckmeans.1d.dp() print out the above statistics.
Version 3.4.3
2016-05-15
- Upgraded to support c++11
- Introduced optimal k-means clustering for weighted data
Version 3.4.2
2016-05-14
- Implemented backward filling of the dynamic programming matrix to
utilize lower bounds for the optimal cluster boundary. This step
substantially reduced the runtime by half (two or more times
faster).
2016-05-07
- Implemented mathematically proven tighter ranges when searching for
cluster boundaries. The runtime of the function is greatly reduced. Most
notably, the runtime is roughly constant when number of clusters
increases after k=2.
- Integrated all test cases into one single file.
Version 3.4.0
2016-05-07
- Substantial runtime reduction. Added code to check for an upper
bound for the sum of within cluster square distances. This reduced the
runtime by half when clustering 100000 points (from standard normal
distribution) into 10 clusters.
- Eliminated the unnecessary calculation of (n-1) elements in the
dynamic programming matrix that are not needed for the final result.
This resulted in enormous reduction in run time when the number of
cluster is 2: assigning one million points into two clusters took half a
a second on iMac with 2.93 GHz Intel Core i7 processor.
- Included a reference to the first description of the dynamic
programming solution by Richard Bellman (1973).
Version 3.3.3
2016-05-03
- Fixed a bug on cluster assignment when there is only one cluster.
This was a bug introduced in version 3.3.2.
Version 3.3.2
2016-05-03
- Added automatic test cases.
- Removed an incorrect warning message when the number of clusters is
equal to the number of unique elements in the input vector.
- Changed from 1-based to 0-based C implementation.
- Optimized the code by reducing overhead. See 22% reduction in
runtime to repeatedly cluster seven points into two clusters one million
times.
Version 3.3.1
2015-02-10
- Fixed a problem that prevented Windows compilation (now forced the
size_t type to unsigned long in max() function.
Version 3.3.0
2015-02-09
- Added automated test cases into the package.
- Changed the code to not issue a warning message when the number of
clusters is estimated to be 1.
- When lower bound of the number of clusters is greater than the
unique number of elements in the input vector, both the min and max
numbers of clusters are set to the number of unique number of input
values.
- When the upper bound of the number of clusters is greater than the
unique number of elements in the input vector, the max number of
clusters is set to the number of unique elements in the input
vector.
- Use warning() instead of cat() to display warning messages.
- Incorporate changes suggested by a user to speed up the code.
- Revised the examples and documentation to improve usability of the
package in general.
- Started the NEWS file.
Version 3.02
2014-03-24 and earlier
- The program now automatically determines the number of clusters from
a given range.
- The code is optimized for further speedup.
Version 1.0
2010-10-26. Version 1.0 is released to CRAN.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.