Getting Started with NNS: Clustering and Regression

Fred Viole

Clustering and Regression

Below are some examples demonstrating unsupervised learning with NNS clustering and nonlinear regression using the resulting clusters. As always, for a more thorough description and definition, please view the References.

NNS Partitioning NNS.part

NNS.part is both a partitional and hierarchal clustering method. NNS iteratively partitions the joint distribution into partial moment quadrants, and then assigns a quadrant identification at each partition.

NNS.part returns a data.table of observations along with their final quadrant identification. It also returns the regression points, which are the quadrant means used in NNS.reg.

x=seq(-5,5,.1); y=x^3

NNS.part(x,y,Voronoi = T)

## $dt
##         x        y quadrant
##   1: -5.0 -125.000     q444
##   2: -4.9 -117.649     q444
##   3: -4.8 -110.592     q444
##   4: -4.7 -103.823     q444
##   5: -4.6  -97.336     q444
##  ---                       
##  97:  4.6   97.336     q111
##  98:  4.7  103.823     q111
##  99:  4.8  110.592     q111
## 100:  4.9  117.649     q111
## 101:  5.0  125.000     q111
## 
## $regression.points
##    quadrant     x        y
## 1:      q44 -4.10 -72.6110
## 2:      q42 -2.80 -22.2880
## 3:      q41 -1.20  -3.6000
## 4:      q14  1.30   4.2250
## 5:      q13  2.85  23.3985
## 6:      q11  4.10  72.6110

X-only Partitioning

NNS.part offers a partitioning based on \(x\) values only, using the entire bandwidth in its regression point derivation, and shares the same limit condition as partitioning via both \(x\) and \(y\) values.

NNS.part(x,y,Voronoi = T,type="XONLY")

## $dt
##         x        y quadrant
##   1: -5.0 -125.000    q1111
##   2: -4.9 -117.649    q1111
##   3: -4.8 -110.592    q1111
##   4: -4.7 -103.823    q1111
##   5: -4.6  -97.336    q1111
##  ---                       
##  97:  4.6   97.336    q2222
##  98:  4.7  103.823    q2222
##  99:  4.8  110.592    q2222
## 100:  4.9  117.649    q2222
## 101:  5.0  125.000    q2222
## 
## $regression.points
##    quadrant     x       y
## 1:     q111 -4.40 -87.032
## 2:     q112 -3.10 -31.093
## 3:     q121 -1.80  -6.588
## 4:     q122 -0.55  -0.363
## 5:     q211  0.65   0.507
## 6:     q212  1.90   7.657
## 7:     q221  3.15  32.382
## 8:     q222  4.40  87.032

Clusters Used in Regression

for(i in 1:3){NNS.part(x,y,order=i,Voronoi = T);NNS.reg(x,y,order=i)}

NNS Regression NNS.reg

NNS.reg can fit any \(f(x)\), for both uni- and multivariate cases. NNS.reg returns a self-evident list of values provided below.

Univariate:

NNS.reg(x,y,order=4,noise.reduction = 'off')

## $R2
## [1] 0.9975512
## 
## $MSE
## [1] 0.2216172
## 
## $Prediction.Accuracy
## [1] 0.04950495
## 
## $equation
## NULL
## 
## $Segments
## [1] 17
## 
## $derivative
##     Coefficient X.Lower.Range X.Upper.Range
##  1:      66.860         -5.00         -4.60
##  2:      58.670         -4.60         -4.10
##  3:      43.090         -4.10         -3.60
##  4:      33.860         -3.60         -3.00
##  5:      25.540         -3.00         -2.80
##  6:      21.620         -2.80         -2.60
##  7:      15.380         -2.60         -2.00
##  8:       9.060         -2.00         -1.35
##  9:       2.685         -1.35         -0.55
## 10:       0.725         -0.55          0.65
## 11:       3.245          0.65          1.45
## 12:      10.120          1.45          2.10
## 13:      16.760          2.10          2.70
## 14:      24.410          2.70          3.00
## 15:      34.570          3.00          3.65
## 16:      51.290          3.65          4.60
## 17:      66.860          4.60          5.00
## 
## $Point
## NULL
## 
## $Point.est
## numeric(0)
## 
## $regression.points
##         x         y
##  1: -5.00 -125.0000
##  2: -4.60  -98.2560
##  3: -4.10  -68.9210
##  4: -3.60  -47.3760
##  5: -3.00  -27.0600
##  6: -2.80  -21.9520
##  7: -2.60  -17.6280
##  8: -2.00   -8.4000
##  9: -1.35   -2.5110
## 10: -0.55   -0.3630
## 11:  0.65    0.5070
## 12:  1.45    3.1030
## 13:  2.10    9.6810
## 14:  2.70   19.7370
## 15:  3.00   27.0600
## 16:  3.65   49.5305
## 17:  4.60   98.2560
## 18:  5.00  125.0000
## 
## $partition
##         x        y quadrant
##   1: -5.0 -125.000    q4444
##   2: -4.9 -117.649    q4444
##   3: -4.8 -110.592    q4444
##   4: -4.7 -103.823    q4444
##   5: -4.6  -97.336    q4442
##  ---                       
##  97:  4.6   97.336    q1113
##  98:  4.7  103.823    q1111
##  99:  4.8  110.592    q1111
## 100:  4.9  117.649    q1111
## 101:  5.0  125.000    q1111
## 
## $Fitted
##   [1] -125.00000 -118.31400 -111.62800 -104.94200  -98.25600  -92.38900
##   [7]  -86.52200  -80.65500  -74.78800  -68.92100  -64.61200  -60.30300
##  [13]  -55.99400  -51.68500  -47.37600  -43.99000  -40.60400  -37.21800
##  [19]  -33.83200  -30.44600  -27.06000  -24.50600  -21.95200  -19.79000
##  [25]  -17.62800  -16.09000  -14.55200  -13.01400  -11.47600   -9.93800
##  [31]   -8.40000   -7.49400   -6.58800   -5.68200   -4.77600   -3.87000
##  [37]   -2.96400   -2.37675   -2.10825   -1.83975   -1.57125   -1.30275
##  [43]   -1.03425   -0.76575   -0.49725   -0.32675   -0.25425   -0.18175
##  [49]   -0.10925   -0.03675    0.03575    0.10825    0.18075    0.25325
##  [55]    0.32575    0.39825    0.47075    0.66925    0.99375    1.31825
##  [61]    1.64275    1.96725    2.29175    2.61625    2.94075    3.60900
##  [67]    4.62100    5.63300    6.64500    7.65700    8.66900    9.68100
##  [73]   11.35700   13.03300   14.70900   16.38500   18.06100   19.73700
##  [79]   22.17800   24.61900   27.06000   30.51700   33.97400   37.43100
##  [85]   40.88800   44.34500   47.80200   52.09500   57.22400   62.35300
##  [91]   67.48200   72.61100   77.74000   82.86900   87.99800   93.12700
##  [97]   98.25600  104.94200  111.62800  118.31400  151.74400
## 
## $Fitted.xy
##         x        y    y.hat
##   1: -5.0 -125.000 -125.000
##   2: -4.9 -117.649 -118.314
##   3: -4.8 -110.592 -111.628
##   4: -4.7 -103.823 -104.942
##   5: -4.6  -97.336  -98.256
##  ---                       
##  97:  4.6   97.336   98.256
##  98:  4.7  103.823  104.942
##  99:  4.8  110.592  111.628
## 100:  4.9  117.649  118.314
## 101:  5.0  125.000  151.744

Multivariate:

f= function(x,y) x^3+3*y-y^3-3*x
y=x; z=expand.grid(x,y)
g=f(z[,1],z[,2])
NNS.reg(z,g,order='max')

Inter/Extrapolation

NNS.reg can inter- or extrapolate any point of interest. The NNS.reg(x,y,point.est=...) paramter permits any sized data of similar dimensions to \(x\) and called specifically with $Point.est.

NNS.reg(iris[,1:4],iris[,5],point.est=iris[1:10,1:4])$Point.est

##  [1] 1 1 1 1 1 1 1 1 1 1

NNS Dimension Reduction Regression

NNS.reg also provides a dimension reduction regression by including a parameter NNS.reg(x,y,type="CLASS"). Reducing all regressors to a single dimension using the returned equation $equation.

NNS.reg(iris[,1:4],iris[,5],type = "CLASS")$equation

## [1] "Synthetic Independent Variable X* = (0.3480*X1  0.3525*X2  0.3769*X3  0.3899*X4)/4"

Threshold

NNS.reg(x,y,type="CLASS",threshold=...) offers a method of reducing regressors further by controlling the absolute value of required correlation.

NNS.reg(iris[,1:4],iris[,5],type = "CLASS",threshold=.35)$equation

## [1] "Synthetic Independent Variable X* = (0.0000*X1  0.3525*X2  0.3769*X3  0.3899*X4)/3"

and the point.est=... operates in the same manner as the full regression above, again called with $Point.est.

NNS.reg(iris[,1:4],iris[,5],type = "CLASS",threshold=.35,point.est=iris[1:10,1:4])$Point.est

##  [1] 1.024040 1.000000 1.000000 1.000000 1.058668 1.350253 1.027714
##  [8] 1.026445 1.000000 1.000000

References

If the user is so motivated, detailed arguments further examples are provided within the following:

*Nonlinear Nonparametric Statistics: Using Partial Moments

*Deriving Nonlinear Correlation Coefficients from Partial Moments

*New Nonparametric Curve-Fitting Using Partitioning, Regression and Partial Derivative Estimation

*Clustering and Curve Fitting by Line Segments

*Classification Using NNS Clustering Analysis