Sampling

The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

Sampling

Simple Random Sampling

The objective of this example is to survey an area of 46.8 ha using the simple random sampling method. The aimed error is 20%. 10 plots of 3000 m² each were measured for a pilot inventory. The data collected is shown below:

data_acs_pilot
#>    TOTAL_AREA PLOT_AREA VWB  VWB_m3ha
#> 1        46.8      3000  41 136.66667
#> 2        46.8      3000  33 110.00000
#> 3        46.8      3000  24  80.00000
#> 4        46.8      3000  31 103.33333
#> 5        46.8      3000  10  33.33333
#> 6        46.8      3000  32 106.66667
#> 7        46.8      3000  62 206.66667
#> 8        46.8      3000  16  53.33333
#> 9        46.8      3000  66 220.00000
#> 10       46.8      3000  25  83.33333

Now we’ll calculate the inventory variables for a 20% error, considering a finite population with the sprs function. Area values must be inserted in square meters, and total area values must be in hectares:

sprs(data_acs_pilot, "VWB", 3000, 46.8,error = 20, pop = "fin")
#>                                        Variables    Values
#> 1              Total number of sampled plots (n)   10.0000
#> 2                    Number of maximum plots (N)  156.0000
#> 3                      Variance Quoeficient (VC)   53.2670
#> 4                                      t-student    2.2622
#> 5                         recalculated t-student    2.0452
#> 6  Number of samples regarding the admited error   25.0000
#> 7                                  Variance (S2)  328.0000
#> 8                         Standard deviation (s)   18.1108
#> 9                                       Mean (Y)   34.0000
#> 10               Standard error of the mean (Sy)    5.5405
#> 11                                Absolute Error   12.5335
#> 12                            Relative Error (%)   36.8634
#> 13                  Estimated Total Value (Yhat) 5304.0000
#> 14                                   Total Error 1955.2326
#> 15             Inferior Confidence Interval (m3)   21.4665
#> 16             Superior Confidence Interval (m3)   46.5335
#> 17          Inferior Confidence Interval (m3/ha)   71.5549
#> 18          Superior Confidence Interval (m3/ha)  155.1118
#> 19       inferior Total Confidence Interval (m3) 3348.7674
#> 20       Superior Total Confidence Interval (m3) 7259.2326

With these results, we can see that in order to meet the desired error, we’ll need 15 more samples. After a new survey was done, this are the new data:

data_acs_def
#>    TOTAL_AREA PLOT_AREA VWB
#> 1        46.8      3000  41
#> 2        46.8      3000  33
#> 3        46.8      3000  24
#> 4        46.8      3000  31
#> 5        46.8      3000  10
#> 6        46.8      3000  32
#> 7        46.8      3000  62
#> 8        46.8      3000  16
#> 9        46.8      3000  66
#> 10       46.8      3000  25
#> 11       46.8      3000  44
#> 12       46.8      3000   7
#> 13       46.8      3000  57
#> 14       46.8      3000  22
#> 15       46.8      3000  31
#> 16       46.8      3000  40
#> 17       46.8      3000  43
#> 18       46.8      3000  27
#> 19       46.8      3000  17
#> 20       46.8      3000  50
#> 21       46.8      3000  38
#> 22       46.8      3000  20
#> 23       46.8      3000  35
#> 24       46.8      3000  31
#> 25       46.8      3000  26

Now the definitive inventory can be done:

sprs(data_acs_def, "VWB", 3000, 46.8, error = 20, pop = "fin")
#>                                        Variables    Values
#> 1              Total number of sampled plots (n)   25.0000
#> 2                    Number of maximum plots (N)  156.0000
#> 3                      Variance Quoeficient (VC)   45.4600
#> 4                                      t-student    2.0639
#> 5                         recalculated t-student    2.0930
#> 6  Number of samples regarding the admited error   20.0000
#> 7                                  Variance (S2)  226.6933
#> 8                         Standard deviation (s)   15.0563
#> 9                                       Mean (Y)   33.1200
#> 10               Standard error of the mean (Sy)    2.7595
#> 11                                Absolute Error    5.6952
#> 12                            Relative Error (%)   17.1957
#> 13                  Estimated Total Value (Yhat) 5166.7200
#> 14                                   Total Error  888.4555
#> 15             Inferior Confidence Interval (m3)   27.4248
#> 16             Superior Confidence Interval (m3)   38.8152
#> 17          Inferior Confidence Interval (m3/ha)   91.4159
#> 18          Superior Confidence Interval (m3/ha)  129.3841
#> 19       inferior Total Confidence Interval (m3) 4278.2645
#> 20       Superior Total Confidence Interval (m3) 6055.1755

The desired error was met.

The area values can also be inserted as variables:

sprs(data_acs_def, "VWB", "PLOT_AREA", "TOTAL_AREA", 
     error = 20, pop = "fin")
#>                                        Variables    Values
#> 1              Total number of sampled plots (n)   25.0000
#> 2                    Number of maximum plots (N)  156.0000
#> 3                      Variance Quoeficient (VC)   45.4600
#> 4                                      t-student    2.0639
#> 5                         recalculated t-student    2.0930
#> 6  Number of samples regarding the admited error   20.0000
#> 7                                  Variance (S2)  226.6933
#> 8                         Standard deviation (s)   15.0563
#> 9                                       Mean (Y)   33.1200
#> 10               Standard error of the mean (Sy)    2.7595
#> 11                                Absolute Error    5.6952
#> 12                            Relative Error (%)   17.1957
#> 13                  Estimated Total Value (Yhat) 5166.7200
#> 14                                   Total Error  888.4555
#> 15             Inferior Confidence Interval (m3)   27.4248
#> 16             Superior Confidence Interval (m3)   38.8152
#> 17          Inferior Confidence Interval (m3/ha)   91.4159
#> 18          Superior Confidence Interval (m3/ha)  129.3841
#> 19       inferior Total Confidence Interval (m3) 4278.2645
#> 20       Superior Total Confidence Interval (m3) 6055.1755

It’s also possible to run multiple simple random sampling inventories. To demonstrate this, we’ll use the example dataset for stratified sampling, but running simple random statistics. We’ll still use the sprs function, but use the .groups argument to run a simple random sampling inventory for each stratum:

sprs(data_ace_def, "VWB", "PLOT_AREA", "STRATA_AREA",
     .groups = "STRATA" ,error = 20, pop = "fin")
#>                                        Variables  STRATA1   STRATA2   STRATA3
#> 1              Total number of sampled plots (n)  14.0000   20.0000   23.0000
#> 2                    Number of maximum plots (N) 144.0000  164.0000  142.0000
#> 3                      Variance Quoeficient (VC)  24.4785   15.8269   16.7813
#> 4                                      t-student   2.1604    2.0930    2.0739
#> 5                         recalculated t-student   2.4469    4.3027    4.3027
#> 6  Number of samples regarding the admited error   9.0000   11.0000   12.0000
#> 7                                  Variance (S2)   2.1829    3.6161    5.3192
#> 8                         Standard deviation (s)   1.4774    1.9016    2.3063
#> 9                                       Mean (Y)   6.0357   12.0150   13.7435
#> 10               Standard error of the mean (Sy)   0.3752    0.3984    0.4402
#> 11                                Absolute Error   0.8105    0.8339    0.9130
#> 12                            Relative Error (%)  13.4288    6.9409    6.6431
#> 13                  Estimated Total Value (Yhat) 869.1429 1970.4600 1951.5739
#> 14                                   Total Error 116.7157  136.7670  129.6455
#> 15             Inferior Confidence Interval (m3)   5.2252   11.1811   12.8305
#> 16             Superior Confidence Interval (m3)   6.8462   12.8489   14.6565
#> 17          Inferior Confidence Interval (m3/ha)  52.2519  111.8105  128.3048
#> 18          Superior Confidence Interval (m3/ha)  68.4624  128.4895  146.5647
#> 19       inferior Total Confidence Interval (m3) 752.4271 1833.6930 1821.9284
#> 20       Superior Total Confidence Interval (m3) 985.8586 2107.2270 2081.2194

Stratified Random Sampling

The objective of this example is to survey an area using the stratified random sampling method. The area was divided into 3 strata: one with 14.4 ha and 7 plots, another with 16.4 ha and 8 plots, and another with 14.2 ha and 7 plots. The plots have an area of 1000 square meters. In total, 22 plots were sampled for the pilot inventory. The data is shown below:

data_ace_pilot
#>    STRATA STRATA_AREA PLOT_AREA   VWB VWB_m3ha
#> 1       1        14.4      1000  7.90     79.0
#> 2       1        14.4      1000  3.80     38.0
#> 3       1        14.4      1000  4.40     44.0
#> 4       1        14.4      1000  6.25     62.5
#> 5       1        14.4      1000  5.55     55.5
#> 6       1        14.4      1000  8.10     81.0
#> 7       1        14.4      1000  6.10     61.0
#> 8       2        16.4      1000 10.20    102.0
#> 9       2        16.4      1000 15.25    152.5
#> 10      2        16.4      1000 13.40    134.0
#> 11      2        16.4      1000 13.60    136.0
#> 12      2        16.4      1000 14.20    142.0
#> 13      2        16.4      1000  9.85     98.5
#> 14      2        16.4      1000 10.20    102.0
#> 15      2        16.4      1000 11.55    115.5
#> 16      3        14.2      1000 10.65    106.5
#> 17      3        14.2      1000 12.15    121.5
#> 18      3        14.2      1000 14.60    146.0
#> 19      3        14.2      1000 10.90    109.0
#> 20      3        14.2      1000 16.55    165.5
#> 21      3        14.2      1000 17.90    179.0
#> 22      3        14.2      1000 13.35    133.5

We’ll calculate the statistics with an aimed error of 5%, considering a finite population using the strs function. Area values can be inserted as a numeric vector, or as a variable. The plot area must be inserted in square meters, and strata area must be in hectares:

strs(data_ace_pilot, "VWB", 3000, c(14.4, 16.4, 14.2), 
     strata = "STRATA", error = 5, pop = "fin")
#> $Table1
#>                                             Variables  STRATA 1  STRATA 2
#> 1                                         STRATA_AREA   14.4000   16.4000
#> 2                                           Plot Area 3000.0000 3000.0000
#> 3            Number of sampled plots per stratum (nj)    7.0000    8.0000
#> 4                   Total number of sampled plots (n)   22.0000   22.0000
#> 5            Number of maximum plots per stratum (Nj)   48.0000   54.6667
#> 6                         Number of maximum plots (N)  150.0000  150.0000
#> 7                                     Nj/N Ratio (Pj)    0.3200    0.3644
#> 8                                   Stratum sum (Eyj)   42.1000   98.2500
#> 9                        Stratum quadratic sum (Eyj2)  268.8950 1237.2275
#> 10                        Mean of Yi per stratum (Yj)    6.0143   12.2812
#> 11                                              PjSj2    0.8370    1.5929
#> 12                                               PjSj    0.5175    0.7619
#> 13                                               PjYj    1.9246    4.4758
#> 14                                          t-student    2.0796    2.0796
#> 15                             recalculated t-student    2.0129    2.0129
#> 16      Number of samples regarding the admited error   45.0000   45.0000
#> 17 Optimal number of samples per stratum (nj optimal)   11.0000   16.0000
#> 18              Optimal number of samples (n optimal)   46.0000   46.0000
#> 19               Total value of Y per stratum (Yhatj)  288.6857  671.3750
#>     STRATA 3
#> 1    14.2000
#> 2  3000.0000
#> 3     7.0000
#> 4    22.0000
#> 5    47.3333
#> 6   150.0000
#> 7     0.3156
#> 8    96.1000
#> 9  1365.5500
#> 10   13.7286
#> 11    2.4316
#> 12    0.8760
#> 13    4.3321
#> 14    2.0796
#> 15    2.0129
#> 16   45.0000
#> 17   19.0000
#> 18   46.0000
#> 19  649.8190
#> 
#> $Table2
#>                                  Variables     value
#> 1                                t-student    2.0796
#> 2          Standard error of the mean (Sy)    0.4228
#> 3                      Stratified Variance    4.8614
#> 4            Stratified Standard Deviation    2.1554
#> 5                Variance Quoeficient (VC)   20.0829
#> 6                      Stratified Mean (Y)   10.7325
#> 7                           Absolute Error    0.8793
#> 8                       Relative Error (%)    8.1925
#> 9             Estimated Total Value (Yhat) 1609.8798
#> 10                             Total Error  131.8894
#> 11       Inferior Confidence Interval (m3)    9.8533
#> 12       Superior Confidence Interval (m3)   11.6118
#> 13    Inferior Confidence Interval (m3/ha)   32.8442
#> 14    Superior Confidence Interval (m3/ha)   38.7060
#> 15 inferior Total Confidence Interval (m3) 1477.9904
#> 16 Superior Total Confidence Interval (m3) 1741.7691

Analyzing the first table, we can see that in order to achieve the desired error, we must sample 24 additional plots. 4 in stratum 1, 8 in stratum 2 and 12 in stratum 3.

After a new survey, the new data is shown below:

data_ace_def
#>    STRATA STRATA_AREA PLOT_AREA   VWB VWB_m3ha
#> 1       1        14.4      1000  7.90     79.0
#> 2       1        14.4      1000  3.80     38.0
#> 3       1        14.4      1000  4.40     44.0
#> 4       1        14.4      1000  6.25     62.5
#> 5       1        14.4      1000  5.55     55.5
#> 6       1        14.4      1000  8.10     81.0
#> 7       1        14.4      1000  6.10     61.0
#> 8       1        14.4      1000  6.60     66.0
#> 9       1        14.4      1000  7.40     74.0
#> 10      1        14.4      1000  5.35     53.5
#> 11      1        14.4      1000  5.90     59.0
#> 12      1        14.4      1000  4.65     46.5
#> 13      1        14.4      1000  4.25     42.5
#> 14      1        14.4      1000  8.25     82.5
#> 15      2        16.4      1000 10.20    102.0
#> 16      2        16.4      1000 15.25    152.5
#> 17      2        16.4      1000 13.40    134.0
#> 18      2        16.4      1000 13.60    136.0
#> 19      2        16.4      1000 14.20    142.0
#> 20      2        16.4      1000  9.85     98.5
#> 21      2        16.4      1000 10.20    102.0
#> 22      2        16.4      1000 11.55    115.5
#> 23      2        16.4      1000  9.25     92.5
#> 24      2        16.4      1000 11.30    113.0
#> 25      2        16.4      1000 13.95    139.5
#> 26      2        16.4      1000 12.70    127.0
#> 27      2        16.4      1000 10.15    101.5
#> 28      2        16.4      1000 14.90    149.0
#> 29      2        16.4      1000 10.80    108.0
#> 30      2        16.4      1000 11.55    115.5
#> 31      2        16.4      1000 13.90    139.0
#> 32      2        16.4      1000  9.20     92.0
#> 33      2        16.4      1000 12.45    124.5
#> 34      2        16.4      1000 11.90    119.0
#> 35      3        14.2      1000 10.65    106.5
#> 36      3        14.2      1000 12.15    121.5
#> 37      3        14.2      1000 14.60    146.0
#> 38      3        14.2      1000 10.90    109.0
#> 39      3        14.2      1000 16.55    165.5
#> 40      3        14.2      1000 17.90    179.0
#> 41      3        14.2      1000 13.35    133.5
#> 42      3        14.2      1000 14.90    149.0
#> 43      3        14.2      1000  9.70     97.0
#> 44      3        14.2      1000 15.20    152.0
#> 45      3        14.2      1000 13.45    134.5
#> 46      3        14.2      1000 12.40    124.0
#> 47      3        14.2      1000 14.45    144.5
#> 48      3        14.2      1000 13.55    135.5
#> 49      3        14.2      1000 12.30    123.0
#> 50      3        14.2      1000 15.65    156.5
#> 51      3        14.2      1000 14.20    142.0
#> 52      3        14.2      1000 17.80    178.0
#> 53      3        14.2      1000 14.80    148.0
#> 54      3        14.2      1000  9.35     93.5
#> 55      3        14.2      1000 12.60    126.0
#> 56      3        14.2      1000 13.80    138.0
#> 57      3        14.2      1000 15.85    158.5

Now we’ll run the inventory again, this time with the definitive data:

strs(data_ace_def, "VWB", "PLOT_AREA", "STRATA_AREA", 
     strata = "STRATA", error = 5, pop = "fin")
#> $Table1
#>                                             Variables  STRATA 1  STRATA 2
#> 1                                         STRATA_AREA   14.4000   16.4000
#> 2                                           Plot Area 1000.0000 1000.0000
#> 3            Number of sampled plots per stratum (nj)   14.0000   20.0000
#> 4                   Total number of sampled plots (n)   57.0000   57.0000
#> 5            Number of maximum plots per stratum (Nj)  144.0000  164.0000
#> 6                         Number of maximum plots (N)  450.0000  450.0000
#> 7                                     Nj/N Ratio (Pj)    0.3200    0.3644
#> 8                                   Stratum sum (Eyj)   84.5000  240.3000
#> 9                        Stratum quadratic sum (Eyj2)  538.3950 2955.9100
#> 10                        Mean of Yi per stratum (Yj)    6.0357   12.0150
#> 11                                              PjSj2    0.6985    1.3179
#> 12                                               PjSj    0.4728    0.6930
#> 13                                               PjYj    1.9314    4.3788
#> 14                                          t-student    2.0032    2.0032
#> 15                             recalculated t-student    2.0141    2.0141
#> 16      Number of samples regarding the admited error   46.0000   46.0000
#> 17 Optimal number of samples per stratum (nj optimal)   12.0000   17.0000
#> 18              Optimal number of samples (n optimal)   47.0000   47.0000
#> 19               Total value of Y per stratum (Yhatj)  869.1429 1970.4600
#>     STRATA 3
#> 1    14.2000
#> 2  1000.0000
#> 3    23.0000
#> 4    57.0000
#> 5   142.0000
#> 6   450.0000
#> 7     0.3156
#> 8   316.1000
#> 9  4461.3350
#> 10   13.7435
#> 11    1.6785
#> 12    0.7278
#> 13    4.3368
#> 14    2.0032
#> 15    2.0141
#> 16   46.0000
#> 17   18.0000
#> 18   47.0000
#> 19 1951.5739
#> 
#> $Table2
#>                                  Variables     value
#> 1                                t-student    2.0032
#> 2          Standard error of the mean (Sy)    0.2339
#> 3                      Stratified Variance    3.6949
#> 4            Stratified Standard Deviation    1.8936
#> 5                Variance Quoeficient (VC)   17.7851
#> 6                      Stratified Mean (Y)   10.6471
#> 7                           Absolute Error    0.4685
#> 8                       Relative Error (%)    4.4003
#> 9             Estimated Total Value (Yhat) 4791.1768
#> 10                             Total Error  210.8250
#> 11       Inferior Confidence Interval (m3)   10.1786
#> 12       Superior Confidence Interval (m3)   11.1156
#> 13    Inferior Confidence Interval (m3/ha)  101.7856
#> 14    Superior Confidence Interval (m3/ha)  111.1556
#> 15 inferior Total Confidence Interval (m3) 4580.3518
#> 16 Superior Total Confidence Interval (m3) 5002.0018

The desired error was met.

Systematic Sampling

Now we’ll survey an area of 18 hectares in which 18 plots of 200 m² each were systematically sampled:

data_as
#>    TOTAL_AREA PLOT_AREA VWB VWB_m3ha
#> 1          10       200   6      300
#> 2          10       200   8      400
#> 3          10       200   9      450
#> 4          10       200  10      500
#> 5          10       200  13      650
#> 6          10       200  12      600
#> 7          10       200  18      900
#> 8          10       200  19      950
#> 9          10       200  20     1000
#> 10         10       200  20     1000
#> 11         10       200  24     1200
#> 12         10       200  23     1150
#> 13         10       200  26     1300
#> 14         10       200  30     1500
#> 15         10       200  31     1550
#> 16         10       200  31     1550
#> 17         10       200  33     1650
#> 18         10       200  32     1600

First, let’s see what error we would get, if we used the simple random sampling method:

sprs(data_as, "VWB", 200, 18)
#>                                        Variables     Values
#> 1              Total number of sampled plots (n)    18.0000
#> 2                    Number of maximum plots (N)   900.0000
#> 3                      Variance Quoeficient (VC)    44.6505
#> 4                                      t-student     2.1098
#> 5                         recalculated t-student     1.9873
#> 6  Number of samples regarding the admited error    79.0000
#> 7                                  Variance (S2)    81.9771
#> 8                         Standard deviation (s)     9.0541
#> 9                                       Mean (Y)    20.2778
#> 10               Standard error of the mean (Sy)     2.1341
#> 11                                Absolute Error     4.5025
#> 12                            Relative Error (%)    22.2042
#> 13                  Estimated Total Value (Yhat) 18250.0000
#> 14                                   Total Error  4052.2580
#> 15             Inferior Confidence Interval (m3)    15.7753
#> 16             Superior Confidence Interval (m3)    24.7803
#> 17          Inferior Confidence Interval (m3/ha)   788.7634
#> 18          Superior Confidence Interval (m3/ha)  1239.0143
#> 19       inferior Total Confidence Interval (m3) 14197.7420
#> 20       Superior Total Confidence Interval (m3) 22302.2580

We got a 22.2% error. Now, let’s calculate the sampling error using the method of successive differences, with the ss_diffs function. To use this function, the data must be set in the measured order, the plot area must be in square meters, and the total area value must be in hectares.

ss_diffs(data_as, "VWB", 200, 18)
#>                                        Variables     Values
#> 1              Total number of sampled plots (n)    18.0000
#> 2                    Number of maximum plots (N)   900.0000
#> 3                      Variance Quoeficient (VC)    44.6505
#> 4                                      t-student     2.1098
#> 5                         recalculated t-student     1.9873
#> 6  Number of samples regarding the admited error    79.0000
#> 7                                  Variance (S2)    81.9771
#> 8                         Standard deviation (S)     9.0541
#> 9                                       Mean (Y)    20.2778
#> 10               Standard error of the mean (Sy)     0.4041
#> 11                                Absolute Error     0.8527
#> 12                            Relative Error (%)     4.2050
#> 13                  Estimated Total Value (Yhat) 18250.0000
#> 14                                   Total Error   767.4046
#> 15             Inferior Confidence Interval (m3)    19.4251
#> 16             Superior Confidence Interval (m3)    21.1304
#> 17          Inferior Confidence Interval (m3/ha)   971.2553
#> 18          Superior Confidence Interval (m3/ha)  1056.5225
#> 19       inferior Total Confidence Interval (m3) 17482.5954
#> 20       Superior Total Confidence Interval (m3) 19017.4046

We got a 4.2% error, which is significantly lower than before.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.