NAME
s.kcv - Randomly partition sites into test/train sets.
(GRASS Sites Program)
SYNOPSIS
s.kcv
s.kcv help
s.kcv [-dq] k=value sites=name
DESCRIPTION
s.kcv randomly divides a sites lists into k sets of
test/train data (for k-fold cross validation).
Test partitions are mutually exclusive. That is, a site will
appear in only one test partition and k-1 training partitions.
The program generates a random point using the selected
random number generator and then finds the closest site to
it. This site is removed from the candidate list (meaning
that it will not be selected for any other test set) and
saved in the first test partition file. This is repeated
until enough points have been selected for the test partition.
The number of sites chosen for test partitions
depends upon the number of sites available and the number
of partitions chosen (this number is made as consistent as
possible while ensuring that all sites will be chosen for
testing). This process of filling up a test partition is
done k times.
Flags:
-d Use drand48() (default is rand()).
-q Run quietly. Don't report progress.
Parameters:
k=value Positive integer value indicating the
number of partitions.
sites=name Name of a sites file to store random
points in.
Test/train pairs are saved as sites list using name as a
basename. Test sites are saved in name-test.i while train
ing sites are saved in name-train.i, where i ranges from
zero to k.
NOTES
Existing files are silently overwritten.
An ideal random sites generator will follow a Poisson dis
only be as random as the original sites. This program
simply divides sites up in a random manner.
Be warned that random number generation occurs over the
intervals defined by the current region.
This program may not work properly with Lat-long data.
SEE ALSO
s.rand and g.region
AUTHOR
James Darrell McCauley, Purdue University