Title: | Statistical Functions for Probability Distributions and Regression |
---|---|
Description: | A collection of miscellaneous statistical functions for probability distributions: 'dbern()', 'pbern()', 'qbern()', 'rbern()' for the Bernoulli distribution, and 'distr2name()', 'name2distr()' for distribution names; probability density estimation: 'densityfun()'; most frequent value estimation: 'mfv()', 'mfv1()'; other statistical measures of location: 'cv()' (coefficient of variation), 'midhinge()', 'midrange()', 'trimean()'; construction of histograms: 'histo()', 'find_breaks()'; calculation of the Hellinger distance: 'hellinger()'; use of classical kernels: 'kernelfun()', 'kernel_properties()'; univariate piecewise-constant regression: 'picor()'. |
Authors: | Paul Poncet [aut, cre], The R Core Team [aut, cph] (C function 'BinDist' copied from package 'stats'), The R Foundation [cph] (C function 'BinDist' copied from package 'stats'), Adrian Baddeley [ctb] (C function 'BinDist' copied from package 'stats') |
Maintainer: | Paul Poncet <[email protected]> |
License: | GPL-3 |
Version: | 0.2.3 |
Built: | 2025-01-06 03:31:14 UTC |
Source: | https://github.com/paulponcet/statip |
bandwidth
computes the bandwidth to be used in the
densityfun
function.
bandwidth(x, rule)
bandwidth(x, rule)
x |
numeric. The data from which the estimate is to be computed. |
rule |
character. A rule to choose the bandwidth. See |
A numeric value.
Compute the coefficient of variation of a numeric vector x
,
defined as the ratio between the standard deviation and the mean.
cv(x, na_rm = FALSE, ...)
cv(x, na_rm = FALSE, ...)
x |
numeric. A numeric vector. |
na_rm |
logical. Should missing values be removed before computing the coefficient of variation? |
... |
Additional arguments to be passed to |
A numeric value, the coefficient of variation.
https://en.wikipedia.org/wiki/Coefficient_of_variation.
Density, distribution function, quantile function and random generation for the Bernoulli distribution.
dbern(x, prob, log = FALSE) qbern(p, prob, lower.tail = TRUE, log.p = FALSE) pbern(q, prob, lower.tail = TRUE, log.p = FALSE) rbern(n, prob)
dbern(x, prob, log = FALSE) qbern(p, prob, lower.tail = TRUE, log.p = FALSE) pbern(q, prob, lower.tail = TRUE, log.p = FALSE) rbern(n, prob)
x |
numeric. Vector of quantiles. |
prob |
Probability of success on each trial. |
log |
logical. If |
p |
numeric in |
lower.tail |
logical. If |
log.p |
logical. If |
q |
numeric. Vector of quantiles. |
n |
number of observations.
If |
See the help page of the Binomial
distribution.
Return a function performing kernel density estimation.
The difference between density
and
densityfun
is similar to that between
approx
and approxfun
.
densityfun( x, bw = "nrd0", adjust = 1, kernel = "gaussian", weights = NULL, window = kernel, width, n = 512, from, to, cut = 3, na.rm = FALSE, ... )
densityfun( x, bw = "nrd0", adjust = 1, kernel = "gaussian", weights = NULL, window = kernel, width, n = 512, from, to, cut = 3, na.rm = FALSE, ... )
x |
numeric. The data from which the estimate is to be computed. |
bw |
numeric. The smoothing bandwidth to be used.
See the eponymous argument of |
adjust |
numeric. The bandwidth used is actually |
kernel , window
|
character. A string giving the smoothing kernel to be used.
Authorized kernels are listed in |
weights |
numeric. A vector of non-negative observation weights,
hence of same length as |
width |
this exists for compatibility with S;
if given, and |
n |
The number of equally spaced points at which the density
is to be estimated.
See the eponymous argument of |
from , to
|
The left and right-most points of the grid at which the
density is to be estimated;
the defaults are |
cut |
By default, the values of |
na.rm |
logical. If |
... |
Additional arguments for (non-default) methods. |
A function that can be called to generate a density.
Adapted from the density
function of package stats.
The C code of BinDist
is copied from package stats and authored
by the R Core Team with contributions from Adrian Baddeley.
density
and approxfun
from package stats.
x <- rlnorm(1000, 1, 1) f <- densityfun(x, from = 0) curve(f(x), xlim = c(0, 20))
x <- rlnorm(1000, 1, 1) f <- densityfun(x, from = 0) curve(f(x), xlim = c(0, 20))
The function distr2name()
converts abbreviated
distribution names to proper distribution names
(e.g. "norm"
becomes "Gaussian"
).
The function name2distr()
does the reciprocal operation.
distr2name(x) name2distr(x)
distr2name(x) name2distr(x)
x |
character. A vector of abbreviated distribution names or proper distribution names. |
A character vector of the same length as x
.
Elements of x
that are not recognized are kept unchanged
(yet in lowercase).
distr2name(c("norm", "dnorm", "rhyper", "ppois")) name2distr(c("Cauchy", "Gaussian", "Generalized Extreme Value"))
distr2name(c("norm", "dnorm", "rhyper", "ppois")) name2distr(c("Cauchy", "Gaussian", "Generalized Extreme Value"))
The function erf()
encodes the
error function,
defined as erf(x) = 2 * F(x * sqrt(2)) - 1
, where
F
is the Gaussian distribution function.
erf(x, ...)
erf(x, ...)
x |
numeric. A vector of input values. |
... |
Additional arguments to be passed to |
A numeric vector of the same length as x
.
https://en.wikipedia.org/wiki/Error_function.
pnorm
from package stats.
The function find_breaks()
isolates a piece of code of
the function truehist()
from package MASS
that is used to compute the set of breakpoints to be applied for the
construction of the histogram.
find_breaks(x, nbins = "Scott", h, x0 = -h/1000)
find_breaks(x, nbins = "Scott", h, x0 = -h/1000)
x |
numeric. A vector. |
nbins |
integer or character. The suggested number of bins.
Either a positive integer, or a character string naming a rule:
|
h |
numeric. The bin width, a strictly positive number (takes precedence over nbins). |
x0 |
numeric. Shift for the bins -
the breaks are at |
A numeric vector.
histo()
in this package;
truehist()
from package MASS;
hist()
from package graphics.
Estimate the Hellinger distance between two random samples whose underdyling distributions are continuous.
hellinger(x, y, lower = -Inf, upper = Inf, method = 1, ...)
hellinger(x, y, lower = -Inf, upper = Inf, method = 1, ...)
x |
numeric. A vector giving the first sample. |
y |
numeric. A vector giving the second sample. |
lower |
numeric. Lower limit passed to |
upper |
numeric. Upper limit passed to |
method |
integer. If |
... |
Additional parameters to be passed to |
Probability density functions are estimated with
densityfun
.
Then numeric integration is performed with integrate
.
A numeric value, the Hellinger distance.
https://en.wikipedia.org/wiki/Hellinger_distance.
HellingerDist
in package distrEx.
x <- rnorm(200, 0, 2) y <- rnorm(1000, 10, 15) hellinger(x, y, -Inf, Inf) hellinger(x, y, -Inf, Inf, method = 2)
x <- rnorm(200, 0, 2) y <- rnorm(1000, 10, 15) hellinger(x, y, -Inf, Inf) hellinger(x, y, -Inf, Inf, method = 2)
A simplified version of
hist()
from package graphics.
histo(x, breaks, ...)
histo(x, breaks, ...)
x |
numeric. A vector. |
breaks |
numeric. A vector of breakpoints to build the histogram,
possibly given by |
... |
Additional parameters (currently not used). |
An object of class "histogram"
, which can be plotted
by plot.histogram
from package graphics.
This object is a list with components:
breaks
: the n+1
cell boundaries;
counts
: n
integers giving the number of x
inside each cell;
xname
: a string with the actual x
argument name.
find_breaks()
in this package;
truehist()
from package MASS;
hist()
from package graphics.
The generic function kernelfun
creates
a smoothing kernel function.
kernel_properties(name, derivative = FALSE) kernelfun(name, ...) ## S3 method for class ''function'' kernelfun(name, ...) ## S3 method for class 'character' kernelfun(name, derivative = FALSE, ...) .kernelsList()
kernel_properties(name, derivative = FALSE) kernelfun(name, ...) ## S3 method for class ''function'' kernelfun(name, ...) ## S3 method for class 'character' kernelfun(name, derivative = FALSE, ...) .kernelsList()
name |
character.
The name of the kernel to be used.
Authorized kernels are listed in |
derivative |
logical. If |
... |
Additional arguments to be passed to the kernel function. |
A function.
density
in package stats.
kernel_properties("gaussian") k <- kernelfun("epanechnikov") curve(k(x), xlim = c(-1, 1))
kernel_properties("gaussian") k <- kernelfun("epanechnikov") curve(k(x), xlim = c(-1, 1))
This function computes a lagged vector, shifting it back or forward.
lagk(x, k, na = FALSE, cst = FALSE)
lagk(x, k, na = FALSE, cst = FALSE)
x |
A vector. |
k |
integer. The number of lags.
If |
na |
logical. If |
cst |
logical.
If |
A vector of the same type and length as x
.
v <- sample(1:10) print(v) lagk(v, 1) lagk(v, 1, na = TRUE) lagk(v, -2) lagk(v, -3, na = TRUE) lagk(v, -3, na = FALSE, cst = TRUE) lagk(v, -3, na = FALSE)
v <- sample(1:10) print(v) lagk(v, 1) lagk(v, 1, na = TRUE) lagk(v, -2) lagk(v, -3, na = TRUE) lagk(v, -3, na = FALSE, cst = TRUE) lagk(v, -3, na = FALSE)
The function mfv()
returns the most frequent value(s) (or mode(s))
found in a vector.
The function mfv1
returns the first of these values, so that
mfv1(x)
is identical to mfv(x)[[1L]]
.
mfv(x, ...) ## Default S3 method: mfv(x, na_rm = FALSE, ...) ## S3 method for class 'tableNA' mfv(x, na_rm = FALSE, ...) mfv1(x, na_rm = FALSE, ...)
mfv(x, ...) ## Default S3 method: mfv(x, na_rm = FALSE, ...) ## S3 method for class 'tableNA' mfv(x, na_rm = FALSE, ...) mfv1(x, na_rm = FALSE, ...)
x |
Vector of observations (of type numeric, integer, character, factor, or
logical).
|
... |
Additional arguments (currently not used). |
na_rm |
logical. If |
See David Smith' blog post
here
to understand the philosophy followed in the code of mfv
for missing
values treatment.
The function mfv
returns a vector of the same type as x
.
One should be aware that this vector can be of length > 1
, in case of
multiple modes.
mfv1
always returns a vector of length 1
(the first of the modes found).
mfv()
calls the function tabulate
.
Dutta S. and Goswami A. (2010). Mode estimation for discrete distributions. Mathematical Methods of Statistics, 19(4):374–384.
# Basic examples: mfv(integer(0)) # NaN mfv(c(3, 3, 3, 2, 4)) # 3 mfv(c(TRUE, FALSE, TRUE)) # TRUE mfv(c("a", "a", "b", "a", "d")) # "a" mfv(c("a", "a", "b", "b", "d")) # c("a", "b") mfv1(c("a", "a", "b", "b", "d")) # "a" # With missing values: mfv(c(3, 3, 3, 2, NA)) # 3 mfv(c(3, 3, 2, NA)) # NA mfv(c(3, 3, 2, NA), na_rm = TRUE) # 3 mfv(c(3, 3, 2, 2, NA)) # NA mfv(c(3, 3, 2, 2, NA), na_rm = TRUE) # c(2, 3) mfv1(c(3, 3, 2, 2, NA), na_rm = TRUE)# 2 # With only missing values: mfv(c(NA, NA)) # NA mfv(c(NA, NA), na_rm = TRUE) # NaN # With factors mfv(factor(c("a", "b", "a"))) mfv(factor(c("a", "b", "a", NA))) mfv(factor(c("a", "b", "a", NA)), na_rm = TRUE)
# Basic examples: mfv(integer(0)) # NaN mfv(c(3, 3, 3, 2, 4)) # 3 mfv(c(TRUE, FALSE, TRUE)) # TRUE mfv(c("a", "a", "b", "a", "d")) # "a" mfv(c("a", "a", "b", "b", "d")) # c("a", "b") mfv1(c("a", "a", "b", "b", "d")) # "a" # With missing values: mfv(c(3, 3, 3, 2, NA)) # 3 mfv(c(3, 3, 2, NA)) # NA mfv(c(3, 3, 2, NA), na_rm = TRUE) # 3 mfv(c(3, 3, 2, 2, NA)) # NA mfv(c(3, 3, 2, 2, NA), na_rm = TRUE) # c(2, 3) mfv1(c(3, 3, 2, 2, NA), na_rm = TRUE)# 2 # With only missing values: mfv(c(NA, NA)) # NA mfv(c(NA, NA), na_rm = TRUE) # NaN # With factors mfv(factor(c("a", "b", "a"))) mfv(factor(c("a", "b", "a", NA))) mfv(factor(c("a", "b", "a", NA)), na_rm = TRUE)
Compute the midhinge of a numeric vector x
,
defined as the average of the first and third quartiles.
midhinge(x, na_rm = FALSE, ...)
midhinge(x, na_rm = FALSE, ...)
x |
numeric. A numeric vector. |
na_rm |
logical. Should missing values be removed before computing the midhinge? |
... |
Additional arguments to be passed to |
A numeric value, the midhinge.
https://en.wikipedia.org/wiki/Midhinge.
Compute the mid-range of a numeric vector x
,
defined as the mean of the minimum and the maximum.
midrange(x, na_rm = FALSE)
midrange(x, na_rm = FALSE)
x |
numeric. A numeric vector. |
na_rm |
logical. Should missing values be removed before computing the mid-range? |
A numeric value, the mid-range.
https://en.wikipedia.org/wiki/Mid-range.
picor
looks for a piecewise-constant function as a regression
function. The regression is necessarily univariate.
This is essentially a wrapper for rpart
(regression
tree) and isoreg
.
picor(formula, data, method, min_length = 0, ...) ## S3 method for class 'picor' knots(Fn, ...) ## S3 method for class 'picor' predict(object, newdata, ...) ## S3 method for class 'picor' plot(x, ...) ## S3 method for class 'picor' print(x, ...)
picor(formula, data, method, min_length = 0, ...) ## S3 method for class 'picor' knots(Fn, ...) ## S3 method for class 'picor' predict(object, newdata, ...) ## S3 method for class 'picor' plot(x, ...) ## S3 method for class 'picor' print(x, ...)
formula |
formula of the model to be fitted. |
data |
optional data frame. |
method |
character. If |
min_length |
integer. The minimal distance between two consecutive knots. |
... |
Additional arguments to be passed to |
object , x , Fn
|
An object of class |
newdata |
data.frame to be passed to the |
An object of class "picor"
, which is a list composed of the
following elements:
formula: the formula passed as an argument;
x: the numeric vector of predictors;
y: the numeric vector of responses;
knots: a numeric vector (possibly of length 0), the knots found;
values: a numeric vector (of length length(knots)+1
),
the constant values taken by the regression function between the knots.
## Not run: s <- stats::stepfun(c(-1,0,1), c(1., 2., 4., 3.)) x <- stats::rnorm(1000) y <- s(x) p <- picor(y ~ x, data.frame(x = x, y = y)) print(p) plot(p) ## End(Not run)
## Not run: s <- stats::stepfun(c(-1,0,1), c(1., 2., 4., 3.)) x <- stats::rnorm(1000) y <- s(x) p <- picor(y ~ x, data.frame(x = x, y = y)) print(p) plot(p) ## End(Not run)
Plots a loess object adjusted on one unique explanatory variable.
## S3 method for class 'loess' plot(x, ...)
## S3 method for class 'loess' plot(x, ...)
x |
An object of class |
... |
Additional graphical arguments. |
loess
from package stats.
reg <- loess(dist ~ speed, cars) plot(reg)
reg <- loess(dist ~ speed, cars) plot(reg)
Default method of the predict
generic
function, which can be used when the model object is empty.
## Default S3 method: predict(object, newdata, ...)
## Default S3 method: predict(object, newdata, ...)
object |
A model object, possibly empty. |
newdata |
An optional data frame in which to look for variables with which to predict. If omitted, the fitted values are used. |
... |
Additional arguments. |
A vector of predictions.
predict
from package stats.
stats::predict(NULL) stats::predict(NULL, newdata = data.frame(x = 1:2, y = 2:3))
stats::predict(NULL) stats::predict(NULL, newdata = data.frame(x = 1:2, y = 2:3))
Count the occurrences of each factor level or value in a vector.
tableNA(x)
tableNA(x)
x |
numeric. An atomic vector or a factor. |
An object of class "tableNA"
, which is the result of
tabulate()
with three attributes:
type_of_x
: the result of typeof(x)
;
is_factor_x
: the result of is.factor(x)
;
levels
: the result of levels(x)
.
The number of missing values is always reported.
tableNA(c(1,2,2,1,3)) tableNA(c(1,2,2,1,3, NA))
tableNA(c(1,2,2,1,3)) tableNA(c(1,2,2,1,3, NA))
Compute the trimean of a numeric vector x
.
trimean(x, na_rm = FALSE, ...)
trimean(x, na_rm = FALSE, ...)
x |
numeric. A numeric vector. |
na_rm |
logical. Should missing values be removed before computing the trimean? |
... |
Additional arguments to be passed to |
A numeric value, the trimean.
https://en.wikipedia.org/wiki/Trimean