Title: | Lighten your R Model Outputs |
---|---|
Description: | The strip function deletes components of R model outputs that are useless for specific purposes, such as predict[ing], print[ing], summary[izing], etc. |
Authors: | Paul Poncet [aut, cre] |
Maintainer: | Paul Poncet <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.0.0 |
Built: | 2024-11-04 03:18:33 UTC |
Source: | https://github.com/paulponcet/strip |
The strip
function deletes components of R model outputs that are
useless for specific purposes,
such as predict
[ing], print
[ing], summary
[izing], etc.
The idea is to prevent the size of the model output to grow with the size of the training dataset. This is useful if one has to save the output for later use while limiting its size on disk.
The birth of this package originates with Nina Zumel's post ‘Trimming the Fat from glm() Models in R’ on Win-Vector Blog.
strip(object, keep, ...) strip_(object, keep, ...) ## Default S3 method: strip_(object, keep, ...) ## S3 method for class 'gam' strip_(object, keep, ...) ## S3 method for class 'glm' strip_(object, keep, ...) ## S3 method for class 'kmeans' strip_(object, keep, ...) ## S3 method for class 'lm' strip_(object, keep, ...) ## S3 method for class 'loess' strip_(object, keep, ...) ## S3 method for class 'randomForest' strip_(object, keep, ...) ## S3 method for class 'train' strip_(object, keep, use_trim = FALSE, ...)
strip(object, keep, ...) strip_(object, keep, ...) ## Default S3 method: strip_(object, keep, ...) ## S3 method for class 'gam' strip_(object, keep, ...) ## S3 method for class 'glm' strip_(object, keep, ...) ## S3 method for class 'kmeans' strip_(object, keep, ...) ## S3 method for class 'lm' strip_(object, keep, ...) ## S3 method for class 'loess' strip_(object, keep, ...) ## S3 method for class 'randomForest' strip_(object, keep, ...) ## S3 method for class 'train' strip_(object, keep, use_trim = FALSE, ...)
object |
result of an R model, see 'Details'. |
keep |
character. A vector of values among |
... |
Additional arguments to be passed to other methods. |
use_trim |
boolean. For the |
If keep="predict"
, components inside the list object
are kept
if they are needed by the predict
method, otherwise they are set to NULL
.
If keep=c("predict", "print")
, components are kept as soon as
they are needed by one of the
predict
or print
methods.
If keep="everything"
, object
is returned with no modifications.
Currently the models supported are limited to the following list:
lm
and glm
, the linear and generalized linear regression function from package stat;
loess
, the local polynomial regression function from package stat;
randomForest
, from package randomForest.
There is also a strip
function for 'train' objects built with the caret package.
Further developments of the package should include additional models,
and should enable additional keep
values
(e.g. keep="summary"
, keep="anova"
, etc.)
A list of the same class as object
is returned.
The method for glm
objects is adapted
from Nina Zumel's post
on Win-Vector Blog.
The method for randomForest
objects is adapted
from ReKa's answer
on StackExchange.
See Nina Zumel's post
on Win-Vector Blog for further insight, examples, and motivations;
ReKa's answer on StackExchange for reducing the size of a
randomForest
object; this discussion for limiting
the ‘footprint’ of regression and classification objects within the caret package.
data("mtcars") set.seed(110) i = sample(2, nrow(mtcars), replace = TRUE, prob=c(0.8, 0.2)) r1 = lm(mpg ~ ., data = mtcars[i==1,]) r2 = strip(r1, keep = "predict") # Estimate the objects' size as the size of their serialization length(serialize(r1, NULL)) length(serialize(r2, NULL)) # Check that predictions are the same p1 = predict(r1, newdata = mtcars[i==2,]) p2 = predict(r2, newdata = mtcars[i==2,]) identical(p1, p2) # TRUE
data("mtcars") set.seed(110) i = sample(2, nrow(mtcars), replace = TRUE, prob=c(0.8, 0.2)) r1 = lm(mpg ~ ., data = mtcars[i==1,]) r2 = strip(r1, keep = "predict") # Estimate the objects' size as the size of their serialization length(serialize(r1, NULL)) length(serialize(r2, NULL)) # Check that predictions are the same p1 = predict(r1, newdata = mtcars[i==2,]) p2 = predict(r2, newdata = mtcars[i==2,]) identical(p1, p2) # TRUE