| Title: | Various Common Statistical Utilities |
|---|---|
| Description: | Utilities for simplifying common statistical operations including probability density functions, cumulative distribution functions, Kolmogorov-Smirnov tests, principal component analysis plots, and prediction plots. |
| Authors: | Zach Peagler [aut, cre, cph] |
| Maintainer: | Zach Peagler <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.1.2 |
| Built: | 2026-06-09 08:21:20 UTC |
| Source: | https://github.com/zachpeagler/ztils |
A function for calculating the pseudo R^2 of a glm object
glm_pseudor2(mod)glm_pseudor2(mod)
mod |
The model for which to calculate the pseudo R^2 |
The pseudo R^2 value of the model
gmod <- glm(Sepal.Length ~ Petal.Length + Species, data = iris) glm_pseudor2(gmod)gmod <- glm(Sepal.Length ~ Petal.Length + Species, data = iris) glm_pseudor2(gmod)
This function gets the cumulative distribution function for selected distributions against a continuous, non-negative input variable. Possible distributions include "normal", "lognormal", "gamma", "exponential", "cauchy", "t", "weibull", "logistic", and "all".
multicdf_cont(var, seq_length = 50, distributions = "all")multicdf_cont(var, seq_length = 50, distributions = "all")
var |
The variable of which to get the CDF |
seq_length |
The length of sequence to fit the distribution to |
distributions |
The distributions to fit x against |
A dataframe with x, the real density, and the pdf of the desired distributions with length (nrows) equal to seq_length +1.
multicdf_cont(iris$Petal.Length) multicdf_cont(iris$Sepal.Length, 100, c("normal", "lognormal") )multicdf_cont(iris$Petal.Length) multicdf_cont(iris$Sepal.Length, 100, c("normal", "lognormal") )
This function extends 'multiCDF_cont' and gets the cumulative distribution functions (CDFs) for selected distributions against a continuous variable. Possible distributions include any combination of "normal", "lognormal", "gamma", "exponential", and "all" (which just uses all of the prior distributions). It then plots this using 'ggplot2' and a 'scico' palette, using var_name for the plot labeling, if specified. If not specified, it will use var instead.
multicdf_plot( var, seq_length = 50, distributions = "all", palette = "oslo", var_name = NULL )multicdf_plot( var, seq_length = 50, distributions = "all", palette = "oslo", var_name = NULL )
var |
The variable to for which to plot CDFs |
seq_length |
The number of points over which to fit x |
distributions |
The distributions to fit x against |
palette |
The color palette to use on the graph |
var_name |
The variable name to use for x |
A plot showing the CDF of the selected variable against the selected distributions over the selected sequence length
multicdf_plot(iris$Sepal.Length) multicdf_plot(iris$Sepal.Length, seq_length = 100, distributions = c("normal", "lognormal", "gamma"), palette = "bilbao", var_name = "Sepal Length (cm)" )multicdf_plot(iris$Sepal.Length) multicdf_plot(iris$Sepal.Length, seq_length = 100, distributions = c("normal", "lognormal", "gamma"), palette = "bilbao", var_name = "Sepal Length (cm)" )
This function gets the distance and p-value from a Kolmogorov-smirnov test for selected distributions against a continuous input variable. Possible distributions include "normal", "lognormal", "gamma", "exponential", and "all".
multiks_cont(var, distributions = "all")multiks_cont(var, distributions = "all")
var |
The variable to perform ks tests against |
distributions |
The distributions to test x against |
A dataframe with the distance and p value for each performed ks test
multiks_cont(iris$Sepal.Length) multiks_cont(iris$Sepal.Length, c("normal", "lognormal"))multiks_cont(iris$Sepal.Length) multiks_cont(iris$Sepal.Length, c("normal", "lognormal"))
This function gets the proportional density functions for selected distributions against continuous, non-negative numbers. Possible distributions include "normal", "lognormal", "gamma", "exponential", and "all".
multipdf_cont(var, seq_length = 50, distributions = "all")multipdf_cont(var, seq_length = 50, distributions = "all")
var |
The variable of which to get the PDF. |
seq_length |
The length of sequence to fit the distribution to |
distributions |
The distributions to fit x against |
A dataframe with x, the real density, and the pdf of the desired distributions with length (nrows) equal to seq_length +1.
multipdf_cont(iris$Petal.Length) multipdf_cont(iris$Sepal.Length, 100, c("normal", "lognormal"))multipdf_cont(iris$Petal.Length) multipdf_cont(iris$Sepal.Length, 100, c("normal", "lognormal"))
This function extends 'multiPDF_cont' and gets the probability density functions (PDFs) for selected distributions against continuous variables. Possible distributions include any combination of "normal", "lognormal", "gamma", "exponential", and "all" (which just uses all of the prior distributions). It then plots this using 'ggplot2' and a 'scico' palette, using var_name for the plot labeling, if specified. If not specified, it will use var instead.
multipdf_plot( var, seq_length = 50, distributions = "all", palette = "oslo", var_name = NULL )multipdf_plot( var, seq_length = 50, distributions = "all", palette = "oslo", var_name = NULL )
var |
The variable to for which to plot PDFs |
seq_length |
The number of points over which to fit x |
distributions |
The distributions to fit x against |
palette |
The color palette to use on the graph |
var_name |
The variable name to use for x. If no name is provided, the function will grab the column name provided in x |
A plot showing the PDF of the selected variable against the selected distributions over the selected sequence length
multipdf_plot(iris$Sepal.Length) multipdf_plot(iris$Sepal.Length, seq_length = 100, distributions = c("normal", "lognormal", "gamma"), palette = "bilbao", var_name = "Sepal Length (cm)" )multipdf_plot(iris$Sepal.Length) multipdf_plot(iris$Sepal.Length, seq_length = 100, distributions = c("normal", "lognormal", "gamma"), palette = "bilbao", var_name = "Sepal Length (cm)" )
This function returns a dataframe subsetted to not include observations that are beyond the extremes of the specified variable. Extremes are defined by the quantiles +- 3 times the interquartile range.
no_extremes(data, var)no_extremes(data, var)
data |
The data to subset |
var |
The variable to subset by. |
A dataframe without entries containing extremes in the selected variable.
no_extremes(iris, Sepal.Length)no_extremes(iris, Sepal.Length)
This function returns a dataframe subsetted to not include observations that are beyond the outliers of the specified variable. Outliers are defined by the quantiles +- 1.5 times the interquartile range.
no_outliers(data, var)no_outliers(data, var)
data |
The data to subset |
var |
The variable to subset by |
A dataframe without entries containing outliers in the selected variable.
no_outliers(iris, Sepal.Length)no_outliers(iris, Sepal.Length)
This function uses a dataframe, PCA variables, and a scaled boolean to generate a dataframe with principal components as columns.
pca_data(data, pcavars, scaled = FALSE)pca_data(data, pcavars, scaled = FALSE)
data |
The dataframe to add principal components to. |
pcavars |
The variables to include in the principle component analysis |
scaled |
A boolean (TRUE or FALSE) indicating if the pcavars are already scaled |
A plot showing PC1 on the x axis, PC2 on the y axis, colored by group, with vectors and labels showing the individual pca variables.
pca_data(iris, iris[,c(1:4)], FALSE)pca_data(iris, iris[,c(1:4)], FALSE)
This function uses a group, PCA variables, and a 'scaled' boolean to generate a biplot using 'ggplot2' and 'scico'.
If scaled is set to TRUE, variables will not be scaled. If scaled is set to FALSE, variables will be scaled.
pca_plot(group, pcavars, scaled = FALSE, palette = "oslo")pca_plot(group, pcavars, scaled = FALSE, palette = "oslo")
group |
The group variable (column) |
pcavars |
The variables to include in the principle component analysis |
scaled |
A boolean (TRUE or FALSE) indicating if the pcavars are already scaled |
palette |
A color palette to use on the plot, with each group assigned to a color. |
A plot showing PC1 on the x axis, PC2 on the y axis, colored by group, with vectors and labels showing the individual pca variables.
pca_plot(iris$Species, iris[,c(1:4)]) pca_plot(iris$Species, iris[,c(1:4)], FALSE, "bilbao")pca_plot(iris$Species, iris[,c(1:4)]) pca_plot(iris$Species, iris[,c(1:4)], FALSE, "bilbao")
This function uses a model, dataframe, and supplied predictor, response, and group variables to make predictions based off the model over a user-defined length with options to predict over the confidence or prediction interval and to apply a mathematical correction. It then graphs both the real data and the specified interval using 'ggplot2'. You can also choose the color palette from 'scico' palettes.
predict_plot( mod, data, rvar, pvar, group = NULL, length = 50, interval = "confidence", correction = "normal", palette = "oslo" )predict_plot( mod, data, rvar, pvar, group = NULL, length = 50, interval = "confidence", correction = "normal", palette = "oslo" )
mod |
the model used for predictions |
data |
the data used to render the "real" points on the graph and for aggregating groups to determine prediction limits (should be the same as the data used in the model) |
rvar |
the response variable (y variable / variable the model is predicting) |
pvar |
the predictor variable (x variable / variable the model will predict against) |
group |
the group; should be a factor; one response curve will be made for each group |
length |
the length of the variable over which to predict (higher = more resolution, essentially) |
interval |
the type of interval to predict ("confidence" or "prediction") |
correction |
the type of correction to apply to the prediction ("normal", "exponential", or "logit") |
palette |
the color palette used to color the graph, with each group corresponding to a color |
A plot showing the real data and the model's predicted 95% CI or PI over a number of groups, with optional corrections.
## Example 1 mod1 <- lm(Sepal.Length ~ Petal.Length + Species, data = iris) predict_plot(mod1, iris, Sepal.Length, Petal.Length, Species)## Example 1 mod1 <- lm(Sepal.Length ~ Petal.Length + Species, data = iris) predict_plot(mod1, iris, Sepal.Length, Petal.Length, Species)