Title: | Bayesian Multinomial Logistic Normal Regression |
---|---|
Description: | Provides methods for fitting and inspection of Bayesian Multinomial Logistic Normal Models using MAP estimation and Laplace Approximation as developed in Silverman et. Al. (2022) <https://www.jmlr.org/papers/v23/19-882.html>. Key functionality is implemented in C++ for scalability. 'fido' replaces the previous package 'stray'. |
Authors: | Justin Silverman [aut], Kim Roche [ctb], Michelle Nixon [ctb, cre] |
Maintainer: | Michelle Nixon <[email protected]> |
License: | GPL-3 |
Version: | 1.1.1 |
Built: | 2024-10-31 05:44:11 UTC |
Source: | https://github.com/jsilve24/fido |
Compute the ALR of a matrix
alr(x, d = NULL)
alr(x, d = NULL)
x |
A matrix where the rows are the samples |
d |
Index of column used as a reference. Defaults to last column |
matrix
Compute the ALR of an array
alr_array(x, d = dim(x)[parts], parts)
alr_array(x, d = dim(x)[parts], parts)
x |
multidimensional array in simplex |
d |
Index of column used as a reference. Defaults to last column |
parts |
index of dimension of 'x' that represents parts |
array
Compute the inverse ALR of a matrix
alrInv(y, d = NULL)
alrInv(y, d = NULL)
y |
An ALR transformed matrix |
d |
Index of column used as a reference. Defaults to last column |
matrix
Compute the ALR of an array
alrInv_array(y, d = dim(y)[coords] + 1, coords)
alrInv_array(y, d = dim(y)[coords] + 1, coords)
y |
multidimensional ALR transformed array |
d |
Index of column used as a reference. Defaults to last column |
coords |
index of dimension of 'x' that represents coordinates |
array
Convert object of class orthusfit to a list
## S3 method for class 'orthusfit' as.list(x, ...)
## S3 method for class 'orthusfit' as.list(x, ...)
x |
an object of class orthusfit |
... |
currently unused |
A list of the converted orthusfit object
Convert object of class pibblefit to a list
## S3 method for class 'pibblefit' as.list(x, ...)
## S3 method for class 'pibblefit' as.list(x, ...)
x |
an object of class pibblefit |
... |
currently unused |
A list from the converted pibblefit object.
convert list to orthusfit
as.orthusfit(object)
as.orthusfit(object)
object |
list object |
An orthusfit object
convert list to pibblefit
as.pibblefit(object)
as.pibblefit(object)
object |
list object |
A pibblefit object
Basset (A Lazy Learner) - non-linear regression models in fido
basset( Y = NULL, X, upsilon = NULL, Theta = NULL, Gamma = NULL, Xi = NULL, linear = NULL, init = NULL, pars = c("Eta", "Lambda", "Sigma"), newdata = NULL, ... ) ## S3 method for class 'bassetfit' refit(m, pars = c("Eta", "Lambda", "Sigma"), ...)
basset( Y = NULL, X, upsilon = NULL, Theta = NULL, Gamma = NULL, Xi = NULL, linear = NULL, init = NULL, pars = c("Eta", "Lambda", "Sigma"), newdata = NULL, ... ) ## S3 method for class 'bassetfit' refit(m, pars = c("Eta", "Lambda", "Sigma"), ...)
Y |
D x N matrix of counts (if NULL uses priors only) |
X |
Q x N matrix of covariates (cannot be NULL) |
upsilon |
dof for inverse wishart prior (numeric must be > D) (default: D+3) |
Theta |
A function from dimensions dim(X) -> (D-1)xN (prior mean of gaussian process). For an additive GP model, can be a list of functions from dimensions dim(X) -> (D-1)xN + a (optional) matrix of size (D-1)xQ for the prior of a linear component if desired. |
Gamma |
A function from dimension dim(X) -> NxN (kernel matrix of gaussian process). For an additive GP model, can be a list of functions from dimension dim(X) -> NxN + a QxQ prior covariance matrix if a linear component is specified. It is assumed that the order matches the order of Theta. |
Xi |
(D-1)x(D-1) prior covariance matrix (default: ALR transform of diag(1)*(upsilon-D)/2 - this is essentially iid on "base scale" using Aitchison terminology) |
linear |
A vector denoting which rows of X should be used if a linear component is specified. Default is all rows. |
init |
(D-1) x Q initialization for Eta for optimization |
pars |
character vector of posterior parameters to return |
newdata |
Default is |
... |
other arguments passed to pibble (which is used internally to fit the basset model) |
m |
object of class bassetfit |
the full model is given by:
Where is short hand for the Gram matrix of the Kernel function.
Alternatively can be used to fit an additive GP of the form:
Where is short hand for the Gram matrix of the Kernel function.
Default behavior is to use MAP estimate for uncollaping the LTP model if laplace approximation is not preformed.
an object of class bassetfit
Check vector/matrix/data.frame for expected dimensions or throw error
check_dims(x, d, par)
check_dims(x, d, par)
x |
object to check |
d |
expected dimensions |
par |
character name of x (for error message) |
nothing if no error, otherwise throws error
y <- c(1,3,4) check_dims(y, 3, "y")
y <- c(1,3,4) check_dims(y, 3, "y")
Compute the CLR of an array
clr_array(x, parts)
clr_array(x, parts)
x |
multidimensional array in index |
parts |
index of dimension of 'x' that represents parts |
array
Orthus: Returned as array of dimension (D-1+P) x Q x iter (if in ALR or ILR) otherwise (D+P) x Q x iter.
## S3 method for class 'orthusfit' coef(object, ...)
## S3 method for class 'orthusfit' coef(object, ...)
object |
an object of class orthusfit |
... |
other options passed to coef.orthusfit (see details) |
Other arguments:
use_names if column and row names were passed for Y and X in
call to pibble
, should these names be applied to output
array.
Array of dimension (D-1) x Q x iter
Pibble: Returned as array of dimension (D-1) x Q x iter (if in ALR or ILR) otherwise DxQxiter (if in proportions or clr).
## S3 method for class 'pibblefit' coef(object, ...)
## S3 method for class 'pibblefit' coef(object, ...)
object |
an object of class pibblefit |
... |
other options passed to coef.pibblefit (see details) |
Other arguments:
'use_names' if column and row names were passed for Y and X in
call to pibble
, should these names be applied to output
array.
Array of dimension (D-1) x Q x iter
See details for model. Notation: N
is number of samples,
D
is the dimension of the response, Q
is number
of covariates.
conjugateLinearModel(Y, X, Theta, Gamma, Xi, upsilon, n_samples = 2000L)
conjugateLinearModel(Y, X, Theta, Gamma, Xi, upsilon, n_samples = 2000L)
Y |
matrix of dimension D x N |
X |
matrix of covariates of dimension Q x N |
Theta |
matrix of prior mean of dimension D x Q |
Gamma |
covariance matrix of dimension Q x Q |
Xi |
covariance matrix of dimension D x D |
upsilon |
scalar (must be > D-1) degrees of freedom for InvWishart prior |
n_samples |
number of samples to draw (default: 2000) |
This function provides a means of sampling from the posterior distribution of
Lambda
and Sigma
.
List with components
Lambda Array of dimension (D-1) x Q x n_samples (posterior samples)
Sigma Array of dimension (D-1) x (D-1) x n_samples (posterior samples)
sim <- pibble_sim() eta.hat <- t(alr(t(sim$Y+0.65))) fit <- conjugateLinearModel(eta.hat, sim$X, sim$Theta, sim$Gamma, sim$Xi, sim$upsilon, n_samples=2000)
sim <- pibble_sim() eta.hat <- t(alr(t(sim$Y+0.65))) fit <- conjugateLinearModel(eta.hat, sim$X, sim$Theta, sim$Gamma, sim$Xi, sim$upsilon, n_samples=2000)
Convert orthus covariance matricies between representations
oilrvar2ilrvar(Sigma, s, V1, V2) oilrvar2clrvar(Sigma, s, V) oclrvar2ilrvar(Sigma, s, V) oalrvar2clrvar(Sigma, s, d1) oclrvar2alrvar(Sigma, s, d2) oalrvar2alrvar(Sigma, s, d1, d2) oalrvar2ilrvar(Sigma, s, d1, V2) oilrvar2alrvar(Sigma, s, V1, d2)
oilrvar2ilrvar(Sigma, s, V1, V2) oilrvar2clrvar(Sigma, s, V) oclrvar2ilrvar(Sigma, s, V) oalrvar2clrvar(Sigma, s, d1) oclrvar2alrvar(Sigma, s, d2) oalrvar2alrvar(Sigma, s, d1, d2) oalrvar2ilrvar(Sigma, s, d1, V2) oilrvar2alrvar(Sigma, s, V1, d2)
Sigma |
covariance matrix arrat in specified transformed space (dim(Sigma)[3]=iter) |
s |
first s rows and colums of Sigma are transformed |
V1 |
ILR contrast matrix of basis Sigma is already in |
V2 |
ILR contrast matrix of basis Sigma is desired in |
V |
ILR contrast matrix (i.e., transformation matrix of ILR) |
d1 |
alr reference element Sigma is already expressed with respec to |
d2 |
alr reference element Sigma is to be expressed with respect to |
matrix
Create a default ILR base
create_default_ilr_base(D)
create_default_ilr_base(D)
D |
the number of parts (e.g., number of columns in untransformed data) |
A matrix
Provides methods for fitting and inspection of Bayesian Multinomial Logistic Normal Models using MAP estimation and Laplace Approximation. Key functionality is implemented in C++ for scalability.
Maintainer: Michelle Nixon [email protected] [contributor]
Authors:
Justin Silverman [email protected]
Other contributors:
Kim Roche [email protected] [contributor]
Useful links:
These are a collection of convenience functions for transforming fido fit objects to a number of different representations including ILR bases, CLR coordinates, ALR coordinates, and proportions.
to_proportions(m) to_alr(m, d) to_ilr(m, V = NULL) to_clr(m) ## S3 method for class 'pibblefit' to_proportions(m) ## S3 method for class 'orthusfit' to_proportions(m) ## S3 method for class 'pibblefit' to_alr(m, d) ## S3 method for class 'orthusfit' to_alr(m, d) ## S3 method for class 'pibblefit' to_ilr(m, V = NULL) ## S3 method for class 'orthusfit' to_ilr(m, V = NULL) ## S3 method for class 'pibblefit' to_clr(m) ## S3 method for class 'orthusfit' to_clr(m)
to_proportions(m) to_alr(m, d) to_ilr(m, V = NULL) to_clr(m) ## S3 method for class 'pibblefit' to_proportions(m) ## S3 method for class 'orthusfit' to_proportions(m) ## S3 method for class 'pibblefit' to_alr(m, d) ## S3 method for class 'orthusfit' to_alr(m, d) ## S3 method for class 'pibblefit' to_ilr(m, V = NULL) ## S3 method for class 'orthusfit' to_ilr(m, V = NULL) ## S3 method for class 'pibblefit' to_clr(m) ## S3 method for class 'orthusfit' to_clr(m)
m |
object of class pibblefit or orthusfit (e.g., output of |
d |
(integer) multinomial category to take as new alr reference |
V |
(matrix) contrast matrix for ILR basis to transform into to (defaults to
|
For orthus, transforms only appleid to log-ratio parameters
Note: that there is a degeneracy of representations for a covariance
matrix represented in terms of proportions. As such the function
to_proportions
does not attempt to transform parameters Sigma
or prior Xi and instead just removes them from the pibblefit object returned.
object
Gather Multidimensional Array to Tidy Tibble
gather_array(a, value, ..., .id = NULL)
gather_array(a, value, ..., .id = NULL)
a |
multidimensional array |
value |
unquoted name of column with values (defaults to "var") |
... |
unquoted dimension names (defaults to "dim_1", "dim_2", etc...) |
.id |
if specified, name for column created with name of a captured |
data.frame
spread_array
a <- array(1:100, dim =c(10, 5, 2)) gather_array(a, sequence, A, B, C)
a <- array(1:100, dim =c(10, 5, 2)) gather_array(a, sequence, A, B, C)
Designed to be partially specified. (see examples)
SE(X, sigma = 1, rho = median(as.matrix(dist(t(X)))), jitter = 1e-10) LINEAR(X, sigma = 1, c = rep(0, nrow(X)))
SE(X, sigma = 1, rho = median(as.matrix(dist(t(X)))), jitter = 1e-10) LINEAR(X, sigma = 1, c = rep(0, nrow(X)))
X |
covariate (dimension Q x N; i.e., covariates x samples) |
sigma |
scalar parameter |
rho |
scalar bandwidth parameter |
jitter |
small scalar to add to off-diagonal of gram matrix (for numerical underflow issues) |
c |
vector parameter defining intercept for linear kernel |
Gram matrix G is given by
SE (squared exponential):
LINEAR:
Gram Matrix (N x N) (e.g., the Kernel evaluated at each pair of points)
Takes idea from Wu et al. (citation below) and calculates IQLR for Lambda, potentially useful if you believe there is an invariant group of categories (e.g., taxa / genes) that are not changing (in absolute abundance) between samples. IQLR is defined as
for i in 1,...,D.
IQVF are the CLR coordinates whose variance is within the inter-quantile range
(defined by probs
argument to this function).
A different IQVF is fit for each posteior sample as the IQVFs are calculted
based on posterior estimates for Lambda. The variance of a CLR coordinate
is defined as the norm of each row of Lambda[,focus.cov] (i.e.,
the covariation in Eta, explained by those covariates). This definition of
variance allows uses to exclude variation from technical / trivial sources
in calculation of IQVF/IQLR.
lambda_to_iqlr(m, focus.cov = NULL, probs = c(0.25, 0.75))
lambda_to_iqlr(m, focus.cov = NULL, probs = c(0.25, 0.75))
m |
object of class pibblefit (e.g., output of |
focus.cov |
vector of integers or characters specifying columns (covariates) of Lambda to include in calculating IQLR (if NULL, default, then uses all covariates) |
probs |
bounds for categories (i.e., features / genes / taxa) to include in calculation of iqlr (smaller bounds means more stringent inclusion criteria) |
Primarily intended for doing differential expression analysis under assumption that only small group of categories (e.g., taxa / genes) are changing
array of dimension (D, Q, iter) where D is number of taxa, Q is number of covariates, and iter is number of posterior samples.
Jia R. Wu, Jean M. Macklaim, Briana L. Genge, Gregory B. Gloor (2017) Finding the center: corrections for asymmetry in high-throughput sequencing datasets. arxiv:1704.01841v1
sim <- pibble_sim() fit <- pibble(sim$Y, sim$X) # Use first two covariates to define iqlr, just show first 5 samples lambda_to_iqlr(fit, 1:2)[,,1:5]
sim <- pibble_sim() fit <- pibble(sim$Y, sim$X) # Use first two covariates to define iqlr, just show first 5 samples lambda_to_iqlr(fit, 1:2)[,,1:5]
Log of Multivarate Gamma Function - Gamma_p(a)
lmvgamma(a, p)
lmvgamma(a, p)
a |
defined by Gamma_p(a) |
p |
defined by Gamma_p(a) |
Numeric
https://en.wikipedia.org/wiki/Multivariate_gamma_function
Derivative of Log of Multivariate Gamma Function - Gamma_p(a)
lmvgamma_deriv(a, p)
lmvgamma_deriv(a, p)
a |
defined by Gamma_p(a) |
p |
defined by Gamma_p(a) |
Numeric
https://en.wikipedia.org/wiki/Multivariate_gamma_function
Functions providing access to the Log Likelihood, Gradient, and Hessian
of the collapsed pibble model. Note: These are convenience functions
but are not as optimized as direct coding of the PibbleCollapsed
C++ class due to a lack of Memoization. By contrast function optimPibbleCollapsed
is much more optimized and massively cuts down on repeated calculations.
A more efficient Rcpp module based implementation of these functions
may following if the future. For model details see optimPibbleCollapsed
documentation
loglikPibbleCollapsed(Y, upsilon, ThetaX, KInv, AInv, eta, sylv = FALSE) gradPibbleCollapsed(Y, upsilon, ThetaX, KInv, AInv, eta, sylv = FALSE) hessPibbleCollapsed(Y, upsilon, ThetaX, KInv, AInv, eta, sylv = FALSE)
loglikPibbleCollapsed(Y, upsilon, ThetaX, KInv, AInv, eta, sylv = FALSE) gradPibbleCollapsed(Y, upsilon, ThetaX, KInv, AInv, eta, sylv = FALSE) hessPibbleCollapsed(Y, upsilon, ThetaX, KInv, AInv, eta, sylv = FALSE)
Y |
D x N matrix of counts |
upsilon |
(must be > D) |
ThetaX |
D-1 x N matrix formed by Theta*X (Theta is Prior mean for regression coefficients) |
KInv |
Inverse of K for LTP (for Pibble defined as KInv = solve(Xi)) |
AInv |
Inverse of A for LTP (for Pibble defined as AInv = solve(diag(N)+ X'GammaX) ) |
eta |
matrix (D-1)xN of parameter values at which to calculate quantities |
sylv |
(default:false) if true and if N < D-1 will use sylvester determinant identity to speed computation |
see below
loglikPibbleCollapsed - double
gradPibbleCollapsed - vector
hessPibbleCollapsed- matrix
D <- 10 Q <- 2 N <- 30 # Simulate Data Sigma <- diag(sample(1:8, D-1, replace=TRUE)) Sigma[2, 3] <- Sigma[3,2] <- -1 Gamma <- diag(sqrt(rnorm(Q)^2)) Theta <- matrix(0, D-1, Q) Phi <- Theta + t(chol(Sigma))%*%matrix(rnorm(Q*(D-1)), nrow=D-1)%*%chol(Gamma) X <- matrix(rnorm(N*(Q-1)), Q-1, N) X <- rbind(1, X) Eta <- Phi%*%X + t(chol(Sigma))%*%matrix(rnorm(N*(D-1)), nrow=D-1) Pi <- t(alrInv(t(Eta))) Y <- matrix(0, D, N) for (i in 1:N) Y[,i] <- rmultinom(1, sample(5000:10000), prob = Pi[,i]) # Priors upsilon <- D+10 Xi <- Sigma*(upsilon-D) # Precompute KInv <- solve(Xi) AInv <- solve(diag(N)+ t(X)%*%Gamma%*%X) ThetaX <- Theta%*%X loglikPibbleCollapsed(Y, upsilon, ThetaX, KInv, AInv, Eta) gradPibbleCollapsed(Y, upsilon, ThetaX, KInv, AInv, Eta)[1:5] hessPibbleCollapsed(Y, upsilon, ThetaX, KInv, AInv, Eta)[1:5,1:5]
D <- 10 Q <- 2 N <- 30 # Simulate Data Sigma <- diag(sample(1:8, D-1, replace=TRUE)) Sigma[2, 3] <- Sigma[3,2] <- -1 Gamma <- diag(sqrt(rnorm(Q)^2)) Theta <- matrix(0, D-1, Q) Phi <- Theta + t(chol(Sigma))%*%matrix(rnorm(Q*(D-1)), nrow=D-1)%*%chol(Gamma) X <- matrix(rnorm(N*(Q-1)), Q-1, N) X <- rbind(1, X) Eta <- Phi%*%X + t(chol(Sigma))%*%matrix(rnorm(N*(D-1)), nrow=D-1) Pi <- t(alrInv(t(Eta))) Y <- matrix(0, D, N) for (i in 1:N) Y[,i] <- rmultinom(1, sample(5000:10000), prob = Pi[,i]) # Priors upsilon <- D+10 Xi <- Sigma*(upsilon-D) # Precompute KInv <- solve(Xi) AInv <- solve(diag(N)+ t(X)%*%Gamma%*%X) ThetaX <- Theta%*%X loglikPibbleCollapsed(Y, upsilon, ThetaX, KInv, AInv, Eta) gradPibbleCollapsed(Y, upsilon, ThetaX, KInv, AInv, Eta)[1:5] hessPibbleCollapsed(Y, upsilon, ThetaX, KInv, AInv, Eta)[1:5,1:5]
High Resolution (hourly and daily) sampling of 4 in vitro artificial gut models with many technical replicates to identify technical variation.
data(mallard)
data(mallard)
A list containing "otu_table", "sample_data", "tax_table", and "refseq".
This data is at the sequence variant level. Data at the family level
processed as in Silverman et al. 2018 is given in mallard_family
Silverman et al. "Dynamic linear models guide design and analysis of microbiota studies within artificial human guts". Microbiome 2018 6:202
High Resolution (hourly and daily) sampling of 4 in vitro artificial gut models with many technical replicates to identify technical variation.
data(mallard_family)
data(mallard_family)
A list containing "otu_table", "sample_data", "tax_table", and "refseq".
This data is at the family level and processed as in Silverman et al. 2018. Data at the sequence
variant level without preprocessing is given in mallard
Silverman et al. "Dynamic linear models guide design and analysis of microbiota studies within artificial human guts". Microbiome 2018 6:202
Mock communities and calibration samples created for measuring and validating model of PCR bias. This data has been preprocessed as in the original manuscript.
a data.frame metadata associated with the counts matrix 'Y'
Justin D. Silverman, Rachael J. Bloom, Sharon Jiang, Heather K. Durand, Sayan Mukherjee, Lawrence A. David. (2019) Measuring and Mitigating PCR Bias in Microbiome Data. bioRxiv 604025; doi: https://doi.org/10.1101/604025
Closure operator
miniclo(x)
miniclo(x)
x |
vector or matrix (rows are samples, parts are columns) of data in simplex |
x with row entries divided by sum of row (converts vectors to row matricies)
x <- matrix(runif(30), 10, 3) x <- miniclo(x)
x <- matrix(runif(30), 10, 3) x <- miniclo(x)
Array version of miniclo
.
miniclo_array(x, parts)
miniclo_array(x, parts)
x |
multidimensional array |
parts |
index of dimension of |
array
x <- array(1:100, dim=c(10, 5, 2)) miniclo_array(x, parts=2)
x <- array(1:100, dim=c(10, 5, 2)) miniclo_array(x, parts=2)
This function is deprecated, please use pibble
instead.
mongrel( Y = NULL, X = NULL, upsilon = NULL, Theta = NULL, Gamma = NULL, Xi = NULL, init = NULL, pars = c("Eta", "Lambda", "Sigma"), ... )
mongrel( Y = NULL, X = NULL, upsilon = NULL, Theta = NULL, Gamma = NULL, Xi = NULL, init = NULL, pars = c("Eta", "Lambda", "Sigma"), ... )
Y |
D x N matrix of counts (if NULL uses priors only) |
X |
Q x N matrix of covariates (design matrix) (if NULL uses priors only, must be present to sample Eta) |
upsilon |
dof for inverse wishart prior (numeric must be > D) (default: D+3) |
Theta |
(D-1) x Q matrix of prior mean for regression parameters (default: matrix(0, D-1, Q)) |
Gamma |
QxQ prior covariance matrix (default: diag(Q)) |
Xi |
(D-1)x(D-1) prior covariance matrix (default: ALR transform of diag(1)*(upsilon-D)/2 - this is essentially iid on "base scale" using Aitchison terminology) |
init |
(D-1) x N initialization for Eta for optimization |
pars |
character vector of posterior parameters to return |
... |
arguments passed to |
An object of class pibblefit
Intended to be called internally by package
name(m, ...)
name(m, ...)
m |
object |
... |
other arguments to be passed |
object of same class but with names applied to dimensions
To avoid confusion, assigned default names to multinomial categories (c1 etc...) and zdimensions (z1 etc...)
## S3 method for class 'orthusfit' name(m, ...)
## S3 method for class 'orthusfit' name(m, ...)
m |
object of class orthusfit |
... |
currently ignored |
object of class orthusfit
S3 for pibblefit apply names to pibblefit object
## S3 method for class 'pibblefit' name(m, ...)
## S3 method for class 'pibblefit' name(m, ...)
m |
object of class pibblefit |
... |
currently ignored |
object of class pibblefit
Generic method for getting and setting dimension names of fit object
## S3 method for class 'pibblefit' names_covariates(m) ## S3 method for class 'pibblefit' names_samples(m) ## S3 method for class 'pibblefit' names_categories(m) ## S3 method for class 'pibblefit' names_coords(m) ## S3 replacement method for class 'pibblefit' names_covariates(m) <- value ## S3 replacement method for class 'pibblefit' names_samples(m) <- value ## S3 replacement method for class 'pibblefit' names_categories(m) <- value names_covariates(m) names_samples(m) names_categories(m) names_coords(m) names_covariates(m) <- value names_samples(m) <- value names_categories(m) <- value
## S3 method for class 'pibblefit' names_covariates(m) ## S3 method for class 'pibblefit' names_samples(m) ## S3 method for class 'pibblefit' names_categories(m) ## S3 method for class 'pibblefit' names_coords(m) ## S3 replacement method for class 'pibblefit' names_covariates(m) <- value ## S3 replacement method for class 'pibblefit' names_samples(m) <- value ## S3 replacement method for class 'pibblefit' names_categories(m) <- value names_covariates(m) names_samples(m) names_categories(m) names_coords(m) names_covariates(m) <- value names_samples(m) <- value names_categories(m) <- value
m |
object |
value |
character vector (or NULL) |
names_coords
is different than names_categories
.
names_categories
provides access to the basic names of each multinomial
category. In contrast, names_coords
provides access to the
names of the coordinates in which an object is represented. These coordinate
names are based on the category names. For example, category names may be,
(OTU1, ..., OTUD) where as coordinate names could be (log(OTU1/OTUD), etc...)
if object is in default coordinate system.
A vector of names
Generic method for accessing model fit dimensions
## S3 method for class 'pibblefit' ncategories(m) ## S3 method for class 'pibblefit' nsamples(m) ## S3 method for class 'pibblefit' ncovariates(m) ## S3 method for class 'pibblefit' niter(m) ## S3 method for class 'orthusfit' ncategories(m) ## S3 method for class 'orthusfit' nsamples(m) ## S3 method for class 'orthusfit' ncovariates(m) ## S3 method for class 'orthusfit' niter(m) ncategories(m) nsamples(m) ncovariates(m) niter(m)
## S3 method for class 'pibblefit' ncategories(m) ## S3 method for class 'pibblefit' nsamples(m) ## S3 method for class 'pibblefit' ncovariates(m) ## S3 method for class 'pibblefit' niter(m) ## S3 method for class 'orthusfit' ncategories(m) ## S3 method for class 'orthusfit' nsamples(m) ## S3 method for class 'orthusfit' ncovariates(m) ## S3 method for class 'orthusfit' niter(m) ncategories(m) nsamples(m) ncovariates(m) niter(m)
m |
An object of class pibblefit |
An alternative approach to accessing these dimensions is to
access them directly from the pibblefit object using list indexing.
* ncategories
is equivalent to m$D
* nsamples
is equivalent to m$N
* ncovariates
is equivalent to m$Q
integer
See details for model. Should likely be followed by function
uncollapsePibble
. Notation: N
is number of samples,
D
is number of multinomial categories, and Q
is number
of covariates.
optimPibbleCollapsed( Y, upsilon, ThetaX, KInv, AInv, init, n_samples = 2000L, calcGradHess = TRUE, b1 = 0.9, b2 = 0.99, step_size = 0.003, epsilon = 1e-06, eps_f = 1e-10, eps_g = 1e-04, max_iter = 10000L, verbose = FALSE, verbose_rate = 10L, decomp_method = "cholesky", optim_method = "lbfgs", eigvalthresh = 0, jitter = 0, multDirichletBoot = -1, useSylv = TRUE, ncores = -1L, seed = -1L )
optimPibbleCollapsed( Y, upsilon, ThetaX, KInv, AInv, init, n_samples = 2000L, calcGradHess = TRUE, b1 = 0.9, b2 = 0.99, step_size = 0.003, epsilon = 1e-06, eps_f = 1e-10, eps_g = 1e-04, max_iter = 10000L, verbose = FALSE, verbose_rate = 10L, decomp_method = "cholesky", optim_method = "lbfgs", eigvalthresh = 0, jitter = 0, multDirichletBoot = -1, useSylv = TRUE, ncores = -1L, seed = -1L )
Y |
D x N matrix of counts |
upsilon |
(must be > D) |
ThetaX |
D-1 x N matrix formed by Theta*X (Theta is Prior mean for regression coefficients) |
KInv |
D-1 x D-1 precision matrix (inverse of Xi) |
AInv |
N x N precision matrix given by |
init |
D-1 x N matrix of initial guess for eta used for optimization |
n_samples |
number of samples for Laplace Approximation (=0 very fast as no inversion or decomposition of Hessian is required) |
calcGradHess |
if n_samples=0 should Gradient and Hessian still be calculated using closed form solutions? |
b1 |
(ADAM) 1st moment decay parameter (recommend 0.9) "aka momentum" |
b2 |
(ADAM) 2nd moment decay parameter (recommend 0.99 or 0.999) |
step_size |
(ADAM) step size for descent (recommend 0.001-0.003) |
epsilon |
(ADAM) parameter to avoid divide by zero |
eps_f |
(ADAM) normalized function improvement stopping criteria |
eps_g |
(ADAM) normalized gradient magnitude stopping criteria |
max_iter |
(ADAM) maximum number of iterations before stopping |
verbose |
(ADAM) if true will print stats for stopping criteria and iteration number |
verbose_rate |
(ADAM) rate to print verbose stats to screen |
decomp_method |
decomposition of hessian for Laplace approximation 'eigen' (more stable-slightly, slower) or 'cholesky' (less stable, faster, default) |
optim_method |
(default:"lbfgs") or "adam" |
eigvalthresh |
threshold for negative eigenvalues in decomposition of negative inverse hessian (should be <=0) |
jitter |
(default: 0) if >=0 then adds that factor to diagonal of Hessian before decomposition (to improve matrix conditioning) |
multDirichletBoot |
if >0 then it overrides laplace approximation and samples eta efficiently at MAP estimate from pseudo Multinomial-Dirichlet posterior. |
useSylv |
(default: true) if N<D-1 uses Sylvester Determinant Identity to speed up calculation of log-likelihood and gradients. |
ncores |
(default:-1) number of cores to use, if ncores==-1 then uses default from OpenMP typically to use all available cores. |
seed |
(random seed for Laplace approximation – integer) |
Notation: Let denote the J-th row of a matrix Z.
Model:
Where , K is a (D-1)x(D-1) covariance
matrix,
is a Q x Q covariance matrix, and
is ALRInv_D
transform.
Gradient and Hessian calculations are fast as they are computed using closed form solutions. That said, the Hessian matrix can be quite large [N*(D-1) x N*(D-1)] and storage may be an issue.
Note: Warnings about large negative eigenvalues can either signal
that the optimizer did not reach an optima or (more commonly in my experience)
that the prior / degrees of freedom for the covariance (given by parameters
upsilon
and KInv
) were too specific and at odds with the observed data.
If you get this warning try the following.
Try restarting the optimization using a different initial guess for eta
Try decreasing (or even increasing )step_size
(by increments of 0.001 or 0.002)
and increasing max_iter
parameters in optimizer. Also can try
increasing b1
to 0.99 and decreasing eps_f
by a few orders
of magnitude
Try relaxing prior assumptions regarding covariance matrix. (e.g., may want
to consider decreasing parameter upsilon
closer to a minimum value of
D)
Try adding small amount of jitter (e.g., set jitter=1e-5
) to address
potential floating point errors.
List containing (all with respect to found optima)
LogLik - Log Likelihood of collapsed model (up to proportionality constant)
Gradient - (if calcGradHess
=true)
Hessian - (if calcGradHess
=true) of the POSITIVE LOG POSTERIOR
Pars - Parameter value of eta at optima
Samples - (D-1) x N x n_samples array containing posterior samples of eta based on Laplace approximation (if n_samples>0)
Timer - Vector of Execution Times
logInvNegHessDet - the log determinant of the covariacne of the Laplace approximation, useful for calculating marginal likelihood
logMarginalLikelihood - A calculation of the log marginal likelihood based on the laplace approximation
S. Ruder (2016) An overview of gradient descent optimization algorithms. arXiv 1609.04747
JD Silverman K Roche, ZC Holmes, LA David, S Mukherjee. Bayesian Multinomial Logistic Normal Models through Marginally Latent Matrix-T Processes. 2022, Journal of Machine Learning
sim <- pibble_sim() # Fit model for eta fit <- optimPibbleCollapsed(sim$Y, sim$upsilon, sim$Theta%*%sim$X, sim$KInv, sim$AInv, random_pibble_init(sim$Y))
sim <- pibble_sim() # Fit model for eta fit <- optimPibbleCollapsed(sim$Y, sim$upsilon, sim$Theta%*%sim$X, sim$KInv, sim$AInv, random_pibble_init(sim$Y))
This function is largely a more user friendly wrapper around
optimPibbleCollapsed
and
uncollapsePibble
for fitting orthus models.
See details for model specification.
Notation: N
is number of samples, P
is the number of dimensions
of observations in the second dataset,
D
is number of multinomial categories, Q
is number
of covariates, iter
is the number of samples of eta
(e.g.,
the parameter n_samples
in the function
optimPibbleCollapsed
)
orthus( Y = NULL, Z = NULL, X = NULL, upsilon = NULL, Theta = NULL, Gamma = NULL, Xi = NULL, init = NULL, pars = c("Eta", "Lambda", "Sigma"), ... )
orthus( Y = NULL, Z = NULL, X = NULL, upsilon = NULL, Theta = NULL, Gamma = NULL, Xi = NULL, init = NULL, pars = c("Eta", "Lambda", "Sigma"), ... )
Y |
D x N matrix of counts (if NULL uses priors only) |
Z |
P x N matrix of counts (if NULL uses priors only - must be present/absent if Y is present/absent) |
X |
Q x N matrix of covariates (design matrix) (if NULL uses priors only, must be present to sample Eta) |
upsilon |
dof for inverse wishart prior (numeric must be > D) (default: D+3) |
Theta |
(D-1+P) x Q matrix of prior mean for regression parameters (default: matrix(0, D-1+P, Q)) |
Gamma |
QxQ prior covariance matrix (default: diag(Q)) |
Xi |
(D-1+P)x(D-1+P) prior covariance matrix (default: ALR transform of diag(1)*(upsilon-D)/2 - this is essentially iid on "base scale" using Aitchison terminology) |
init |
(D-1) x Q initialization for Eta for optimization |
pars |
character vector of posterior parameters to return |
... |
arguments passed to |
the full model is given by:
Where is a Q x Q covariance matrix, and
is
ALRInv_D transform.
That is, the orthus model models the latent multinomial log-ratios (Eta) and
the observations of the second dataset jointly as a linear model. This allows
Sigma to also describe the covariation between the two datasets.
Default behavior is to use MAP estimate for uncollaping the LTP model if laplace approximation is not preformed.
an object of class pibblefit
JD Silverman K Roche, ZC Holmes, LA David, S Mukherjee. Bayesian Multinomial Logistic Normal Models through Marginally Latent Matrix-T Processes. 2019, arXiv e-prints, arXiv:1903.11695
fido_transforms
provide convenience methods for
transforming the representation of pibblefit objects (e.g., conversion to
proportions, alr, clr, or ilr coordinates.)
access_dims
provides convenience methods for accessing
dimensions of pibblefit object
sim <- orthus_sim() fit <- orthus(sim$Y, sim$Z, sim$X)
sim <- orthus_sim() fit <- orthus(sim$Y, sim$Z, sim$X)
Log-Ratio transforms for orthus objects
oglr(x, s, V) oglrInv(x, s, V) oalr(x, s, d = NULL) oalrInv(y, s, d = NULL) oilr(x, s, V = NULL) oilrInv(y, s, V = NULL) oclr(x, s) oclrInv(x, s)
oglr(x, s, V) oglrInv(x, s, V) oalr(x, s, d = NULL) oalrInv(y, s, d = NULL) oilr(x, s, V = NULL) oilrInv(y, s, V = NULL) oclr(x, s) oclrInv(x, s)
x |
orthus data array (e.g., first s rows are multinomial parameters or log-ratios) |
s |
first s rows of x are transformed |
V |
transformation matrix (defines transform) |
d |
for ALR, which component (integer position) to take as reference (default is ncol(x)) for alrInv corresponds to column position in untransformed matrix. |
y |
orthus data array (e.g., first s rows are multinomial parameters or log-ratios) |
A matrix
Simulate simple orthus dataset and priors (for testing)
orthus_sim( D = 10, P = 10, N = 30, Q = 2, use_names = TRUE, true_priors = FALSE )
orthus_sim( D = 10, P = 10, N = 30, Q = 2, use_names = TRUE, true_priors = FALSE )
D |
number of multinomial categories |
P |
number of dimensions of second dataset Z |
N |
number of samples |
Q |
number of covariates (first one is an intercept, must be > 1) |
use_names |
should samples, covariates, and categories be named |
true_priors |
should Xi and upsilon be chosen to have mean at true simulated value |
list
sim <- orthus_sim()
sim <- orthus_sim()
Combines them all into a single tibble, see example for formatting and
column headers. Primarily designed to be used by
summary.orthusfit
.
orthus_tidy_samples(m, use_names = FALSE, as_factor = FALSE)
orthus_tidy_samples(m, use_names = FALSE, as_factor = FALSE)
m |
an object of class orthusfit |
use_names |
should dimension indices be replaced by dimension names if provided in data used to fit pibble model. |
as_factor |
if use_names should names be returned as factor? |
tibble
sim <- orthus_sim() fit <- orthus(sim$Y, sim$Z, sim$X) fit_tidy <- orthus_tidy_samples(fit, use_names=TRUE) head(fit_tidy)
sim <- orthus_sim() fit <- orthus(sim$Y, sim$Z, sim$X) fit_tidy <- orthus_tidy_samples(fit, use_names=TRUE) head(fit_tidy)
Create orthusfit object
orthusfit( D, N, Q, P, coord_system, iter = NULL, alr_base = NULL, ilr_base = NULL, Eta = NULL, Lambda = NULL, Sigma = NULL, Sigma_default = NULL, Z = NULL, Y = NULL, X = NULL, upsilon = NULL, Theta = NULL, Xi = NULL, Xi_default = NULL, Gamma = NULL, init = NULL, names_categories = NULL, names_samples = NULL, names_Zdimensions = NULL, names_covariates = NULL )
orthusfit( D, N, Q, P, coord_system, iter = NULL, alr_base = NULL, ilr_base = NULL, Eta = NULL, Lambda = NULL, Sigma = NULL, Sigma_default = NULL, Z = NULL, Y = NULL, X = NULL, upsilon = NULL, Theta = NULL, Xi = NULL, Xi_default = NULL, Gamma = NULL, init = NULL, names_categories = NULL, names_samples = NULL, names_Zdimensions = NULL, names_covariates = NULL )
D |
number of multinomial categories |
N |
number of samples |
Q |
number of covariates |
P |
Dimension of second dataset (e.g., nrows(Z) ) |
coord_system |
coordinate system objects are represented in (options include "alr", "clr", "ilr", and "proportions") |
iter |
number of posterior samples |
alr_base |
integer category used as reference (required if coord_system=="alr") |
ilr_base |
(D x D-1) contrast matrix (required if coord_system=="ilr") |
Eta |
Array of samples of Eta |
Lambda |
Array of samples of Lambda |
Sigma |
Array of samples of Sigma (null if coord_system=="proportions") |
Sigma_default |
Array of samples of Sigma in alr base D, used if coord_system=="proportions" |
Z |
PxN matrix of real valued observations |
Y |
DxN matrix of observed counts |
X |
QxN design matrix |
upsilon |
scalar prior dof of inverse wishart prior |
Theta |
prior mean of Lambda |
Xi |
Matrix of prior covariance for inverse wishart (null if coord_system=="proportions") |
Xi_default |
Matrix of prior covariance for inverse wishart in alr base D (used if coord_system=="proportions") |
Gamma |
QxQ covariance matrix prior for Lambda |
init |
matrix initial guess for Lambda used for optimization |
names_categories |
character vector |
names_samples |
character vector |
names_Zdimensions |
character vector |
names_covariates |
character vector |
object of class orthusfit
Mock communities and calibration samples created for measuring and validating model of PCR bias. This data has been preprocessed as in the original manuscript.
data(pcrbias_mock)
data(pcrbias_mock)
an matrix Y (counts for each community member) and a data.frame metadata
Justin D. Silverman, Rachael J. Bloom, Sharon Jiang, Heather K. Durand, Sayan Mukherjee, Lawrence A. David. (2019) Measuring and Mitigating PCR Bias in Microbiome Data. bioRxiv 604025; doi: https://doi.org/10.1101/604025
This function is largely a more user friendly wrapper around
optimPibbleCollapsed
and
uncollapsePibble
.
See details for model specification.
Notation: N
is number of samples,
D
is number of multinomial categories, Q
is number
of covariates, iter
is the number of samples of eta
(e.g.,
the parameter n_samples
in the function
optimPibbleCollapsed
)
pibble( Y = NULL, X = NULL, upsilon = NULL, Theta = NULL, Gamma = NULL, Xi = NULL, init = NULL, pars = c("Eta", "Lambda", "Sigma"), newdata = NULL, ... ) ## S3 method for class 'pibblefit' refit(m, pars = c("Eta", "Lambda", "Sigma"), ...)
pibble( Y = NULL, X = NULL, upsilon = NULL, Theta = NULL, Gamma = NULL, Xi = NULL, init = NULL, pars = c("Eta", "Lambda", "Sigma"), newdata = NULL, ... ) ## S3 method for class 'pibblefit' refit(m, pars = c("Eta", "Lambda", "Sigma"), ...)
Y |
D x N matrix of counts (if NULL uses priors only) |
X |
Q x N matrix of covariates (design matrix) (if NULL uses priors only, must be present to sample Eta) |
upsilon |
dof for inverse wishart prior (numeric must be > D) (default: D+3) |
Theta |
(D-1) x Q matrix of prior mean for regression parameters (default: matrix(0, D-1, Q)) |
Gamma |
QxQ prior covariance matrix (default: diag(Q)) |
Xi |
(D-1)x(D-1) prior covariance matrix (default: ALR transform of diag(1)*(upsilon-D)/2 - this is essentially iid on "base scale" using Aitchison terminology) |
init |
(D-1) x N initialization for Eta for optimization |
pars |
character vector of posterior parameters to return |
newdata |
Default is |
... |
arguments passed to |
m |
object of class pibblefit |
the full model is given by:
Where is a Q x Q covariance matrix, and
is
ALRInv_D transform.
Default behavior is to use MAP estimate for uncollaping the LTP model if laplace approximation is not preformed.
an object of class pibblefit
JD Silverman K Roche, ZC Holmes, LA David, S Mukherjee. Bayesian Multinomial Logistic Normal Models through Marginally Latent Matrix-T Processes. 2019, arXiv e-prints, arXiv:1903.11695
fido_transforms
provide convenience methods for
transforming the representation of pibblefit objects (e.g., conversion to
proportions, alr, clr, or ilr coordinates.)
access_dims
provides convenience methods for accessing
dimensions of pibblefit object
Generic functions including summary
,
print
,
coef
,
as.list
,
predict
,
name
, and
sample_prior
name_dims
Plotting functions provided by plot
and ppc
(posterior predictive checks)
sim <- pibble_sim() fit <- pibble(sim$Y, sim$X)
sim <- pibble_sim() fit <- pibble(sim$Y, sim$X)
Simulate simple pibble dataset and priors (for testing)
pibble_sim(D = 10, N = 30, Q = 2, use_names = TRUE, true_priors = FALSE)
pibble_sim(D = 10, N = 30, Q = 2, use_names = TRUE, true_priors = FALSE)
D |
number of multinomial categories |
N |
number of samples |
Q |
number of covariates (first one is an intercept, must be > 1) |
use_names |
should samples, covariates, and categories be named |
true_priors |
should Xi and upsilon be chosen to have mean at true simulated value |
list
sim <- pibble_sim()
sim <- pibble_sim()
Combines them all into a single tibble, see example for formatting and
column headers. Primarily designed to be used by
summary.pibblefit
.
pibble_tidy_samples(m, use_names = FALSE, as_factor = FALSE)
pibble_tidy_samples(m, use_names = FALSE, as_factor = FALSE)
m |
an object of class pibblefit |
use_names |
should dimension indices be replaced by dimension names if provided in data used to fit pibble model. |
as_factor |
if use_names should names be returned as factor? |
tibble
sim <- pibble_sim() fit <- pibble(sim$Y, sim$X) fit_tidy <- pibble_tidy_samples(fit, use_names=TRUE) head(fit_tidy)
sim <- pibble_sim() fit <- pibble(sim$Y, sim$X) fit_tidy <- pibble_tidy_samples(fit, use_names=TRUE) head(fit_tidy)
Create pibblefit object
pibblefit( D, N, Q, coord_system, iter = NULL, alr_base = NULL, ilr_base = NULL, Eta = NULL, Lambda = NULL, Sigma = NULL, Sigma_default = NULL, Y = NULL, X = NULL, upsilon = NULL, Theta = NULL, Xi = NULL, Xi_default = NULL, Gamma = NULL, init = NULL, names_categories = NULL, names_samples = NULL, names_covariates = NULL )
pibblefit( D, N, Q, coord_system, iter = NULL, alr_base = NULL, ilr_base = NULL, Eta = NULL, Lambda = NULL, Sigma = NULL, Sigma_default = NULL, Y = NULL, X = NULL, upsilon = NULL, Theta = NULL, Xi = NULL, Xi_default = NULL, Gamma = NULL, init = NULL, names_categories = NULL, names_samples = NULL, names_covariates = NULL )
D |
number of multinomial categories |
N |
number of samples |
Q |
number of covariates |
coord_system |
coordinate system objects are represented in (options include "alr", "clr", "ilr", and "proportions") |
iter |
number of posterior samples |
alr_base |
integer category used as reference (required if coord_system=="alr") |
ilr_base |
(D x D-1) contrast matrix (required if coord_system=="ilr") |
Eta |
Array of samples of Eta |
Lambda |
Array of samples of Lambda |
Sigma |
Array of samples of Sigma (null if coord_system=="proportions") |
Sigma_default |
Array of samples of Sigma in alr base D, used if coord_system=="proportions" |
Y |
DxN matrix of observed counts |
X |
QxN design matrix |
upsilon |
scalar prior dof of inverse wishart prior |
Theta |
prior mean of Lambda |
Xi |
Matrix of prior covariance for inverse wishart (null if coord_system=="proportions") |
Xi_default |
Matrix of prior covariance for inverse wishart in alr base D (used if coord_system=="proportions") |
Gamma |
QxQ covariance matrix prior for Lambda |
init |
matrix initial guess for Lambda used for optimization |
names_categories |
character vector |
names_samples |
character vector |
names_covariates |
character vector |
object of class pibblefit
Plot Summaries of Posterior Distribution of pibblefit Parameters
## S3 method for class 'pibblefit' plot(x, ...)
## S3 method for class 'pibblefit' plot(x, ...)
x |
an object of class pibblefit |
... |
other arguments passed to plot.pibblefit (see details) |
Other arguments:
'par' parameter to plot (options: Lambda, Eta, and Sigma) (default="Lambda")
'focus.cov' vector of covariates to include in plot (plots all if NULL)
'focus.coord' vector of coordinates to include in plot (plots all if NULL)
'focus.sample' vector of samples to include in plot (plots all if NULL)
'use_names' if TRUE, uses dimension names found in data as plot labels rather than using dimension integer indices.
ggplot object
sim <- pibble_sim(N=10, D=4, Q=3) fit <- pibble(sim$Y, sim$X) plot(fit, par="Lambda") plot(fit, par="Sigma")
sim <- pibble_sim(N=10, D=4, Q=3) fit <- pibble(sim$Y, sim$X) plot(fit, par="Lambda") plot(fit, par="Sigma")
Generic method for visualizing posterior predictive checks
ppc(m, ...)
ppc(m, ...)
m |
object |
... |
other arguments passed that control visualization |
A plot
Generic Method to Plot Posterior Predictive Summaries
## S3 method for class 'pibblefit' ppc_summary(m, from_scratch = FALSE, ...) ppc_summary(m, ...)
## S3 method for class 'pibblefit' ppc_summary(m, from_scratch = FALSE, ...) ppc_summary(m, ...)
m |
model object |
from_scratch |
should predictions of Y come from fitted Eta or from predictions of Eta from posterior of Lambda? (default: false) |
... |
other arguments to pass |
vector
Visualization of Posterior Predictive Check of fit model
## S3 method for class 'pibblefit' ppc(m, ...)
## S3 method for class 'pibblefit' ppc(m, ...)
m |
an object of class pibblefit |
... |
other options passed to ppc (see details) |
ppc.pibblefit accepts the following additional arguments:
"type" type of plot (options "lines", "points", "bounds")
"iter" number of samples from posterior predictive distribution to plot (currently must be <= m$iter) if type=="lines" default is 50, if type=="ribbon" default is to use all available iterations.
"from_scratch" should predictions of Y come from fitted Eta or from predictions of Eta from posterior of Lambda? (default: false)
ggplot object
sim <- pibble_sim() fit <- pibble(sim$Y, sim$X) ppc(fit)
sim <- pibble_sim() fit <- pibble(sim$Y, sim$X) ppc(fit)
Predict using basset
## S3 method for class 'bassetfit' predict( object, newdata = NULL, response = "Lambda", size = NULL, use_names = TRUE, summary = FALSE, iter = NULL, from_scratch = FALSE, ... )
## S3 method for class 'bassetfit' predict( object, newdata = NULL, response = "Lambda", size = NULL, use_names = TRUE, summary = FALSE, iter = NULL, from_scratch = FALSE, ... )
object |
An object of class pibblefit |
newdata |
An optional matrix for which to evaluate prediction. |
response |
Options = "Lambda":Mean of regression, "Eta", "Y": counts |
size |
the number of counts per sample if response="Y" (as vector or matrix), default if newdata=NULL and response="Y" is to use colsums of m$Y. Otherwise uses median colsums of object$Y as default. If passed as a matrix should have dimensions ncol(newdata) x iter. |
use_names |
if TRUE apply names to output |
summary |
if TRUE, posterior summary of predictions are returned rather than samples |
iter |
number of iterations to return if NULL uses object$iter |
from_scratch |
should predictions of Y come from fitted Eta or from predictions of Eta from posterior of Lambda? (default: false) |
... |
other arguments passed to summarise_posterior |
currently only implemented for pibblefit objects in coord_system "default" "alr", or "ilr".
(if summary==FALSE) array D x N x iter; (if summary==TRUE) tibble with calculated posterior summaries
Predict response from new data
## S3 method for class 'pibblefit' predict( object, newdata = NULL, response = "LambdaX", size = NULL, use_names = TRUE, summary = FALSE, iter = NULL, from_scratch = FALSE, ... )
## S3 method for class 'pibblefit' predict( object, newdata = NULL, response = "LambdaX", size = NULL, use_names = TRUE, summary = FALSE, iter = NULL, from_scratch = FALSE, ... )
object |
An object of class pibblefit |
newdata |
An optional matrix for which to evaluate predictions. If NULL (default), the original data of the model is used. |
response |
Options = "LambdaX":Mean of regression, "Eta", "Y": counts |
size |
the number of counts per sample if response="Y" (as vector or matrix), default if newdata=NULL and response="Y" is to use colsums of m$Y. Otherwise uses median colsums of m$Y as default. If passed as a matrix should have dimensions ncol(newdata) x iter. |
use_names |
if TRUE apply names to output |
summary |
if TRUE, posterior summary of predictions are returned rather than samples |
iter |
number of iterations to return if NULL uses object$iter |
from_scratch |
should predictions of Y come from fitted Eta or from predictions of Eta from posterior of Lambda? (default: false) |
... |
other arguments passed to summarise_posterior |
currently only implemented for pibblefit objects in coord_system "default" "alr", or "ilr".
(if summary==FALSE) array D x N x iter; (if summary==TRUE) tibble with calculated posterior summaries
sim <- pibble_sim() fit <- pibble(sim$Y, sim$X) predict(fit)[,,1:2] # just show 2 samples
sim <- pibble_sim() fit <- pibble(sim$Y, sim$X) predict(fit)[,,1:2] # just show 2 samples
Print dimensions and coordinate system information for orthusfit object.
## S3 method for class 'orthusfit' print(x, summary = FALSE, ...)
## S3 method for class 'orthusfit' print(x, summary = FALSE, ...)
x |
an object of class orthusfit |
summary |
if true also calculates and prints summary |
... |
other arguments to pass to summary function |
No direct return, prints out summary
summary.orthusfit
summarizes posterior intervals
sim <- orthus_sim() fit <- orthus(sim$Y, sim$Z, sim$X) print(fit)
sim <- orthus_sim() fit <- orthus(sim$Y, sim$Z, sim$X) print(fit)
Print dimensions and coordinate system information for pibblefit object.
## S3 method for class 'pibblefit' print(x, summary = FALSE, ...)
## S3 method for class 'pibblefit' print(x, summary = FALSE, ...)
x |
an object of class pibblefit |
summary |
if true also calculates and prints summary |
... |
other arguments to pass to summary function |
No direct return, prints out summary
summary.pibblefit
summarizes posterior intervals
sim <- pibble_sim() fit <- pibble(sim$Y, sim$X) print(fit)
sim <- pibble_sim() fit <- pibble(sim$Y, sim$X) print(fit)
Generic Method to Calculate R2 for Fitted Model
r2(m, ...) ## S3 method for class 'pibblefit' r2(m, covariates = NULL, ...) ## S3 method for class 'bassetfit' r2(m, covariates = NULL, components = NULL, ...)
r2(m, ...) ## S3 method for class 'pibblefit' r2(m, covariates = NULL, ...) ## S3 method for class 'bassetfit' r2(m, covariates = NULL, components = NULL, ...)
m |
model object |
... |
other arguments to pass |
covariates |
vector of indices for covariates to include in calculation of R2 (default:NULL means include all covariates by default). When non-null, all covariates not specified are set to zero for prediction. |
components |
vector of indices for components of the GP model to include in the calculation of R2, i.e. which elements in the list of Theta/Gamma should be used for calculating R2 (default:NULL means to include all components by default). When non-null, all components not specified are removed for prediction. |
Calculates Posterior over Linear Model R2 as:
where
is defined in terms of trace of variances
Method of calculating R2 is multivariate version of the Bayesian R2 proposed by Gelman, Goodrich, Gabry, and Vehtari, 2019
Calculates Posterior over Basset Model R2 as:
Method of calculating R2 is multivariate version of the Bayesian R2 proposed by Gelman, Goodrich, Gabry, and Vehtari, 2019
vector
Randomly initializes based on ALR transform of counts plus random pseudocounts uniformily distributed between 0 and 1.
random_pibble_init(Y)
random_pibble_init(Y)
Y |
matrix (D x N) of counts |
Notation: N
is number of samples and
D
is number of multinomial categories
(D-1) x N matrix
Y <- matrix(sample(1:100, 100), 10, 10) random_pibble_init(Y)
Y <- matrix(sample(1:100, 100), 10, 10) random_pibble_init(Y)
Generic method for fitting model from passed model fit object
refit(m, ...)
refit(m, ...)
m |
object |
... |
other arguments passed that control fitting |
object of the same class as m
Intended to be called internally by package
req(m, r)
req(m, r)
m |
object |
r |
vector of elements to test for |
throws error if required element is not present
require elements to be non-null in orthusfit or throw error
## S3 method for class 'orthusfit' req(m, r)
## S3 method for class 'orthusfit' req(m, r)
m |
object |
r |
vector of elements to test for |
None, throws an error if NULL
require elements to be non-null in pibblefit or throw error
## S3 method for class 'pibblefit' req(m, r)
## S3 method for class 'pibblefit' req(m, r)
m |
object |
r |
vector of elements to test for |
Nothing, throws an error if NULL
OTU data and metadata for 1,359 samples in a Crohn's disease study
data(RISK_CCFA)
data(RISK_CCFA)
An otu table, sample data table, and taxonomy table.
Study is described here: https://pubmed.ncbi.nlm.nih.gov/24629344/. Data was obtained from https://github.com/twbattaglia/MicrobeDS.
Gevers D, et al. The treatment-naive microbiome in new-onset Crohn's disease. Cell Host Microbe. 2014 Mar 12;15(3):382-392. doi: 10.1016/j.chom.2014.02.005. PMID: 24629344; PMCID: PMC4059512.
OTU data and metadata for 1,359 samples in a Crohn's disease study
data(RISK_CCFA)
data(RISK_CCFA)
A matrix otu table.
Study is described here: https://pubmed.ncbi.nlm.nih.gov/24629344/. Data was obtained from https://github.com/twbattaglia/MicrobeDS.
Gevers D, et al. The treatment-naive microbiome in new-onset Crohn's disease. Cell Host Microbe. 2014 Mar 12;15(3):382-392. doi: 10.1016/j.chom.2014.02.005. PMID: 24629344; PMCID: PMC4059512.
OTU data and metadata for 1,359 samples in a Crohn's disease study
data(RISK_CCFA)
data(RISK_CCFA)
A sample data table.
Study is described here: https://pubmed.ncbi.nlm.nih.gov/24629344/. Data was obtained from https://github.com/twbattaglia/MicrobeDS.
Gevers D, et al. The treatment-naive microbiome in new-onset Crohn's disease. Cell Host Microbe. 2014 Mar 12;15(3):382-392. doi: 10.1016/j.chom.2014.02.005. PMID: 24629344; PMCID: PMC4059512.
OTU data and metadata for 1,359 samples in a Crohn's disease study
data(RISK_CCFA)
data(RISK_CCFA)
A taxonomy table.
Study is described here: https://pubmed.ncbi.nlm.nih.gov/24629344/. Data was obtained from https://github.com/twbattaglia/MicrobeDS.
Gevers D, et al. The treatment-naive microbiome in new-onset Crohn's disease. Cell Host Microbe. 2014 Mar 12;15(3):382-392. doi: 10.1016/j.chom.2014.02.005. PMID: 24629344; PMCID: PMC4059512.
Generic method for sampling from prior distribution of object
sample_prior(m, n_samples = 2000L, ...)
sample_prior(m, n_samples = 2000L, ...)
m |
object |
n_samples |
number of samples to produce |
... |
other arguments to be passed |
object of the same class
Note this can be used to sample from prior and then predict can
be called to get counts or LambdaX (predict.pibblefit
)
## S3 method for class 'pibblefit' sample_prior( m, n_samples = 2000L, pars = c("Eta", "Lambda", "Sigma"), use_names = TRUE, ... )
## S3 method for class 'pibblefit' sample_prior( m, n_samples = 2000L, pars = c("Eta", "Lambda", "Sigma"), use_names = TRUE, ... )
m |
object of class pibblefit |
n_samples |
number of samples to produce |
pars |
parameters to sample |
use_names |
should names be used if available |
... |
currently ignored |
Could be greatly speed up in the future if needed by sampling directly from cholesky form of inverse wishart (currently implemented as header in this library - see MatDist.h).
A pibblefit object
# Sample prior of already fitted pibblefit object sim <- pibble_sim() attach(sim) fit <- pibble(Y, X) head(sample_prior(fit)) # Sample prior as part of model fitting m <- pibblefit(N=as.integer(sim$N), D=as.integer(sim$D), Q=as.integer(sim$Q), iter=2000L, upsilon=upsilon, Xi=Xi, Gamma=Gamma, Theta=Theta, X=X, coord_system="alr", alr_base=D) m <- sample_prior(m) plot(m) # plot prior distribution (defaults to parameter Lambda)
# Sample prior of already fitted pibblefit object sim <- pibble_sim() attach(sim) fit <- pibble(Y, X) head(sample_prior(fit)) # Sample prior as part of model fitting m <- pibblefit(N=as.integer(sim$N), D=as.integer(sim$D), Q=as.integer(sim$Q), iter=2000L, upsilon=upsilon, Xi=Xi, Gamma=Gamma, Theta=Theta, X=X, coord_system="alr", alr_base=D) m <- sample_prior(m) plot(m) # plot prior distribution (defaults to parameter Lambda)
store_coord
stores coordinate information for pibblefit object
and can be reapplied with function reapply_coord
. Some coordinate
systems are not useful for computation and this makes it simple keep
returned object from computations in the same coordinate system as the input.
(Likely most useful inside of a package)
store_coord(m) reapply_coord(m, l)
store_coord(m) reapply_coord(m, l)
m |
object of class pibblefit |
l |
object returned by function |
store_coord
list with important information to identify c
coordinate system of pibblefit object. reapply_coord
pibblefit object
in coordinate system previously stored.
Shortcut for summarize variable with quantiles and mean
summarise_posterior(data, var, ...)
summarise_posterior(data, var, ...)
data |
tidy data frame |
var |
variable name (unquoted) to be summarised |
... |
other expressions to pass to summarise |
Notation: pX
refers to the X
% quantile
data.frame
d <- data.frame("a"=sample(1:10, 50, TRUE), "b"=rnorm(50)) # Summarize posterior for b over grouping of a and also calcuate # minmum of b (in addition to normal statistics returned) d <- dplyr::group_by(d, a) summarise_posterior(d, b, mean.b = mean(b), min=min(b))
d <- data.frame("a"=sample(1:10, 50, TRUE), "b"=rnorm(50)) # Summarize posterior for b over grouping of a and also calcuate # minmum of b (in addition to normal statistics returned) d <- dplyr::group_by(d, a) summarise_posterior(d, b, mean.b = mean(b), min=min(b))
Default calculates median, mean, 50% and 95% credible interval
## S3 method for class 'orthusfit' summary( object, pars = NULL, use_names = TRUE, as_factor = FALSE, gather_prob = FALSE, ... )
## S3 method for class 'orthusfit' summary( object, pars = NULL, use_names = TRUE, as_factor = FALSE, gather_prob = FALSE, ... )
object |
an object of class orthusfit |
pars |
character vector (default: c("Eta", "Lambda", "Sigma")) |
use_names |
should summary replace dimension indices with orthusfit
names if names Y and X were named in call to |
as_factor |
if use_names and as_factor then returns names as factors (useful for maintaining orderings when plotting) |
gather_prob |
if TRUE then prints quantiles in long format rather than wide (useful for some plotting functions) |
... |
other expressions to pass to summarise (using name 'val' unquoted is probably what you want) |
A list
Default calculates median, mean, 50% and 95% credible interval
## S3 method for class 'pibblefit' summary( object, pars = NULL, use_names = TRUE, as_factor = FALSE, gather_prob = FALSE, ... )
## S3 method for class 'pibblefit' summary( object, pars = NULL, use_names = TRUE, as_factor = FALSE, gather_prob = FALSE, ... )
object |
an object of class pibblefit |
pars |
character vector (default: c("Eta", "Lambda", "Sigma")) |
use_names |
should summary replace dimension indices with pibblefit
names if names Y and X were named in call to |
as_factor |
if use_names and as_factor then returns names as factors (useful for maintaining orderings when plotting) |
gather_prob |
if TRUE then prints quantiles in long format rather than wide (useful for some plotting functions) |
... |
other expressions to pass to summarise (using name 'val' unquoted is probably what you want) |
A list
See details for model. Should likely be called following
optimPibbleCollapsed
. Notation: N
is number of samples,
D
is number of multinomial categories, Q
is number
of covariates, iter
is the number of samples of eta
(e.g.,
the parameter n_samples
in the function optimPibbleCollapsed
)
uncollapsePibble( eta, X, Theta, Gamma, Xi, upsilon, seed, ret_mean = FALSE, ncores = -1L )
uncollapsePibble( eta, X, Theta, Gamma, Xi, upsilon, seed, ret_mean = FALSE, ncores = -1L )
eta |
array of dimension (D-1) x N x iter (e.g., |
X |
matrix of covariates of dimension Q x N |
Theta |
matrix of prior mean of dimension (D-1) x Q |
Gamma |
covariance matrix of dimension Q x Q |
Xi |
covariance matrix of dimension (D-1) x (D-1) |
upsilon |
scalar (must be > D) degrees of freedom for InvWishart prior |
seed |
seed to use for random number generation |
ret_mean |
if true then uses posterior mean of Lambda and Sigma corresponding to each sample of eta rather than sampling from posterior of Lambda and Sigma (useful if Laplace approximation is not used (or fails) in optimPibbleCollapsed) |
ncores |
(default:-1) number of cores to use, if ncores==-1 then uses default from OpenMP typically to use all available cores. |
Notation: Let denote the J-th row of a matrix Z.
While the collapsed model is given by:
Where ,
is a (D-1)x(D-1) covariance
matrix,
is a Q x Q covariance matrix, and
is ALRInv_D
transform.
The uncollapsed model (Full pibble model) is given by:
This function provides a means of sampling from the posterior distribution of
Lambda
and Sigma
given posterior samples of Eta
from
the collapsed model.
List with components
Lambda Array of dimension (D-1) x Q x iter (posterior samples)
Sigma Array of dimension (D-1) x (D-1) x iter (posterior samples)
The number of cores used
Timer
JD Silverman K Roche, ZC Holmes, LA David, S Mukherjee. Bayesian Multinomial Logistic Normal Models through Marginally Latent Matrix-T Processes. 2019, arXiv e-prints, arXiv:1903.11695
sim <- pibble_sim() # Fit model for eta fit <- optimPibbleCollapsed(sim$Y, sim$upsilon, sim$Theta%*%sim$X, sim$KInv, sim$AInv, random_pibble_init(sim$Y)) # Finally obtain samples from Lambda and Sigma fit2 <- uncollapsePibble(fit$Samples, sim$X, sim$Theta, sim$Gamma, sim$Xi, sim$upsilon, seed=2849)
sim <- pibble_sim() # Fit model for eta fit <- optimPibbleCollapsed(sim$Y, sim$upsilon, sim$Theta%*%sim$X, sim$KInv, sim$AInv, random_pibble_init(sim$Y)) # Finally obtain samples from Lambda and Sigma fit2 <- uncollapsePibble(fit$Samples, sim$X, sim$Theta, sim$Gamma, sim$Xi, sim$upsilon, seed=2849)
See details for model. Should likely be called following
optimPibbleCollapsed
. Notation: N
is number of samples,
D
is number of multinomial categories, Q
is number
of covariates, iter
is the number of samples of eta
(e.g.,
the parameter n_samples
in the function optimPibbleCollapsed
)
uncollapsePibble_sigmaKnown( eta, X, Theta, Gamma, GammaComb, Xi, sigma, upsilon, seed, ret_mean = FALSE, linear = FALSE, ncores = -1L )
uncollapsePibble_sigmaKnown( eta, X, Theta, Gamma, GammaComb, Xi, sigma, upsilon, seed, ret_mean = FALSE, linear = FALSE, ncores = -1L )
eta |
array of dimension (D-1) x N x iter (e.g., |
X |
matrix of covariates of dimension Q x N |
Theta |
matrix of prior mean of dimension (D-1) x Q |
Gamma |
covariance matrix of dimension Q x Q |
GammaComb |
summed covariance matrix across additive components of dimension Q x Q. |
Xi |
covariance matrix of dimension (D-1) x (D-1) |
sigma |
known covariance matrix of dimension (D-1) x (D-1) x iter |
upsilon |
scalar (must be > D) degrees of freedom for InvWishart prior |
seed |
seed to use for random number generation |
ret_mean |
if true then uses posterior mean of Lambda and Sigma corresponding to each sample of eta rather than sampling from posterior of Lambda and Sigma (useful if Laplace approximation is not used (or fails) in optimPibbleCollapsed) |
linear |
Boolean. Is this for a linear parameter? |
ncores |
(default:-1) number of cores to use, if ncores==-1 then uses default from OpenMP typically to use all available cores. |
Notation: Let denote the J-th row of a matrix Z.
While the collapsed model is given by:
Where ,
is a (D-1)x(D-1) covariance
matrix,
is a Q x Q covariance matrix, and
is ALRInv_D
transform.
The uncollapsed model (Full pibble model) is given by:
This function provides a means of sampling from the posterior distribution of
Lambda
and Sigma
given posterior samples of Eta
from
the collapsed model.
List with components
Lambda Array of dimension (D-1) x Q x iter (posterior samples)
Sigma Array of dimension (D-1) x (D-1) x iter (posterior samples)
The number of cores used
Timer
JD Silverman K Roche, ZC Holmes, LA David, S Mukherjee. Bayesian Multinomial Logistic Normal Models through Marginally Latent Matrix-T Processes. 2019, arXiv e-prints, arXiv:1903.11695
Intended to be called internally by package or object creator
verify(m, ...)
verify(m, ...)
m |
object |
... |
other arguments to be passed to verify |
throws error if verify test fails
Simple verification of passed bassetfit object
## S3 method for class 'bassetfit' verify(m, ...)
## S3 method for class 'bassetfit' verify(m, ...)
m |
an object of class bassetfit |
... |
not used |
throws error if any verification tests fail
Simple verification of passed orthusfit object
## S3 method for class 'orthusfit' verify(m, ...)
## S3 method for class 'orthusfit' verify(m, ...)
m |
an object of class orthusfit |
... |
not used |
throws error if any verification tests fail
Simple verification of passed pibblefit object
## S3 method for class 'pibblefit' verify(m, ...)
## S3 method for class 'pibblefit' verify(m, ...)
m |
an object of class pibblefit |
... |
not used |
throws error if any verification tests fail
Mock communities and calibration samples created for measuring and validating model of PCR bias. This data has been preprocessed as in the original manuscript.
an matrix Y (counts for each community member)
Justin D. Silverman, Rachael J. Bloom, Sharon Jiang, Heather K. Durand, Sayan Mukherjee, Lawrence A. David. (2019) Measuring and Mitigating PCR Bias in Microbiome Data. bioRxiv 604025; doi: https://doi.org/10.1101/604025