setup_model_MegaLMM.Rd
Sets up the MegaLMM model, selects starting values, and pre-calculates matrices for the Gibbs sampler.
setup_model_MegaLMM(
Y,
formula,
extra_regressions = NULL,
data,
relmat = NULL,
cis_genotypes = NULL,
Lambda_fixed = NULL,
run_parameters = MegaLMM_control(),
posteriorSample_params = c("Lambda", "U_F", "F", "delta", "tot_F_prec", "F_h2",
"tot_Eta_prec", "resid_h2", "B1", "B2_F", "B2_R", "U_R", "cis_effects",
"Lambda_m_eff", "Lambda_pi", "B2_R_pi", "B2_F_pi"),
posteriorMean_params = c(),
posteriorFunctions = list(),
run_ID = "MegaLMM_run"
)
either a) a n x p matrix of data (n individuals x p traits), or b) a list describing
the observation_model, data, and associated parameters. This list should contain:
i) observation_model
: a function modeled after missing_observation_model
(see code by typing this into the console) that draws posterior samples of Eta conditional on
the observations (Y) and the current state of the MegaLMM model (current_state).
The function should have the same form as missing_data,
and must return Eta even if current_state is NULL.
ii) observations
: a data.frame containing the observaition-level data and associated covariates.
This must include a column ID
that is also present in data
iii) any other parameters necessary for observation_model
RHS of a model. The syntax is similar to lmer.
Random effects are specified by (1+factor | group), with the left side of the '|' a design
matrix, and the right side a random effect factor (group). For each random effect factor, a
covariance matrix (K_mats
) or precision matrix (K_inv_mats
) can be provided.
Unlike in lmer
, each variable or covariate in the design matrix is given an
independent random effect (ie no covariance among random effects is modeled), so two bars '||'
gives an identical model to one bar.
Note: the speed of the model will decrease dramatically with the number of random effects (multiplied
by h2_divisions).
Fixed effects only apply to the model residuals (ie X1) below, and are not regularized (prior precision = 0)
Optional. A list including either:
i) the matrix X (n x b) of regression coeffients, or
ii) two matrices U (n x m) and V (m x b) such that X = U*V
also, logical variables resids
and fixed
specify whether these coefficients apply to either or
both of the model residuals or the factors
data.frame with n rows containing columns corresponding to the fixed and random effects
Optional. A list of covariance matrices for random effects. If none provided for any of the random effects, K is assumed to be the identity.
Optional. A list of n x ci matrices of length p giving cis-effect coefficients for each trait
Optional. A matrix of the first k rows of Lambda that are fixed
See MegaLMM_control
A character vector giving names of parameters to save all posterior samples
A character vector giving names of parameters to save only the posterior mean.
A list of named and quoted functions that that will calculate statistics based on the current_state.
The functions can use variables in current_state
, data_matrices
, priors
, or in the user's environment.
A unique identifier for this model. The code will create a folder with this name to hold all posterior samples and diagnostic information during the run.
An object of class MegaLMM_state with components:
current_state: a list of parameters in the current iteration of the sampler. Initially empty
Posterior: a list of arrays of posterior samples. Initially empty
RNG: current state of R's Random number generator (for re-starting chaings)
traitnames: vector of trait names (from colnames of Y)
run_parameters, run_variables, data_matrices, priors: input data and parameters
The first step in fitting a MegaLMM model. This function sets up the model matrices based on the fixed and random effect formulas provided. This function must be followed by calls to set_priors_MegaLMM, initialize_variables_MegaLMM and initialize_MegaLMM, before the Gibbs sampler can be run with sample_MegaLMM.
The model is specified as:
y_i = g(eta_i)
Eta = rbind(eta_1,...,eta_n) = X1*B1 + X2_R*B2_R + F*Lambda + Z*U_R + E_R
F = X2_F * B2_F + Z*U_F + E_F
For sampling, we reparameterize as:
Qt*Eta = Qt*X*B + Qt*F*Lambda + Qt*ZL*U_R + Qt*E_R
Qt*F = Qt*X_F * B_F + Qt*ZL*U_F + Qt*E_F
where LTL = K and ZL = Z*L
We sample the quantities Qt*Eta, Qt*F, B, Lambda, U_R, U_F. We then back-calculate Eta and F.
Note: In the original MegaLMM paper, we set y_i = eta_i
, so replaced Eta
with Y
above.