Sets up the MegaLMM model, selects starting values, and pre-calculates matrices for the Gibbs sampler.

setup_model_MegaLMM(
  Y,
  formula,
  extra_regressions = NULL,
  data,
  relmat = NULL,
  cis_genotypes = NULL,
  Lambda_fixed = NULL,
  run_parameters = MegaLMM_control(),
  posteriorSample_params = c("Lambda", "U_F", "F", "delta", "tot_F_prec", "F_h2",
    "tot_Eta_prec", "resid_h2", "B1", "B2_F", "B2_R", "U_R", "cis_effects",
    "Lambda_m_eff", "Lambda_pi", "B2_R_pi", "B2_F_pi"),
  posteriorMean_params = c(),
  posteriorFunctions = list(),
  run_ID = "MegaLMM_run"
)

Arguments

Y

either a) a n x p matrix of data (n individuals x p traits), or b) a list describing the observation_model, data, and associated parameters. This list should contain: i) observation_model: a function modeled after missing_observation_model (see code by typing this into the console) that draws posterior samples of Eta conditional on the observations (Y) and the current state of the MegaLMM model (current_state). The function should have the same form as missing_data, and must return Eta even if current_state is NULL. ii) observations: a data.frame containing the observaition-level data and associated covariates. This must include a column ID that is also present in data iii) any other parameters necessary for observation_model

formula

RHS of a model. The syntax is similar to lmer. Random effects are specified by (1+factor | group), with the left side of the '|' a design matrix, and the right side a random effect factor (group). For each random effect factor, a covariance matrix (K_mats) or precision matrix (K_inv_mats) can be provided. Unlike in lmer, each variable or covariate in the design matrix is given an independent random effect (ie no covariance among random effects is modeled), so two bars '||' gives an identical model to one bar. Note: the speed of the model will decrease dramatically with the number of random effects (multiplied by h2_divisions). Fixed effects only apply to the model residuals (ie X1) below, and are not regularized (prior precision = 0)

extra_regressions

Optional. A list including either: i) the matrix X (n x b) of regression coeffients, or ii) two matrices U (n x m) and V (m x b) such that X = U*V also, logical variables resids and fixed specify whether these coefficients apply to either or both of the model residuals or the factors

data

data.frame with n rows containing columns corresponding to the fixed and random effects

relmat

Optional. A list of covariance matrices for random effects. If none provided for any of the random effects, K is assumed to be the identity.

cis_genotypes

Optional. A list of n x ci matrices of length p giving cis-effect coefficients for each trait

Lambda_fixed

Optional. A matrix of the first k rows of Lambda that are fixed

run_parameters

See MegaLMM_control

posteriorSample_params

A character vector giving names of parameters to save all posterior samples

posteriorMean_params

A character vector giving names of parameters to save only the posterior mean.

posteriorFunctions

A list of named and quoted functions that that will calculate statistics based on the current_state. The functions can use variables in current_state, data_matrices, priors, or in the user's environment.

run_ID

A unique identifier for this model. The code will create a folder with this name to hold all posterior samples and diagnostic information during the run.

Value

An object of class MegaLMM_state with components:

  • current_state: a list of parameters in the current iteration of the sampler. Initially empty

  • Posterior: a list of arrays of posterior samples. Initially empty

  • RNG: current state of R's Random number generator (for re-starting chaings)

  • traitnames: vector of trait names (from colnames of Y)

  • run_parameters, run_variables, data_matrices, priors: input data and parameters

Details

The first step in fitting a MegaLMM model. This function sets up the model matrices based on the fixed and random effect formulas provided. This function must be followed by calls to set_priors_MegaLMM, initialize_variables_MegaLMM and initialize_MegaLMM, before the Gibbs sampler can be run with sample_MegaLMM.

The model is specified as:

y_i = g(eta_i)

Eta = rbind(eta_1,...,eta_n) = X1*B1 + X2_R*B2_R + F*Lambda + Z*U_R + E_R

F = X2_F * B2_F + Z*U_F + E_F

For sampling, we reparameterize as:

Qt*Eta = Qt*X*B + Qt*F*Lambda + Qt*ZL*U_R + Qt*E_R

Qt*F = Qt*X_F * B_F + Qt*ZL*U_F + Qt*E_F

where LTL = K and ZL = Z*L

We sample the quantities Qt*Eta, Qt*F, B, Lambda, U_R, U_F. We then back-calculate Eta and F.

Note: In the original MegaLMM paper, we set y_i = eta_i, so replaced Eta with Y above.