Function to create run_parameters list for initializing MegaLMM model

MegaLMM_control(
  which_sampler = list(Y = 1, F = 1),
  run_sampler_times = 1,
  scale_Y = c(T, F),
  K = 20,
  h2_divisions = 100,
  h2_step_size = NULL,
  drop0_tol = 1e-14,
  K_eigen_tol = 1e-10,
  burn = 100,
  thin = 2,
  max_NA_groups = Inf,
  svd_K = TRUE,
  verbose = TRUE,
  save_current_state = TRUE,
  diagonalize_ZtZ_Kinv = TRUE,
  ...
)

Arguments

which_sampler

List with two elements (Y and F) specifying which sampling function to use for the observations (Y) and factors (F). Each is a number in 1-4. 1-3 are block updators. 4 is a single-site updater. MegaLMM uses 1-3 depending on data dimensions. MegaBayesC uses 4 which updates each coefficient individually.

run_sampler_times

For which_sampler==4, we can repeat the single-site sampler multiple times to help take larger steps each iteration.

scale_Y

Should the Y values be centered and scaled? Recommend, except for simulated data.

K

number of factors

h2_divisions

A scalar or vector of length equal to number of random effects. In MegaLMM, random effects are parameterized as proportions of the total variance of all random effects plus residuals. The prior on the variance componets is discrete spanning the interval [0,1) over each varince component proportion with h2_divisions equally spaced values is constructed. If h2_divisions is a scalar, the prior for each variance component has this number of divisions. If a vector, the length should equal the number of variance components, in the order of the random effects specified in the model

h2_step_size

Either NULL, or a scaler in the range (0,1]. If NULL, h2's will be sampled based on the marginal probability over all possible h2 vectors. If a scalar, a Metropolis-Hastings update step will be used for each h2 vector. The trail value will be selected uniformly from all possible h2 vectors within this Euclidean distance from the current vector.

drop0_tol

A scalar giving the a tolerance for the drop0() function that will be applied to various symmetric (possibly) sparse matrices to try to fix numerical errors and increase sparsity.

K_eigen_tol

A scalar giving the minimum eigenvalue of a K matrix allowed. During pre-processing, eigenvalues of each K matrix will be calculated using svd(K). Only eigenvectors of K with corresponding eigenvalues greater than this value will be kept. If smaller eigenvalues exist, the model will be transformed to reduce the rank of K, by multiplying Z by the remaining eigenvectors of K. This transformation is undone before posterior samples are recorded, so posterior samples of U_F and U_R are untransformed.

burn

burnin length of the MCMC chain

thin

thinning rate of the MCMC chain

max_NA_groups

If 0, all NAs will be imputed during sampling. If Inf, all NAs will be marginalized over. If in (0,Inf), up to this many groups of columns will be separately sampled. The minimum number of NAs in each column not in one of these groups will be imputed.

svd_K

If TRUE, the the diagonalization of ZKZt for the first random effect is accomplished using this algorithm: https://math.stackexchange.com/questions/67231/singular-value-decomposition-of-product-of-matrices which doesn't require forming ZKTt. If FALSE, the SVD of ZKZt for the first random effect is calculated directly. TRUE is generally faster if the same genomes are repeated several times.

verbose

should progress during initiation and sampling be printed?

save_current_state

should the current state of the sampler be saved every time the function sample_MegaLMM is called?

See also