MegaLMM_control.Rd
Function to create run_parameters list for initializing MegaLMM model
MegaLMM_control(
which_sampler = list(Y = 1, F = 1),
run_sampler_times = 1,
scale_Y = c(T, F),
K = 20,
h2_divisions = 100,
h2_step_size = NULL,
drop0_tol = 1e-14,
K_eigen_tol = 1e-10,
burn = 100,
thin = 2,
max_NA_groups = Inf,
svd_K = TRUE,
verbose = TRUE,
save_current_state = TRUE,
diagonalize_ZtZ_Kinv = TRUE,
...
)
List with two elements (Y and F) specifying which sampling function to use for the observations (Y) and factors (F). Each is a number in 1-4. 1-3 are block updators. 4 is a single-site updater. MegaLMM uses 1-3 depending on data dimensions. MegaBayesC uses 4 which updates each coefficient individually.
For which_sampler==4
, we can repeat the single-site sampler multiple times to help take larger steps each iteration.
Should the Y values be centered and scaled? Recommend, except for simulated data.
number of factors
A scalar or vector of length equal to number of random effects. In MegaLMM, random
effects are parameterized as proportions of the total variance of all random effects plus residuals.
The prior on the variance componets is discrete spanning the interval [0,1) over each varince component proportion
with h2_divisions
equally spaced values is constructed. If
h2_divisions
is a scalar, the prior for each variance component has this number of divisions.
If a vector, the length should equal the number of variance components, in the order of the random effects specified in the model
Either NULL, or a scaler in the range (0,1]. If NULL, h2's will be sampled based on the marginal probability over all possible h2 vectors. If a scalar, a Metropolis-Hastings update step will be used for each h2 vector. The trail value will be selected uniformly from all possible h2 vectors within this Euclidean distance from the current vector.
A scalar giving the a tolerance for the drop0()
function that will be applied
to various symmetric (possibly) sparse matrices to try to fix numerical errors and increase sparsity.
A scalar giving the minimum eigenvalue of a K matrix allowed. During pre-processing,
eigenvalues of each K matrix will be calculated using svd(K)
. Only eigenvectors of K with corresponding eigenvalues
greater than this value will be kept. If smaller eigenvalues exist, the model will be transformed
to reduce the rank of K, by multiplying Z by the remaining eigenvectors of K. This transformation
is undone before posterior samples are recorded, so posterior samples of U_F
and U_R
are
untransformed.
burnin length of the MCMC chain
thinning rate of the MCMC chain
If 0, all NAs will be imputed during sampling. If Inf, all NAs will be marginalized over. If in (0,Inf), up to this many groups of columns will be separately sampled. The minimum number of NAs in each column not in one of these groups will be imputed.
If TRUE, the the diagonalization of ZKZt for the first random effect is accomplished using this algorithm: https://math.stackexchange.com/questions/67231/singular-value-decomposition-of-product-of-matrices which doesn't require forming ZKTt. If FALSE, the SVD of ZKZt for the first random effect is calculated directly. TRUE is generally faster if the same genomes are repeated several times.
should progress during initiation and sampling be printed?
should the current state of the sampler be saved every time the function sample_MegaLMM
is called?
MegaLMM_init
, sample_MegaLMM
, print.MegaLMM_state