simtrait: Simulate Complex Traits from Genotypes
Introduction | Sample usage | Load libraries required for this vignette | Simulate an admixed population | Simulate a trait from random coefficients (RC) model | Compare sample covariance of trait to theoretical expectation | Simulated trait without ancestral allele frequencies | Simulated trait from infinitesimal model | Simulated trait from fixed effect sizes (FES) model | Environment group effects | Model | We assume the linear polygenic model for a quantitative trait:$$\mathbf | Algorithm | Constructing environment and residual effects | Constructing coefficients | Scaling coefficients | Scaling using known ancestral allele frequencies | Scaling using a known kinship matrix | Centering the trait | Centering using known ancestral allele frequencies | This is the preferred approach as it is the only case that guarantees success.Given our model, we obtain the desired overall trait mean $\mu$ by choosing the intercept to be$$\alpha | Centering without ancestral allele frequencies | How NOT to center the trait vector | Now let's discuss why the obvious way of centering the trait without known ancestral allele frequencies doesn't work.Why not use the sample allele frequencies as$$\alpha | \mu - 2 \mathbf{\hat{p}}' \boldsymbol{\beta} \quad ?$$Centering the trait this way is equivalent to centering genotypes at each locus:$$\mathbf{y} = \mathbf{1}n \alpha + \sum{i=1}^m (\mathbf{x}_i - 2 \hat{p}_i \mathbf{1}_n) \beta_i + \boldsymbol{\epsilon}.$$However, this operation introduces a distortion in the covariance of the genotypes [@Ochoa083923]:$$\Cov \left( \mathbf{x}_i - 2 \hat{p}_i \mathbf{1}_n \right) | References