sim_pedigree optimized for sparse case: internal C++ code now identifies all close relatives in a single scan of the sparse structure, which is much faster than the repeated queries performed before.kinship_fam runtime optimized for sparse case: internal C++ code has new structure for extracting rows considerably faster (at the expense of somewhat more memory).kinship_fam and sim_pedigree added option sparse to handle sparse kinship matrices, which reduce memory usage when applicable but are currently slower algorithms otherwise.pop_recomb and geno_last_gen_admix_recomb further optimized based on code profiling, and smarter codingpop_recomb and geno_last_gen_admix_recomb added option indexes_chr_ends to provided precalculated chromosome end indexes, which can result in improved runtime when looped.indexes_chr to precalculate chromosome end indexes, to provide to the above functions if desired.pop_recomb option indexes_loci now must be a range (start and end indexes) rather than take on arbitrary values.geno_last_gen_admix_recomb that efficiently simulates admixed families with LD in the ancestors
rlang, Matrix, and methods, which are used by the new functionrecomb_haplo_inds now accepts Matrix-package objects (including sparse matrices) as input haplotype matrixrecomb_admix_inds now accepts inputs with a single ancestry (which may happen when used through geno_last_gen_admix_recomb). Before it required at least two ancestries.pop_recomb added option indexes_loci to simulate only a portion of the genome available.tidy_recomb_map_inds and recomb_founder_blocks_inherited to create and manipulate tidy versions of our recombination data structure, which are more useful in some cases, currently focused on identifying founder blocks that are inherited by focal individuals.
dplyr and tidyselect as new package dependencies, which is used for both of these new functions that manipulate tidy tables.pop_recomb added support for option haps to be a BEDMatrix object, and added option loci_on_cols to accept a transposed haps input.pop_recomb to simulate genotypes with linkage disequilibrium (LD) given a population of haplotypes, using a Li-Stephens-like model of haplotype copyingbim_add_posg to calculate genetic positions from base pair positions and a genetic map.README.md pagerecomb_map_hg.cran-comments.mdrecomb_admix_inds to produce true population ancestry dosage matrices that parallel genotype matrices, useful for regression models that incorporate local ancestry.fam_ancestors to construct simple ancestor pedigrees for a single person with a desired number of generations, and automatic names.recomb_last_gen, a wrapper around recomb_fam that processes data in discrete generations and returns the recombination breaks/blocks of the final generation only, to reduce memory usage.
Same analogy of previous *_last_gen functions and their corresponding *_fam versions.recomb_haplo_inds to construct the haplotypes of descendant individuals given the haplotypes of the ancestors.recomb_geno_inds to construct a standard genotype matrix from the haplotypes of individuals (a complex nested list structure).recomb_fam and recomb_init_founders slight change in input and output formats: each chromosome list now has column posg indicating end of recombination block in genetic position (the column used to be called end; changed to match notation in recombination map, where pos is position in base pairs and posg is in genetic distance).recomb_init_founders argument lengs may now be a recombination map for simplicity, from which the desired chromosome lengths are extracted, rather than having to extract them in a separate step.recomb_map_inds to map recombination breaks from genetic positions to base pair coordinates.recomb_map_fix_ends_chr to shift and extrapolate genetic map to chromosome ends.recomb_map_simplify_chr to simplify genetic maps by removing rows that can be interpolated to within a desired error.recomb_map_hg38 and recomb_map_hg37, which was created from existing maps processed by the above two functions.First set of updates for simulating with recombination!
recomb_fam and recomb_init_founders for simulating recombination breaks for a pedigree!
sim_pedigree error rate ~100x less likely than before (error due to not being able to pair everybody; previously, this unlikely error expected 0.4% of the time actually occurred on CRAN, now it is expected 0.002% of the time).build_vignettes = TRUE instead of build_opts = c() (which did not build vignettes anymore).draw_couples_nearest removed unnecessary checks (redundant with unit tests)cran-comments.md.par() in vignette examples.draw_ prefix from genotype functions:
draw_geno_fam -> geno_famdraw_geno_last_gen -> geno_last_genadmix_last_gen. same deal as previous *_last_gen functions (less coding in practice, memory savings).drop = FALSE in some necessary cases.kinship_last_gen for calculating kinship for last generation only, of a pedigree with non-overlapping generations, saving lots of memory when the number of generations is large (behavior resembles original function, though internally it's a wrapper around the more general kinship_fam).draw_geno_fam (for the other *_fam functions, which are respectively sources for *_last_gen functions).draw_geno_fam in C++ (using Rcpp).
New version is much faster and uses about half as much memory as the previous pure-R version!draw_geno_last_gen for drawing genotypes for last generation only, of a pedigree with non-overlapping generations, saving lots of memory when the number of generations is large (behavior resembles original function, though internally it's a wrapper around the more general draw_geno_fam).sim_pedigree now returns ids (ids of IDs separated by generation) among its list elements, after fam but before kinship_local.sim_pedigree removed verbose option (it was a holdout from original code, which could get stuck in some situations; the new code doesn't get stuck).simfam-package).sim_pedigree now returns parents of founders as NA (used to be 0).NA parents correctly as missing (i.e., those individuals with missing parents are treated as founders), and by default the empty strings ('') and zero (0) are also treated as missing (used to be only 0 was treated as missing).sim_pedigree now assigns IDs without g prefix (format is just \d+-\d+ with two integers denoting generation and index, separated by a dash).sim_pedigree, kinship_fam, admix_fam, draw_geno_fam, draw_sex, prune_fam.sim_pedigree: made n first and only mandatory argument (used to be second), now G is second (used to be first) and defaults to G = length(n).