count_lines, het_reencode_bed, and probably all or most read_* functions), the shared internal code now works as intended when the file without extension matches a directory name, so that the directory is ignored and the extension is added as expected. Before, in this scenario the file was treated as if it existed as-is, even though it is a directory, so no extension was added, and then there would be a fatal attempt to read the directory. For count_lines, for example, the error is Error: basic_filebuf::underflow error reading the file: Is a directory since it involves C++ code, but pure-R functions may have different but related error messages.Rcpp code to use standard C++ library, avoiding various pitfalls and making code shorter and more readable!symlink to create symbolic links (ought to work across operating systems).
R.utils for functions required by symlink.het_reencode_bed added option make_bim_fam to create symbolic links to bim and fam files automatically.het_reencode_bed that reencodes a BED file to have heterozygote indicators, which is useful to scan for dominance effects.write_bed_cpp corrected variable names (had n_ind and m_loci reversed; no bug resulted from previous mixup) and one word in an error message.n_ind > m_loci) to unit tests, confirmed that existing code works in that case too.sim_and_write_plink that facilitates simulating and writing data in small chunks to save memorysprintf usage (see below).
sprintf, all of which then went to stop, with direct calls to stop.
* checking compiled code ... WARNING
File ‘genio/libs/genio.so’:
Found ‘sprintf’, possibly from ‘sprintf’ (C)
Objects: ‘read_bed_cpp.o’, ‘write_bed_cpp.o’
Compiled code should not call entry points which might terminate R nor
write to stdout/stderr instead of to the console, nor use Fortran I/O
nor system RNGs nor [v]sprintf.
cran-comments.mdwrite_plink added option write_phen to streamline writing simulation outputs more (as phen files are often required).cran-comments.mdread_bim, write_bim, and geno_to_char: reversed columns "ref" and "alt" in BIM table
read_bim now returns a tibble with allele names "alt" and "ref" in that order (columns still ordered as they appear in input file)write_bim writes tables with column "alt" before "ref"geno_to_char reverses the role of "alt" and "ref" correspondingly so that the output remains the same as before these changes (the original outputs were correct as validated against the plink1 "ped" text genotypes).write_bed now checks if output directory exists prior to attempting to open the file for writing in the C++ part of the code.
*** buffer overflow detected ***: terminated
Aborted (core dumped)
write_grm added the same options added yesterday to read_grm (see there) to write GRM-like formats produced by plink2, particularly data produced by --make-king with bin or bin4 options.read_grm edited documentation only, particularly added parsing examples for various plink2 --make-king outputs.read_grm added several options to facilitate reading GRM-like formats produced by plink2, particularly data produced by --make-king with bin or bin4 options. Added options:
ext to specify alternate shared extensions (like "grm" or "king").shape to specify whether the input is a full "square" matrix, a "triangle" with diagonal (default for GRM) or a "strict" triangle without diagonal (for KING-robust).size_bytes to parse bin4/GRM (4) or bin (8) plink2 data.comment to control comment characters in the <ext>.id file.vec_to_mat_sym and mat_sym_to_vec added option strict to exclude diagonal in their transformations.read_tab_generic added option comment to set comment characters.count_lines and all read_* functions, which use add_ext_read internally to sort out file paths:
ext = NA finds files that end in a .gz extension that was not specified (before those files were incorrectly not found).read_matrix( 'my-file', ext = NA ) now finds and reads my-file.gz if it exists and my-file does not exist.README fixed github installation instructions to build vignette, explained how to view it.read_eigenvec and write_eigenvec have new option plink2 for better handling files with headers in the default style of plink2.write_bed, write_plink, and count_lines fixed a bug: write (or read) failed if output path started with "~/" on Unix systems.
Problem was the path wasn't expanded in C++ code.
write_plink( '~/test', X ) failed with message:
Writing: ~/test.bed
Error in write_bed_cpp(file, X, append = append) :
Could not open BED file `~/test.bed` for writing: No such file or directory
Calls: write_plink -> write_bed -> write_bed_cpp
Execution halted
read_eigenvec fixed this warning:
value argument of names<- must be a character vector as of tibble 3.0.0."write_bed/plink with append = TRUE debugged to write in "binary" mode.
append option was introduced in 1.0.15.9000 (2020-07-03).readr::read_table2 with readr::read_table
read_table2() was deprecated in readr 2.0.0. Please use read_table() instead.
readr (>= 2.0.0, already on CRAN).pryr::object_size with lobstr::obj_size (a suggested package used in vignette only; the former was recently superseded by the latter)
pryr::object_size output (now of class lobstr_bytes), which triggered a CRAN warning.read_bed and read_plink no longer stop with an error if the input BED file has non-zero padding bits.
plink2 binary and the BEDMatrix R package load this file without complaining about the non-zero pads, so I decided to agree in that behavior. I verified that genio's data agrees with BEDMatrix after the fix.read_bed now reads file even if it doesn't have a BED extension (as long as it exists).
ext option.read_* functions to clarify behavior regarding file and ext options.real_path to add_ext_read to make the distinction clearer to add_ext.read_* functions use add_ext_read while all write_* functions use add_ext.
Only function count_lines switched from add_ext to add_ext_read (in addition to read_bed, which led to the earlier change), but count_lines didn't have a default extension so this change is less likely to matter.NEWS.md slightly to improve its automatic parsing.lfa from suggested packages (no connection anymore since lfa comparison was removed from vignette in version 1.0.19.9000).count_lines now returns value as integer instead of double (a very minor bug/annoyance fix).geno_to_char to convert genotype numeric codes (allele dosages such as 0, 1, 2) into character codes such as 'A/A', 'A/G', 'G/G' (depending on locus).read_matrix and write_matrix, intended for admixture inference data.read_bed, which previously incorrectly stated that the numerical genotypes (allele dosages) counted alternative alleles (allele 2 in BIM table), whereas the truth is that they count reference alleles (allele 1).genio package docread_bed added a missing file check in R code.
lfa comparison.
lfa fork doesn't have function read.bed anymore, previously the slowest and most memory-hungry competitor, which genio::read_plink was being compared to.read_eigenvec added Plink 2 support via comment option, which by default now treats data after # as comments.
This enables automatically parsing eigenvec files generated by Plink 2, whose header line starts with # (this header is ignored).
Previously, parsing Plink 2 eigenvec files generated warnings and resulted in the first row being an additional row with all NA values.count_lines, uses C++ code (via Rcpp) to count file lines extremely quickly.
Intended for counting numbers of individuals (from FAM and equivalent files) or numbers of loci (from BIM and equivalent files) when these files are extremely large and no other information is needed from those files.write_eigenvec and read_eigenvec to read and write Plink/GCTA eigenvector files.write_plink, write_bed, and write_bim now have append option, for writing extremely large files in parts.validate_tab_generic.read_grm and write_grm to read and write GCTA's binary genetic relatedness matrix (GRM) format.require_files_grm, delete_files_grm, require_files_phen, and delete_files_phen.tidy_kinship to transform a square symmetric matrix into a long-format table that is easy to sort and add annotations toman/figures/read_phen and write_phen, a phenotype format (very similar to Plink's FAM) used by GCTA and EMMAX.write_plink returns the data it wrote, invisibly as a list.
Most useful for auto-generated data.include <cerrno> to my cpp code.read_plink now includes row and column names automatically.read_bed accepts either row and column names or just their numbers.write_plink checks these row and column names against the BIM and FAM tables for consistency, if these are all present.BEDMatrix in testing, since it leaves temporary files open and on Windows they do not get deleted and leave confusing error messages behind.read_bed and read_plink!
Now all Plink reading and writing operations are supported.BEDMatrix, snpStats, and lfa.ind_to_fam, sex_to_int, sex_to_char.write_plink now returns NULL invisibly.require_files_plink, delete_files_plink.make_fam, make_bim, and write_plink functions.read_fam bug (used to require phenotypes to be integers, now can be double numbers).verbose option to write_bed.write_fam, write_bim, write_ind, write_snp functions.read_* code, updated docs and tests.write_bed error message for invalid data, documentation.write_bed tests.write_bed written in Rcpp and thoroughly tested against BEDMatrix package.read_bim, read_fam, read_ind, and read_snp functions.