symlink
to create symbolic links (ought to work across operating systems).
R.utils
for functions required by symlink
.het_reencode_bed
added option make_bim_fam
to create symbolic links to bim and fam files automatically.het_reencode_bed
that reencodes a BED file to have heterozygote indicators, which is useful to scan for dominance effects.write_bed_cpp
corrected variable names (had n_ind
and m_loci
reversed; no bug resulted from previous mixup) and one word in an error message.n_ind > m_loci
) to unit tests, confirmed that existing code works in that case too.sim_and_write_plink
that facilitates simulating and writing data in small chunks to save memorysprintf
usage (see below).
sprintf
, all of which then went to stop
, with direct calls to stop
.
* checking compiled code ... WARNING
File ‘genio/libs/genio.so’:
Found ‘sprintf’, possibly from ‘sprintf’ (C)
Objects: ‘read_bed_cpp.o’, ‘write_bed_cpp.o’
Compiled code should not call entry points which might terminate R nor
write to stdout/stderr instead of to the console, nor use Fortran I/O
nor system RNGs nor [v]sprintf.
cran-comments.md
write_plink
added option write_phen
to streamline writing simulation outputs more (as phen files are often required).cran-comments.md
read_bim
, write_bim
, and geno_to_char
: reversed columns "ref" and "alt" in BIM table
read_bim
now returns a tibble with allele names "alt" and "ref" in that order (columns still ordered as they appear in input file)write_bim
writes tables with column "alt" before "ref"geno_to_char
reverses the role of "alt" and "ref" correspondingly so that the output remains the same as before these changes (the original outputs were correct as validated against the plink1 "ped" text genotypes).write_bed
now checks if output directory exists prior to attempting to open the file for writing in the C++ part of the code.
*** buffer overflow detected ***: terminated
Aborted (core dumped)
write_grm
added the same options added yesterday to read_grm
(see there) to write GRM-like formats produced by plink2
, particularly data produced by --make-king
with bin
or bin4
options.read_grm
edited documentation only, particularly added parsing examples for various plink2 --make-king
outputs.read_grm
added several options to facilitate reading GRM-like formats produced by plink2
, particularly data produced by --make-king
with bin
or bin4
options. Added options:
ext
to specify alternate shared extensions (like "grm" or "king").shape
to specify whether the input is a full "square" matrix, a "triangle" with diagonal (default for GRM) or a "strict" triangle without diagonal (for KING-robust).size_bytes
to parse bin4
/GRM (4) or bin
(8) plink2 data.comment
to control comment characters in the <ext>.id
file.vec_to_mat_sym
and mat_sym_to_vec
added option strict
to exclude diagonal in their transformations.read_tab_generic
added option comment
to set comment characters.count_lines
and all read_*
functions, which use add_ext_read
internally to sort out file paths:
ext = NA
finds files that end in a .gz
extension that was not specified (before those files were incorrectly not found).read_matrix( 'my-file', ext = NA )
now finds and reads my-file.gz
if it exists and my-file
does not exist.README
fixed github installation instructions to build vignette, explained how to view it.read_eigenvec
and write_eigenvec
have new option plink2
for better handling files with headers in the default style of plink2.write_bed
, write_plink
, and count_lines
fixed a bug: write (or read) failed if output path started with "~/" on Unix systems.
Problem was the path wasn't expanded in C++ code.
write_plink( '~/test', X )
failed with message:
Writing: ~/test.bed
Error in write_bed_cpp(file, X, append = append) :
Could not open BED file `~/test.bed` for writing: No such file or directory
Calls: write_plink -> write_bed -> write_bed_cpp
Execution halted
read_eigenvec
fixed this warning:
value
argument of names<-
must be a character vector as of tibble 3.0.0."write_bed/plink
with append = TRUE
debugged to write in "binary" mode.
append
option was introduced in 1.0.15.9000 (2020-07-03).readr::read_table2
with readr::read_table
read_table2()
was deprecated in readr 2.0.0. Please use read_table()
instead.
readr
(>= 2.0.0, already on CRAN).pryr::object_size
with lobstr::obj_size
(a suggested package used in vignette only; the former was recently superseded by the latter)
pryr::object_size
output (now of class lobstr_bytes
), which triggered a CRAN warning.read_bed
and read_plink
no longer stop with an error if the input BED file has non-zero padding bits.
plink2
binary and the BEDMatrix
R package load this file without complaining about the non-zero pads, so I decided to agree in that behavior. I verified that genio
's data agrees with BEDMatrix
after the fix.read_bed
now reads file
even if it doesn't have a BED extension (as long as it exists).
ext
option.read_*
functions to clarify behavior regarding file
and ext
options.real_path
to add_ext_read
to make the distinction clearer to add_ext
.read_*
functions use add_ext_read
while all write_*
functions use add_ext
.
Only function count_lines
switched from add_ext
to add_ext_read
(in addition to read_bed
, which led to the earlier change), but count_lines
didn't have a default extension so this change is less likely to matter.NEWS.md
slightly to improve its automatic parsing.lfa
from suggested packages (no connection anymore since lfa
comparison was removed from vignette in version 1.0.19.9000).count_lines
now returns value as integer instead of double (a very minor bug/annoyance fix).geno_to_char
to convert genotype numeric codes (allele dosages such as 0, 1, 2) into character codes such as 'A/A', 'A/G', 'G/G' (depending on locus).read_matrix
and write_matrix
, intended for admixture inference data.read_bed
, which previously incorrectly stated that the numerical genotypes (allele dosages) counted alternative alleles (allele 2 in BIM table), whereas the truth is that they count reference alleles (allele 1).genio
package docread_bed
added a missing file check in R code.
lfa
comparison.
lfa
fork doesn't have function read.bed
anymore, previously the slowest and most memory-hungry competitor, which genio::read_plink
was being compared to.read_eigenvec
added Plink 2 support via comment
option, which by default now treats data after #
as comments.
This enables automatically parsing eigenvec files generated by Plink 2, whose header line starts with #
(this header is ignored).
Previously, parsing Plink 2 eigenvec files generated warnings and resulted in the first row being an additional row with all NA
values.count_lines
, uses C++ code (via Rcpp) to count file lines extremely quickly.
Intended for counting numbers of individuals (from FAM and equivalent files) or numbers of loci (from BIM and equivalent files) when these files are extremely large and no other information is needed from those files.write_eigenvec
and read_eigenvec
to read and write Plink/GCTA eigenvector files.write_plink
, write_bed
, and write_bim
now have append
option, for writing extremely large files in parts.validate_tab_generic
.read_grm
and write_grm
to read and write GCTA's binary genetic relatedness matrix (GRM) format.require_files_grm
, delete_files_grm
, require_files_phen
, and delete_files_phen
.tidy_kinship
to transform a square symmetric matrix into a long-format table that is easy to sort and add annotations toman/figures/
read_phen
and write_phen
, a phenotype format (very similar to Plink's FAM) used by GCTA and EMMAX.write_plink
returns the data it wrote, invisibly as a list.
Most useful for auto-generated data.include <cerrno>
to my cpp code.read_plink
now includes row and column names automatically.read_bed
accepts either row and column names or just their numbers.write_plink
checks these row and column names against the BIM and FAM tables for consistency, if these are all present.BEDMatrix
in testing, since it leaves temporary files open and on Windows they do not get deleted and leave confusing error messages behind.read_bed
and read_plink
!
Now all Plink reading and writing operations are supported.BEDMatrix
, snpStats
, and lfa
.ind_to_fam
, sex_to_int
, sex_to_char
.write_plink
now returns NULL
invisibly.require_files_plink
, delete_files_plink
.make_fam
, make_bim
, and write_plink
functions.read_fam
bug (used to require phenotypes to be integers, now can be double numbers).verbose
option to write_bed
.write_fam
, write_bim
, write_ind
, write_snp
functions.read_*
code, updated docs and tests.write_bed
error message for invalid data, documentation.write_bed
tests.write_bed
written in Rcpp and thoroughly tested against BEDMatrix
package.read_bim
, read_fam
, read_ind
, and read_snp
functions.