Changes in version 1.1.6.9000 - For various file reading functions that assume or can have a file extension specified to be automatically added (count_lines, het_reencode_bed, and probably all or most read_* functions), the shared internal code now works as intended when the file without extension matches a directory name, so that the directory is ignored and the extension is added as expected. Before, in this scenario the file was treated as if it existed as-is, even though it is a directory, so no extension was added, and then there would be a fatal attempt to read the directory. For count_lines, for example, the error is Error: basic_filebuf::underflow error reading the file: Is a directory since it involves C++ code, but pure-R functions may have different but related error messages. Changes in version 1.1.5.9000 - Modernized Rcpp code to use standard C++ library, avoiding various pitfalls and making code shorter and more readable! Changes in version 1.1.4.9000 - Added function symlink to create symbolic links (ought to work across operating systems). - Added package dependency R.utils for functions required by symlink. - Function het_reencode_bed added option make_bim_fam to create symbolic links to bim and fam files automatically. Changes in version 1.1.3.9000 - Added function het_reencode_bed that reencodes a BED file to have heterozygote indicators, which is useful to scan for dominance effects. - Internal changes with no user-facing effects - C++ function write_bed_cpp corrected variable names (had n_ind and m_loci reversed; no bug resulted from previous mixup) and one word in an error message. - Added wide genotype matrix (n_ind > m_loci) to unit tests, confirmed that existing code works in that case too. Changes in version 1.1.2.9000 - Added function sim_and_write_plink that facilitates simulating and writing data in small chunks to save memory Changes in version 1.1.2 (2023-01-06) - 6th CRAN submission - Fixed an R-devel warning about sprintf usage (see below). - Solution was to replace calls to sprintf, all of which then went to stop, with direct calls to stop. * checking compiled code ... WARNING File ‘genio/libs/genio.so’: Found ‘sprintf’, possibly from ‘sprintf’ (C) Objects: ‘read_bed_cpp.o’, ‘write_bed_cpp.o’ Compiled code should not call entry points which might terminate R nor write to stdout/stderr instead of to the console, nor use Fortran I/O nor system RNGs nor [v]sprintf. - Ran spellcheck, made one correction. - Updated cran-comments.md Changes in version 1.1.1.9000 - Function write_plink added option write_phen to streamline writing simulation outputs more (as phen files are often required). Changes in version 1.1.1 (2022-04-27) - 5th CRAN submission - Ran spellcheck (no changes) - Updated cran-comments.md Changes in version 1.1.0.9000 - Functions read_bim, write_bim, and geno_to_char: reversed columns "ref" and "alt" in BIM table - Previous versions treated the first allele column from the (headerless) BIM file as "ref", second as "alt", in part because the plink 1.9 documentation was unclear about their identities (they were simply called alleles "1/clear bits/usually minor" and "2/set bits/usually major"). - New version has first allele column as "alt", second as "ref", after seeing plink 2.0 documentation define them explicitly this way, and after noticing reverse correspondence with ref/alt values in VCF files (thanks Ochoa Lab members Amika Sood and Tiffany Tu for reporting these issues!). - read_bim now returns a tibble with allele names "alt" and "ref" in that order (columns still ordered as they appear in input file) - write_bim writes tables with column "alt" before "ref" - geno_to_char reverses the role of "alt" and "ref" correspondingly so that the output remains the same as before these changes (the original outputs were correct as validated against the plink1 "ped" text genotypes). - All documentation was updated to reflect these changes. Changes in version 1.0.32.9000 - Function write_bed now checks if output directory exists prior to attempting to open the file for writing in the C++ part of the code. - The original code crashed "ruthlessly" in RStudio if the path contains a directory that does not exist, triggering an error such as this one on a terminal: *** buffer overflow detected ***: terminated Aborted (core dumped) - The new code produces an ordinary (fatal) error message in R without the buffer overflow. - Bug reported by Richel Bilderbeek (thanks!) Changes in version 1.0.31.9000 - Function write_grm added the same options added yesterday to read_grm (see there) to write GRM-like formats produced by plink2, particularly data produced by --make-king with bin or bin4 options. - Function read_grm edited documentation only, particularly added parsing examples for various plink2 --make-king outputs. Changes in version 1.0.30.9000 - Function read_grm added several options to facilitate reading GRM-like formats produced by plink2, particularly data produced by --make-king with bin or bin4 options. Added options: - ext to specify alternate shared extensions (like "grm" or "king"). - shape to specify whether the input is a full "square" matrix, a "triangle" with diagonal (default for GRM) or a "strict" triangle without diagonal (for KING-robust). - size_bytes to parse bin4/GRM (4) or bin (8) plink2 data. - comment to control comment characters in the .id file. - Internal functions vec_to_mat_sym and mat_sym_to_vec added option strict to exclude diagonal in their transformations. - Internal function read_tab_generic added option comment to set comment characters. Changes in version 1.0.29.9000 - Bug fix in count_lines and all read_* functions, which use add_ext_read internally to sort out file paths: - Now setting ext = NA finds files that end in a .gz extension that was not specified (before those files were incorrectly not found). - Example: read_matrix( 'my-file', ext = NA ) now finds and reads my-file.gz if it exists and my-file does not exist. - README fixed github installation instructions to build vignette, explained how to view it. Changes in version 1.0.28.9000 - Functions read_eigenvec and write_eigenvec have new option plink2 for better handling files with headers in the default style of plink2. Changes in version 1.0.27.9000 - Functions write_bed, write_plink, and count_lines fixed a bug: write (or read) failed if output path started with "~/" on Unix systems. Problem was the path wasn't expanded in C++ code. - For example, write_plink( '~/test', X ) failed with message: Writing: ~/test.bed Error in write_bed_cpp(file, X, append = append) : Could not open BED file `~/test.bed` for writing: No such file or directory Calls: write_plink -> write_bed -> write_bed_cpp Execution halted - Thanks to Bingsong Zhang for reporting the bug! Changes in version 1.0.26.9000 - Function read_eigenvec fixed this warning: - "The value argument of names<- must be a character vector as of tibble 3.0.0." Changes in version 1.0.25 (2021-07-26) - 4th CRAN submission - write_bed/plink with append = TRUE debugged to write in "binary" mode. - Fixed rare error observed in Windows only, where "binary" mode makes a difference, and only when written bytes matched certain special characters (such as newlines). - Bug probably present since append option was introduced in 1.0.15.9000 (2020-07-03). - Internally replaced readr::read_table2 with readr::read_table - Fixes warning message: read_table2() was deprecated in readr 2.0.0. Please use read_table() instead. - Reported by Richel Bilderbeek (thanks again!) - Requires readr (>= 2.0.0, already on CRAN). - Added tests for verbosity - Replaced pryr::object_size with lobstr::obj_size (a suggested package used in vignette only; the former was recently superseded by the latter) - One-line vignette update for a change in former pryr::object_size output (now of class lobstr_bytes), which triggered a CRAN warning. Changes in version 1.0.24.9000 - Functions read_bed and read_plink no longer stop with an error if the input BED file has non-zero padding bits. - A real-life example (link below, also part of tests now) reported by Richel Bilderbeek (Thanks!) caused the error. - https://github.com/kausmees/GenoCAE/tree/master/example_tiny - I verified that both the plink2 binary and the BEDMatrix R package load this file without complaining about the non-zero pads, so I decided to agree in that behavior. I verified that genio's data agrees with BEDMatrix after the fix. Changes in version 1.0.23.9000 - Function read_bed now reads file even if it doesn't have a BED extension (as long as it exists). - For additional flexibility, this function also has a new ext option. - Thanks to Richel Bilderbeek for reporting this problem. - Updated documentation for some read_* functions to clarify behavior regarding file and ext options. - Internal changes - Renamed internal function real_path to add_ext_read to make the distinction clearer to add_ext. - Verified that all read_* functions use add_ext_read while all write_* functions use add_ext. Only function count_lines switched from add_ext to add_ext_read (in addition to read_bed, which led to the earlier change), but count_lines didn't have a default extension so this change is less likely to matter. - Reformatted this NEWS.md slightly to improve its automatic parsing. Changes in version 1.0.23 (2021-06-11) - 3rd CRAN submission - Removed lfa from suggested packages (no connection anymore since lfa comparison was removed from vignette in version 1.0.19.9000). - Fixed a dead link in the vignette. - Removed "LazyData: true" from DESCRIPTION (to avoid a new "note" on R-devel). Changes in version 1.0.22.9000 - Function count_lines now returns value as integer instead of double (a very minor bug/annoyance fix). Changes in version 1.0.21.9000 - Added function geno_to_char to convert genotype numeric codes (allele dosages such as 0, 1, 2) into character codes such as 'A/A', 'A/G', 'G/G' (depending on locus). - Added functions read_matrix and write_matrix, intended for admixture inference data. - Fixed documentation for read_bed, which previously incorrectly stated that the numerical genotypes (allele dosages) counted alternative alleles (allele 2 in BIM table), whereas the truth is that they count reference alleles (allele 1). Changes in version 1.0.20.9000 - Documentation updates: - Added GRM examples to README and genio package doc - Clarified language and code examples throughout, normalized style (full conversion to roxygen markdown, including fixing some cases where old non-markdown notation did not work anymore) - Spellchecked package Changes in version 1.0.19.9000 - Function read_bed added a missing file check in R code. - A check in the underlying C++ code already existed, but it could suffer from a buffer overflow if the erroneous file path was very long. Such buffer overflows are now completely avoided. - Vignette: Removed lfa comparison. - My latest lfa fork doesn't have function read.bed anymore, previously the slowest and most memory-hungry competitor, which genio::read_plink was being compared to. Changes in version 1.0.18.9000 - Function read_eigenvec added Plink 2 support via comment option, which by default now treats data after # as comments. This enables automatically parsing eigenvec files generated by Plink 2, whose header line starts with # (this header is ignored). Previously, parsing Plink 2 eigenvec files generated warnings and resulted in the first row being an additional row with all NA values. Changes in version 1.0.17.9000 - Added count_lines, uses C++ code (via Rcpp) to count file lines extremely quickly. Intended for counting numbers of individuals (from FAM and equivalent files) or numbers of loci (from BIM and equivalent files) when these files are extremely large and no other information is needed from those files. Changes in version 1.0.16.9000 - Added write_eigenvec and read_eigenvec to read and write Plink/GCTA eigenvector files. Changes in version 1.0.15.9000 - Functions write_plink, write_bed, and write_bim now have append option, for writing extremely large files in parts. Changes in version 1.0.14.9000 - Improved error message in validate_tab_generic. Changes in version 1.0.13.9000 - Added read_grm and write_grm to read and write GCTA's binary genetic relatedness matrix (GRM) format. - Also added auxiliary functions require_files_grm, delete_files_grm, require_files_phen, and delete_files_phen. Changes in version 1.0.13 - Added tidy_kinship to transform a square symmetric matrix into a long-format table that is easy to sort and add annotations to Changes in version 1.0.12 (2019-12-17) - Second CRAN submission - Moved logo to man/figures/ - Minor Roxygen-related updates. Changes in version 1.0.11.9000 - Fixed a "buffer overflow" bug that occurred when input files started with "~/" on Unix systems. Changes in version 1.0.11 - Added read_phen and write_phen, a phenotype format (very similar to Plink's FAM) used by GCTA and EMMAX. - Now write_plink returns the data it wrote, invisibly as a list. Most useful for auto-generated data. Changes in version 1.0.10 (2019-05-28) - CRAN submission follow ups, fixing issues that arose on other systems: - Added include to my cpp code. - Fixed a "heap buffer overflow" detected by valgrind that only occurred for data with fewer than 9 individuals (included many of my toy tests). - Edited a test within vignette to allow for small machine precision-level errors. Changes in version 1.0.9 (2019-05-24) - CRAN-requested edits, resubmission - DESCRIPTION edits - Changed examples, vignettes, and tests to write files to the default temporary directory. Changes in version 1.0.8 - First CRAN submission - Genotype matrix row and column names from BIM/FAM files - read_plink now includes row and column names automatically. - read_bed accepts either row and column names or just their numbers. - write_plink checks these row and column names against the BIM and FAM tables for consistency, if these are all present. - Added memory estimation and comparisons sections to vignette. - Windows debugging - Now BED writing is in binary mode, like reading already was. - Reduced comparisons to BEDMatrix in testing, since it leaves temporary files open and on Windows they do not get deleted and leave confusing error messages behind. Changes in version 1.0.7.9000 - Added read_bed and read_plink! Now all Plink reading and writing operations are supported. - Added package documentation summarizing main read and write functions. - Added vignette comparing our BED reader and writer to those of BEDMatrix, snpStats, and lfa. Changes in version 1.0.6.9000 - Added ind_to_fam, sex_to_int, sex_to_char. - 2019-05-13: added ORCID to author info Changes in version 1.0.5.9000 - write_plink now returns NULL invisibly. - Added require_files_plink, delete_files_plink. - Removed "Fatal: " prefix from stop messages. Changes in version 1.0.4.9000 - Added make_fam, make_bim, and write_plink functions. - Fixed read_fam bug (used to require phenotypes to be integers, now can be double numbers). - Added verbose option to write_bed. Changes in version 1.0.3.9000 - Added write_fam, write_bim, write_ind, write_snp functions. - Refactored read_* code, updated docs and tests. Changes in version 1.0.2.9000 - Improved write_bed error message for invalid data, documentation. - Extended write_bed tests. Changes in version 1.0.1.9000 - Added an efficient write_bed written in Rcpp and thoroughly tested against BEDMatrix package. Changes in version 1.0.0.9000 - First GitHub release! Includes read_bim, read_fam, read_ind, and read_snp functions.