Search anything and hit enter
  • Teams
  • Members
  • Projects
  • Events
  • Calls
  • Jobs
  • publications
  • Software
  • Tools
  • Network
  • Equipment

A little guide for advanced search:

  • Tip 1. You can use quotes "" to search for an exact expression.
    Example: "cell division"
  • Tip 2. You can use + symbol to restrict results containing all words.
    Example: +cell +stem
  • Tip 3. You can use + and - symbols to force inclusion or exclusion of specific words.
    Example: +cell -stem
e.g. searching for members in projects tagged cancer
Search for
Count
IN
OUT
Content 1
  • member
  • team
  • department
  • center
  • program_project
  • nrc
  • whocc
  • project
  • software
  • tool
  • patent
  • Administrative Staff
  • Assistant Professor
  • Associate Professor
  • Clinical Research Assistant
  • Full Professor
  • Graduate Student
  • Lab assistant
  • Non-permanent Researcher
  • Permanent Researcher
  • Pharmacist
  • PhD Student
  • Physician
  • Post-doc
  • Project Manager
  • Research Associate
  • Research Engineer
  • Retired scientist
  • Technician
  • Undergraduate Student
  • Veterinary
  • Visiting Scientist
  • Deputy Director of Center
  • Deputy Director of Department
  • Deputy Director of National Reference Center
  • Deputy Head of Facility
  • Director of Center
  • Director of Department
  • Director of Institute
  • Director of National Reference Center
  • Group Leader
  • Head of Facility
  • Head of Operations
  • Head of Structure
  • Honorary President of the Departement
  • Labex Coordinator
Content 2
  • member
  • team
  • department
  • center
  • program_project
  • nrc
  • whocc
  • project
  • software
  • tool
  • patent
  • Administrative Staff
  • Assistant Professor
  • Associate Professor
  • Clinical Research Assistant
  • Full Professor
  • Graduate Student
  • Lab assistant
  • Non-permanent Researcher
  • Permanent Researcher
  • Pharmacist
  • PhD Student
  • Physician
  • Post-doc
  • Project Manager
  • Research Associate
  • Research Engineer
  • Retired scientist
  • Technician
  • Undergraduate Student
  • Veterinary
  • Visiting Scientist
  • Deputy Director of Center
  • Deputy Director of Department
  • Deputy Director of National Reference Center
  • Deputy Head of Facility
  • Director of Center
  • Director of Department
  • Director of Institute
  • Director of National Reference Center
  • Group Leader
  • Head of Facility
  • Head of Operations
  • Head of Structure
  • Honorary President of the Departement
  • Labex Coordinator
Search
Go back
Scroll to top
Share
© Research
Publication : Bioinformatics (Oxford, England)

Efficient analysis of large-scale genome-wide data with two R packages: bigstatsr and bigsnpr

Scientific Fields
Diseases
Organisms
Applications
Technique

Published in Bioinformatics (Oxford, England) - 15 Aug 2018

Privé F, Aschard H, Ziyatdinov A, Blum MGB

Link to Pubmed [PMID] – 29617937

Bioinformatics 2018 Aug;34(16):2781-2787

Motivation: Genome-wide datasets produced for association studies have dramatically increased in size over the past few years, with modern datasets commonly including millions of variants measured in dozens of thousands of individuals. This increase in data size is a major challenge severely slowing down genomic analyses, leading to some software becoming obsolete and researchers having limited access to diverse analysis tools.

Results: Here we present two R packages, bigstatsr and bigsnpr, allowing for the analysis of large scale genomic data to be performed within R. To address large data size, the packages use memory-mapping for accessing data matrices stored on disk instead of in RAM. To perform data pre-processing and data analysis, the packages integrate most of the tools that are commonly used, either through transparent system calls to existing software, or through updated or improved implementation of existing methods. In particular, the packages implement fast and accurate computations of principal component analysis and association studies, functions to remove single nucleotide polymorphisms in linkage disequilibrium and algorithms to learn polygenic risk scores on millions of single nucleotide polymorphisms. We illustrate applications of the two R packages by analyzing a case-control genomic dataset for celiac disease, performing an association study and computing polygenic risk scores. Finally, we demonstrate the scalability of the R packages by analyzing a simulated genome-wide dataset including 500 000 individuals and 1 million markers on a single desktop computer.

Availability and implementation: https://privefl.github.io/bigstatsr/ and https://privefl.github.io/bigsnpr/.

Supplementary information: Supplementary data are available at Bioinformatics online.

https://www.ncbi.nlm.nih.gov/pubmed/29617937