Software

Our gitlab group

https://gitlab.inria.fr/ml_genetics/public
For miscellaneous software, code, article data

DNADNA: Deep Neural Networks for DNA

There's a quickstart tutorial that you can try out! Please let us know what you think, we happily welcome feedback.

DNADNA is a package for deep learning inference in population genetics. DNADNA provides utility functions to improve development of neural networks for population genetics and is currently based on PyTorch. In particular, it already implements several neural networks that allow inferring demographic and adaptive history from genetic data. Pretrained networks can be used directly on real/simulated genetic polymorphism data for prediction. Implemented networks can also be optimized based on user−specified training sets and/or tasks. Finally, any user can implement new architectures and tasks, while benefiting from DNADNA input/output, network optimization, and test environment.
Dual−licensed under the GNU Lesser General Public License, and the compatible CeCILL−C

T Sanchez*, EM Bray*, P Jobic, J Guez, A-C Letournel, G Charpiat, J Cury°, F Jay° (2023) Dnadna: - A deep learning framework for population genetic inference. Bioinformatics 39(1), btac765. Link

There is also a repo specific to the code from the Sanchez et al 2020 paper : repo ; prediction notebook

Artificial Genomes

https://gitlab.inria.fr/ml_genetics/public/artificial_genomes

Contributors: Burak Yelmen, Flora Jay, Aurélien Decelle, Antoine Szatkownik, Guillaume Charpiat, Cyril Furtlehner

Repository containing (i) neural networks (GAN, RBM, VAE, DDPM) designed, implemented, and trained for genome generation, (ii) real and generated genomic datasets, (iii) implementations of various evaluation metrics and their visualization. This repository aims to facilitate benchmarking in generative genomics.

References: Yelmen et al. PLoS Genetics 2021, Yelmen et al. PLoS Comp Bio 2023, Szatkownik et al. PMLR 2024, Szatkownik et al. bioarxiv 2024

Supervised machine learning for demographic inference

https://github.com/amquelin/SML_demographic_inference

Developer: Arnaud Quelin, including previous code by Flora Jay and Simon Boitard

Supervision: Frédéric Austerlitz & Flora Jay

Simulation-based inference using machine learning methods (MLP, RF, XGBoost) to infer two-population demographic models from summary statistics.

Bacterial SLiMulator

https://github.com/jeanrjc/BacterialSlimulations

J Cury, B Haller, G Achaz, F Jay (2022). Simulation of bacterial populations with SLiM. 10.24072/pcjournal.72 - Peer Community Journal, Volume 2, article no. e7. (recommended by PCI EvolBiol in 2021)

Simulation of genomic data from bacterial populations based on SLiM, incorporating models of demographic changes, horizontal gene transfers, and circular DNA. This bacterial modeling has been integrated into SLiM (https://messerlab.org/slim/).

tfa: factor analysis of temporal samples

https://github.com/bcm-uga/tfa

with Olivier François (TIMC-IMAG)

The R package tfa implements a factor analysis algorithm for temporal DNA or ancient DNA samples, adjusting individual scores for the effect of allele frequency drift through time. The adjusted scores provide geometric representations of temporal samples consistent with estimates of ancestry proportions and have interpretation similar to principal component analysis. Based on the adjusted factors, the program can also estimate ancestry proportions for a target population or a subset of target individuals given specified source populations. GNU General Public License v3.0

PoPS: Predictions of Population Structure Version 1.2

Inferring population genetic structure and its relationships with environmental variables. Predicting changes in population structure

POPS is a method I developed during my PhD. It has a nice and friendly graphical interface. I also provide some post-processing R scripts to plot colorful maps.

The POPS program performs inference of ancestry distribution models. It uses a TESS-like interface to compute individual cluster membership and admixture proportions based on multilocus genotype data and their correlation with environmental and geographical variables. Similarly to species distribution models, POPS provides routines to project cluster memberships and admixture proportions under scenarios of environmental change. Typical uses of POPS are for evaluating how the population genetic structure of a species could be modified by climate change, or testing hypotheses about local adaptation and ecological speciation.

For more information or to download the software go to POPS page

Flora Jay - PhD - population genetics

Pages