Our gitlab group
https://gitlab.inria.fr/ml_genetics/public
For miscellaneous software, code, article data
DNADNA: Deep Neural Networks for DNA
https://gitlab.com/mlgenetics/dnadna
There's a quickstart tutorial that you can try out! Please let us know what you think, we happily welcome feedback.
DNADNA is a package for deep learning inference in population genetics. DNADNA provides utility functions to improve development of neural networks for population genetics and is currently based on PyTorch. In particular, it already implements several neural networks that allow inferring demographic and adaptive history from genetic data. Pretrained networks can be used directly on real/simulated genetic polymorphism data for prediction. Implemented networks can also be optimized based on user−specified training sets and/or tasks. Finally, any user can implement new architectures and tasks, while benefiting from DNADNA input/output, network optimization, and test environment.
Dual−licensed under the GNU Lesser General Public License, and the compatible CeCILL−C
T Sanchez*, EM Bray*, P Jobic, J Guez, A-C Letournel, G Charpiat, J Cury°, F Jay° (2023) Dnadna: - A deep learning framework for population genetic inference. Bioinformatics 39(1), btac765. Link
There is also a repo specific to the code from the Sanchez et al 2020 paper : repo ; prediction notebook
Artificial Genomes
https://gitlab.inria.fr/ml_genetics/public/artificial_genomes
Repository containing (i) neural networks (GAN, RBM, VAE, DDPM) designed, implemented, and trained for genome generation, (ii) real and generated genomic datasets, (iii) implementations of various evaluation metrics and their visualization. This repository aims to facilitate benchmarking in generative genomics.
Supervised machine learning for demographic inference
https://github.com/amquelin/SML_demographic_inference
Bacterial SLiMulator
Simulation of genomic data from bacterial populations based on SLiM, incorporating models of demographic changes, horizontal gene transfers, and circular DNA. This bacterial modeling has been integrated into SLiM (https://messerlab.org/slim/).
tfa: factor analysis of temporal samples
https://github.com/bcm-uga/tfa
with Olivier François (TIMC-IMAG)
The R package tfa implements a factor analysis algorithm for temporal DNA or ancient DNA samples, adjusting individual scores for the effect of allele frequency drift through time. The adjusted scores provide geometric representations of temporal samples consistent with estimates of ancestry proportions and have interpretation similar to principal component analysis. Based on the adjusted factors, the program can also estimate ancestry proportions for a target population or a subset of target individuals given specified source populations. GNU General Public License v3.0
PoPS: Predictions of Population Structure Version 1.2
Inferring population genetic structure and its relationships with environmental variables. Predicting changes in population structure
For more information or to download the software go to POPS page