The 200 Mammals Project started in 2012 with the main goal to understand the human genome, base per base. By comparing more than 240 mammalian genomes and looking at single base resolution, the evolutionary constraint for each base can be measured and is likely to indicate a function for that base. 131 genomes included in the project were generated by us using DISCOVAR-de novo, combined with the 110 mammalian genomes present in NCBI in March 2017. As this data set is analyzed, it will represent the largest eutherian nuclear genome phylogeny and provide the possibility to perform genotype–phenotype correlations across hundreds of mammalian species. It will allow the study of evolution of genome structure, provide reference genomes that can be utilized for species conservation and a very detailed map of evolutionary constraint which can be used with human genome-wide association (GWAS) catalogs and other data sets to investigate patterns of constraint in disease associated regions in any of the mammalian genomes studied. The data set will also allow the study of accelerated regions (genomic regions or positions that may be under positive selection for novel functions) in any of the sequenced mammalian genomes.
Three of the 68 genomes sequenced at Lindblad-Toh Lab:
Western spotted skunk (Spilogale gracilis)
Two-toed sloth (Choloepus didactylus)
Arctic fox (Vulpes lagopus)
Armstrong, J., Hickey, G., Diekhans, M., Fiddes, I.T., Novak, A.M., Deran, A., Fang, Q., Xie, D., Feng, S., Stiller, J., Genereux, D., Johnson, J., Marinescu, V.D., Alföldi, J., Harris, R.S., Lindblad-Toh, K., Haussler, D., Karlsson, E., Jarvis, E.D., Zhang, G., Paten, B., 2020. Progressive Cactus is a multiple-genome aligner for the thousand-genome era. Nature 587, 246–251.