Dark DNA: Vertebrate Evolution at a Crossroads?
Chris King


A recent study of sand rats - desert dwelling gerbils native to North Africa and the Middle East that are prone to diabetes 2 in lab conditions where the diet is plentiful, called Psammomys obesis for this reason, focused on the gene Pdx1, responsible for insulin production. Though essential to survival, it was found to be missing from the genome of the species. On further investigation, they found that the gene wasn't actually missing, but obscured, because they found downstream effects in desert rat systems, produced by Pdx1. The masked DNA sequence involved was high in base pairs G and C, which are notorious for being hard to work with as well as having very rapid rates of evolution (Hargreaves A et al. 2018 Genome sequence of a diabetes-prone rodent reveals a mutation hotspot around the ParaHox gene cluster PNAS doi:10.1073/pnas.1702930114).

Fig 1: (A) Juvenile sand rat P. obesus. (B) Cladogram of representative murid rodents indicating the phylogenetic position of sand rat. (C) GC content of genes around the ParaHox cluster of sand rat and other rodents (Mus musculus, Rattus norvegicus, Chinchilla lanigera) revealing a chromosomal hotspot of GC skew in sand rat (shaded in gray). Genes shown in inferred ancestral gene order; parentheses around Rfc3 indicate this gene has been transposed to a different genomic location in sand rat. Sand rat GC values based on transcriptome and genome sequences; when partial only alignable sequence is compared. (D) Unrooted phylogenetic trees inferred from synonymous changes (dS) only from concat- enated alignments of 26 genes in the mutational hotspot (Upper) and 100 random genes (Lower)

Looking deeper into their genetic makeup and comparing it to other rodents, investigators found that sand rat DNA contained far more mutations in a hotspot area surrounding Pdx1 than their closest cousins. All the genes within this mutation hotspot now have very GC-rich DNA, and have mutated to such a degree that they are hard to detect using standard methods.

Dark DNA is rare but has been detected in bird species. In a second study (Hron T et al. 2015 Hidden genes in birds Genome Biology 16:164 doi:10.1186/s13059-015-0724-z), scientists found that a total of 274 genes were absent in many bird species, yet necessary for almost all vertebrates. Once again, GC-rich DNA was detected. Some but not all of these genes have again been found as highly GC-enriched rapidly evolved sequences again hard to detect. This suggests another hotspot of evolution involving the ancestor of birds.

Fig 2: Dot plots of selected avian genes, compared with their vertebrate orthologs. GC-content and average length of G/C stretches in coding sequences of chicken MMP14 and LPPR2 (reported as missing in birds [1]), and genes from the EPO and EPOR loci are shown. If available, orthologous genes from other birds, turtles, mammals, lizards and crocodilians are included in the plots. The blue dots show the distribution of chicken RefSeq genes.

So far, hidden genes haven't been found outside of these two cases.

The real innovation in understanding doesn't surround hidden genes, but these hotspots. Hargreaves and colleagues have some indication that many genes intercede in these vacuous locations in order to produce a necessary protein. This hints at a deeper process, in which several genes may mutate together.

Could there be an underlying process driving evolution? If so, learning more about dark DNA could clue us into it.  The desert rat may have undergone a fast evolutionary jump, to do with diatary starvation and low fluid intake uder the desert conditions affecting the insulin regulating genes, which is why the hotspot occurred.

Today, human driven climate change is speeding up evolutionary processes among many species. More cases of dark DNA could be one result (Perry P 2018 "Dark DNA" Is Changing the Way We See Evolution bigthink.com/articles/dark-dna-is-changing-the-way-we-see-evolution).