Since the releasing of Arabidopsis thaliana genome in 2000, more than 1000 plant genomes have been released and about 50 of them have acquired chromosome-scale assemblies as the technologies advance. As the “backbone” of the existing global forest ecosystems (4 billion hectares), conifers account for 39% of the world’s forests and about 45% of the world’s annual lumber production, providing all the world’s softwood, the principal building timber of temperate regions. However, the genomes of conifers (615 species out of 1000 gymnosperm species) remain poorly assembled and the genes remain partially identified and annotated since the first release of Norway spruce genome 8 years ago. Their huge and highly repetitive sequences (70%-80%) in the genomes pose great challenges for assembling a full-length reference genome that is highly sought after by various studies on their unique genome features, evolutionary trajectory, adaptability, development and reproduction of complex traits.
Recently, an international joint cooperative group, which is led by Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, Beijing Forestry University, Ume? Plant Science Centre, Swedish University of Agricultural Sciences, and Michigan Technological University, reported a 25.4 Gb chromosome-level assembly of Chinese pine (Pinus tabuliformis) which is a widespread indigenous conifer species and an economically and ecologically important hard pine in northern China, and revealed the genetic basis of the key characteristics of the evolution of the Chinese pine. This study, published online in Cell journal, provides an important reference for in-depth understanding of the evolution of conifers.
In order to assemble such a huge genome under acceptable computing resources and time, a new technical strategy was developed to optimize the assembly process, which allows this project to make a breakthrough in assembling 25Gb huge genome. In addition, the massive RNA-seq data of 760 biological samples from 11 tissues and under many normal/stress conditions were obtained to aid the gene identification and map the gene/exon/intron boundary precisely, leading to the identification of most genes, especially very long ones.
A multitude of long introns were observed in the Chinese pine genome. The average of intron length is 10 kb in Chinese pine, compared with average 0.5 kb in 57 other sequenced angiosperms. Interestingly, extraordinarily long introns did not show any detrimental effects for the transcription; on the contrary, the larger genes have evolved to exhibit higher expression levels. These efficient transcription systems may be the driving force for the huge genome expansion and can ensure that they still maintain highly functional.
Based on these high-quality resources, the researchers found that continuous expansion and slow removal of transposons contribute to the one-way genomic expansion in Chinese pine. A recent large-scale burst of TEs since 4-6 MYA and a considerably lower unequal recombination mediated LTR removal rate in Chinese pine compared with small plants genomes were observed. The truth that all TEs in Chinese pine, regardless of insertion ages, have a similar methylation level supports the hypothesis that the large, repeat-rich conifer genomes may become lockdown by epigenetic silencing and reduce the frequency of repeat removal.
In Chinese pine genome, large scale gene duplications (i.e. 91.2%) were found mainly through dispersed duplication (DSD), and the significantly expanded gene families have been particularly enriched with the members involved in biotic and abiotic stress response. These expansions may have contributed to conifers’ wide distribution in the temperate and boreal regions, and a distinctive reproductive regulatory network lacking many key floral signal integrators which were shared in most of the flowering plants was observed. These results revealed that conifers follow a distinctive evolutionary trajectory with adaptation to harsh conditions and reproductive regulation compared to angiosperms.
The genomics resources in this study provide unprecedented opportunities for a panoramic view of the conifer complex giga-genome characteristics, methylation pattern, gene expression and regulation, and evolutionary studies that require a full-length chromosomes level assembly. In addition, the complete genome and accurate gene space annotation offer opportunity to study conifer-specific traits of interest through comparative genomics, GWAS and genomics-assisted breeding, and the technical strategy adopted in this project provides an important feasible path for high-quality assembly and annotation of other huge genomes in other conifer species.