Large-scale long terminal repeat insertions found to produce a significant set of novel transcripts in cotton
TEs (transposable elements), especially LTRs, are known to play an important role in determining the basic genome structure and influencing the expression of functional genes. Insertion of TE or LTR fragments may also create novel transcription start sites (TSSs) to initiate transcription in the host genome. New intergenic transcripts were thought to be created by terminal repeat retrotransposon insertions using a combination of de novo and homology-based strategy in maize.
Although these studies have predicted the possibility of new transcript production by transposon insertion, they do not reveal the evolutionary, regulatory and functional mechanisms of these new transcripts. Furthermore, there is not even one systematic study on the extensiveness of intergenic transcript production at the genomic level so far.
In a study published in the journal Science China Life Sciences, Yuxian Zhu and their colleagues applied extremely deep-sequencing techniques (from 10 G to over 100 G) in each cotton sample to discover more than 10,000 novel genes that were largely not identified in previous genome assembly and annotations. Most of these transcripts were protein-coding in nature and were created by LTR insertions in various ways.
The team found that more transcripts appeared mainly in intergenic regions as identified in the previously published genome. In the 100 G data set, a total of 10,284 new intergenic genes were discovered. In total, 10,032 are protein-coding genes and 252 were lncRNA genes. There was no significant increase in genic gene numbers between these two groups. Generally, these new intergenic transcripts were expressed at very low levels, and most of them were single exon transcripts.
These new intergenic transcripts appeared only when the sequencing depth reached to 30 G to 100 G due to their low expression level. ChIP-seq analysis with antibodies against H3K4me3, H3K27ac and H3K9me2 revealed that most of these new transcripts might not be transcribed by RNA polymeraseⅡ. Only 30% of these intergenic transcripts possessed one or two transcription activation markers while greater than 70% of the genic genes contained these markers.
MNase-seq analysis revealed that genes without transcription activation markers formed their +1 and -1 nucleosomes significantly more closely (only 117±1.4 bp apart), while twice as big the spaces (about 403.5±46.0 bp apart) were found for genes with the activation markers. Genes without one of these two markers intended to form -1 nucleosomes at the close vicinity of their +1 nucleosomes. This may impede the binding of the RNA polymerase.
Evolutionary analysis showed that genic genes were originated during one of the whole genome duplication events around 130.8 or 16 MYA, while ITG transcripts were evolved around 2.3 MYA, resultant of the last retrotransposon insertion.
Characterization of these low-transcribed ITG transcripts will help us understand the biological roles of retrotransposons during speciation and diversifications. This study may help elucidate the mechanisms related to intergenic transcript expression and cotton fiber development.
More information: Yan Yang et al, Large-scale long terminal repeat insertions produced a significant set of novel transcripts in cotton, Science China Life Sciences (2023). DOI: 10.1007/s11427-022-2341-8
Provided by Science China Press