snoRNAs and Long-term Risk of Diabetic Complications

Jean E. Schaffer, M.D.

Project Overview:

From the approximately 1,000 subjects enrolled in the Medalist Study, Joslin investigators provided to us high quality DNA samples for this case control study of snoRNA sequence variation and extremes of phenotype for diabetic retinopathy: 253 cases with no or mild non-proliferative retinopathy (NPDR) and 127 controls with severe proliferative diabetic retinopathy (PDR) were matched (2:1) for sex, duration of diabetes, lipids and blood pressure. Although our Longer Life Foundation grant provided funding for sequencing of 192 samples, we successfully leveraged this award to obtain additional funds from the NIH Diabetes Complications Consortium (U24 DK076169) to sequence additional samples in an effort to maximize the statistical power of our approach. For each DNA sample, we assessed DNA purity by absorbance with a UV spectrometer (OD at 260 and 280 nm) and verified quantification of input DNA using a fluorescence-based method.

Following identification of genomic coordinates for the 332 annotated or predicted human snoRNAs across the genome (hg19 build, UCSC and Ensembl databases), we used Illumina’s DesignStudio application to design amplicons for sequencing these regions. Based on criteria including specificity, GC nucleotide content, length, and minimal probe-probe interactions, we successfully designed amplicons to sequence 294 snoRNAs. Reasons for which 38 snoRNAs failed in the design process included the presence of highly related homologs, regions with greater than 80% GC content, homopolymer and repetitive elements, and the presence of known single nucleotide polymorphisms in the region of probe hybridization. Probes for the amplicons were synthesized and provided by Illumina multiplexed in a 96-well format that was optimized for input of 250 ng DNA from individual subjects in each well. We used reagents supplied with the TruSeq Custom Amplicon Library Preparation Kit to generate indexed sequencing libraries from each subject (well) and to normalize and pool the libraries prior to submission for MiSeq analysis. Thus far, we have completed library preparation and sequencing from three of four plates. At present, data from the first plate, containing 95 samples from Joslin Medalist cases (and one well to control for library generation procedures), have been aligned and analyzed. We report this interim analysis below while sequencing, alignment, and analysis is in progress for the remaining cases and matched controls. Our approach to sequencing is providing high quality data on the majority of snoRNA targets. The DNA samples yielded on average 162,000 total reads and 151,000 mapped reads (average mapped percentage of 93%). Greater than 98% of reads corresponded to anticipated targets. On average, there were 458 reads/snoRNAQ/sample. Since each indexed sequencing library contained DNA from a single subject with 2 alleles, we estimated that 20 reads would be sufficient to identify variant sequences. All of the samples had >93 % of the targeted snoRNAs covered by ≥20 reads (Nearly all the targeted snoRNAs had >20 reads in 90-100% of samples).

While a central goal of this project will be to compare case sequences to those from matched Joslin Medalist controls, at this interim analysis, comparison of the snoRNA sequences from these 95 cases to the hg19 build of the human genome (reference sequence) and data available from the 1000 Genomes project supports our hypothesis that there is substantial variation in snoRNA sequences. We found 54 variants in 35 snoRNAs, including single and dinucleotide substitutions and deletions of one or two nucleotides. While a number of these variants relative to the reference sequence were previously reported in the 1000 Genomes data, for 8 snoRNAs, variant sequences were observed that have not been previously described. For example, snord13 had seven single nucleotide variant in six Joslin cases that were not observed in 1000 Genomes. In a second example, 6 of the 95 Joslin cases carried the same nucleotide variant in snord45B that was not observed in 1000 Genomes. Some of these variants fall within the antisense element that interacts with the snoRNA targets through nucleotide complementarity, whereas others have potential to disrupt interactions of the snoRNA with critical protein partners. In a third example, one of the Joslin cases harbored a variant nucleotid close to the end of the snoRNA in a region that has potential to perturb processing of the snoRNA from its precursor intron lariat. Together, these preliminary findings indicate that variation in snoRNA sequences is found in the human genome, setting the stage for us to determine whether variation in these non-coding elements is associated with altered susceptibility to diabetic complications.

Progress Report:

We have recently discovered that snoRNAs, a class of small RNA molecules that do not encode proteins, can function as critical mediators of oxidant stress in the setting of high levels of metabolites. Because oxidant stress is a key pathway in the development of diabetic complications, this project is testing the hypothesis that genetic variation in these RNAs underlies differences in susceptibility to diabetic complications. In the first year of this project, we have sequenced the DNA regions that encode these small RNAs in 400 people with diabetes who have marked differences in the degree of eye complications, despite all having had type 1 diabetes for 50 years. We have discovered rare variants in these non-coding RNA sequences that are associated with relative protection from diabetic retinopathy. Current work is focused on sequencing in another group of diabetic subjects to validate these findings. We will test the effects of variant sequences on the expression and function of these non-coding RNAs. Our study is the first to identify sequence variants in these non-coding RNAs that are related to diabetic complications. In the era of personalized medicine, these variant sequences may provide prognostic biomarkers for guiding individualized disease management.