Using Next-Generation Sequencing (NGS) to Survey Transcriptome in Different Seed Compartments, Tissues, and Regions Throughout Soybean Seed Development


The Affymetrix soybean genome array was used to study the activity of genes in different compartments of the soybean seed at various stages of development (see the Browse link). The soybean array was designed using publicly available ESTs (Click here for more details about the soybean array). Most of the ESTs originate from reproductive and vegetative organs, but very few ESTs are from libraries constructed from soybean seeds throughout development. As such, genes active during many stages of seed development are most likely under-represented on the array. To uncover additional genes active during Soybean seed development, we carried out a study using next-generation sequencing to survey the transcriptome in different seed compartments, tissues, and regions across soybean seed development.


  • Soybean plants were grown in the UCLA Plant Growth Center with a 16:8 light-dark cycle at 22°C.
  • Soybean seeds (Williams 82) were fixed in 3:1 ethanol:acetic acid for 16-20 hours at 4°C, then dehydrated with 75% ethanol, 85% ethanol, 95% ethanol, and three times with 100% ethanol, each wash in one hour intervals. Then they were infiltrated with 3:1 ethanol-xylenes, 1:1 ethanol-xylenes, 1:3 ethanol-xylenes, and three times with 100% xylenes, dilution in two hour intervals. They were then infiltrated with molten paraffin at 60°C for 1-4 days, depending on the size of the tissue.
  • Laser capture microdissection was used to isolate specific compartments of the seed after tissues were sectioned in 5-10um sections. Total RNA was isolated with Ambion RNaqueous-Micro kit (Grand Island, NY). Approximately 20 nanogram of total RNA was amplified with Nugen Ovation Pico WTA system v.1 (Nugen, San Carlos, CA). Double-stranded cDNA was generated using WT-Ovation Exon Module (Nugen, San Carlos, CA). One microgram of double-stranded cDNA was fragmented using NEB Fragmentase for 15 minutes at 37°C for the standard protocol (NEB, Ipswich, MA). The Illumina TruSeq DNA Sample prep kit was used to prepare the Illumina library with modifications (Illumina, San Diego, CA). Specifically, the Covaris shearing step was omitted, the final PCR enrichment step was performed using Agilent Pfu Turbo Cx DNA polymerase (Agilent, Santa Clara, CA) instead of the TruSeq PCR mix, and 12 cycles of PCR were performed.
  • Total RNA was isolated from whole seed and seed parts using the Concert Plant RNA Reagent (Invitrogen, Carlsbad, CA) according to manufacturer's instructions.Total RNA was treated with RNase-free DNaseI (Ambion, Austin, TX) and subjected to two rounds of poly-A+ RNA selection using oligo-dT25 magnetic beads (Dynabeads; Invitrogen, Carlsbad, CA). Approximately 100 nanogram of poly-A+ RNA was subjected to mRNA-Seq library preparation according to the Illumina mRNA-Seq protocol. Adapter-ligated cDNAs were size selected on an agarose gel and purified cDNAs were amplified by PCR for 15 cycles.

Data Analysis

Briefly, raw reads generated by Illumina sequencing machine is processed to remove low quality and rRNA reads. Filtered reads are then mapped to several references using Bowtie (Langmead et al. Genome Research (2009). The references used for mapping include the entire assembled genome sequence (version 1.0), the predicted gene models, and the predicted transcripts. The inclusion of all three references will allow the identification of gene models from reads that mapped to exon-exon junctions, novel exons (e.g. within predicted introns), and novel untranslated regions. Read counts to Glycine max gene models (v1.1) models were computed using BedTools. Reads per kilobase per million (RPKM) value was calculated according to Mortazavi et al. (Mortazavi et al., 2008). Raw unprocessed sequences and normalized RPKM value for each gene models have been submitted to NCBI Gene Expression Omnibus (

The major conclusions to date are:

  • Using next-generation sequencing technology, we estimate that there are at least 51,000 diverse mRNAs required for the differentiation of all soybean seed compartments, regions, and tissues across development (i.e., genes required to "make a soybean seed").
  • Seed-stage and compartment-specific up-regulated mRNA sets (including transcription factor and metabolic-pathway mRNAs) have been identified that may play important roles in seed differentiation and/or function.
  • Large quantitative changes in gene activity appear to be responsible for the majority of stage-specific physiological and functional events that program soybean seed formation.
  • Analysis of RNA-Seq data produced the same conceptual results as similar data obtained previously using Affymetrix GeneChip microarrays.
  • Gene expression patterns observed in different soybean seed compartments, tissues, and regions during development are similar to those in Arabidopsis seeds, suggesting that major spatial and temporal gene regulatory programs are conserved in flowering plants.

Data Set

Click here to see the complete list of soybean RNA-Seq data sets generated using next-generation sequencing.

RNA-Seq Analysis Tools

Click to browse the mRNA profiles of compartments during soybean seed developments by gene ID, transcription factor, metabolic pathway.