Using Next-Generation Sequencing (NGS) to Survey Trascriptome in Different Seed Compartments, Tissues, and Regions Throughout Soybean Seed Development


The Affymetrix soybean genome array was used to study the activity of genes in different compartments of the soybean seed at various stages of development (see the Browse link). The soybean array was designed using publicly available ESTs (Click here for more details about the soybean array). Most of the ESTs originate from reproductive and vegetative organs, but very few ESTs are from libraries constructed from soybean seeds throughout development. As such, genes active during many stages of seed development are most likely under-represented on the array. To uncover additional genes active during Soybean seed development, we carried out a study using next-generation sequencing to survey the transcriptome in different seed compartments, tissues, and regions across soybean seed development.


  • Soybean plants were grown in the UCLA Plant Growth Center with a 16:8 light-dark cycle at 22°C.
  • Soybean seeds (Williams 82) were fixed in 3:1 ethanol:acetic acid for 16-20 hours at 4°C, then dehydrated with 75% ethanol, 85% ethanol, 95% ethanol, and three times with 100% ethanol, each wash in one hour intervals. Then they were infiltrated with 3:1 ethanol-xylenes, 1:1 ethanol-xylenes, 1:3 ethanol-xylenes, and three times with 100% xylenes, dilution in two hour intervals. They were then infiltrated with molten paraffin at 60°C for 1-4 days, depending on the size of the tissue.
  • Laser capture microdissection was used to isolate specific compartments of the seed after tissues were sectioned in 5-10um sections. Total RNA was isolated with Ambion RNaqueous-Micro kit (Grand Island, NY). Approximately 20 nanogram of total RNA was amplified with Nugen Ovation Pico WTA system v.1 (Nugen, San Carlos, CA). Double-stranded cDNA was generated using WT-Ovation Exon Module (Nugen, San Carlos, CA). One microgram of double-stranded cDNA was fragmented using NEB Fragmentase for 15 minutes at 37°C for the standard protocol (NEB, Ipswich, MA). The Illumina TruSeq DNA Sample prep kit was used to prepare the Illumina library with modifications (Illumina, San Diego, CA). Specifically, the Covaris shearing step was omitted, the final PCR enrichment step was performed using Agilent Pfu Turbo Cx DNA polymerase (Agilent, Santa Clara, CA) instead of the TruSeq PCR mix, and 12 cycles of PCR were performed.
  • Total RNA was isolated from whole seed and seed parts using the Concert Plant RNA Reagent (Invitrogen, Carlsbad, CA) according to manufacturer's instructions.Total RNA was treated with RNase-free DNaseI (Ambion, Austin, TX) and subjected to two rounds of poly-A+ RNA selection using oligo-dT25 magnetic beads (Dynabeads; Invitrogen, Carlsbad, CA). Approximately 100 nanogram of poly-A+ RNA was subjected to mRNA-Seq library preparation according to the Illumina mRNA-Seq protocol. Adapter-ligated cDNAs were size selected on an agarose gel and purified cDNAs were amplified by PCR for 15 cycles.

Data Analysis

Briefly, raw reads generated by Illumina sequencing machine is processed to remove low quality and rRNA reads. Filtered reads are then mapped to several references using Bowtie (Langmead et al. Genome Research (2009). The references used for mapping include the entire assembled genome sequence (version 1.0), the predicted gene models, and the predicted transcripts. The inclusion of all three references will allow the identification of gene models from reads that mapped to exon-exon junctions, novel exons (e.g. within predicted introns), and novel untranslated regions. Read counts to Glycine max gene models (v1.1) models were computed using BedTools. Reads per kilobase per million (RPKM) value was calculated according to Mortazavi et al. (Mortazavi et al., 2008). Raw unprocessed sequences and normalized RPKM value for each gene models have been submitted to NCBI Gene Expression Omnibus (

The major conclusions to date are:

  • Using next-generation sequencing technology, we estimate that there are at least 51,000 diverse mRNAs required for the differentiation of all soybean seed compartments, regions, and tissues across development (i.e., genes required to "make a soybean seed").
  • Seed-stage and compartment-specific up-regulated mRNA sets (including transcription factor and metabolic-pathway mRNAs) have been identified that may play important roles in seed differentiation and/or function.
  • Large quantitative changes in gene activity appear to be responsible for the majority of stage-specific physiological and functional events that program soybean seed formation.
  • Analysis of RNA-Seq data produced the same conceptual results as similar data obtained previously using Affymetrix GeneChip microarrays.
  • Gene expression patterns observed in different soybean seed compartments, tissues, and regions during development are similar to those in Arabidopsis seeds, suggesting that major spatial and temporal gene regulatory programs are conserved in flowering plants.

Data Set

Click on GEO accession number in the study to download the data.

Study Dataset GEO Accessions
Transcriptome Profiling of the Soybean Life Cycle
Globular Stage Seeds GSM721725
Heart Stage Seeds GSM721726
Cotyledon Stage Seeds GSM721727
Early Maturation Stage Seeds GSM721728
Dry Seeds GSM721729
Trifoliate leaves GSM721730
Roots GSM721731
Stems GSM721732
Floral Buds GSM721733
Whole seedlings six days after imbibition GSM721734
Transcriptome Profiling of Soybean Seed Compartments Using LCM
Globular Stage Embryo Proper GSM721717
Globular Stage Suspensor GSM721718
Early Maturation Seed Coat Parenchyma GSM721719
Transcriptome Profiling of Soybean Embryonic Cotyledon Before and After Germination
Mid-Maturation Cotyledon GSM721277
Late-Maturation Cotyledon GSM721278
Seedling Cotyledon GSM721280
Transcriptome Profiling of Soybean Early Maturation Seed Parts
Embryonic Cotyledons GSM1213856
Embryonic Axis GSM1213857
Seed Coat GSM1213855
Transcriptome Profiling of Soybean Seed Compartments at Early Maturation Stage Using LCM
Axis Epidermis (3 BRs) GSM1398252/GSM1398253/GSM1398254
Axis Stele (3 BRs) GSM1398262/GSM1398263/GSM1398264
Axis Vascular (3 BRs) GSM1123207/GSM1123208/GSM1398265
Axis Parenchyma (3 BRs) GSM1123204/GSM1123205/GSM1123206
Plumule (3 BRs) GSM1398255/GSM1398256/GSM1398257
Root Tip (3 BRs) GSM1123218/GSM1123219/GSM1398258
Shoot Meristem (3 BRs) GSM1398259/GSM1398260/GSM1398261
Cotyledon Abaxial Parenchyma (3 BRs) GSM1398266/GSM1398267/GSM1398268
Cotyledon Adaxial Parenchyma (3 BRs) GSM1398272/GSM1398273/GSM1398274
Cotyledon Abaxial Epidermis (3 BRs) GSM1123209/GSM1123210/GSM11232011
Cotyledon Adaxial Epidermis (3 BRs) GSM1398269/GSM1398270/GSM1398271
Cotyledon Vascular Bundle (2 BRs) GSM1123212/GSM1123213/GSM1123214
Endosperm (3 BRs) GSM1123214/GSM1398276/GSM1398277
Hilum (3 BRs) GSM1123215/GSM1123216/GSM1123217
Seed Coat Parenchyma (3 BRs) GSM1398279/GSM1398280/GSM1123225
Seed Coat Hourglass (3 BRs) GSM1123220/GSM1123221/GSM1123222
Seed Coat Palisade (2 BRs) GSM1123223/GSM1123224/GSM1398278
Transcriptome Profiling of Soybean Seed Compartments at Cotyledon Stage Using LCM
Embryo Proper (3 BRs) GSM1385450/GSM1385451/GSM1385452
Cotyledon (3 BRs) GSM1385456/GSM1385457/GSM1385458
Axis (3 BRs) GSM1385453/GSM1385454/GSM1385455
Endosperm (3 BRs) GSM1385459/GSM1385460/GSM1385461
Seed Coat Endothelium (3 BRs) GSM1385462/GSM1385463/GSM1385464
Seed Coat Inner Integument (3 BRs) GSM1385471/GSM1385472/GSM1385473
Seed Coat Outer Integument (3 BRs) GSM1385475/GSM1385476/GSM1385477
Seed Coat Hilum (3 BR) GSM1385468/GSM1385469/GSM1385460
Seed Coat Epidermis (3 BRs) GSM1385465/GSM1385466/GSM1385467
Suspensor (3 BRs) GSM1385477/GSM1385478/GSM1385479
Transcriptome Profiling of Soybean Seed Compartments at Heart Stage Using LCM
Embryo Proper (3 BRs) GSM1380799/GSM1380800/GSM1380801
Endosperm (3 BR) GSM1380802/GSM1380803/GSM1380804
Seed Coat Endothelium (3 BRs) GSM1380805/GSM1380806/GSM1380807
Seed Coat Epidermis (3 BRs) GSM1380808/GSM1380809/GSM1380810
Seed Coat Hilium (3 BRs) GSM1380811/GSM1380812/GSM1380813
Seed Coat Inner Integument (4 BRs) GSM1380814/GSM1380815/GSM1380816
Seed Coat Outer Integument (3 BRs) GSM1380817/GSM1380818/GSM1380819
Suspensor (3 BRs) GSM1380820/GSM1380821/GSM1380822
Transcriptome Profiling of Soybean Seed Compartments at Globular Stage Using LCM
Embryo Proper (3 BRs) GSM1380774/GSM1380775/GSM1380776
Endosperm (3 BR) GSM1380777/GSM1380778/GSM1380779
Seed Coat Endothelium (3 BRs) GSM1380780/GSM1380781/GSM1380782
Seed Coat Epidermis (3 BRs) GSM1380783/GSM1380784/GSM1380785
Seed Coat Hilium (3 BRs) GSM1380786/GSM1380787/GSM1380788
Seed Coat Inner Integument (4 BRs) GSM1380789/GSM1380790/GSM1380791/GSM1380792
Seed Coat Outer Integument (3 BRs) GSM1380793/GSM1380794/GSM1380795
Suspensor (3 BRs) GSM1380796/GSM1380797/GSM1380798

Note: LCM - Laser Microdissection;

RNA-Seq Analysis Tools

Click to browse the mRNA profiles of compartments during soybean seed developments by gene ID, transcription factor, metabolic pathway.