About This Project

Soybean is the most important oilseed crop in the world with a production in 2001/02 of about 184.3 million tons of beans over 79.4 million ha. Soybean seeds contain a high nutritional value, rich in proteins and oils, that serve as a major source of food for humans, and recently, as a promising resource for biofuel. Soybean seed development is triggered by a double-fertilization process that leads to the differentiation of the embryo, endosperm, and seed coat that are the major regions of the seed and essential for seed viability and plant reproduction. Many different developmental and physiological events occur within each soybean seed region during development that are programmed, in part, by the activity of different genes. Seed development, therefore, is the result of a mosaic of different gene expression programs occurring in parallel in different seed compartments. What these programs are and how they are organized into unique regulatory circuits within the plant genome to “make a seed” remain major unanswered questions.

The long-term objective of this project is to identify the regulatory networks required for the development of soybean seeds and the seeds of flowering plants. Our research goals are to:

  • Identify the spectrum of genes that are active in different seed compartments (embryo, endosperm, seed coat), regions (embryo proper, suspensor), and tissue types (epidermis, inner integument, outer integument, endothelium, hilum) at major developmental stages (globular, heart, cotyledon, and early maturation).
  • Characterize the spectrum of genes that are active in embryo regions (embryo proper, suspensor, cotyledon, axis) and tissues (epidermis, parenchyma, root and shoot meristems, vascular bundle) at different developmental stages.
  • Describe the cellular processes that occur in different seed compartments, regions, and tissues using bioinformatic approaches.
  • Determine the distribution of transcription factor mRNAs within a seed at different developmental periods, and identify transcription factor mRNAs that are unique for specific compartments, regions, and tissues.
  • Estimate the number of genes required to program all of seed development (i.e., the number of genes required to “make a seed”).
  • Down regulate compartment-, region-, and tissue-specific transcription factors to determine their roles in seed development.

To date, we have:

  • Using LCM and GeneChip technologies, finished profiling mRNAs from every soybean and Arabidopsis seed compartment, region, and tissue throughout development - providing the first comprehensive global atlas of gene activity during all of seed development.
  • Completed 166 GeneChip experiments of 76 soybean and Arabidopsis seed compartments, and uploaded all RNA profiling datasets into GEO, as well as onto this interactive NSF-sponsored seed genes website that can be used by the plant research community (Figure 1).
  • Provided the first comprehensive insights into gene activity in every part of a seed throughout seed development – providing a novel resource for understanding a spectrum of processes that are required to “make a seed”.
  • Generated ~65 different transgenic soybean lines containing either seed promoter/GUS constructs to characterize promoter transcriptional specificity or promoter/RNAi constructs that target seed-compartment-specific transcription factor mRNAs. Click here to view the complete list of RNAi lines.
  • Using high throughput sequencing technology, sequenced ~700 million reads (53.2 Gb) of mRNAs from soybean seeds of five soybean major developmental stages, three soybean seed regions captured by LCM, and soybean seed cotyledons from maturation through germination (Figure 1).

We Found:

  • At least 54,000 genes are active during soybean seed development (Figure 2).
  • In both soybean and Arabidopsis, a similar number of mRNAs is required to “make a seed” at each development stage – from fertilization through maturation (Figure 3 and 4).
  • Different seed compartments required approximately the same number of genes to carry out their specific roles in seed development (Figure 3 and 4).
  • Most diverse seed mRNAs are shared by different seed compartments at all developmental stages (Figure 3 and 4).
  • Each seed compartment has a small set of compartment-, region- and tissue-specific mRNAs, including those encoding transcription factors (Figure 3 and 4).
  • The majority of shared mRNAs are regulated quantitatively.
Figure 1. Summary of All Soybean and Arabidopsis Experiments Carried Out To Date. Summary of GeneChip, qRT-PCR, NextGen Sequencing, and promoter-GUS, and RNAi experiments carried out to date - including GEO and GenBank accession numbers for relevant datasets. A list all seed compartments, tissues, and regions used for soybean (left) and Arabidopsis (right) LCM and GeneChip experiments. PREGLOB, GLOB, HRT, LCOT, BCOT, EM, MG refer to pre-globular, globular, heart, linear-cotyledon, bent-cotyledon, early-maturation, and mature green-stage seeds, respectively.
Figure 2. Soybean gene models detected within mRNA population during soybean seed development and in specific seed compartments and tissues through RNASeq profiling experiments. Whole seed, whole cotyledons, and LCM refer to mRNA sequences obtained from the entire seed, cotyledons only, and specific seed compartments and regions, respectively. Abbreviations are defined in the legend to Figure 1. Union refers to the union of diverse mRNA sets detected throughout seed development.
Figure 3. Gene activity in soybean globular-stage seed. (A-B) Globular-stage whole mount seed and longitudinal paraffin section used for LCM. (B) ep, epd, ent, es, hi, ii, oi, and sus refer to embryo proper, epidermis, endothelium, endosperm, hilum, inner integument, outer integument, and suspensor of globular-stage seed, respectively. (C-F) summary of GeneChip experiments with mRNAs from globular-stage seed compartments, tissues and regions captured using LCM. BR, TF, and WS refer to biological replicates, transcription factor, and whole seed, respectively. (G) soybean gene models represented in globular-stage seed mRNA populations using GeneChip and RNASeq experiments. ND refers to not determined.
Figure 4. Gene activity in arabidopsis globular-stage seed. (A) Globular-stage seed longitudinal paraffin section showing tissues and regions captured using LCM. (B) Summary of GeneChip experiments with mRNAs from globular-stage seed regions and tissues captured using LCM. GSC, PEN, MCE, EP, SUS, CZE, and CZSC refer to general seed coat, peripheral endosperm, micropylar endosperm, embryo proper, suspensor, chalazal endosperm, and chalazal seed coat, respectively. Abbreviations are defined in Figure 3.