All data summaries were stored in clickable excel files. You can browse the data summary using the clickable buttons. The excel file includes an index sheet, which lists all clickable button for all tables. Every table sheets have clickable button to go back to index sheet. Files can be downloaded from this page: 1) RNAseq-Data-Summary-S1-4-7-15.xlsx Soybean RNA-Seq Data (Tables S1 - S9) 2) RNAseq-Data-Summary-S2-4-7-15.xlsx Specific mRNAs including, Transcription Factor and Enzyme mRNAs in Seed Regions, Subregions, and Tissues at Each Developmental Stage (Tables S10 - S33) 3) RNAseq-Data-Summary-S3-4-8-15.xlsx Hierarchical Clustering of Most Varying mRNAs, Enzyme mRNAs, and Transcription Factor mRNAs in All Seed Regions, Subregions, and Tissues from Globular to Early-Maturation Stage (Tables S34 - S41) ## RNA-Seq Sequence Data Analysis RNA-Seq read normalization and pairwise comparisons between seed regions or developmental stages were carried out to identify significantly differentially expressed genes in a distinct seed region or at a particular developmental stage using EdgeR package v3.0.8 {Robinson, 2010 #1}. ## Identifying Region-Specific mRNAs Region-specific mRNAs that are at least five-fold higher and statistically significant (FDR < 0.001, Benjamini-Hochberg multiple testing correction) in a particular seed region relative to all other seed regions at a given developmental stage were identified for globular, heart, cotyledon, and early-maturation stage. Lists of region-specific mRNAs, including enzymes in metabolic pathways and transcription factors (TFs), and overrepresented GO (gene ontology) terms are shown in Dataset S2. For example, the mRNA identified as globular suspensor-specific mRNA are at least five-fold higher and statistically significant relative to all other regions (embryo proper, endosperm, endothelium, inner integument, outer integument, epidermis, and hilum at globular stage. ## Identifying Stage-Specific mRNAs Stage-specific mRNA that are at least five-fold higher and statistically significant (FDR < 0.001, Benjamini-Hochberg multiple testing correction) at a particular developmental stage relative to all other developmental stages within a given seed region were identified in all the seed regions, subregions, and tissues from globular stage to early-maturation stage. For globular-, heart-, and cotyledon-stage specific mRNAs in embryo proper, stage-specific mRNAs are at least five-fold higher (FDR < 0.001) at a particular developmental stage relative to two other developmental stages as well as at least one of embryo tissues at early-maturation stage. For example, globular-stage specific mRNAs in embryo proper are at least file-fold higher (FDR < 0.001) relative to heart stage embryo proper, cotyledon stage embryo proper, and at least one of early-maturation stage embryo proper tissues. Lists of stage-specific mRNAs, including enzymes in metabolic pathways and transcription factors (TFs), and overrepresented GO terms are shown in Dataset S3. ## Identifying Unique, Mosaic, and Shared mRNAs in Seed Regions at a Specific Developmental Stage. mRNAs detected in a specific seed region, subregion or tissue, not detected in all other samples, with statistical significance (ANOVA, P < 0.05) at a given developmental stage are defined as unqiue mRNAs. mRNAs detected in all the seed regions, subregions, and tissues at a given developmental stage are defined as shared mRNAs. In addition, mRNAs detected in two or more seed regions, subregions, and tissues with statistical significance (ANOVA, P < 0.05) were identified. ## Hierarchical Clustering Unsupervised hierarchical clustering was carried out using dChip 2011 software {Li, 2001 #2}. To cluster most varying mRNAs throughout development, genes detected in at least one of regions, subregions, or tissues from globular to early-maturation stage were selected. These detected genes were further used to identify quantitatively varying mRNAs (ANOVA, P<0.001) between regions, subregions, or tissue throughout development. These selected genes were ordered according to the standard deviation across all regions, subregions, and tissues, and most varying top 15,000 mRNAs were selected to study mRNA accumulation using hierarchical clustering. To cluster enzyme mRNAs in the metabolic pathways and TF mRNAs throughout development, quantitatively varying 5,156 enzyme mRNAs and 4,756 TF mRNAs (ANOVA, P < 0.001) were identified. Those selected most varying top 15,000 mRNAs, quantitatively varying enzyme mRNAs, and quantitatively varying TF mRNAs were further analyzed using hierarchical clustering. Normalization of RNA-Seq reads across all samples and distance measurement between two genes and clusters were performed as described by Le et al. {Le, 2007 #3}. Bar plots of clusters were generated by R package (http://www.r-project.org/) using z-score normalized RNA-Seq read counts of all the mRNAs in each cluster identified from hierarchical clustering. To identify mRNA sets that were up-regulated at least 5-fold within each cluster (P < 0.05), samples were compared according to criteria previously described by Le et al. {Le, 2007 #3} with following modifications: E/B >5, Student’s t-test P < 0.05, and 1000 permutations to obtain a false discovery rate (FDR), where E and B represent experimental groups and baseline, respectively. The identities of mRNAs at least 5-fold up-regulated in the clusters, including transcription factors and enzymes in metabolic pathways are listed in Dataset S4. To determine biological relationship between different soybean seed regions, subregions, and tissues throughout development, bootstrapping analysis of clustering was performed with normalized values of most varying top 15,000 mRNAs (ANOVA, P < 0.001) for biological replicates using the pvclust package {Suzuki, 2006 #4} with default setting. Probability values (p-values) for each cluster were calculated. In addition, Principal component analysis (PCA) was carried out with normalized RNA-Seq read counts for all 40 different seed regions, subregions, and tissues throughout development using scatterplot3d software package (http://www.r-project.org/) with some modification. ## Transcription Factor Annotation and Functional Classification of Metabolic Pathway Genes. To identify soybean transcription factors, we used three databases including SoyDB (http://casp.rnet.missouri.edu/soydb/), SoybeanTFDB (http://soybeantfdb.psc.riken.jp/), and PlantTFDB (http://planttfdb.cbi.pku.edu.cn/index.php?sp=Gma). By combining transcription factors from three databases, we have identified 5,547 transcription factors represented on the soybean genome. Furthermore, 6,538 genes encoding enzymes in metabolic pathways were classified into 473 metabolic pathways using SoyCyc 4.0 (http://www.soybase.org:8082/). The list of transcription factors and metabolic pathway enzymes are listed in Dataset S1. ## Gene Ontology (GO) Term Enrichment Analysis Overrepresented GO term were identified in region- and stage-specific mRNAs and co-regulated mRNAs in the clusters identified by hierarchical clustering of most varying top 15,000 mRNAs throughout development by using GO Annotation from Soybase (http://soybase.org/genomeannotation/index.php). The GO term enrichment analysis was performed using goseq software package (http://www.r-project.org/). Enriched GO term analysis was carried out using the following parameter: (i) reference: soybean genome locus, (ii) statistical test method: hypergeometric, and (iii) significance Level: FDR (Benjamini-Hochberg multiple testing correction) < 0.05. Enriched GO terms (FDR < 0.05) of region-specific mRNAs, stage-specific mRNAs, and mRNAs in the clusters identified by hierarchical clustering are listed in Dataset S2, Dataset S3, and Dataset S4, respectively. ## REFERENCES 1. Robinson MD & Oshlack A (2010) A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol 11(3):R25. 2. Li C & Hung Wong W (2001) Model-based analysis of oligonucleotide arrays: model validation, design issues and standard error application. Genome Biology 2(8):research0032.0031 - research0032.0011. 3. Le BH, et al. (2007) Using genomics to study legume seed development. Plant physiology 144(2):562-574. 4. Suzuki R & Shimodaira H (2006) Pvclust: an R package for assessing the uncertainty in hierarchical clustering. Bioinformatics 22(12):1540-1542.