All datasets were stored in clickable excel files. You can browse the dataset using the clickable buttons. The excel file includes an index sheet, which lists all clickable button for all tables. Every table sheets have clickable button to go back to index sheet. Files can be downloaded from this page: 1) GeneChip-Dataset-S1-1-14-13.xlsx Soybean GeneChip Data (Tables S1 - S5) 2) GeneChip-Dataset-S2-1-14-13.xlsx Region-, Subregion-, and Tissue-Specific mRNAs at Each Individual Developmental Stage Identified by GeneChip (Tables S7 - S25) 3) GeneChip-Dataset-S3-2-28-13.xlsx Hierarchical Clustering of Most Varying mRNAs, Enzyme mRNAs, and Transcription Factor mRNAs at Each Individual Developmental Stage (Tables S27 - S62) 4) GeneChip-Dataset-S4-3-14-13.xlsx Hierarchical Clustering of Most Varying mRNAs, Enzyme mRNAs, and Transcription Factor mRNAs in All Seed Regions, Subregions, and Tissues from Globular to Early-Maturation Stage (Tables S63 - S79) ## GeneChip Data Processing and Analysis Signal intensities and detection calls were determined as described by Le et al. {Le, 2010 #1}. All image files (.cel) and tab-delimited text filles (.txt) containing MAS 5.0 {Redman, 2004 #2} analyzed data were deposited in the Gene Expression Omnibus (GEO) database (http://www.ncbi.nlm.nih.gov/geo) as described in Materials and Methods. We also created an interactive web site (http://seedgenenetwork.net/soybean) that enables browsing mRNA profiles and comparing gene activity in different seed regions, subregions, and tissues throughout soybean development. Each probe set was manually assigned a consensus detection call as PP, AA, or INS by as described by Le et al. {Le, 2010 #1}. Identifying Region-, Subregion-, or Tissue-Specific mRNAs and Shared mRNAs at a Specific Developmental Stage. The region-, subregion-, or tissue-specific and shared mRNAs were identified following the criterion as described by Le et al. {Le, 2010 #1}. Identifying Shared mRNAs by Mosaic Combination at a Specific Developmental Stage. The shared mRNAs by mosaic pattern at a specific developmental stage were identified following the criterion as described by Le et al. {Le, 2010 #1}. Identifying Embryo Proper-, Suspensor-, and Seed Coat Tissue-Specific mRNAs at Globular Stage. To identify embryo proper- and suspensor-specific mRNAs, we filtered for probe sets with a detection call PP in embryo proper and a detection call AA in suspensor as embryo proper-specific mRNAs and probe sets with a detection call PP in suspensor and a detection call AA in embryo proper as suspensor-specific mRNAs. Similarly, specific mRNAs in each layer of seed coat were identified by filtering for probe sets with detection call PP in one of seed coat tissues and AA in all other seed coat tissues. ## Hierarchical Clustering Unsupervised hierarchical clustering was carried out using dChip 2004 software {Li, 2001 #10}. To cluster most varying mRNAs at each developmental stage and throughout development from globular to early-maturation stage, probe sets with a consensus call of PP in at least one of regions, subregions, or tissues were selected from 37,593 probe sets represented on Soybean Genome Array. These probe sets were further used to identify quantitatively varying mRNAs between regions, subregions, or tissue at each developmental stage and throughout development. These selected probe sets were ordered according to the standard deviation of signal intensities across all regions, subregions, and tissues, and most varying top 4,000 mRNAs were selected to study mRNA accumulation using hierarchical clustering. To cluster enzyme mRNAs in the metabolic pathways and TF mRNAs at each developmental stage and throughout development, quantitatively varying mRNAs (ANOVA, P < 0.001) were selected from 3,559 enzymes mRNAs and 2,832 TF mRNAs represented on Soybean Genome Array, respectively as previously described. Those selected most varying top 4,000 mRNAs, quantitatively varying enzyme mRNAs, and quantitatively varying TF mRNAs were further analyzed using hierarchical clustering. Normalization of signal intensities of a probe set across all samples and distance measurement between two genes and clusters were performed as described by Le et al. {Le, 2010 #2}. Bar plots of clusters were generated by R package (Reference?) using standardized signal expression values of all the mRNAs in each cluster identified from hierarchical clustering. To identify mRNA sets that were up-regulated at least 2-fold within each cluster (P < 0.05), samples were compared following filtering criteria as described by Le et al. {Le, 2010 #2}. The identities of mRNAs at least 2-fold up-regulated in the clusters are listed in Dataset S3 and Dataset S4. To determine biological relationship between different soybean seed regions, subregions, and tissues throughout development, bootstrapping analysis of clustering was performed with MAS5 normalized most varying top 4,000 mRNAs (ANOVA, P < 0.001) for all soybean GeneChip biological replicates using the pvclust package {Suzuki, 2006 #3} with default setting. Probability values (p-values) for each cluster were calculated and p-values of 100 are indicated at the branch by *. Principal component analysis (PCA) was carried out with robust multichip average (RMA) normalized data for all 40 different seed regions, subregions, and tissues throughout development using the MultiExperiment Viewer (MeV version 4.4) (http://www.tm4.org/mev/) with default setting. ## Gene Ontology (GO) Term Enrichment Analysis Region-specific mRNAs at a specific developmental stage and co-regulated mRNA sets in the clusters identified by hierarchical clustering of most varying top 4,000 mRNAs at each developmental stage and throughout development were analyzed for GO term enrichment by using AgriGO 1.0 (http://bioinfo.cau.edu.cn/agriGO/) {Du, #4}. Enriched GO term analysis was carried out using the following parameter: (i) reference: soybean genome locus, (ii) statistical test method: hypergeometric, (iii) significance Level: 0.01, and (iv) minimum number of mapping entries: 3. Enriched GO terms with a P < 0.01 are listed in Dataset S3 and Dataset S4. ## REFERENCES 1. Le BH, et al. (2007) Using genomics to study legume seed development. Plant Physiology 144(2):562-574. 2. Redman JC, et al. (2004) Development and evaluation of an Arabidopsis whole genome Affymetrix probe array. Plant Journal 38(3):545-561. 3. Suzuki R & Shimodaira H (2006) Pvclust: an R package for assessing the uncertainty in hierarchical clustering. Bioinformatics 22(12):1540-1542. 4. Du Z, et al. (2010) agriGO: a GO analysis toolkit for the agricultural community. Nucleic Acids Research 38(W)64-70.