|BGI WEGO Web Gene Ontology Annotation Plotting|
Fig 1. The sample figure of WEGO output, from the rice genome paper published on science. The GO (Gene Ontology) project began as the collaboration of Flybase, Saccharomyces Genome Database (SGD) and Mouse Genome Base. And now it has gone beyond what it used to be. There are so many GO resources and tools that help biologists explore the depth of gene analysis, from several genes to large-scale.
WEGO (Web Gene Ontology Annotation Plot) is a useful tool for plotting GO annotation results. It has been widely used in many important biological research projects, such as the rice genome project [Yu, J. et al. Science 296, 79-92 (2002); Yu, J. et al. PLoS Biol 3, e38 (2005)] and the silkworm genome project [Xia, Q. et al. Science 306, 1937-40 (2004)]. It has become one of the daily tools for downstream gene annotation analysis, especially when performing comparative genomics tasks. WEGO along with two other tools, namely External to GO Query and GO Archive Query, are freely available for all users. Any suggestions are welcome at email@example.com. Here is a sample output generated by WEGO (Fig. 1).
There are three steps to work with WEGO. The first is to upload annotation result(s). The input file(s) can be in WEGO native format, or if you are using InterProScan as the annotation tool, the result(s) could be used directly. We support InterProScan text, raw and XML output formats as the input format of WEGO. Then, you will be redirected to a webpage with hierarchical GO tree in which all the GO terms contained in the files uploaded are included. You could choose any GO terms interested at this page to display in the output histogram. The last step is figure setting, such as the figure caption, histogram color(s) and legend description. Currently, WEGO support SVG, PNG, PostScript, EPS and GIF as output graph format. You can also get the results by our feedback Email.
Ye J, Fang L, et al. Nucleic. Acids Res., 2006, 34(Web service issue), 293-297 [PDF]
[Input of WEGO]
Currently, WEGO supports 3 kinds of input format: WEGO native format, InterProScan text, raw and XML output formats. WEGO native format is a simple text file with one gene record per line. And each column is tab-delimited. The first column is the gene name and the others are the GO ID in format of GO:0000015. The annotation columns could be empty if there is no annotation result available for the gene. It supports comment line which starts with an exclamation point (!). A sample file of WEGO native format could be downloaded from the homepage of WEGO.
The InterProScan text, raw and XML output format are both acceptable for the convenient using of the users, so that the annotation results of InterProScan could be uploaded onto the WEGO without any conversion. We are planning to accept more output formats of other GO annotation tools now. Any requirement are welcome at WEGO@genomics.org.cn .
Fig2. WEGO native format. The first column is the protein ID, the followed column(s) are the GO ID.
[Output of WEGO]
SVG is the default output format of WEGO, for its wide support by many industrial and open source software, such as CorelDRAW, Illustrilator, inkscape, ImageMagick and so on. With the help of SVG plug-in, SVG graph could be viewed in browser. Another advantage of SVG is easy conversion to other graph formats and suitability for publishing. WEGO also support other graph formats, including the bitmap formats PNG, JPEG and GIF, suitable for on-screen display, and the other vector formats PostScript and EPS formats. The file will be compressed for downloading. Some useful links of SVG tools are listed on the WEGO homepage.
[Uses of WEGO]
There are two ways to work with WEGO. The first is to upload the annotation files (up to three files at a time).The input files must be in one of the three formats described above. The version of GO archive is optimal for that it suggested to be the same version of that used in annotation. The second way is to enter the job ID if a previous analysis on WEGO web site was performed within three days. WEGO allows users to change almost all of the settings from their prior session via this job ID. Even the version of GO archive could be changed without re-uploading the input files.
1. First of all, upload the input files or input the job ID. This form is on the homepage of WEGO.
Fig3. Step 1 of WEGO.
A process window which auto-refresh with an interval of 5 seconds shows the job ID after the file uploaded. The users could use the job ID to re-edit the analysis within three days.
Fig4. Step 2 of WEGO, GO tree edit page.
2. Then the user is redirected to a webpage with hierarchical GO tree in which all the GO terms contained in the files uploaded are included. Any GO ID that don't exist in the GO archive are listed in the "view error" page. Another tool GO Archive Query is developed to help users, especially the one without information of the GO version used in annotation, deal with this frequently happened error. On the top of the page, are the ontology type selection box, the GO level input box and the view error button. Users can switch among the three ontology trees via the ontology type selection box. And the number inputted in the GO level input box is used to limit the level of GO tree displayed in the page.
The user could choose any GO terms interested at this page to display in the output histogram. (The second level is chosen as default.) The hierarchical GO tree is the main body of this page. Each line of the GO tree represents a GO term. From left to right of each line, are selection accelerating toolbar, gene number associated to this GO term, gene percentage of the GO term, Pearson Chi-Square test p-value of every two input data, GO ID and GO term annotation. If there is only one input data the Pearson Chi-Square test p-value will be dropped. If there are three input data, the three columns stand for the p-value between every couple of the three datasets in the order of one-two, one-three, two-three. Comparing with Fisher's exact test, Pearson Chi-Square test is appropriate for 2x2 matrix when all of the expected counts are greater than 5. However, it does give 'Ml' standing for meaningless if any of the expected counts are less than 5. And 'Na' stands for not available of the p-value of Pearson Chi-Square test. Red arrows are used to mark items with significant relationship. (since the significance level is below the 0.05.) The 'arrowed' button was designed to help users to select all the significant items.
Users can switch among the three ontology trees to choose the GO terms interested. The selections are saved in the server automatically and are available in the "summary". The output figure could also be previewed before setting. When all the selection are OK, the user could click the "plot" button to enter the export decoration page to set the output figure.
3. The output title, legend style, data mark, color, figure width and height could be set in this page. (The default output width and height have been optimized.) The anonymous terms filter was designed to avoid the insignificant items. All the GO terms including unknown/unknowing/obsolete are dropped from the output histogram. The user could get rid of the tick in ahead of the filter to avoid this function. Users could also decide whether set the y-axis to log-scale.
Fig5. Step 3 of WEGO, export decoration setting page.
We currently support SVG, PNG, GIF, JPG, EPS and PS output format. The default output file format is SVG. Welcome to contact us for support of other output formats. The user could also get the results by our feedback email.
Fig6. Step 4 of WEGO, output format convertion.
[External to GO Query]
External to GO Query attempts to make translations between GO and other catalogs of annotation vocabularies. It is an interface based on the database of GO consortium's external2go. We caution that these mapping are neither complete nor exact. The External to GO Query is designed to help biologists to better understand their annotation results. It only deal with the association between GO and the others. Users could query both GO ID and categories of external systems included in the database in External to GO Query. Corresponding entries or GO ID would be given as output. The GO ID could be input in the format of GO:0000015, 0000015 or just 15. The user could choose a special database to search for. Or else, the input will be searched in all external database indexes. Please note that it will take more time.
Fig7. External to GO Query.
[GO Archive Query]
GO Archive Query will help the users to find the version of GO Archive where a special GO term exist in. With it, the user could choose the proper version of GO Archive to be used in the plotting. The Ontology used in the downstream analysis should be the same as the one used in the annotation. And there is a frequently happened error that due to the different version adopted in WEGO analysis and annotation, some GO terms could not find in the ontology. WEGO will list all of these GO terms in the "view error". And we strongly suggest the user query these GO terms in GO Archive Query if without the information of the ontology used in the annotation. The GO ID could be input in the format as GO:0000015, 0000015 of just 15. The versions of Gene Ontology containing the GO ID will be given as output.
Fig8. GO Archive Query.
1. Xia, Q. et al. A draft sequence for the genome of the domesticated silkworm (Bombyx mori). Science 306, 1937-40 (2004).[pdf]
2. Yu, J. et al. A draft sequence of the rice genome (Oryza sativa L. ssp. indica). Science 296, 79-92 (2002).[pdf]
3. Yu, J. et al. The Genomes of Oryza sativa: A History of Duplications, PLoS Biol 3, e38 (2005).[pdf]
External to GO Query
GO Archive Query
Gene ontology documents
GO flat file format guide
OBO flat file format guide
XML version guide
MySQL version guide
The Gene Ontology homepage
The Sequence Ontology
Saccharomyces Genome Database
Mouse Genome Informatics
The InterProScan homepage
The GOA homepage
Cluster of Orthology Groups
GO Term Finder
Inkscape SVG Editor
Adobe SVG viewer
Apache SVG Tools
BGI all right reserved!