| Please help improve this article by expanding it. Further information might be found on the talk page. (November 2008) |
Biological databases are libraries of life sciences information, collected from scientific experiments, published literature, high throughput experiment technology, and computational analyses. They contain information from research areas including genomics, proteomics, metabolomics, microarray gene expression, and phylogenetics.[1] Information contained in biological databases includes gene function, structure, localization (both cellular and chromosomal), clinical effects of mutations as well as similarities of biological sequences and structures.
Relational database concepts of computer science and Information retrieval concepts of digital libraries are important for understanding biological databases. Biological database design, development, and long-term management is a core area of the discipline of Bioinformatics.[2]. Data contents include gene sequences, textual descriptions, attributes and ontology classifications, citations, and tabular data. These are often described as semi-structured data, and can be represented as tables, key delimited records, and XML structures. Cross-references among databases are common, using database accession numbers.
Overview
Biological databases have become an important tool in assisting scientists to understand and explain a host of biological phenomena from the structure of biomolecules and their interaction, to the whole metabolism of organisms and to understanding the evolution of species. This knowledge helps facilitate the fight against diseases, assists in the development of medications and in discovering basic relationships amongst species in the history of life.
The biological knowledge is distributed amongst many different general and specialized databases. This sometimes makes it difficult to ensure the consistency of information. Biological databases cross-reference other databases with accession numbers as one way of linking their related knowledge together.
An important resource for finding biological databases is a special yearly issue of the journal Nucleic Acids Research (NAR). The Database Issue of NAR is freely available, and categorizes many of the publicly available online databases related to biology and bioinformatics.
Output
Biological data comes in many formats. These formats include text, sequence data, protein structure, links. Each of these can be found from certain sources. For example,
Text formats are provided by PubMed and OMIM.
Sequence Data are provide by GeneBank, in terms of DNA, and UniProt, in terms of protein.
Protein Structures are provided by PDB, SCOP, and CATH.
Example public databases for molecular biology
(from www.kokocinski.net)
Primary sequence databases
The International Nucleotide Sequence Database (INSD) consists of the following databases.
- DDBJ (DNA Data Bank of Japan)
- EMBL Nucleotide DB (European Molecular Biology Laboratory)
- GenBank [1] (National Center for Biotechnology Information)
- UniProtKB (Universal Protein Resource Knowledgebase)
The four largest databases are GeneBank, (the U.S.’s collection of various biological data), EMBL, (Europe’s collection of nucleotide sequence data), DDBJ, (DNA Data Bank of Japan), and UniProt, (Universal Protein Resource). GeneBank, is a service provided by NCBI, which stores sequence data and “biological sequence related data.” EMBL is a service provided by EBI, the European Bioinformatics Institute, and provides a collection of nucleotide sequence data, as its name suggests. DDBJ is a nucleotide database. UniProt is a high-quality and comprehensive universal protein resource. It provides translations of sequences from EMBL, GeneBank, and DDBJ, in its UniProt Knowledgebase (UniProtKB).
These databanks represent the current knowledge about the sequences of all organisms. They interchange the stored information and are the source for many other databases.
Note that GeneBank, EMBL, and DDBJ work very closely with one-another, and as a result what one can find from one of these sources they can find from any of the other two and vice-versa.
Meta-databases
Strictly speaking a meta database can be considered a database of databases, rather than any one integration project or technology. They collect data from different sources and usually makes them available in new and more convenient form, or with an emphasis on a particular disease or organism.
- Entrez[2] (National Center for Biotechnology Information)
- euGenes (Indiana University)
- GeneCards (Weizmann Inst.)
- SOURCE (Stanford University)
- mGen containing four of the world biggest databases GenBank, Refseq, EMBL and DDBJ - easy and simple program friendly gene extraction
- Bioinformatic Harvester[3] (Karlsruhe Institute of Technology) - Integrating 26 major protein/gene resources.
- MetaBase[4] (KOBIC) - A user contributed database of biological databases.
- ConsensusPathDB - A molecular functional interaction database, integrating information from 12 other databases.
Genome Databases
These databases collect organism genome sequences, annotate and analyze them, and provide public access. Some add curation of experimental literature to improve computed annotations. These databases may hold many species genomes, or a single model organism genome.
- CAMERA Resource for microbial genomics and metagenomics
- Corn, the Maize Genetics and Genomics Database
- Ensembl provides automatic annotation databases for human, mouse, other vertebrate and eukaryote genomes.
- ERIC (Enteropathogen Resource Integration Center) Curated database containing annotated genome data for five enteropathogens - Escherichia coli, Shigella, Salmonella, Yersinia enterocolitica, and Y. pestis.
- Flybase, genome of the model organism Drosophila melanogaster
- MGI Mouse Genome (Jackson Lab.)
- JGI Genomes of the DOE-Joint Genome Institute provides databases of many eukaryote and microbial genomes.
- National Microbial Pathogen Data Resource. A manually curated database of annotated genome data for the pathogens Campylobacter, Chlamydia, Chlamydophila, Haemophilus, Listeria, Mycoplasma, Neisseria, Staphylococcus, Streptococcus, Treponema, Ureaplasma, and Vibrio.
- Saccharomyces Genome Database, genome of the yeast model organism.
- Viral Bioinformatics Resource Center Curated database containing annotated genome data for eleven virus families.
- The SEED platform for microbial genome analysis includes all complete microbial genomes, and most partial genomes. The platform is used to annotate microbial genomes using subsystems.
- Xenbase, genome of the model organism Xenopus tropicalis and Xenopus laevis
- Wormbase, genome of the model organism Caenorhabditis elegans
- Zebrafish Information Network, genome of this fish model organism.
- TAIR, The Arabidopsis Information Resource.
- UCSC Malaria Genome Browser, genome of malaria causing species (Plasmodium falciparumata and others)
- RGD Rat Genome Database: Genomic and phenotype data for Rattus norvegicus
Genome Browsers
Genome Browsers enable researchers to visualize and browse entire genomes (most have many complete genomes) with annotated data including gene prediction and structure, proteins, expression, regulation, variation, comparative analysis, etc. Annotated data is usually from multiple diverse sources.
- Integrated Microbial Genomes (IMG) system by the DOE-Joint Genome Institute
- UCSC Genome Bioinformatics Genome Browser and Tools (UCSC)
- Ensembl The Ensembl Genome Browser (Sanger Institute and EBI)
- GBrowse The GMOD GBrowse Project
- Pathway Tools Genome Browser
- X:Map A genome browser that shows Affymetrix Exon Microarray hit locations alongside the gene, transcript and exon data on a Google maps api
- Viral Genome Organizer (VGO) A genome browser providing visualization and analysis tools for annotated whole genomes from the eleven virus families in the VBRC (Viral Bioinformatics Resource Center) databases
- Apollo Genome Annotation Curation Tool A cross-platform, JAVA-based standalone genome viewer with enterprise-level functionality and customizations. The standard for many model organism databases.
- SEED viewer for visualizing and interrogating the SEED database of complete microbial genomes
- Integrated Genome Browser (IGB) A cross-platform, Java-based desktop genome viewer.
- Argo Genome Browser A fre and open source standalone Java-based genome browser for visualizing and manually annotating whole genomes.
Protein sequence databases
- UniProt[5] Universal Protein Resource (UniProt Consortium: EBI, Expasy, PIR)
- PIR Protein Information Resource (Georgetown University Medical Center (GUMC))
- Swiss-Prot[6] Protein Knowledgebase (Swiss Institute of Bioinformatics)
- PEDANT Protein Extraction, Description and ANalysis Tool (Forschungszentrum f. Umwelt & Gesundheit)
- PROSITE Database of Protein Families and Domains
- DIP Database of Interacting Proteins (Univ. of California)
- Pfam Protein families database of alignments and HMMs (Sanger Institute)
- PRINTS PRINTS is a compendium of protein fingerprints (Manchester University)
- ProDom Comprehensive set of Protein Domain Families (INRA/CNRS)
- SignalP 3.0 Server for signal peptide prediction (including cleavage site prediction), based on artificial neural networks and HMMs
- SUPERFAMILY Library of HMMs representing superfamilies and database of (superfamily and family) annotations for all completely sequenced organisms
- Annotation Clearing House a project from the National Microbial Pathogen Data Resource
Protein structure Databases
- Protein Data Bank[7] (PDB) (Research Collaboratory for Structural Bioinformatics (RCSB))
- Protein Model Portal[8] (PMP) Meta database that combines several databases of protein structure models (Biozentrum, Basel, Switzerland)
- CATH Protein Structure Classification
- SCOP Structural Classification of Proteins
- SWISS-MODEL Server and Repository for Protein Structure Models
- ModBase Database of Comparative Protein Structure Models (Sali Lab, UCSF)
Protein-protein interactions
- BioGRID [9] A General Repository for Interaction Datasets (Samuel Lunenfeld Research Institute)
- STRING: STRING is a database of known and predicted protein-protein interactions. (EMBL)
- DIP Database of Interacting Proteins
- BIND Biomolecular Interaction Network Database
Signaling Pathway Databases
- Netpath - A curated resource of signal transduction pathways in humans
- Reactome
- NCI-Nature Pathway Interaction Database
Metabolic pathway Databases
- BioCyc Database Collection including EcoCyc and MetaCyc
- KEGG PATHWAY Database[10] (Univ. of Kyoto)
- MANET database [11] (University of Illinois)
- Reactome[12] (Cold Spring Harbor Laboratory, EBI, Gene Ontology Consortium)
Microarray databases
- ArrayExpress (European Bioinformatics Institute)
- Gene Expression Omnibus (National Center for Biotechnology Information)
- GPX(Scottish Centre for Genomic Technology and Informatics)
- maxd (Univ. of Manchester)
- Stanford Microarray Database (SMD) (Stanford University)
Mathematical Model Databases
PCR / Real time PCR primer Databases
Specialized databases (in alphabetical order)
- AntibodyLink.org Antibody database.
- BIOMOVIE (ETH Zurich) movies related to biology and biotechnology
- CGAP Cancer Genes (National Cancer Institute)
- Clone Registry Clone Collections (National Center for Biotechnology Information)
- Connectivity map Transcriptional expression data and correlation tools for drugs
- CTD The Comparative Toxicogenomics Database describes chemical-gene-disease interactions
- DBGET H.sapiens (Univ. of Kyoto)
- DiProDB A database to collect and analyse thermodynamic, structural and other dinucleotide properties.
- Edinburgh Mouse Atlas
- GreenPhylDB (A phylogenomic database for plant comparative genomics)
- GyDB The Gypsy Database of Mobile Genetic Elements (Universitat de València)
- Genome Database for Rosaceae (International Genomics and Genetics Database for Rosaceous crops)
- GDB Hum. Genome Db (Human Genome Organisation)
- HGMD disease-causing mutations (HGMD Human Gene Mutation Database)
- HUGO (Official Human Genome Database: HUGO Gene Nomenclature Committee)
- HvrBase++ Human and primate mitochondrial DNA
- INTERFEROME The Database of Interferon Regulated Genes
- List with SNP-Databases
- NCBI-UniGene (National Center for Biotechnology Information)
- OMIM Inherited Diseases (Online Mendelian Inheritance in Man)
- OrthoMaM (A database of Orthologous Mammalian Markers)
- p53 The p53 Knowledgebase
- PhenCode linking human mutations with phenotype
- Plasma Proteome Database Human plasma proteins along with their isoforms
- PolygenicPathways Genes and risk factors implicated in Alzheimer's disease, Bipolar disorder or Schizophrenia
- SHMPD The Singapore Human Mutation and Polymorphism Database
- SciClyc An Open-access database to shared antibodies, cell cultures, and documents for biomedical research.
- XTractorDiscovering Newer Scientific Relations Across PubMed Abstracts. A tool to obtain manually annotated relationships for Proteins, Diseases, Drugs and Biological Processes as they get published in PubMed.
Wiki style databases
- Gene Wiki
- OpenWetWare
- PDBWiki
- Proteopedia
- Topsan
- WikiGenes
- WikiPathways
- YTPdb
- CHDwiki
- GyDB
- WikiProfessional
Problems Associated with Protein Databases
Since discovery in the area of protein structure has not evolved quite as quickly as discoveries in the area sequence data, due to the 3D nature of protein structure, less information is available for it. Nonetheless, data can be accessed through the RCSB Protein Data Bank at (http://www.pdb.org), SCOP-Structural Classification of Proteins- at ([13]), and CATH at ([14]).
Frequently Used
Also, species specific databases are also available for some species, mainly those that are often used in research. These databases provided extensive detail for the species in question. For example, Colibase ([15]) is an E. coli database. Other popular species specific databases include, Flybase ([16]) for drosophila, and Wormbase ([17]) for nematodes.
Harvesting
It is impossible to attain all the necessary information in one place with the large amount of information present. However, there is a web-site which is working on doing just that. There web-page can be found at, http://harvester.embl.de/.
Be Careful
Note, with the large amount of information available, one must be wary of false data.
References
- ^ Altman RB (March 2004). "Building successful biological databases". Brief. Bioinformatics 5 (1): 4–5. doi:. PMID 15153301. http://bib.oxfordjournals.org/cgi/pmidlookup?view=long&pmid=15153301.
- ^ Bourne P (August 2005). "Will a biological database be different from a biological journal?". PLoS Comput. Biol. 1 (3): 179–81. doi:. PMID 16158097.
http://www.avatar.se/molbioinfo2001/databases.html</ref>
See also
- Biobank
- Gene bank
- NCBI
- dbSNP
- PubMed
- Interactome
- Biological data
- MetaBase
- Virtual library of biology
|
|||||
External links
- Genome Proteome Search Engine to search across biological databases
- DBD: Database of Biological Databases
- CAMERA Cyberinfrastructure for Metagenomics, free data repository and bioinformatics tools for metagenomics.
This entry is from Wikipedia, the leading user-contributed encyclopedia. It may not have been reviewed by professional editors (see full disclaimer)




