Course Resources



AmiGO is the ontological browser for Gene Ontology.

http://www.godatabase.org

Gene Ontology Lecture

Gene Ontology: Hands-on Annotation Workshop



ArrayExpress is a public repository for microarray data, which is aimed at storing well annotated data in accordance with MGED recommendations.

http://www.ebi.ac.uk/arrayexpress/

ArrayExpress Lecture

 

ArrayExpress Hands-On Workshop



BioMart is a federated Open Source search tool which rapidly permits queries against large volumes of biological data. It has been designed to provide researchers with an easy and interactive access to both the wealth of data available on the Internet and for in house data integration.

http://www.ebi.ac.uk/biomart

Introduction to BioMart Lecture

Using BioMart Hands-On Workshop

 



Sequence alignments provide a powerful way to compare novel sequences with previously characterized genes. Both functional and evolutionary information can be inferred from well designed queries and alignments. BLAST 2.0, (Basic Local Alignment Search Tool), provides a method for rapid searching of nucleotide and protein databases.

http://www.ncbi.nlm.nih.gov/BLAST/

 

NCBI Field Guide

BLAST QuickStart Lecture and Hands-on Workshop

Making Sense of DNA and Protein Sequences Lecture and Hands-On Workshop

Structural Analysis QuickStart and Hands-On Workshop

BLAST—Beyond Point and Click,
an advanced Hands-On Workshop

 

Cn3D

Cn3D is a helper application for your web browser that allows you to view 3-dimensional structures from NCBI's Entrez retrieval service. Cn3D runs on Windows, Macintosh, and Unix. Cn3D simultaneously displays structure, sequence, and alignment, and now has powerful annotation and alignment editing features.

http://www.ncbi.nlm.nih.gov/Structure/CN3D/cn3d.shtml

 

NCBI Field Guide

Making Sense of DNA and Protein Sequences Lecture and Hands-On Workshop

Structural Analysis QuickStart and Hands-On Workshop



Proteins often contain several modules or domains, each with a distinct evolutionary origin and function. NCBI's Conserved Domain Database is a collection of multiple sequence alignments for ancient domains and full-length proteins.

http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml

 

 

NCBI Field Guide

Making Sense of DNA and Protein Sequences Lecture and Hands-On Workshop

Structural Analysis QuickStart Lecture and Hands-On Workshop



EMBOSS is an open source software suite of over 200 applications for the in silico analysis of biological problems ranging from nucleic acid and protein sequence analysis through the creation and indexing of your own data.

http://emboss.sourceforge.net

Introduction to EMBOSS Lecture

Programming with EMBOSS Workshop

 

 



Ensembl is a eukaryotic comparative genome browser resource. Currently available are 14 completed eukaryotic genomes. Also available are the genomic resources for the Cow and Opossum which are currently being assembled. Both assembled and pre-assembly databases are available for the latter two organisms.

http://www.ensembl.org

Ensembl Lecture

Ensembl Hands-On Workshop



The NCBI Entrez portal provides integrated access to nucleotide and protein sequence data from >130,000 organisms, along with 3D protein structures, genomic mapping information, PubMed MEDLINE, and more. Sequence data are combined from various sources, including GenBank, EMBL, DDBJ, RefSeq, PIR-International, PRF, Swiss-Prot, and PDB. Entrez can be searched with a wide variety of text terms such as author name, journal name, gene or protein name, organism, unique identifier (e.g., accession number, sequence ID, PubMed ID), and other terms, depending on the database being searched.

http://www.ncbi.nlm.nih.gov/Entrez/

 



Expression Profiler is a newly created Open Source resource is a web-based environment for the analysis of, mainly, two types of data: gene expression and sequences. The system is designed to be extensible to other data types - currently protein-protein interaction (PPI) data support is being added.

http://www.ebi.ac.uk/expressionprofiler/

 

ArrayExpress Lecture

ArrayExpress Hands-On Workshop

GenBank

GenBank ® is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences.

http://www.ncbi.nlm.nih.gov/Genbank/

 

NCBI Field Guide

Making Sense of DNA and Protein Sequences Lecture and Workshop

 



Recently replacing LocusLink, GENE is a major access point to NCBI’s databases and sequence information, along with MapViewer (chromosome-related access point) and Entrez. (text-based access). Gene, as it name implies, provides a gene-based view of the data from a wide range of genomes. It supplies key connections in the nexus of map, sequence, expression, structure, functional, and homology data. Each record represents a single gene from a given organism. The minimum set of data in a gene record includes a unique identifier or GeneID assigned by NCBI, a preferred symbol, and any of sequence information, map information, or official nomenclature from an authority list. In addition, a gene record can also include expression, structure, functional, and homology data, when available.

http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene

 

NCBI Field Guide

Entrez GENE QuickStart Lecture and Workshop



Gene 3D is a database member of EBI’s InterProScan consolidated proteomic resources which will be covered in the Proteomics Workshop/lecture series during Bioinformatics Week. It contains over 850,000 protein sequences from completed genomes, clustered into protein families and annotated with CATH domains.

http://www.biochem.ucl.ac.uk/bsm/cath/Gene3D/

 

Proteomics Lecture

Proteomics Hands-On Workshop

(through InterPro)

 



NCBI provides several genomic biology tools and resources, including organism-specific pages that include links to many web sites and databases relevant to that species, including incomplete genome assembly projects. Completed genomes are available for through resource-specific portals, including for Plant Genome Central, virus, microbial resources, plasmids, organelles, SARS, and Influenza to name a few.

http://www.ncbi.nlm.nih.gov/Genomes/

 

NCBI Field Guide



Gene Ontology is a controlled vocabulary that can be applied to all organisms even as the knowledge of gene and protein roles is changing.

http://www.geneontology.org

 

Gene Ontology Lecture

Gene Ontology: Hands-on Annotation Workshop

 



IntAct provides a freely available, open source database system and analysis tools for protein interaction data. All interactions are derived from literature curation or direct user submissions and are freely available.

http://www.ebi.ac.uk/intact/index.jsp

 

Proteomics Lecture

Proteomics Hands-On Workshop

(through InterPro)

 



InterPro is a consolidated database of protein families, domains and functional sites in which identifiable features found in known proteins can be applied to unknown protein sequences.

http://www.ebi.ac.uk/interpro/index.html

 

Proteomics Lecture

Proteomics Hands-On Workshop

 



NCBI provides different ways for scientists to access and view sequence-related information. The Map Viewer is a powerful graphical interface which supports search and display of genomic information and expression by chromosomal position. Regions of interest can be retrieved by text queries (e.g. gene or marker name) or by sequence alignment (BLAST). View results at the whole genome level, and select what to display in more detail. Multiple options exist to configure your display, download data, navigate to related data, and analyze supporting information using the tools provided.

http://www.ncbi.nlm.nih.gov/mapview/

 

NCBI Field Guide

MMDB

NCBI's structure database is called MMDB (Molecular Modeling DataBase), and it is a subset of three-dimensional structures obtained from the Protein Data Bank. It was designed for flexibility, and as such, is capable of archiving conventional structural data as well as future descriptions of biomolecules, such as those generated by electron microscopy (surface models).

http://www.ncbi.nlm.nih.gov/Structure/MMDB/mmdb.shtml

 

NCBI Field Guide



The EBI Macromolecular Structure Database is an European project for the collection, management and distribution of data about macromolecular structures, derived in part from the Protein Data Bank (PDB).

http://www.ebi.ac.uk/msd/index.html

 

Proteomics Lecture

Proteomics Hands-On Workshop

(through InterPro)



The PANTHER ( Protein ANalysis THrough Evolutionary Relationships) Classification System was designed to classify proteins (and their genes) modeled on the divergence of function.

https://panther.appliedbiosystems.com/

 

Proteomics Lecture

Proteomics Hands-On Workshop

(through InterPro)



Pfam is a collection of protein families and domains. Pfam contains multiple protein alignments of these families. Pfam is a semi-automatic protein family database, which aims to be comprehensive as well as accurate.

http://www.sanger.ac.uk/Software/Pfam/index.shtml

 

Proteomics Lecture

Proteomics Hands-On Workshop

(through InterPro)

 



The Protein Information Resource SuperFamily (PIRSF) is a classification system based on evolutionary relationship of whole proteins

http://pir.georgetown.edu/pirsf/

 

Proteomics Lecture

Proteomics Hands-On Workshop

(through InterPro)



PRINTS
is a compendium of protein fingerprints. A fingerprint is a group of conserved motifs used to characterize a protein family. Fingerprints can encode protein folds and functionalities more flexibly and powerfully than can single motifs, full diagnostic potency deriving from the mutual context provided by motif neighbors.

http://umber.sbs.man.ac.uk/dbbrowser/PRINTS/

 

Proteomics Lecture

Proteomics Hands-On Workshop

(through InterPro)



ProDom is a comprehensive set of protein domain families automatically generated from the SWISS-PROT and TrEMBL sequence databases.

http://prodes.toulouse.inra.fr/prodom/current/html/home.php

 

Proteomics Lecture

Proteomics Hands-On Workshop

(through InterPro)



Prosite is a database of protein families and domains. It consists of biologically significant sites, patterns and profiles that help to reliably identify to which known protein family (if any) a new sequence belongs.

http://us.expasy.org/prosite/

 

Proteomics Lecture

Proteomics Hands-On Workshop

(through InterPro)

 



The Reference Sequence (RefSeq) collection aims to provide a comprehensive, integrated, non-redundant set of sequences, including genomic DNA, transcript (RNA), and protein products, for major research organisms.

http://www.ncbi.nlm.nih.gov/RefSeq/

 

NCBI Field Guide



SCOP, a Structural Classification Of Proteins, is a database which aims to provide a provides a broad survey of all known protein folds to facilitate a comprehensive description of the structural and evolutionary relationships between all proteins whose structure is known.

http://scop.mrc-lmb.cam.ac.uk/scop/

 

Proteomics Lecture

Proteomics Hands-On Workshop

(through InterPro)



SRS, the Sequence Retrieval System, is a comprehensive web-based, cross-database search interface to more than 150 resources covering information related to protein and nucleic sequence information. This includes the biological, clinical and patent literature, sequence and mutation databanks, biological resource catalogues holding cell line information, metabolic pathways and more. For a complete listing of searchable databanks available through SRS which will be part of the lecture/workshop offered during Bioinformatics Week, see here (http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-page+top+-newId)

http://srs.ebi.ac.uk/srsbin/cgi-bin/wgetz?-page+srsq2+-noSession

 

Sequence Retrieval System (SRS) Lecture

Sequence Retrieval System (SRS) Hands-On Workshop



SNP stands for "single nucleotide polymorphism".   A key aspect of research in genetics is associating sequence variations with heritable phenotypes. The most common variations are single nucleotide polymorphisms (SNPs), which occur approximately once every 100 to 300 bases. In collaboration with the National Human Genome Research Institute, The National Center for Biotechnology Information has established the dbSNP database to serve as a central repository for both single base nucleotide subsitutions and short deletion and insertion polymorphisms.

http://www.ncbi.nlm.nih.gov/SNP/

 

NCBI Field Guide



SMART (Simple Modular Architecture Research Tool) is a web tool for the identification and annotation of protein domains. It provides a platform for the comparative study of complex domain architectures in genes and proteins.

http://smart.embl-heidelberg.de/

 

 

Proteomics Lecture

Proteomics Hands-On Workshop

(through InterPro)



The purpose of Superfamily is to provide structural (and hence implied functional) assignments to protein sequences at the superfamily level. A superfamily contains all proteins for which there is structural evidence of a common evolutionary ancestor.

http://supfam.mrc-lmb.cam.ac.uk/SUPERFAMILY/

 

Proteomics Lecture

Proteomics Hands-On Workshop

(through InterPro)



TIGRFAMs are protein families based on Hidden Markov Models or HMMs, created by TIGR, The Institute for Genomic Research.

http://www.tigr.org/TIGRFAMs/index.shtml

 

Proteomics Lecture

Proteomics Hands-On Workshop

(through InterPro)



UniGene is an experimental system for automatically partitioning GenBank sequences into a non-redundant set of gene-oriented clusters. Each UniGene cluster contains sequences that represent a unique gene, as well as related information such as the tissue types in which the gene has been expressed and map location.

http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=unigene

 

NCBI Field Guide



UniProt (Universal Protein Resource) is the world's most comprehensive catalogue of information on proteins. It is a central repository of protein sequence and function.. Created by merging the data in Swiss-Prot, TrEMBL and PIR-PSD, individual UniProt Knowledgebase entries may contain more information than was available in any given separate source database.

http://www.ebi.ac.uk/uniprot/

 

Proteomics Lecture

Proteomics Hands-On Workshop

(through InterPro)

Last Updated: September 13, 2007