DurandLab Schema

When we describe the structure of a relational database, we refer to the schema -- the set of tables, fields and the relationships between them that allow us to organize our data. The schema might take the form of pictures, descriptions, or code to generate the database skeleton. This document provides a little of each to help you design code, queries, and new datastructures around the DurandLab databases.

Although we've broken the structure into several schemas for many data sources, each can access the tables of the others and should be viewed as a single interconnected database.

Note: Since we're constantly trying to find ways to bring in new biological data, these schemas will be changing very often, and new ones will be added! The best way to explore the DBs once you have the general idea is to fire up a client like mysqlcc and explore.

CDART


CDART is a tool by NCBI extracting protein domain information about sequences contained in NCBI's RefSeq database, by detecting domain models (PSSMs) within protein sequences (PIGs). We use this data to generate the domain architecture information in DurandLab.

SwissProt


SwissProt is meant to be a non-redundant database of well-studied protein sequences. In addition to sequence, it contains a great deal of meta-data about each protein including journal references, and expression data. SwissProt now forms the majority of our sequence data, but was imported primarily for using the database cross-referencing features to eliminate redundant proteins and determine which protein sequences are well-studied and complete. You can find python code to interact with SwissProt in the SwissProt module.

GenBank


NCBI GenBank is probably the best known sequence repository. We don't currently use sequence information from there, but we're importing tables as necessary to help us examine NCBI identifiers.

DurandLab


DurandLab is our database for information generated locally that doesn't fit directly with a single other database. In time, it should probably be broken up into seperate databases based on project or function. You can find code to interact with it in the DurandLab module.

Yeast DataSets

Other Sources



back home
bobsedge@andrew.cmu.edu