| Chemical Shift Correlated Protein Database (CSCPDB v.001)
contains 18,379,788 chemical shifts (CSs), which are correlated with the
secondary structures (a, b, turn, and unassigned) of 5,014 proteins. In
order to capture effect of sequence on CSs, we chose to allow CS query based
on heptapeptides (HPPs) as an elementary protein sequence unit. The CSs
have two origns, which are experimental and theoretical. Experimental CSs
were downloaded from BioMegResBank (BMRB)(1) (ftp://ftp.bmrb.wisc.edu/). We
used pairwise sequence alignments for the BMRB and the corresponding protein
database (PDB)(2) sequences and only entered the CSs that can match to a PDB
structure entry. BMRB entries containing less than 50 CSs were excluded.
We calculated the average and the standard deviation (stdev) for each set
of CSs belonging to a nucleus such as CA of an amino acid and the same secondary
structure assignment (PDB file information). The database experimental CSs
are those within five stdev from this average. We also corrected some outliner
CSs possiblly due to referencing errors in BMRB in a similar way to what
was proposed recently by RefDB database (3). We used PDB x-ray structures
to calculate theoretical CSs using the SHIFTX program (4), which uses structural
coordinates and empirically derived CSs hypersurfaces and semi-classical
correction for ring current, solvent effects and hydrogen bond. We calculated
the average and the stdev for theoretical CSs for the subset of data as
described for experimental CSs. For both types of CSs, the values of the
C-terminal residue were excluded.
For protein structure entries used for
theoretical CS calculations, we used the PISCES(5) program to reduce redundancy
and maintain maximum structural diversity to cull PDB sequences with following
criteria:
- Sequence percentage identity: <= 25%
- Resolution: < 3.0
- R-factor: <0.3
- Sequence length: > 20
- Non X-ray entries included, CA-only entries excluded, blasted by PDB by chain
We annotated the proteins using:
- Calculated CSs
- Primary and secondary protein structure
- PDB connectivity and heteroatom annotation
- DSSP calculated backbone hydrogen bonding energies and solvent accessibility
- SCOP(6) homology family classification
We referenced experimental CSs
using the its average CSs of the theoretical CSs and CSCPDB contains the
referenced experimental and theoretical CSs.
The CSCPDB database was implemented in open
source MySQL 4.0 (http://www.mysql.com/). Procedures for extracting and
correlating information from PDB, BMRB, SCOP and DSSP databases and web
interface were constructed with Borland Delphi 7 Professional package (www.borland.com).
References: 1. Seavey, B. R.; Farr, E. A.; Westler, W. M.; Markley,
J. L., "A relational database for sequence-specific protein NMR data."
J. Biomol. NMR 1991, 1, (3), 217-36. 2. Berman, H. M.; Battistuz, T.; Bhat, T. N.; Bluhm, W. F.; Bourne, P. E.; Burkhardt, K.;
Feng, Z.; Gilliland, G. L.; Iype, L.; Jain, S.; Fagan, P.; Marvin, J.; Padilla,
D.; Ravichandran, V.; Schneider, B.; Thanki, N.; Weissig, H.; Westbrook,
J. D.; Zardecki, C., "The Protein Data Bank." Acta. Crystallogr. D Biol.
Crystallogr 2002, 58, (Pt 6 No 1), 899-907. 3. Zhang, H.;
Neal, S.; Wishart, D. S., "RefDB: a database of uniformly referenced
protein chemical shifts." J. Biomol. NMR 2003, 25, (3),
173-95. 4. Neal, S.; Nip, A. M.; Zhang, H.; Wishart, D. S., "Rapid and
accurate calculation of protein 1H, 13C and 15N chemical shifts." J. Biomol. NMR 2003, 26, (3), 215-40.
5. Wang, G.; Dunbrack, R.
L., Jr., "PISCES: a protein sequence culling server." Bioinformatics
2003, 19, (12), 1589-91. 6. Murzin, A. G.; Brenner, S. E.;
Hubbard, T.; Chothia, C., "SCOP: a structural classification of proteins
database for the investigation of sequences and structures." J. Mol. Biol. 1995, 247, (4), 536-40. |