pic


     Chemical Shift Correlated Protein Database (CSCPDB v.001) contains 18,379,788 chemical shifts (CSs), which are correlated with the secondary structures (a, b, turn, and unassigned) of 5,014 proteins. In order to capture effect of sequence on CSs, we chose to allow CS query based on heptapeptides (HPPs) as an elementary protein sequence unit. The CSs have two origns, which are experimental and theoretical. Experimental CSs were downloaded from BioMegResBank (BMRB)(1) (ftp://ftp.bmrb.wisc.edu/). We used pairwise sequence alignments for the BMRB and the corresponding protein database (PDB)(2) sequences and only entered the CSs that can match to a PDB structure entry. BMRB entries containing less than 50 CSs were excluded. We calculated the average and the standard deviation (stdev) for each set of CSs belonging to a nucleus such as CA of an amino acid and the same secondary structure assignment (PDB file information). The database experimental CSs are those within five stdev from this average. We also corrected some outliner CSs possiblly due to referencing errors in BMRB in a similar way to what was proposed recently by RefDB database (3). We used PDB x-ray structures to calculate theoretical CSs using the SHIFTX program (4), which uses structural coordinates and empirically derived CSs hypersurfaces and semi-classical correction for ring current, solvent effects and hydrogen bond. We calculated the average and the stdev for theoretical CSs for the subset of data as described for experimental CSs. For both types of CSs, the values of the C-terminal residue were excluded.

     For protein structure entries used for theoretical CS calculations, we used the PISCES(5) program to reduce redundancy and maintain maximum structural diversity to cull PDB sequences with following criteria:

  • Sequence percentage identity: <= 25%
  • Resolution: < 3.0
  • R-factor: <0.3
  • Sequence length: > 20
  • Non X-ray entries included, CA-only entries excluded, blasted by PDB by chain

     We annotated the proteins using:

  • Calculated CSs
  • Primary and secondary protein structure
  • PDB connectivity and heteroatom annotation
  • DSSP calculated backbone hydrogen bonding energies and solvent accessibility
  • SCOP(6) homology family classification

     We referenced experimental CSs using the its average CSs of the theoretical CSs and CSCPDB contains the referenced experimental and theoretical CSs.

     The CSCPDB database was implemented in open source MySQL 4.0 (http://www.mysql.com/). Procedures for extracting and correlating information from PDB, BMRB, SCOP and DSSP databases and web interface were constructed with Borland Delphi 7 Professional package (www.borland.com).

References:
1. Seavey, B. R.; Farr, E. A.; Westler, W. M.; Markley, J. L., "A relational database for sequence-specific protein NMR data." J. Biomol. NMR 1991, 1, (3), 217-36.
2. Berman, H. M.; Battistuz, T.; Bhat, T. N.; Bluhm, W. F.; Bourne, P. E.; Burkhardt, K.; Feng, Z.; Gilliland, G. L.; Iype, L.; Jain, S.; Fagan, P.; Marvin, J.; Padilla, D.; Ravichandran, V.; Schneider, B.; Thanki, N.; Weissig, H.; Westbrook, J. D.; Zardecki, C., "The Protein Data Bank." Acta. Crystallogr. D Biol. Crystallogr 2002, 58, (Pt 6 No 1), 899-907.
3. Zhang, H.; Neal, S.; Wishart, D. S., "RefDB: a database of uniformly referenced protein chemical shifts." J. Biomol. NMR 2003, 25, (3), 173-95.
4. Neal, S.; Nip, A. M.; Zhang, H.; Wishart, D. S., "Rapid and accurate calculation of protein 1H, 13C and 15N chemical shifts." J. Biomol. NMR 2003, 26, (3), 215-40.
5. Wang, G.; Dunbrack, R. L., Jr., "PISCES: a protein sequence culling server." Bioinformatics 2003, 19, (12), 1589-91.
6. Murzin, A. G.; Brenner, S. E.; Hubbard, T.; Chothia, C., "SCOP: a structural classification of proteins database for the investigation of sequences and structures." J. Mol. Biol. 1995, 247, (4), 536-40.

 

  Home   |   UofH   |   Biology&Biochemistry   |   Chemistry   |   SeqTools   |   NMRTools

CSCPDB Copyright © 2005