PDB {SMRUCC.genomics.Data.RCSB.PDB} .NET clr documentation

PDB

Description

The RCSB PDB file format is a standardized text-based format used to represent 3D structural data of biological macromolecules, such as proteins, nucleic acids, and viruses. Managed by the Research Collaboratory for Structural Bioinformatics (RCSB), it serves as the primary format for entries in the Protein Data Bank (PDB), a global repository for experimentally determined structures. Below is a detailed introduction:


### Key Features 1. Text-Based Structure: Plain text file (`.pdb` extension) with a fixed-column format, meaning data is organized into specific columns for consistency. Each line begins with a record type (e.g., `ATOM`, `HETATM`, `HEADER`) that defines the data it contains. 2. Core Components: - Atomic Coordinates: Stored in `ATOM` (standard residues) and `HETATM` (heteroatoms, e.g., water, ligands) records. - Metadata: Includes details like the title (`TITLE`), experimental method (`EXPDTA`), authors (`AUTHOR`), and biological source (`SOURCE`). - Sequence Information: Provided in `SEQRES` lines. - Secondary Structure: Annotated in `HELIX`, `SHEET`, and `TURN` records. - Connectivity: Bonds between atoms are listed in `CONECT` lines. - Crystallographic Data: Unit cell parameters (`CRYST1`), symmetry operations, and resolution. 3. Example ATOM/HETATM Line:

    ATOM   2301  CA  SER A 301      26.417  24.105  34.560  1.00 30.97           C  
    HETATM 9101  O   HOH A 910      10.500  20.100  30.500  1.00 25.00           O  
- Columns 1-6: Record type (e.g., `ATOM`). - Columns 7-11: Atom serial number. - Columns 13-16: Atom name (e.g., `CA` for alpha carbon). - Columns 17-20: Residue name (e.g., `SER` for serine). - Column 22: Chain identifier (e.g., `A`). - Columns 23-26: Residue number. - Columns 31-54: X, Y, Z coordinates. - Columns 55-60: Occupancy and temperature factor (B-factor). - Columns 77-78: Element symbol (e.g., `C`, `O`).

### Common Record Types

RecordDescription


`HEADER`Molecular type, deposition date, and PDB ID (e.g., `1ABC`).
`TITLE`Title of the structure.
`COMPND`Molecular components in the entry (e.g., protein, ligand, ion).
`SEQRES`Amino acid/nucleotide sequence of the macromolecule.
`ATOM`3D coordinates of standard residues (e.g., amino acids in a protein).
`HETATM`Coordinates of heteroatoms (non-standard residues: ligands, water, ions).
`HELIX`Details of α-helices.
`SHEET`Details of β-sheets.
`CONECT`Bonds between atoms not covered by standard residue templates.
`REMARK`Annotations, experimental details, or warnings.

### Limitations - Column Width Restrictions: Legacy format limits data fields (e.g., residue numbers up to 9999, atom serial numbers up to 99,999). - Sparse Connectivity Data: Bonds are often inferred rather than explicitly listed. - No Support for Large Structures: Superseded by the mmCIF/PDBx format (more flexible, supports larger datasets).


### Modernization: mmCIF/PDBx Format The PDB now prioritizes the mmCIF format (Macromolecular Crystallographic Information File), which uses a flexible, key-value-based structure without column limits. Legacy PDB files are automatically converted to mmCIF for archiving.


### Tools for Viewing/Editing - Visualization: PyMOL, Chimera, VMD, RCSB PDB Viewer. - Analysis: BioPython, MDAnalysis. - Database Access: RCSB PDB website (search, download, and explore entries).


### Example PDB File Snippet

 HEADER    HYDROLASE                             15-JUL-98   1ABC              
 TITLE     CRYSTAL STRUCTURE OF EXAMPLE ENZYME                                 
 COMPND    MOL_ID: 1;                                                           
 COMPND   2 MOLECULE: EXAMPLE ENZYME; CHAIN: A;                                 
 SEQRES   1 A  321  SER GLY LEU ARG TYR ...                                      
 ATOM      1  N   SER A   1      10.000  20.000  30.000  1.00 25.00           N  
 ATOM      2  CA  SER A   1      11.000  21.000  31.000  1.00 26.00           C  
 HETATM 1001  O   HOH A 1001     40.000  50.000  60.000  1.00 30.00           O  
 HELIX    1  ALA A 10 THR A 20  1                                            
 CONECT 1001 1002

### Use Cases - Studying protein-ligand interactions. - Analyzing enzyme active sites. - Visualizing mutations in diseases. - Teaching structural biology concepts. For more details, visit the RCSB PDB and explore entries like 1ATP.

pdb file is the struct data about a protein complex, one pdb file may includes multiple protein and metabolite compound data.

Declare

            
# namespace SMRUCC.genomics.Data.RCSB.PDB
export class PDB {
   ANISOU: ANISOU;
   # Populate out the multiple structure models inside current pdb data file
   AtomStructures: iterates(Atom);
   Author: Author;
   CAVEAT: CAVEAT;
   CISPEP: CISPEP;
   Compound: Compound;
   Conect: CONECT;
   crystal1: CRYST1;
   DbRef: DbReference;
   Experiment: ExperimentData;
   Formula: Formula;
   Header: Header;
   Helix: Helix;
   Het: Het;
   HetName: HetName;
   HETSYN: HETSYN;
   Journal: Journal;
   Keywords: Keywords;
   Links: Link;
   Master: Master;
   Matrix1: MTRIX123;
   Matrix2: MTRIX123;
   Matrix3: MTRIX123;
   MaxSpace: Point3D;
   MDLTYP: MDLTYP;
   MinSpace: Point3D;
   MODRES: MODRES;
   # number of models inside current pdb file
   NUMMDL: NUMMDL;
   Origin1: ORIGX123;
   Origin2: ORIGX123;
   Origin3: ORIGX123;
   Remark: Remark;
   Revisions: Revision;
   Scale1: SCALE123;
   Scale2: SCALE123;
   Scale3: SCALE123;
   seqadv: SEQADV;
   Sequence: Sequence;
   Sheet: Sheet;
   SIGATM: SIGATM;
   SIGUIJ: SIGUIJ;
   Site: Site;
   Source: Source;
   # the input data text of this pdb object
   SourceText: string;
   SPLIT: SPLIT;
   SPRSDE: SPRSDE;
   SSBOND: SSBOND;
   Title: Title;
}

        

.NET clr type reference tree

  1. use by property member ANISOU: ANISOU
  2. use by property member AtomStructures: iterates(Atom)
  3. use by property member Author: Author
  4. use by property member CAVEAT: CAVEAT
  5. use by property member CISPEP: CISPEP
  6. use by property member Compound: Compound
  7. use by property member Conect: CONECT
  8. use by property member crystal1: CRYST1
  9. use by property member DbRef: DbReference
  10. use by property member Experiment: ExperimentData
  11. use by property member Formula: Formula
  12. use by property member Header: Header
  13. use by property member Helix: Helix
  14. use by property member Het: Het
  15. use by property member HetName: HetName
  16. use by property member HETSYN: HETSYN
  17. use by property member Journal: Journal
  18. use by property member Keywords: Keywords
  19. use by property member Links: Link
  20. use by property member Master: Master
  21. use by property member Matrix1: MTRIX123
  22. use by property member Matrix2: MTRIX123
  23. use by property member Matrix3: MTRIX123
  24. use by property member MaxSpace: Point3D
  25. use by property member MDLTYP: MDLTYP
  26. use by property member MinSpace: Point3D
  27. use by property member MODRES: MODRES
  28. use by property member NUMMDL: NUMMDL
  29. use by property member Origin1: ORIGX123
  30. use by property member Origin2: ORIGX123
  31. use by property member Origin3: ORIGX123
  32. use by property member Remark: Remark
  33. use by property member Revisions: Revision
  34. use by property member Scale1: SCALE123
  35. use by property member Scale2: SCALE123
  36. use by property member Scale3: SCALE123
  37. use by property member seqadv: SEQADV
  38. use by property member Sequence: Sequence
  39. use by property member Sheet: Sheet
  40. use by property member SIGATM: SIGATM
  41. use by property member SIGUIJ: SIGUIJ
  42. use by property member Site: Site
  43. use by property member Source: Source
  44. use by property member SPLIT: SPLIT
  45. use by property member SPRSDE: SPRSDE
  46. use by property member SSBOND: SSBOND
  47. use by property member Title: Title

[Package {$package} version {$version} Index]