Benchmark_GASS module - README This module include two separate experiments: NOS and TRYPSIN_like, as described below: ====== NOS experiment ===== data/ --- PDB list, extracted positions of train and test sites, and processed files for training and evaluating the NOS models PDB list of train and test datasets test_neg_PDBs.txt train_neg_PDBs.txt train_pos_PDBs.txt ptf/ : Extracted positions of train and test sites numpy/ and pytables/ : numpy arrays and pytables of extracted local protein boxes in voxel representation code/ --- Extract all sites of defined amino acid type from PDB structures Extracting test sites boxes Store extracted examples as pytables Definition and training and of 3DCNN models Evaluate test site probabily scores using trained 3DCNN models Record representative probabilty score for each site and generate summary results files for NOS site detection Evaluate AUC scores for classifying positive NOS structures against random structures at residue site and structure levels, respectively results/ --- detect_prob/ : Predicted probability scores of test sites detect_results/ : Summary of detection results using probability scores from detect_prob Final summary of detection results are written in detection_summary_true_sites.txt Each row in the results file corresponding to a true site annotated in CSA, the final column record our detection result of the site -- "True" represents a successful detection and "False" represents an unsuccessful detection detection_summary_fp_sites_neg_pdb.txt detection_summary_fp_sites_pos_pdb.txt weights/ : Trained model weights ====== TRYPSIN_like ====== data/ --- PDB files, extracted positions of test sites (ptf files) and processed files for evaluating the TRYPSIN_like models pdb/: PDB files of TRYPSIN_like enzymes ptf/: Extracted positions of test sites numpy/: numpy arrays and pytables of extracted local protein boxes in voxel representation SCOP PDB list and CSA annotation of TRYPSIN_like families: SCOP_PDB.txt csa_pdb_HIS_SER_annotation.txt SCOP_PDB_csa_key_res.txt code/ Extracting test sites boxes Evaluate test site probabily scores using trained 3DCNN models Record representative probabilty score for each site and generate summary results files for TRYPSIN_like site detection results/ Summary files of performances detecting TRYPSIN_like sites in 1,447 enzymes. Final detection results are summarized in detected_TRYPSIN_CSA_site_summary.txt Each row in the results file corresponding to a true site annotated in CSA, the final column record our detection result of the site -- "True" represents a successful detection and "False" represents an unsuccessful detection detected_TRYPSIN_nonCSA_site_summary.txt Sites detected by our models but not annotated in CSA. Literature confirmed that these are true TRYPSIN_like sites. /post_prob: Predicted probability scores of test sites /weights: Trained model weights of TRYPSIN_SER and TRYPSIN_HIS models copied from the PROSITE module