Benchmark_GASS module - README This module include two separate experiments: NOS and TRYPSIN_like, as described below: ====== NOS experiment ===== data/ --- PDB list, extracted positions of train and test sites, and processed files for training and evaluating the NOS models PDB list of train and test datasets test_neg_PDBs.txt train_neg_PDBs.txt train_pos_PDBs.txt ptf/ : Extracted positions of train and test sites numpy/ and pytables/ : numpy arrays and pytables of extracted local protein boxes in voxel representation code/ --- Extract all sites of defined amino acid type from PDB structures get_all_res_for_defined_type_from_pdb_NOS_test.py Extracting test sites boxes cut_box_site_NOS_detect.py Store extracted examples as pytables store_pytable_NOS.py Definition and training and of 3DCNN models FSCNN_sulfur_NOS.py layers.py Evaluate test site probabily scores using trained 3DCNN models post_train_eval_FSCNN_NOS_detect.py Record representative probabilty score for each site and generate summary results files for NOS site detection get_NOS_fold_pos_pdb.py Evaluate AUC scores for classifying positive NOS structures against random structures at residue site and structure levels, respectively AUC_ROC.py results/ --- detect_prob/ : Predicted probability scores of test sites detect_results/ : Summary of detection results using probability scores from detect_prob Final summary of detection results are written in detection_summary_true_sites.txt Each row in the results file corresponding to a true site annotated in CSA, the final column record our detection result of the site -- "True" represents a successful detection and "False" represents an unsuccessful detection detection_summary_fp_sites_neg_pdb.txt detection_summary_fp_sites_pos_pdb.txt weights/ : Trained model weights ====== TRYPSIN_like ====== data/ --- PDB files, extracted positions of test sites (ptf files) and processed files for evaluating the TRYPSIN_like models pdb/: PDB files of TRYPSIN_like enzymes ptf/: Extracted positions of test sites numpy/: numpy arrays and pytables of extracted local protein boxes in voxel representation SCOP PDB list and CSA annotation of TRYPSIN_like families: SCOP_PDB.txt csa_pdb_HIS_SER_annotation.txt SCOP_PDB_csa_key_res.txt code/ Extracting test sites boxes cut_box_site_TRYPSIN.py Evaluate test site probabily scores using trained 3DCNN models layers.py eval_prob_score_numpy.py Record representative probabilty score for each site and generate summary results files for TRYPSIN_like site detection eval_TRYPSIN_detection.py results/ Summary files of performances detecting TRYPSIN_like sites in 1,447 enzymes. Final detection results are summarized in detected_TRYPSIN_CSA_site_summary.txt Each row in the results file corresponding to a true site annotated in CSA, the final column record our detection result of the site -- "True" represents a successful detection and "False" represents an unsuccessful detection detected_TRYPSIN_nonCSA_site_summary.txt Sites detected by our models but not annotated in CSA. Literature confirmed that these are true TRYPSIN_like sites. /post_prob: Predicted probability scores of test sites /weights: Trained model weights of TRYPSIN_SER and TRYPSIN_HIS models copied from the PROSITE module