PROSITE module - README data/ --- datasets for training and evaluating the 10 PROSITE functional families data_process/ --- Generate training datasets: cut_box_site_noNMR_std_sulfur.py cut_box_site_fn_curated.py cut_box_site_fp_curated.py store_pytable.py models/ --- Definition and model training: Voxel_3DCNN.py Voxel_SVM.py FEATURE_SVM.py FEATURE_1DCNN.py layers.py utils.py evaluate/ --- After Training: Step (1): PROSITE true positive and true negative dataset: Evaluate probability scores of each test site using the corresponding test fold model from 5 fold cross-validation (Only 3DCNN and 1DCNN have scripts in this step. SVM models directly evaluated and saved probability estimates of the test examples in Voxel_SVM.py and FEATURE_SVM.py) eval_tp_tn_3DCNN.py eval_tp_tn_1DCNN.py Step (2): Using generated probability scores, evaluate precision and recalls of individule site at user specified probabilty threshold. Find probability threshold that results in 0.99 precision level. PR_tp_tn_3DCNN.py PR_tp_tn_1DCNN.py PR_tp_tn_FEATURE_SVM.py PR_tp_tn_Voxel_SVM.py Step (3): Summarize means and standard deviations of precision and recall values at desired probailiy thresholds determined from Step (2) for all functional site. summarize_tp_fn_fold_PR_SD_CNN.py summarize_tp_fn_fold_PR_SD_SVM.py Step (4) - Evaluate probability scores of each PROSTIE false negative and PROSITE positive negative site using trained five fold models eval_fn_1DCNN.py eval_fn_3DCNN.py eval_fn_FEATURE_SVM.py eval_fn_Voxel_SVM.py eval_fp_1DCNN.py eval_fp_3DCNN.py eval_fp_FEATURE_SVM.py eval_fp_Voxel_SVM.py Step (5) - Using generated probability scores, evaluate performance of each functional family at threshold determined from Step (2) PR_fn_1DCNN.py PR_fn_3DCNN.py PR_fn_FEATURE_SVM.py PR_fn_Voxel_SVM.py PR_fp_1DCNN.py PR_fp_3DCNN.py PR_fp_FEATURE_SVM.py PR_fp_Voxel_SVM.py results/ --- Prediction probability scores of each site and the trained model weights for the 10 functional sites - results/weights stores trained weights of the 10 functional site models for each method - results/prob_score contains predicted probability scores generated from Step(1) - results/FN_prob contains predicted probability scores generated from Step(4) - results/FP_prob contains predicted probability scores generated from Step(4) - results/TP_TN_results contains summary files generated from Step (3) - results/FN_results contains summary files generated from Step (5) - results/FP_results contains summary files generated from Step (5)