########################### # README FOR ZifNet # ########################### Copyright Jiajian and Gary Stormo May be copied for noncommercial purposes. Author: Jiajian Liu and Gary Stormo Department of Genetics Washington University in St. Louis Campus Box 8232 St. Louis, MO 63110 Updated: September, 2008 stormo@ural.wustl.edu jjliu@ural.wustl.edu ############## #Introduction# ############## ZifNet is the package that can be used to predict the DNA binding model for any given a C2H2 zinc finger based on back-propagation algorithm. It includes two parts: First, identify the optimal DNA-zinc finger interaction model with the core C programs of ZifNet. And second, predict DNA-binding weight matrix models for Zinc fingers using the auxiliary codes, and the defined DNA-zinc finger interaction model derived from the last step. The input encoding, output encoding, network graph structure, and back-propagation algorithm for our multiple neural net can be found in our paper (Bioinformatics, 24:1850-7, 2008). To overcome the overfitting problem, we used the cross-validation procedure to identify the optimal model for DNA-zinc finger interactions. In the process of learning, the performances for the training dataset, test dataset and validation set are reported at each learning epoch with regard to accuracy, sensitivity, specificity, positive probability value, negative probability value and correlation. The best model is chosen based on the predictive performance in validation set. After the optimal DNA-zinc finger interaction model is determined, we can then use the auxiliary PERL programs to identify the weight matrix models for the given Zinc finger proteins. The details about our method can be found from our published paper (Bioinformatics, 14:1850-7, 2008). ############################################# #Installation and Compliation for c codes: # ############################################# 1)download both core and auxiolary programs 2)gunzip zifNet.tar.gz 3)tar -xvf zifNet.tar 4)gcc -g -Wall recognitiontrain_320_v4.c protein_DNA_recognition_v8.c protein_DNA_load_ann_320_3.c backprop_v9.c -lm -o ZifNet Usage: ./zifnet -n weightmodel -t traningdata -1 testdat -2 validationdata -e learning_epoch_num -h hidden_unit_num -t -1 -2 -e < number of training ropches> -h < number of hidden unit> -n < name of NN model> ########################################################### # Prediction of DNA binding profiles with th defined model# ########################################################### 1)download auxiliary programs 2)gunzip *.tar.gz 3)tar -xvf *.tar command: perl run_generate_prediction_code_pipeline_new.pl seqFile result_dir weightmodel - seqFile: each line in this file contains amino acid residues at -1, +3 and +6 for a zinc finger; - result_dir: name of directory used to store the predicted results - weightmodel:the defined neural net modelforDNA-zinc finger interactions It includes 2 binary c codes and 5 short perl codes in this pipeline a) two c binary codes a1)generate_64_sequences, a2)compute_DNA_preference_for_given_protein_mnn_1_240nodes, b) five perl programs b1) convet_energymatrix.pl b2) generate_interactionPairs_order_based_scores.pl b3) make_seqFile_from_sortedfile.pl b4) make_weightmatrix_from_inputSeq_alignmentcount.pl b5) make_weightmatrix_from_inputSeq_pseudocount.pl