User guide for ProCon
ProCon is tool for locating and visualization of evolutionary conservation in protein sequences. The method can identify three types of conservation, namely identity (type I), physicochemical similarity (II), and covariant conservation (III). The conservative sites of type I and II are located with entropy calculation and the third type is identified by calculation of mutual information. The interacting networks formed by covariant pairs can also be identified. All the three types of conservation can be visualized in a representative protein structure. The tool performs exhaustive analysis results of which can be used e.g. for identifying different types of conservative residues, studying protein-protein interactions, explaining consequences of disease-causing mutations and mutant design for protein engineering.
1 Configure java runtime environment
Download program first (Versions available for Windows and MAC).
To run the program, please make sure that the jre1.6 is installed before hand. (You may download it from here)
|
2 Run ProCon System
-
release the downloaded file ProCon-V1.1-windows.rar to current file route.
-
the source file is in the bin document.
-
double click the "ProCon-V1.1-Windows.jar" file, the graphic user interface will be shown:
|

Graphic user interface of ProCon
|
3 Import a FASTA sequence file
Click the menu "File->open a FASTA file "( or use shortcut Ctrl + O) to open the file wizard. Then choose a FASTA format file to open. (An example file called "input.txt" is available in the "examples" document. )
|
Open a FASTA format sequence
|
4 Configure parameters
Click the menu "settings->Gap Percentage" to set this parameter, the default value is 30%.
|
Set gap percentage parameter.
|
Click the menu "settings->P values" to choose the p1 and p2 parameters,the default values are 0.01 and 0.05.
|
Set p value parameters. After setting the parameters, click the button "Apply". The system will calculate results according to new parameters.
|
5 View the results
There are seven tag pages, the first page is "sequence" details. The chosen sequence is shown and applied as reference. The default reference is the first sequence.
|
Sequence page
The "entropy" page displays the information at different positions using column charts. The red columns show information for 20 alphabets while blue ones show information for 6 alphabets. The green columns below show gap frequency at corresponding positions.
|
|
Entropy page.
The "aa distribution" page displays amino acid distribution in one position using column charts. The top diagram shows results based on 20 amino acids and the diagram below shows results based on 6 alphabets. The scroll bar in the bottom is used to choose positions.
|
|
aa distribution page.
The "covariant aa" page displays the covariant amino acid pairs with maximum mutual information values. The details are shown in both tables and diagrams.
|
|
Covariant aa page.
The "triplets" page shows the triplets corresponding to different p values. The associations of these sites are shown in a diagram and stored in a file in the "output" document as well.
|
|
Triplets page.
For the "structure" page, a PDB ID is required. The protein structure is downloaded form PDB database.
|
|
Protein structure download page
|
|
Highlighted triplet in protein structure


|