Query         035532
Match_columns 112
No_of_seqs    118 out of 829
Neff          6.2 
Searched_HMMs 46136
Date          Fri Mar 29 03:47:07 2013
Command       hhsearch -i /work/01045/syshi/csienesis_hhblits_a3m/035532.a3m -d /work/01045/syshi/HHdatabase/Cdd.hhm -o /work/01045/syshi/hhsearch_cdd/035532hhsearch_cdd -cpu 12 -v 0 

 No Hit                             Prob E-value P-value  Score    SS Cols Query HMM  Template HMM
  1 PF00031 Cystatin:  Cystatin do  99.7 1.1E-17 2.4E-22  110.0   8.9   61   29-100     1-61  (94)
  2 cd00042 CY Substituted updates  99.7 9.1E-17   2E-21  107.6   9.5   59   29-99      1-59  (105)
  3 smart00043 CY Cystatin-like do  99.7 5.6E-17 1.2E-21  109.1   6.9   63   27-100     2-64  (107)
  4 PF07430 PP1:  Phloem filament   99.3 3.2E-12 6.9E-17   95.5   7.0   74   25-112   111-186 (202)
  5 TIGR01638 Atha_cystat_rel Arab  98.5 3.1E-07 6.6E-12   61.7   6.3   54   39-103    11-64  (92)
  6 PF07430 PP1:  Phloem filament   98.5 3.2E-07   7E-12   68.8   6.2   75   25-112     4-80  (202)
  7 PF06907 Latexin:  Latexin;  In  90.9     5.3 0.00012   30.8  10.3   67   38-112     4-73  (220)
  8 COG3360 Uncharacterized conser  74.8      19 0.00042   23.0   6.5   46   42-99     18-65  (71)
  9 TIGR01572 A_thl_para_3677 Arab  72.6     6.4 0.00014   31.1   4.1   57   43-111    42-100 (265)
 10 PF00666 Cathelicidins:  Cathel  72.5     6.1 0.00013   25.0   3.3   28   43-79      4-31  (67)
 11 PF07311 Dodecin:  Dodecin;  In  71.0      23  0.0005   22.2   7.3   48   39-98     12-61  (66)
 12 PF08194 DIM:  DIM protein;  In  65.8     7.8 0.00017   21.7   2.4   23    1-23      1-23  (36)
 13 PF10731 Anophelin:  Thrombin i  60.0     8.7 0.00019   24.0   2.1   21    1-23      1-21  (65)
 14 PF15240 Pro-rich:  Proline-ric  52.9     9.7 0.00021   28.5   1.7   13    6-18      2-14  (179)
 15 PF05679 CHGN:  Chondroitin N-a  50.7      57  0.0012   27.6   6.3   48   41-99    161-210 (499)
 16 PF12276 DUF3617:  Protein of u  39.7      34 0.00074   23.9   2.9   13   87-99    128-140 (162)
 17 TIGR03481 HpnM hopanoid biosyn  35.4   1E+02  0.0022   22.8   5.0   32   80-112   127-160 (198)
 18 PTZ00459 mucin-associated surf  34.7      28 0.00061   27.6   1.9   19    1-19      1-21  (291)
 19 PRK02710 plastocyanin; Provisi  33.4      39 0.00085   22.8   2.2   17    1-17      1-17  (119)
 20 CHL00132 psaF photosystem I su  29.8 1.7E+02  0.0036   22.1   5.2   26   27-55     25-50  (185)
 21 PF08884 Flagellin_D3:  Flagell  27.1      88  0.0019   20.8   3.0   23   89-112    37-59  (90)
 22 TIGR02105 III_needle type III   26.1      90   0.002   19.8   2.8   21   38-58     31-51  (72)
 23 COG5510 Predicted small secret  24.9      79  0.0017   18.4   2.1   17    1-17      1-17  (44)
 24 PRK10081 entericidin B membran  23.1      79  0.0017   18.7   2.0   18    1-18      1-18  (48)
 25 PF00041 fn3:  Fibronectin type  21.7      80  0.0017   18.4   1.9   16   84-99     62-77  (85)
 26 cd01781 AF6_RA_repeat2 Ubiquit  21.4 1.3E+02  0.0027   20.5   2.9   33   39-80     24-56  (100)

No 1  
>PF00031 Cystatin:  Cystatin domain;  InterPro: IPR000010 Peptide proteinase inhibitors can be found as single domain proteins or as single or multiple domains within proteins; these are referred to as either simple or compound inhibitors, respectively. In many cases they are synthesised as part of a larger precursor protein, either as a prepropeptide or as an N-terminal domain associated with an inactive peptidase or zymogen. This domain prevents access of the substrate to the active site. Removal of the N-terminal inhibitor domain either by interaction with a second peptidase or by autocatalytic cleavage activates the zymogen. Other inhibitors interact direct with proteinases using a simple noncovalent lock and key mechanism; while yet others use a conformational change-based trapping mechanism that depends on their structural and thermodynamic properties.  The cystatins are cysteine proteinase inhibitors belonging to MEROPS inhibitor family I25, clan IH [, , ]. They mainly inhibit peptidases belonging to peptidase families C1 (papain family) and C13 (legumain family). The cystatin family includes:   The Type 1 cystatins, which are intracellular cystatins that are present in the cytosol of many cell types, but can also appear in body fluids at significant concentrations. They are single-chain polypeptides of about 100 residues, which have neither disulphide bonds nor carbohydrate side chains.  The Type 2 cystatins, which are mainly extracellular secreted polypeptides synthesised with a 19-28 residue signal peptide. They are broadly distributed and found in most body fluids.  The Type 3 cystatins, which are multidomain proteins. The mammalian representatives of this group are the kininogens. There are three different kininogens in mammals: H- (high molecular mass, IPR002395 from INTERPRO) and L- (low molecular mass) kininogen which are found in a number of species, and T-kininogen that is found only in rat.  Unclassified cystatins. These are cystatin-like proteins found in a range of organisms: plant phytocystatins, fetuin in mammals, insect cystatins and a puff adder venom cystatin which inhibits metalloproteases of the MEROPS peptidase family M12 (astacin/adamalysin). Also a number of the cystatins-like proteins have been shown to be devoid of inhibitory activity.   All true cystatins inhibit cysteine peptidases of the papain family (MEROPS peptidase family C1), and some also inhibit legumain family enzymes (MEROPS peptidase family C13). These peptidases play key roles in physiological processes, such as intracellular protein degradation (cathepsins B, H and L), are pivotal in the remodelling of bone (cathepsin K), and may be important in the control of antigen presentation (cathepsin S, mammalian legumain). Moreover, the activities of such peptidases are increased in pathophysiological conditions, such as cancer metastasis and inflammation. Additionally, such peptidases are essential for several pathogenic parasites and bacteria. Thus in animals cystatins not only have capacity to regulate normal body processes and perhaps cause disease when down-regulated, but in other organisms may also participate in defence against biotic and abiotic stress. ; GO: 0004869 cysteine-type endopeptidase inhibitor activity; PDB: 3L0R_B 2W9P_K 2W9Q_A 3S67_A 3QRD_D 1R4C_G 3GAX_A 1TIJ_B 1G96_A 3NX0_A ....
Probab=99.75  E-value=1.1e-17  Score=109.97  Aligned_cols=61  Identities=34%  Similarity=0.602  Sum_probs=58.4

Q ss_pred             CceeecCCCCCCHHHHHHHHHHHHHHHhhhhhcccccccccccceeEeEEEEEeEeeeeceEEEEEEEEecC
Q 035532           29 GGRSEVKDVKKNKEVQELGKFSVEEFNRSQQRQGKVIRNVAFGRLRFSQVLEAQKQVVSGIKYYLTIEATTG  100 (112)
Q Consensus        29 GG~~~~~~~~~d~~v~~~a~fAv~~~N~~~~~~~~~~~~~~~~~~~~~kV~~a~~QVVaG~nY~l~v~v~~~  100 (112)
                      |||++++  ++||+++++++||+.+||+++         ++.+.+++.+|++|++|||+|++|+|+++++++
T Consensus         1 Gg~~~~~--~~dp~v~~~~~~al~~~N~~~---------~~~~~~~~~~v~~a~~QvV~G~~Y~i~~~~~~t   61 (94)
T PF00031_consen    1 GGPSPVD--PNDPEVQEAAEFALDKFNEQS---------NSGYKFKLVKVISATTQVVAGINYYIEFEVGET   61 (94)
T ss_dssp             SSEEEEC--TTSHHHHHHHHHHHHHHHHHS---------TTSEEEEEEEEEEEEEEESSSEEEEEEEEEEEE
T ss_pred             CCCccCC--CCCHHHHHHHHHHHHHHHHhC---------cccCcceeeeeeEEEEeecCCceEEEEEEEEcc
Confidence            8999998  799999999999999999998         679999999999999999999999999999994


No 2  
>cd00042 CY Substituted updates: Jan 30, 2002
Probab=99.71  E-value=9.1e-17  Score=107.64  Aligned_cols=59  Identities=44%  Similarity=0.707  Sum_probs=55.9

Q ss_pred             CceeecCCCCCCHHHHHHHHHHHHHHHhhhhhcccccccccccceeEeEEEEEeEeeeeceEEEEEEEEec
Q 035532           29 GGRSEVKDVKKNKEVQELGKFSVEEFNRSQQRQGKVIRNVAFGRLRFSQVLEAQKQVVSGIKYYLTIEATT   99 (112)
Q Consensus        29 GG~~~~~~~~~d~~v~~~a~fAv~~~N~~~~~~~~~~~~~~~~~~~~~kV~~a~~QVVaG~nY~l~v~v~~   99 (112)
                      |||.+++  .+||+++++++||+++||+.+         ++.+ +++.+|++|++|||+|++|+|++++.+
T Consensus         1 gg~~~~~--~~d~~~~~~~~~a~~~~N~~~---------~~~~-~~~~~i~~~~~QvvaG~~y~i~~~~~~   59 (105)
T cd00042           1 GGPSDIP--ANDPEVQELADFAVAEYNKKS---------NDKY-LEFFKVLSAKSQVVAGTNYYITVEAGD   59 (105)
T ss_pred             CCCccCC--CCCHHHHHHHHHHHHHHHhhc---------Cccc-eeEEEEEEEEEEEEeeeEEEEEEEEec
Confidence            8999997  799999999999999999998         5677 999999999999999999999999998


No 3  
>smart00043 CY Cystatin-like domain. Cystatins are a family of cysteine protease inhibitors that occur mainly as single domain proteins. However some extracellular proteins such as  kininogen, His-rich glycoprotein and fetuin also contain these domains.
Probab=99.69  E-value=5.6e-17  Score=109.11  Aligned_cols=63  Identities=38%  Similarity=0.595  Sum_probs=58.5

Q ss_pred             ccCceeecCCCCCCHHHHHHHHHHHHHHHhhhhhcccccccccccceeEeEEEEEeEeeeeceEEEEEEEEecC
Q 035532           27 LVGGRSEVKDVKKNKEVQELGKFSVEEFNRSQQRQGKVIRNVAFGRLRFSQVLEAQKQVVSGIKYYLTIEATTG  100 (112)
Q Consensus        27 ~~GG~~~~~~~~~d~~v~~~a~fAv~~~N~~~~~~~~~~~~~~~~~~~~~kV~~a~~QVVaG~nY~l~v~v~~~  100 (112)
                      ++|||++++  .+||+++++++||+.+||+++         ++.+.+++.+|++|++|||+|++|+|++++.+.
T Consensus         2 ~~Gg~~~~~--~~d~~~~~~~~~a~~~~N~~~---------~~~~~~~~~~v~~a~~QvvaG~~y~l~~~v~~t   64 (107)
T smart00043        2 CLGGPSDVP--PNDPEVQEAADFAVAEYNKKS---------NDKYELRVIKVVSAKSQVVAGTNYYLKVEVGET   64 (107)
T ss_pred             CCCCCccCC--CCCHHHHHHHHHHHHHHHHhc---------ccchhhhhhhhheeeeeeecceEEEEEEEEEec
Confidence            689999997  689999999999999999998         557888899999999999999999999999984


No 4  
>PF07430 PP1:  Phloem filament protein PP1;  InterPro: IPR009994 This domain represents a conserved region approximately 200 residues long, four copies of which are found within the plant phloem filament protein PP1. This is one of the constituents of the proteinaceous filaments found in the sieve elements of Cucurbita phloem [].
Probab=99.34  E-value=3.2e-12  Score=95.50  Aligned_cols=74  Identities=34%  Similarity=0.471  Sum_probs=65.1

Q ss_pred             ccccCceeecCCCCCCHHHHHHHHHHHHHHHhhhhhcccccccccccceeEeEEEEEeEeeee--ceEEEEEEEEecCCC
Q 035532           25 DRLVGGRSEVKDVKKNKEVQELGKFSVEEFNRSQQRQGKVIRNVAFGRLRFSQVLEAQKQVVS--GIKYYLTIEATTGEN  102 (112)
Q Consensus        25 ~~~~GG~~~~~~~~~d~~v~~~a~fAv~~~N~~~~~~~~~~~~~~~~~~~~~kV~~a~~QVVa--G~nY~l~v~v~~~~~  102 (112)
                      .+....|.+++|+ ++|.+|+++.|||.|||+ .           +..++|.+|.+++.|=++  |++|+|.+.+.+ ..
T Consensus       111 ~p~~~~Wi~I~ni-n~p~VQeLgkFAV~EhNK-~-----------gd~LkF~KV~eGw~q~l~~d~ikYrLhI~AkD-g~  176 (202)
T PF07430_consen  111 TPQSKKWIPIPNI-NNPFVQELGKFAVIEHNK-A-----------GDKLKFEKVYEGWYQDLGNDGIKYRLHIVAKD-GL  176 (202)
T ss_pred             CcccCCCEECCCC-CcHHHHHHHHHHHHHHhh-c-----------CCceEEEEEeeEEEEeccCCCceEEEEEEeec-CC
Confidence            4556889999886 999999999999999994 3           358999999999999775  699999999999 45


Q ss_pred             CeeeeEEEEC
Q 035532          103 GEIQMFDSIV  112 (112)
Q Consensus       103 g~~~~y~a~V  112 (112)
                      |+...|+|+|
T Consensus       177 G~~~~YeAvV  186 (202)
T PF07430_consen  177 GRLGNYEAVV  186 (202)
T ss_pred             CCcCceEEEE
Confidence            9999999986


No 5  
>TIGR01638 Atha_cystat_rel Arabidopsis thaliana cystatin-related protein. This model represents a family similar in sequence and probably homologous to a large family of cysteine proteinase inhibitors, or cystatins, as described by pfam model pfam00031. Cystatins may help plants resist attack by insects.
Probab=98.53  E-value=3.1e-07  Score=61.68  Aligned_cols=54  Identities=22%  Similarity=0.355  Sum_probs=48.7

Q ss_pred             CCHHHHHHHHHHHHHHHhhhhhcccccccccccceeEeEEEEEeEeeeeceEEEEEEEEecCCCC
Q 035532           39 KNKEVQELGKFSVEEFNRSQQRQGKVIRNVAFGRLRFSQVLEAQKQVVSGIKYYLTIEATTGENG  103 (112)
Q Consensus        39 ~d~~v~~~a~fAv~~~N~~~~~~~~~~~~~~~~~~~~~kV~~a~~QVVaG~nY~l~v~v~~~~~g  103 (112)
                      +.+-+..++++|+++||...           ...+++++|++|..|..+|.+|+||+++.+..+|
T Consensus        11 ~rd~~~~la~~al~k~N~~~-----------~t~lEfV~vVrAn~~~~~g~~~yITF~Ard~~d~   64 (92)
T TIGR01638        11 NRDLLERLSYVASKKYNDTK-----------FLNLELVEVVRANYRGGAKSKSYITFEARDKPDG   64 (92)
T ss_pred             HHHHHHHHHHHHHHHhhhhc-----------CceEEEEEEEEEEeeccceEEEEEEEEEecCCCC
Confidence            45668999999999999987           5899999999999999999999999999986555


No 6  
>PF07430 PP1:  Phloem filament protein PP1;  InterPro: IPR009994 This domain represents a conserved region approximately 200 residues long, four copies of which are found within the plant phloem filament protein PP1. This is one of the constituents of the proteinaceous filaments found in the sieve elements of Cucurbita phloem [].
Probab=98.49  E-value=3.2e-07  Score=68.80  Aligned_cols=75  Identities=27%  Similarity=0.379  Sum_probs=67.5

Q ss_pred             ccccCceeecCCCCCCHHHHHHHHHHHHHHHhhhhhcccccccccccceeEeEEEEEe--EeeeeceEEEEEEEEecCCC
Q 035532           25 DRLVGGRSEVKDVKKNKEVQELGKFSVEEFNRSQQRQGKVIRNVAFGRLRFSQVLEAQ--KQVVSGIKYYLTIEATTGEN  102 (112)
Q Consensus        25 ~~~~GG~~~~~~~~~d~~v~~~a~fAv~~~N~~~~~~~~~~~~~~~~~~~~~kV~~a~--~QVVaG~nY~l~v~v~~~~~  102 (112)
                      ....+||.+++|+ .+|.+|+++.|++++++.+.           ...+++..|.+.+  .|.+.+++|+|.+++.+ --
T Consensus         4 ~~~~~~w~~ip~v-~~~~~q~v~~~~veq~k~~~-----------~~~l~~~~v~egwy~el~~~~~~yrlhv~a~d-~l   70 (202)
T PF07430_consen    4 VPFSPKWIKIPDV-KEPCLQEVAKFAVEQFKIQY-----------GDSLKFRSVVEGWYFELCPNSLKYRLHVKAID-FL   70 (202)
T ss_pred             cccCcccccCCcc-cchHHHHHHHHHHHHHhhhc-----------ccceeeeeeeeceeecccccceeEEEeehhhh-hh
Confidence            3457999999997 89999999999999999987           5689999999998  88999999999999998 67


Q ss_pred             CeeeeEEEEC
Q 035532          103 GEIQMFDSIV  112 (112)
Q Consensus       103 g~~~~y~a~V  112 (112)
                      |..-+|||++
T Consensus        71 ~r~l~~e~ii   80 (202)
T PF07430_consen   71 GRSLKYEAII   80 (202)
T ss_pred             ccccceeeee
Confidence            8888999875


No 7  
>PF06907 Latexin:  Latexin;  InterPro: IPR009684 This family consists of several animal specific latexin and proteins related to latexin that belong to MEROPS proteinase inhibitor family I47, clan I- [].  Latexin, a protein possessing inhibitory activity against rat carboxypeptidase A1 (CPA1) and CPA2 (MEROPS peptidase family M14A), is expressed in a neuronal subset in the cerebral cortex and cells in other neural and non-neural tissues of rat [, ]. OCX-32, the 32 kDa eggshell matrix protein, is present at high levels in the uterine fluid during the terminal phase of eggshell formation, and is localised predominantly in the outer eggshell. The timing of OCX-32 secretion into the uterine fluid suggests that it may play a role in the termination of mineral deposition []. OCX-32 protein possesses limited identity (32%) to two unrelated proteins: latexin and to a skin protein that is encoded by a retinoic acid receptor-responsive gene, TIG1. Tazarotene Induced Gene 1 (TIG1) is a putative 228 transmembrane protein with a small N-terminal intracellular region, a single membrane-spanning hydrophobic region, and a large C-terminal extracellular region containing a glycosylation signal. TIG1 is up-regulated by retinoic acid receptor but not by retinoid X receptor-specific synthetic retinoids []. TIG1 may be a tumour suppressor gene whose diminished expression is involved in the malignant progression of prostate cancer [].; PDB: 1WNH_A 2BO9_B.
Probab=90.91  E-value=5.3  Score=30.78  Aligned_cols=67  Identities=16%  Similarity=0.184  Sum_probs=47.8

Q ss_pred             CCCHHHHHHHHHHHHHHHhhhhhcccccccccccceeEeEEEEEeEeeee--ceEEEEEEEEecCCCC-eeeeEEEEC
Q 035532           38 KKNKEVQELGKFSVEEFNRSQQRQGKVIRNVAFGRLRFSQVLEAQKQVVS--GIKYYLTIEATTGENG-EIQMFDSIV  112 (112)
Q Consensus        38 ~~d~~v~~~a~fAv~~~N~~~~~~~~~~~~~~~~~~~~~kV~~a~~QVVa--G~nY~l~v~v~~~~~g-~~~~y~a~V  112 (112)
                      ++.-..+++|+-|..-+|-..+.        -...+.+.+|.+|+..++.  |-+|+|++.+.+-..| +..++.|.|
T Consensus         4 p~h~~a~rAA~va~hy~N~~~GS--------P~~l~~l~~V~~a~~e~ip~~G~Ky~L~FSte~~~~~e~~g~CsA~V   73 (220)
T PF06907_consen    4 PSHRPAQRAARVAQHYINYRAGS--------PSRLFVLQQVQKARAEDIPGEGCKYDLVFSTEEYIEGEHLGNCSAEV   73 (220)
T ss_dssp             TTSHHHHHHHHHHHHHHHHHH-B--------TTB-EEEEEEEEEEEEEETTTEEEEEEEEEEEETTT---EEEEEEEE
T ss_pred             CcchHHHHHHHHHHHHhccccCC--------CceeeehhhhhhhhheeccCCCCEEEEEEEhHHhhcCCceeEeEEEE
Confidence            45666889999999999988732        1355678889999999885  7899999999984222 445665544


No 8  
>COG3360 Uncharacterized conserved protein [Function unknown]
Probab=74.84  E-value=19  Score=23.00  Aligned_cols=46  Identities=17%  Similarity=0.311  Sum_probs=35.0

Q ss_pred             HHHHHHHHHHHHHHhhhhhcccccccccccceeEeEEEEEeEeeeec--eEEEEEEEEec
Q 035532           42 EVQELGKFSVEEFNRSQQRQGKVIRNVAFGRLRFSQVLEAQKQVVSG--IKYYLTIEATT   99 (112)
Q Consensus        42 ~v~~~a~fAv~~~N~~~~~~~~~~~~~~~~~~~~~kV~~a~~QVVaG--~nY~l~v~v~~   99 (112)
                      .+.++++-|+..-...            ...+.+.+|++-+-+|+.|  ..|.++++++=
T Consensus        18 S~d~Ai~~Ai~RA~~t------------~~~l~wfeV~~~rg~v~~g~v~hyqv~lkVgF   65 (71)
T COG3360          18 SIDAAIANAIARAADT------------LDNLDWFEVVETRGHVVDGAVAHYQVTLKVGF   65 (71)
T ss_pred             cHHHHHHHHHHHHHhh------------hhcceEEEEEeecccEeecceEEEEEEEEEEE
Confidence            3666777777765544            4679999999999999988  46888888763


No 9  
>TIGR01572 A_thl_para_3677 Arabidopsis paralogous family TIGR01572. This model describes a paralogous family of hypothetical proteins in Arabidopsis thaliana. No homologs are detected from other species. Length heterogeneity within the family is attributable partly to a 21-residue repeat present in from zero to three tandem copies. The central region of the repeat resembles the pattern [VIF][FY][QK]GX[LM]P[DEK]XXXDDAL.
Probab=72.57  E-value=6.4  Score=31.14  Aligned_cols=57  Identities=19%  Similarity=0.223  Sum_probs=48.2

Q ss_pred             HHHHHHHHHHHHHhhhhhcccccccccccceeEeEEEEEeEeeeeceEEEEEEEEecCCC--CeeeeEEEE
Q 035532           43 VQELGKFSVEEFNRSQQRQGKVIRNVAFGRLRFSQVLEAQKQVVSGIKYYLTIEATTGEN--GEIQMFDSI  111 (112)
Q Consensus        43 v~~~a~fAv~~~N~~~~~~~~~~~~~~~~~~~~~kV~~a~~QVVaG~nY~l~v~v~~~~~--g~~~~y~a~  111 (112)
                      +.-.|+.++.-||-..           +..+++..|.+.-+|..+=..|+||+++-+ .+  +..+.|+..
T Consensus        42 vklyAr~GLH~YN~~~-----------GTNlel~~v~K~N~~~~~~~syyITL~A~D-P~s~~s~qTFQtr  100 (265)
T TIGR01572        42 VKIYARVGLHRYNFLE-----------GTNLELDHVDKFNKRMCALSSYYITLLAVD-PDSRFLQQTFQVR  100 (265)
T ss_pred             HHHHHHhhhhhhhhcc-----------CccceehhhhhhccchhhheeeeEEEEEec-CCccccceEEEEE
Confidence            4778899999999886           688999999999999999999999999998 55  466677654


No 10 
>PF00666 Cathelicidins:  Cathelicidin;  InterPro: IPR001894 The precursor sequences of a number of antimicrobial peptides secreted by neutrophils (polymorphonuclear leukocytes) upon activation have been found to be evolutionarily related and are collectively known as cathelicidins []. Structurally, these proteins consist of three domains: a signal sequence, a conserved region of about 100 residues that contains four cysteines involved in two disulphide bonds, and a highly divergent C-terminal section of variable size. It is in this C-terminal section that the antibacterial peptides are found; they are proteolytically processed from their precursor by enzymes such as elastase. This structure is shown in the following schematic representation:  +---+--------------------------------+--------------------+ |Sig| Propeptide C C C C | Antibacterial pep. | +---+----------------|--|--|--|------+--------------------+ | | | | +--+ +--+ 'C': conserved cysteine involved in a disulphide bond. ; GO: 0006952 defense response, 0005576 extracellular region; PDB: 1KWI_A 1PFP_A 1LXE_A 1N5P_A 1N5H_A.
Probab=72.46  E-value=6.1  Score=24.99  Aligned_cols=28  Identities=14%  Similarity=0.105  Sum_probs=19.9

Q ss_pred             HHHHHHHHHHHHHhhhhhcccccccccccceeEeEEE
Q 035532           43 VQELGKFSVEEFNRSQQRQGKVIRNVAFGRLRFSQVL   79 (112)
Q Consensus        43 v~~~a~fAv~~~N~~~~~~~~~~~~~~~~~~~~~kV~   79 (112)
                      ++++...||..||+++         .+...|++.+..
T Consensus         4 Y~eav~~Av~~yN~~s---------~~~nlfRLLe~~   31 (67)
T PF00666_consen    4 YEEAVLRAVDFYNQGS---------SGENLFRLLELD   31 (67)
T ss_dssp             CHHHHHHHHHHHHHCS----------SSEEEEEEEE-
T ss_pred             HHHHHHHHHHHHhcCC---------CccCceeeeecc
Confidence            4789999999999998         335556555443


No 11 
>PF07311 Dodecin:  Dodecin;  InterPro: IPR009923 This entry represents proteins with a Dodecin-like topology. Dodecin flavoprotein is a small dodecameric flavin-binding protein from Halobacterium salinarium (Halobacterium halobium) that contains two flavins stacked in a single binding pocket between two tryptophan residues to form an aromatic tetrade []. Dodecin binds riboflavin, although it appears to have a broad specificity for flavins. Lumichrome, a molecule associated with flavin metabolism, appears to be a ligand of dodecin, which could act as a waste-trapping device. ; PDB: 2VYX_L 2DEG_F 2V18_K 2V19_D 2UX9_B 2CZ8_E 2V21_F 2CC8_A 2CCB_A 2VX9_A ....
Probab=70.98  E-value=23  Score=22.24  Aligned_cols=48  Identities=17%  Similarity=0.302  Sum_probs=39.6

Q ss_pred             CCHHHHHHHHHHHHHHHhhhhhcccccccccccceeEeEEEEEeEeeeec--eEEEEEEEEe
Q 035532           39 KNKEVQELGKFSVEEFNRSQQRQGKVIRNVAFGRLRFSQVLEAQKQVVSG--IKYYLTIEAT   98 (112)
Q Consensus        39 ~d~~v~~~a~fAv~~~N~~~~~~~~~~~~~~~~~~~~~kV~~a~~QVVaG--~nY~l~v~v~   98 (112)
                      +...+.++.+-|+.+-++.-            ..++.++|..-+-.|..|  +.|+.+++++
T Consensus        12 S~~S~edAv~~Av~~A~kTl------------~ni~~~eV~e~~~~v~dg~i~~y~v~lkv~   61 (66)
T PF07311_consen   12 SPKSWEDAVQNAVARASKTL------------RNIRWFEVKEQRGHVEDGKITEYQVNLKVS   61 (66)
T ss_dssp             ESSHHHHHHHHHHHHHHHHS------------SSEEEEEEEEEEEEEETTCEEEEEEEEEEE
T ss_pred             CCCCHHHHHHHHHHHHhhch------------hCcEEEEEEEEEEEEeCCcEEEEEEEEEEE
Confidence            34568899999999888764            679999999999999988  5788888875


No 12 
>PF08194 DIM:  DIM protein;  InterPro: IPR013172 Drosophila immune-induced molecules (DIMs) are short proteins induced during the immune response of Drosophila []. This entry includes DIMs 1 to 4 and DIM23.
Probab=65.83  E-value=7.8  Score=21.70  Aligned_cols=23  Identities=30%  Similarity=0.301  Sum_probs=10.7

Q ss_pred             CchhhHHHHHHHHHHHHchhccC
Q 035532            1 MAKLARALILSLLLLSITASVNG   23 (112)
Q Consensus         1 ma~~~~llll~~~~l~~~~~~~~   23 (112)
                      |-++...++|.++.|+.+.++++
T Consensus         1 Mk~l~~a~~l~lLal~~a~~~~p   23 (36)
T PF08194_consen    1 MKCLSLAFALLLLALAAAVPATP   23 (36)
T ss_pred             CceeHHHHHHHHHHHHhcccCCC
Confidence            44554445555444444433434


No 13 
>PF10731 Anophelin:  Thrombin inhibitor from mosquito;  InterPro: IPR018932  Members of this family are all inhibitors of thrombin, the peptidase that is at the end of the blood coagulation cascade and which creates the clot by cleaving fibrinogen. The interaction between thrombin and fibrinogen involves two different areas of contact - via the thrombin active site and via a second substrate-binding site known as an exosite. The inhibitor acts by blocking the exosite, rather than by interacting with the active site. The inhibitors are from mosquitoes that feed on human blood and which, by inhibiting thrombin, prevent the blood from clotting and keep it flowing. 
Probab=60.04  E-value=8.7  Score=24.02  Aligned_cols=21  Identities=29%  Similarity=0.487  Sum_probs=12.8

Q ss_pred             CchhhHHHHHHHHHHHHchhccC
Q 035532            1 MAKLARALILSLLLLSITASVNG   23 (112)
Q Consensus         1 ma~~~~llll~~~~l~~~~~~~~   23 (112)
                      ||.+  |+++++|++++.+.+++
T Consensus         1 MA~K--l~vialLC~aLva~vQ~   21 (65)
T PF10731_consen    1 MASK--LIVIALLCVALVAIVQS   21 (65)
T ss_pred             Ccch--hhHHHHHHHHHHHHHhc
Confidence            7777  66666666665554433


No 14 
>PF15240 Pro-rich:  Proline-rich
Probab=52.88  E-value=9.7  Score=28.50  Aligned_cols=13  Identities=31%  Similarity=0.549  Sum_probs=8.9

Q ss_pred             HHHHHHHHHHHHc
Q 035532            6 RALILSLLLLSIT   18 (112)
Q Consensus         6 ~llll~~~~l~~~   18 (112)
                      +|+|||+.||+|.
T Consensus         2 LlVLLSvALLALS   14 (179)
T PF15240_consen    2 LLVLLSVALLALS   14 (179)
T ss_pred             hhHHHHHHHHHhh
Confidence            3677777777665


No 15 
>PF05679 CHGN:  Chondroitin N-acetylgalactosaminyltransferase;  InterPro: IPR008428 This family represents Chondroitin N-acetylgalactosaminyltransferase. Proteins have a type II transmembrane topology. The enzyme is involved in the biosynthetic initiation and elongation of chondroitin sulphate and is the key enzyme responsible for the selective chain assembly of chondroitin/dermatan sulphate on the linkage region tetrasaccharide common to various proteoglycans containing chondroitin/dermatan sulphate or heparin/heparan sulphate chains. ; GO: 0016758 transferase activity, transferring hexosyl groups, 0032580 Golgi cisterna membrane
Probab=50.71  E-value=57  Score=27.63  Aligned_cols=48  Identities=19%  Similarity=0.403  Sum_probs=40.8

Q ss_pred             HHHHHHHHHHHHHHHhhhhhcccccccccccceeEeEEEEEeEee--eeceEEEEEEEEec
Q 035532           41 KEVQELGKFSVEEFNRSQQRQGKVIRNVAFGRLRFSQVLEAQKQV--VSGIKYYLTIEATT   99 (112)
Q Consensus        41 ~~v~~~a~fAv~~~N~~~~~~~~~~~~~~~~~~~~~kV~~a~~QV--VaG~nY~l~v~v~~   99 (112)
                      .++.++.+.|++..|+++           ...+++.+++.....+  .-|+-|.|++.+..
T Consensus       161 ~dl~~vi~~a~~~ln~~~-----------~~~~~~~~l~~GY~R~dp~rG~~Y~Ldl~l~~  210 (499)
T PF05679_consen  161 EDLDDVIEQAMEELNRKS-----------RRVLEFRDLINGYRRFDPTRGMDYILDLLLKY  210 (499)
T ss_pred             HHHHHHHHHHHHHHhccc-----------cccEEeeeeeeEEEEecCCCCceEEEEEEEee
Confidence            689999999999999987           3678889999887665  56999999988765


No 16 
>PF12276 DUF3617:  Protein of unknown function (DUF3617);  InterPro: IPR022061  This family of proteins is found in bacteria. Proteins in this family are typically between 155 and 179 amino acids in length. There is a single completely conserved residue C that may be functionally important. 
Probab=39.69  E-value=34  Score=23.87  Aligned_cols=13  Identities=31%  Similarity=0.253  Sum_probs=7.5

Q ss_pred             eceEEEEEEEEec
Q 035532           87 SGIKYYLTIEATT   99 (112)
Q Consensus        87 aG~nY~l~v~v~~   99 (112)
                      .+..|..++.+..
T Consensus       128 ~~~~~~~~~~~~~  140 (162)
T PF12276_consen  128 SPTSYTGTMTMTS  140 (162)
T ss_pred             CCCeEEEEEEEEe
Confidence            4556666665554


No 17 
>TIGR03481 HpnM hopanoid biosynthesis associated membrane protein HpnM. The genomes containing members of this family share the machinery for the biosynthesis of hopanoid lipids. Furthermore, the genes of this family are usually located proximal to other components of this biological process. The proteins are members of the pfam05494 family of putative transporters known as "toluene tolerance protein Ttg2D", although it is unlikely that the members included here have anything to do with toluene per-se.
Probab=35.42  E-value=1e+02  Score=22.82  Aligned_cols=32  Identities=13%  Similarity=0.194  Sum_probs=24.5

Q ss_pred             EEeEeee--eceEEEEEEEEecCCCCeeeeEEEEC
Q 035532           80 EAQKQVV--SGIKYYLTIEATTGENGEIQMFDSIV  112 (112)
Q Consensus        80 ~a~~QVV--aG~nY~l~v~v~~~~~g~~~~y~a~V  112 (112)
                      ...++++  .|..+.+...+.. .+|+++.|+.++
T Consensus       127 ~V~t~i~~~~g~~i~V~y~l~~-~~g~WkV~DV~i  160 (198)
T TIGR03481       127 IVRSTIVSDGGDPVKFDYIMRQ-GQGKWRIVDILA  160 (198)
T ss_pred             EEEEEEEcCCCCcEEEEEEEEe-cCCCeEEEEEEE
Confidence            3456666  5678888888887 689999998764


No 18 
>PTZ00459 mucin-associated surface protein (MASP); Provisional
Probab=34.71  E-value=28  Score=27.58  Aligned_cols=19  Identities=26%  Similarity=0.354  Sum_probs=15.0

Q ss_pred             Cchhh--HHHHHHHHHHHHch
Q 035532            1 MAKLA--RALILSLLLLSITA   19 (112)
Q Consensus         1 ma~~~--~llll~~~~l~~~~   19 (112)
                      |||+.  |+||+-+|+++-+.
T Consensus         1 MaMmMTGRVLLVCALCVLWCg   21 (291)
T PTZ00459          1 MAMMMTGRVLLVCALCVLWCG   21 (291)
T ss_pred             CccchhchHHHHHHHHHHhcC
Confidence            78776  99998888887764


No 19 
>PRK02710 plastocyanin; Provisional
Probab=33.39  E-value=39  Score=22.82  Aligned_cols=17  Identities=29%  Similarity=0.350  Sum_probs=9.8

Q ss_pred             CchhhHHHHHHHHHHHH
Q 035532            1 MAKLARALILSLLLLSI   17 (112)
Q Consensus         1 ma~~~~llll~~~~l~~   17 (112)
                      |++.+++++.++++++.
T Consensus         1 ~~~~~~~~~~~~~~~~~   17 (119)
T PRK02710          1 MAKRLRSIAAALVAVVS   17 (119)
T ss_pred             CchhHHHHHHHHHHHHH
Confidence            66666555555555444


No 20 
>CHL00132 psaF photosystem I subunit III; Validated
Probab=29.78  E-value=1.7e+02  Score=22.07  Aligned_cols=26  Identities=19%  Similarity=0.167  Sum_probs=19.5

Q ss_pred             ccCceeecCCCCCCHHHHHHHHHHHHHHH
Q 035532           27 LVGGRSEVKDVKKNKEVQELGKFSVEEFN   55 (112)
Q Consensus        27 ~~GG~~~~~~~~~d~~v~~~a~fAv~~~N   55 (112)
                      -+.|..+-+   ++|.+++-+.-+++...
T Consensus        25 d~agLtpCs---es~aF~kR~~~~~k~Le   50 (185)
T CHL00132         25 DVAGLTPCS---ESPAFQKRLNNSVKKLE   50 (185)
T ss_pred             cccCCccCc---cCHHHHHHHHHHHHHHH
Confidence            467888886   68888888877776543


No 21 
>PF08884 Flagellin_D3:  Flagellin D3 domain;  InterPro: IPR014981 This domain is found in the central portion bacterial flagellin FliC, it contains a structural motif called a beta-folium fold []. Although no specific function is assigned its deletion leads to a reduction in filament stability []. ; PDB: 1IO1_A 1UCU_A 3A5X_A.
Probab=27.09  E-value=88  Score=20.76  Aligned_cols=23  Identities=26%  Similarity=0.469  Sum_probs=13.8

Q ss_pred             eEEEEEEEEecCCCCeeeeEEEEC
Q 035532           89 IKYYLTIEATTGENGEIQMFDSIV  112 (112)
Q Consensus        89 ~nY~l~v~v~~~~~g~~~~y~a~V  112 (112)
                      -||+.++.-.. ..++-..||+.|
T Consensus        37 gkYYv~V~g~~-~~~knG~Yev~V   59 (90)
T PF08884_consen   37 GKYYVEVTGTT-ATAKNGYYEVTV   59 (90)
T ss_dssp             --EEEEEEEET--SS--EEEEEEE
T ss_pred             CCeEEEEEeec-cCCCCccEEEEE
Confidence            58888888776 467778888764


No 22 
>TIGR02105 III_needle type III secretion apparatus needle protein. Type III secretion systems translocate proteins, usually virulence factors, out across both inner and outer membranes of certain Gram-negative bacteria and further across the plasma membrane and into the cytoplasm of the host cell. This protein, termed YscF in Yersinia, and EscF, PscF, EprI, etc. in other systems, forms the needle of the injection apparatus.
Probab=26.15  E-value=90  Score=19.83  Aligned_cols=21  Identities=10%  Similarity=0.237  Sum_probs=18.0

Q ss_pred             CCCHHHHHHHHHHHHHHHhhh
Q 035532           38 KKNKEVQELGKFSVEEFNRSQ   58 (112)
Q Consensus        38 ~~d~~v~~~a~fAv~~~N~~~   58 (112)
                      ++||+..--.+|++.+||.--
T Consensus        31 ~~nP~~La~~Q~~~~qYs~~~   51 (72)
T TIGR02105        31 PNDPELMAELQFALNQYSAYY   51 (72)
T ss_pred             CCCHHHHHHHHHHHHHHHHHH
Confidence            489999999999999998643


No 23 
>COG5510 Predicted small secreted protein [Function unknown]
Probab=24.87  E-value=79  Score=18.41  Aligned_cols=17  Identities=41%  Similarity=0.327  Sum_probs=7.0

Q ss_pred             CchhhHHHHHHHHHHHH
Q 035532            1 MAKLARALILSLLLLSI   17 (112)
Q Consensus         1 ma~~~~llll~~~~l~~   17 (112)
                      |.++..+++++.++.++
T Consensus         1 mmk~t~l~i~~vll~s~   17 (44)
T COG5510           1 MMKKTILLIALVLLAST   17 (44)
T ss_pred             CchHHHHHHHHHHHHHH
Confidence            34443344444433333


No 24 
>PRK10081 entericidin B membrane lipoprotein; Provisional
Probab=23.14  E-value=79  Score=18.74  Aligned_cols=18  Identities=44%  Similarity=0.488  Sum_probs=8.6

Q ss_pred             CchhhHHHHHHHHHHHHc
Q 035532            1 MAKLARALILSLLLLSIT   18 (112)
Q Consensus         1 ma~~~~llll~~~~l~~~   18 (112)
                      |.++...++++++++++.
T Consensus         1 MmKk~i~~i~~~l~~~~~   18 (48)
T PRK10081          1 MVKKTIAAIFSVLVLSTV   18 (48)
T ss_pred             ChHHHHHHHHHHHHHHHH
Confidence            545544444554444443


No 25 
>PF00041 fn3:  Fibronectin type III domain;  InterPro: IPR003961 Fibronectins are multi-domain glycoproteins found in a soluble form in plasma, and in an insoluble form in loose connective tissue and basement membranes []. They contain multiple copies of 3 repeat regions (types I, II and III), which bind to a variety of substances including heparin, collagen, DNA, actin, fibrin and fibronectin receptors on cell surfaces. The wide variety of these substances means that fibronectins are involved in a number of important functions: e.g., wound healing; cell adhesion; blood coagulation; cell differentiation and migration; maintenance of the cellular cytoskeleton; and tumour metastasis []. The role of fibronectin in cell differentiation is demonstrated by the marked reduction in the expression of its gene when neoplastic transformation occurs. Cell attachment has been found to be mediated by the binding of the tetrapeptide RGDS to integrins on the cell surface [], although related sequences can also display cell adhesion activity. Plasma fibronectin occurs as a dimer of 2 different subunits, linked together by 2 disulphide bonds near the C terminus. The difference in the 2 chains occurs in the type III repeat region and is caused by alternative splicing of the mRNA from one gene []. The observation that, in a given protein, an individual repeat of one of the 3 types (e.g., the first FnIII repeat) shows much less similarity to its subsequent tandem repeats within that protein than to its equivalent repeat between fibronectins from other species, has suggested that the repeating structure of fibronectin arose at an early stage of evolution. It also seems to suggest that the structure is subject to high selective pressure []. The fibronectin type III repeat region is an approximately 100 amino acid domain, different tandem repeats of which contain binding sites for DNA, heparin and the cell surface []. The superfamily of sequences believed to contain FnIII repeats represents 45 different families, the majority of which are involved in cell surface binding in some manner, or are receptor protein tyrosine kinases, or cytokine receptors.; GO: 0005515 protein binding; PDB: 1UEM_A 1TDQ_A 1X5I_A 2IC2_B 2IBG_C 2IBB_A 3R8Q_A 2FNB_A 1FNH_A 2EDB_A ....
Probab=21.72  E-value=80  Score=18.42  Aligned_cols=16  Identities=19%  Similarity=0.399  Sum_probs=13.9

Q ss_pred             eeeeceEEEEEEEEec
Q 035532           84 QVVSGIKYYLTIEATT   99 (112)
Q Consensus        84 QVVaG~nY~l~v~v~~   99 (112)
                      ++..|+.|.+.|.+..
T Consensus        62 ~L~p~t~Y~~~v~a~~   77 (85)
T PF00041_consen   62 GLQPGTTYEFRVRAVN   77 (85)
T ss_dssp             SCCTTSEEEEEEEEEE
T ss_pred             cCCCCCEEEEEEEEEe
Confidence            4668999999999888


No 26 
>cd01781 AF6_RA_repeat2 Ubiquitin domain of AT-6, second repeat. The AF-6 protein (also known as afadin and canoe) is a multidomain cell junction protein that contains two N-terminal Ras-associating (RA) domains in addition to FHA (forkhead-associated), DIL (class V myosin homology region), and PDZ domains and a proline-rich region. AF6 acts downstream of the Egfr (Epidermal Growth Factor-receptor)/Ras signalling pathway and provides a link from Egfr to cytoskeletal elements.
Probab=21.39  E-value=1.3e+02  Score=20.46  Aligned_cols=33  Identities=9%  Similarity=0.094  Sum_probs=26.5

Q ss_pred             CCHHHHHHHHHHHHHHHhhhhhcccccccccccceeEeEEEE
Q 035532           39 KNKEVQELGKFSVEEFNRSQQRQGKVIRNVAFGRLRFSQVLE   80 (112)
Q Consensus        39 ~d~~v~~~a~fAv~~~N~~~~~~~~~~~~~~~~~~~~~kV~~   80 (112)
                      ++...+++..-|++.|+.+.         ++...|.+.+|+.
T Consensus        24 ~~~~a~~vV~eALeKygL~~---------e~p~~Y~LveV~~   56 (100)
T cd01781          24 INDNADRIVGEALEKYGLEK---------SDPDDYCLVEVSN   56 (100)
T ss_pred             CCccHHHHHHHHHHHhCCCc---------cCccceEEEEEec
Confidence            56678899999999999876         4467888888863


Done!