Query         FBpp0301686 type=protein; loc=3L:join(7012190..7012289,7012739..7013277,7033869..7033964,7034503..7034538,7034635..7034703,7034972..7035038,7035170..7035315,7035744..7035881,7038005..7038151,7041018..7041131,7043466..7043720,7044926..7045003,7045109..7045355,7045534..7045611,7046334..7046488,7046586..7046678,7048842..7048964,7050173..7050187,7050565..7050695,7051242..7051421,7051486..7051722,7051977..7052052); ID=FBpp0301686; name=Mp-PR; parent=FBgn0260660,FBtr0309957; dbxref=FlyBase:FBpp0301686,FlyBase_Annotation_IDs:CG42543-PR,REFSEQ:NP_001246651,GB_protein:AFH04322; MD5=be7e823c399e9b8774f85338b65532db; length=1039; release=r6.06; species=Dmel;
Match_columns 1039
No_of_seqs    759 out of 2752
Neff          6.3 
Searched_HMMs 16187

 No Hit                             Prob E-value P-value  Score    SS Cols Query HMM  Template HMM
   1 PF06482 Endostatin:  Collagena 100.0   9E-76 5.5E-80  630.6  14.1  259  742-1000    1-285 (286)
   2 PF07588 DUF1554:  Protein of u  99.7 3.2E-18   2E-22  165.6   8.1  126  840-975     2-133 (135)
   3 PF02210 Laminin_G_2:  Laminin   98.6 1.8E-07 1.1E-11   87.1  13.5  114   98-221     4-124 (127)
   4 PF13385 Laminin_G_3:  Concanav  98.5   6E-07 3.7E-11   85.2  14.4  129   87-228    18-148 (152)
   5 PF00054 Laminin_G_1:  Laminin   97.7 0.00019 1.2E-08   67.3  13.2  114   98-222     4-126 (131)
   6 PF00354 Pentaxin:  Pentaxin fa  97.7 0.00052 3.2E-08   69.3  16.6  125   86-225    25-157 (194)
   7 PF02973 Sialidase:  Sialidase,  97.5 0.00089 5.5E-08   66.9  15.0  134   86-227    31-177 (189)
   8 PF07953 Toxin_R_bind_N:  Clost  95.1    0.11 6.6E-06   51.0  12.2  123   51-185    18-158 (195)
   9 PF06439 DUF1080:  Domain of Un  92.2   0.038 2.4E-06   53.7   3.2  104   78-186    44-157 (185)
  10 PF01410 COLFI:  Fibrillar coll  90.1   0.029 1.8E-06   57.9  -0.3   36  679-714     2-37  (233)
  11 PF14099 Polysacc_lyase:  Polys  78.0       5 0.00031   39.2  10.0   78  101-181    84-173 (212)
  12 PF16346 DUF4975:  Domain of un  76.5      13 0.00078   35.9  12.1  112   64-182    37-150 (176)
  13 PF16323 DUF4959:  Domain of un  63.5      74  0.0046   31.3  15.0   20    4-23      1-20  (225)
  14 PF00722 Glyco_hydro_16:  Glyco  61.2      35  0.0022   31.6  11.7   31  152-182   106-136 (177)
  15 PF02057 Glyco_hydro_59:  Glyco  58.1      42  0.0026   38.3  13.2   50  150-205   605-654 (669)
  16 PF08787 Alginate_lyase2:  Algi  55.0      61  0.0038   32.0  12.8   93   89-182    86-190 (236)
  17 PF12988 DUF3872:  Domain of un  42.9      52  0.0032   30.0   8.8   36  130-167    97-132 (133)
  18 PF07622 DUF1583:  Protein of u  33.3      16   0.001   39.6   4.2   37  150-186    83-119 (411)
  19 PF05018 DUF667:  Protein of un  28.6 1.2E+02  0.0075   28.7   9.5   31  149-180   121-164 (185)
  20 PF11267 DUF3067:  Domain of un  28.2      13 0.00078   32.5   1.8   14 1025-1038   42-55  (98)
  21 PF02018 CBM_4_9:  Carbohydrate  25.9   2E+02   0.012   24.0  10.5   22  155-176   101-126 (131)
  22 PF00337 Gal-bind_lectin:  Gala  20.3 2.8E+02   0.017   23.8  14.4  101   79-182     4-109 (134)

No 1
>PF06482 Endostatin: Collagenase NC10 and Endostatin; InterPro: IPR010515 NC10 stands for Non-helical region 10 and is taken from P39059 from SWISSPROT. A mutation in this region in P39060 from SWISSPROT is associated with an increased risk of prostrate cancer. This domain is cleaved from the precursor and forms endostatin. Endostatin is a key tumour suppressor and has been used highly successfully to treat cancer. It is a potent angiogenesis inhibitor []. Endostatin also binds a zinc ion near the N terminus; this is likely to be of structural rather than functional importance according to [].; GO: 0005198 structural molecule activity, 0007155 cell adhesion, 0031012 extracellular matrix; PDB: 1DY2_A 1DY1_A 1DY0_A 1KOE_A 3N3F_B 1BNL_D 3HSH_E 3HON_A. Probab=100.00 E-value=9e-76 Score=630.60 Aligned_cols=259 Identities=47% Similarity=0.861 Sum_probs=180.6 Q ss_pred CceeeecCHHHHhhhccCCCCCcEEEEeccceEEEEEccCceeeccccccccCCCCCCCCC----CCCCCc-ccccCC-- Q FBpp0301686 742 PGAVTFQNIDEMTKKSALNPPGTLAYITEEEALLVRVNKGWQYIALGTLVPIATPAPPTTV----APSMRF-DLQSKN-- 814 (1039) Q Consensus 742 pg~~~~~~~~~m~~~~~~s~~Gtl~y~~~~~~l~vrv~~G~~~i~lg~~~p~~~~~~~~~~----~~~~~~-~~~~~~-- 814 (1039) +|+++|+|+++|+++++..+||||+|++|++||||||++|||+||||+++|++...++.++ .+|..+ .....+ T Consensus 1 sGV~vf~T~~~Ml~~a~~~pEGTLayV~e~~eLYVRVrnGWRkV~LG~~ip~~~~~~~~~va~~~p~P~v~~~~~~~~~~ 80 (286) T PF06482_consen 1 SGVTVFRTYETMLATAHRVPEGTLAYVIEREELYVRVRNGWRKVQLGELIPIPSDTPDNEVASTQPPPVVSSPPQSSPPS 80 (286) T ss_dssp --EEEESSHHHHHCHGGGS-TTEEEEETTTTEEEEEETTEEEEE-EEEEEE----------------------------- T ss_pred CCcEEecCHHHHHhhcccCCCeEEEEEEecceEEEEecCCeeeeccCCcccCCCCcccccccccCCCCccccCccccccc Confidence 4799999999999999999999999999999999999999999999999998775543211 111111 111000 Q ss_pred ----ccC-C-------------CCCCCCCCCCCCceeEEEEcCCCCCCCCCccchhhHHHHHHHhhcCCCCceeEEEecc Q FBpp0301686 815 ----LLN-S-------------PPPLLNTPTWYPRMLRVAALNEPSTGDLQGIRGADFACYRQGRRAGLLGTFKAFLSSR 876 (1039) Q Consensus 815 ----~~~-~-------------~~~~~~~~~~~~~~l~l~a~~~~~~G~~~Gi~GAD~~C~~~a~~~g~~gt~rA~Ls~~ 876 (1039) ..+ . ++..........+.|||||||+|++|||+||+|||++||+|||++|+.||||||||++ T Consensus 81 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~l~liAlN~P~~G~m~Gi~gAD~~C~~qAr~~gl~gtfRAfLSs~ 160 (286) T PF06482_consen 81 SHPRPPSTAPDPHYPPQPRRPPPPSPSAHTHHDDGPLHLIALNEPLSGNMRGIRGADFQCFRQARAAGLTGTFRAFLSSR 160 (286) T ss_dssp -----------------------------S--TTS-EEEEE-SS-B-SBSSHHHHHHHHHHHHHHHTT--S-EEESS-BT T ss_pred ccCCcccCCCCccCCCCccccCCCCCccccccCCCceEEEEcCCCCCCCccccccccHHHHHHHHHcCCCCceEEeeecc Confidence 000 0 0000001122234599999999999999999999999999999999999999999999 Q ss_pred ccCcccccCCCCC-CCccccCCCcEEecCccccccCCCCcccCCCceeecCCCccCCCCCCCCceEEecCCCCcccccCc Q FBpp0301686 877 VQNLDTIVRPADR-DLPVVNTRGDVLFNSWKGIFNGQGGFFSQAPRIYSFSGKNVMTDSTWPMKMVWHGSLPNGERSMDT 955 (1039) Q Consensus 877 ~~~~~~~V~~~dr-~~p~vn~~g~vl~~~~~~l~~~~~~~~~~~~~i~~f~~~~~~~~~~~~~k~vW~Gs~~~g~~~~~~ 955 (1039) +|||++||++.|| ++||||+||||||+||++||+++++.|+.+++||||||+|||+|++||+|+|||||+++|++..++ T Consensus 161 ~qdL~~iV~~~dr~~~PivNlkgevLf~sw~~lf~g~~~~~~~~~~iySFdGr~v~~d~~wP~K~vWhGs~~~G~r~~~~ 240 (286) T PF06482_consen 161 LQDLYSIVRRADRDNVPIVNLKGEVLFNSWESLFSGSGGPFNPNAPIYSFDGRDVLTDPAWPQKMVWHGSDPRGRRLTDS 240 (286) T ss_dssp TB-GGGGS-GGGTSS--EE-TTS-EEES-HHHHTSSS-SB--TTS--BBTTS-BTTTSTTSSS-EEE--B-TTS-B-TTS T ss_pred cccHhhhccHhhCCCCCeEeCcCCEeecCHHHHhCCCCCCCCCCCcEEeECCccccCCCCcceEEEEeCCCCCCccCCcC Confidence 9999999999999 899999999999999999999998899999999999999999999999999999999999999999 Q ss_pred ccCcccCCCCCceeecccCCcccccccccccccCCcEEEEEeccc Q FBpp0301686 956 YCDAWHSGDHLKGSFASNLDGHKLLEQKRQSCDSKLIILCVEALS 1000 (1039) Q Consensus 956 ~C~~W~s~~~~~~g~as~~~~~~~~~~~~~~C~~~~~~lCve~~~ 1000 (1039) ||++|+|++.+++|+||+|++++||.|+.+||+++||||||||+. T Consensus 241 ~C~~Wrs~~~~~~G~As~l~~g~ll~q~~~sC~~~~ivLCiE~~~ 285 (286) T PF06482_consen 241 YCEAWRSSDPAVTGQASSLQSGKLLDQQPYSCSNSFIVLCIENSF 285 (286) T ss_dssp BHHHHB---TTSEEEEEEGGGTBSS--EEEETTS-BB-EEEESS- T ss_pred cccccccCCCCceEeeeecCCCCcccCCcccCCCceEEEEEeccc Confidence 999999999999999999999999999999999999999999974
No 2
>PF07588 DUF1554: Protein of unknown function (DUF1554); InterPro: IPR011448 This is a domain that occurs in 1-2 copies in a family of proteins identified in Leptospira interrogans and other bacteria. The function of the proteins is not known. Probab=99.72 E-value=3.2e-18 Score=165.57 Aligned_cols=126 Identities=20% Similarity=0.314 Sum_probs=97.4 Q ss_pred CCCCCCCCccchhhHHHHHHHhhcC--CCCceeEEEeccccCcccc-cCCCCCCCccccCCCcEEecCccccccCCCC-c Q FBpp0301686 840 EPSTGDLQGIRGADFACYRQGRRAG--LLGTFKAFLSSRVQNLDTI-VRPADRDLPVVNTRGDVLFNSWKGIFNGQGG-F 915 (1039) Q Consensus 840 ~~~~G~~~Gi~GAD~~C~~~a~~~g--~~gt~rA~Ls~~~~~~~~~-V~~~dr~~p~vn~~g~vl~~~~~~l~~~~~~-~ 915 (1039) ..|+|||+||.|||++|++.+.+.. ..++|||||++.+...+.+ +.++-. ....||||.+|.+|.+. ++. + T Consensus 2 ~~~~GnlGGi~GADa~C~~d~~~p~~~~~~~yKAml~~~~~~~R~a~~t~n~~----~g~~DWVl~pnt~Y~r~-dgt~i 76 (135) T PF07588_consen 2 NTYNGNLGGISGADAKCNADANKPSPGGGGTYKAMLVDGSNSTRRACVTANCG----DGQIDWVLKPNTTYYRS-DGTTI 76 (135) T ss_pred ccccCcccchhhHhHHHHcCCCCCCCCCCcCeEEEEEcCccccceeecCCCCC----CCcccceecCCceEEec-CCCEE Confidence 4689999999999999999887764 5679999999977522222 222222 22889999999999998 555 7 Q ss_pred ccCCCc-eeecCCCccCCCCCC-CCceEEecCCCCcccccCcccCcccCCCCCceeecccCC Q FBpp0301686 916 FSQAPR-IYSFSGKNVMTDSTW-PMKMVWHGSLPNGERSMDTYCDAWHSGDHLKGSFASNLD 975 (1039) Q Consensus 916 ~~~~~~-i~~f~~~~~~~~~~~-~~k~vW~Gs~~~g~~~~~~~C~~W~s~~~~~~g~as~~~ 975 (1039) |+++.. ||+|+ |++++- ..+.+|||++.+++... .+|++|+++....+|.....+ T Consensus 77 ~tTn~~glf~f~----l~~~i~~~~~~~WTGl~~~Wt~~~-~~C~~Wt~~s~~~~G~~G~~n 133 (135) T PF07588_consen 77 FTTNSNGLFDFP----LSNPISGTSGTIWTGLNSDWTTAT-NNCNNWTSGSSGVTGAYGSSN 133 (135) T ss_pred EecCCCceEccc----ccceecCCCccEEEeECCCCeeCC-CcccCCcCCCCcccccccccc Confidence 777766 99997 454443 48999999999987774 899999999988777766544
No 3
>PF02210 Laminin_G_2: Laminin G domain; InterPro: IPR001791 Laminins are large heterotrimeric glycoproteins involved in basement membrane function []. The Laminin G or LNS domain (for Laminin-alpha, Neurexin and Sex hormone-binding globulin) is an around 180 amino acid long domain found in a large and diverse set of extracellular proteins [, ]. The laminin globular (G) domain can be found in one to several copies in various laminin family members, including a large number of extracellular proteins. The C terminus of the laminin alpha chain contains a tandem repeat of five laminin G domains, which are critical for heparin-binding and cell attachment activity []. Laminin alpha4 is distributed in a variety of tissues including peripheral nerves, dorsal root ganglion, skeletal muscle and capillaries; in the neuromuscular junction, it is required for synaptic specialisation []. The structure of the laminin-G domain has been predicted to resemble that of pentraxin []. Laminin G domains can vary in their function, and a variety of binding functions have been ascribed to different LamG modules. For example, the laminin alpha1 and alpha2 chains each have five C-teminal laminin G domains, where only domains LG4 and LG5 contain binding sites for heparin, sulphatides and the cell surface receptor dystroglycan []. Laminin G-containing proteins appear to have a wide variety of roles in cell adhesion, signalling, migration, assembly and differentiation. Proteins with laminin-G domains include: Laminin. Merosin. Agrin. Neurexins. Vitamin K dependent protein S. Sex steroid binding protein SBP/SHBG. Drosophila proteins Slit, Crumbs, Fat. several proteoglycan precursors. ; PDB: 3POY_A 3QCW_B 3R05_B 3ASI_A 3MW4_B 3SH4_A 3SH5_A 2C5D_A 4RA0_A 1H30_A .... Probab=98.63 E-value=1.8e-07 Score=87.15 Aligned_cols=114 Identities=14% Similarity=0.210 Sum_probs=84.6 Q ss_pred CCCCceEEEEEeCCCCeEEEEEEEcccccCceEEEEEEEeccCCcceeEEEeeeccCCCCceEEEEEEeCCeEEEEEccc Q FBpp0301686 98 SLKGGYLFSVVNPLDTVVQLGVHLSPVVKNSYNVSLVYTQADQNIGRKLASFGVAHVPDKWNSIALQVLSDKVSFYYDCE 177 (1039) Q Consensus 98 ~~~~g~LfsI~~~~d~~~qlgL~lsg~~~~~~~i~L~Y~~~~~~~~q~~~sF~v~laDg~WHrlaLsV~g~~VtLyVDC~ 177 (1039) ....++||.+.+. +...+|.|.|... .|.|.|...... .....+...+.|++||+|.|..+.+.++|+|||. T Consensus 4 ~~~~glLl~~~~~-~~~~~~~l~l~~g-----~l~~~~~~~~~~--~~~~~~~~~~~dg~wh~v~v~~~~~~~~l~vd~~ 75 (127) T PF02210_consen 4 RQPNGLLLYIGSQ-NSSDFLALELRNG-----RLVFRYNLGGEE--EISLFSPRNVNDGEWHSVSVRRSGNNVTLSVDDN 75 (127) T ss_dssp SSSSEEEEEEEES-TTSEEEEEEEETT-----EEEEEEESSSSE--EEEEESSSCCTSSSEEEEEEEEETTEEEEEETTS T ss_pred CCCCEEEEEEcCC-CCCeEEEEEEECC-----EEEEEEEeeccc--eeeccccccccchheeeeeeeeeeeeeeeccCCc Confidence 4567899999997 4578999999854 477888765221 1333446788899999999999999999999999 Q ss_pred cceeeeccCCCcceeccCCeeEEEEecCccc-------CcccccceeeeEE Q FBpp0301686 178 LRNTTLVTREPIELVFDSASTLYIGQAGSII-------GGKFEGYLEKINV 221 (1039) Q Consensus 178 ~~~t~~l~r~~~~l~~~~~~~l~IGq~g~~~-------~~~F~G~LQ~L~i 221 (1039) .............+ .....+|||...... ...|+|-|++|+| T Consensus 76 ~~~~~~~~~~~~~~--~~~~~l~iGg~~~~~~~~~~~~~~~F~GCi~~l~i 124 (127) T PF02210_consen 76 RVMSSPSSSSSQQL--NFDGSLYIGGIPNDFSSPGSDTQPGFVGCIRDLKI 124 (127) T ss_dssp EEEEEESSSTTHCB--ESEEEEEESSTTTTCTTTTSSTTSB-EEEEEEEEE T ss_pred ceeeeccCCCcccc--ccCCcEEECcccCccccccccCCCCCEEEECeEEE Confidence 98887665543333 335669999874422 4679999999987
No 4
>PF13385 Laminin_G_3: Concanavalin A-like lectin/glucanases superfamily; PDB: 1N1Y_A 1MZ6_A 1MZ5_A 2A75_A 1N1S_A 1WCS_A 1N1T_A 1N1V_A 2FHR_A 2AGS_A .... Probab=98.52 E-value=6e-07 Score=85.18 Aligned_cols=129 Identities=17% Similarity=0.322 Sum_probs=81.3 Q ss_pred ceEEEEEEeecCCCCc--eEEEEEeCCCCeEEEEEEEcccccCceEEEEEEEeccCCcceeEEEeeeccCCCCceEEEEE Q FBpp0301686 87 EFAILITFRQSSLKGG--YLFSVVNPLDTVVQLGVHLSPVVKNSYNVSLVYTQADQNIGRKLASFGVAHVPDKWNSIALQ 164 (1039) Q Consensus 87 eFSllaT~R~~~~~~g--~LfsI~~~~d~~~qlgL~lsg~~~~~~~i~L~Y~~~~~~~~q~~~sF~v~laDg~WHrlaLs 164 (1039) +|||.++||+...... .+|. ...++ ..+.|.+.... .+.|.+....... ..+.....+..++||+|+|. T Consensus 18 ~~Ti~~w~k~~~~~~~~~~~~~--~~~~~-~~~~~~~~~~~----~~~~~~~~~~~~~--~~~~~~~~~~~~~W~~va~~ 88 (152) T PF13385_consen 18 DFTISFWVKPDSPSSSDQFIFS--SSNDN-SGFSLFIDSSG----SLQFRVSNGNGTW--YSVTSDTPLSPGQWHHVAIT 88 (152) T ss_dssp SEEEEEEEEESS-SSSEEEEEE--SCCTS-EEEEEEEETTS----CEEEEECCSECCS--CEEE-CS---TT-EEEEEEE T ss_pred CEEEEEEEEECCCCCcceEEEE--eCCCC-CEEEEEEeCCC----EEEEEEECCCCce--EEEEecccccCCcEEEEEEE Confidence 9999999999766653 3343 32233 67777774431 5677665543222 23344556678999999999 Q ss_pred EeCCeEEEEEccccceeeeccCCCcceeccCCeeEEEEecCcccCcccccceeeeEEeeCcccc Q FBpp0301686 165 VLSDKVSFYYDCELRNTTLVTREPIELVFDSASTLYIGQAGSIIGGKFEGYLEKINVYGNPDAI 228 (1039) Q Consensus 165 V~g~~VtLyVDC~~~~t~~l~r~~~~l~~~~~~~l~IGq~g~~~~~~F~G~LQ~L~i~~dp~~~ 228 (1039) +.+..++||||.+++.+...... ........++||.... ....|.|.|.+|+|+..+... T Consensus 89 ~~~~~~~~yvnG~~~~~~~~~~~---~~~~~~~~~~iG~~~~-~~~~f~G~i~~v~i~~~alt~ 148 (152) T PF13385_consen 89 YDGGTVRLYVNGKLVGSSTNTGN---FSSSSSSPLTIGASSW-GSRYFKGYIDEVRIYDRALTP 148 (152) T ss_dssp EETTEEEEEETTEEEEEETCES----SSTCCCCEEEESS-TT-TT---EEEEEEEEEESS---H T ss_pred EccceeeeEEcceEEEEEEeeec---ccCCCceeEEEeecCC-CCCceEEEEEEEEEECCcCCh Confidence 99999999999999988755432 1123467899998852 457899999999999876543
No 5
>PF00054 Laminin_G_1: Laminin G domain; InterPro: IPR001791 Laminins are large heterotrimeric glycoproteins involved in basement membrane function []. The Laminin G or LNS domain (for Laminin-alpha, Neurexin and Sex hormone-binding globulin) is an around 180 amino acid long domain found in a large and diverse set of extracellular proteins [, ]. The laminin globular (G) domain can be found in one to several copies in various laminin family members, including a large number of extracellular proteins. The C terminus of the laminin alpha chain contains a tandem repeat of five laminin G domains, which are critical for heparin-binding and cell attachment activity []. Laminin alpha4 is distributed in a variety of tissues including peripheral nerves, dorsal root ganglion, skeletal muscle and capillaries; in the neuromuscular junction, it is required for synaptic specialisation []. The structure of the laminin-G domain has been predicted to resemble that of pentraxin []. Laminin G domains can vary in their function, and a variety of binding functions have been ascribed to different LamG modules. For example, the laminin alpha1 and alpha2 chains each have five C-teminal laminin G domains, where only domains LG4 and LG5 contain binding sites for heparin, sulphatides and the cell surface receptor dystroglycan []. Laminin G-containing proteins appear to have a wide variety of roles in cell adhesion, signalling, migration, assembly and differentiation. Proteins with laminin-G domains include: Laminin. Merosin. Agrin. Neurexins. Vitamin K dependent protein S. Sex steroid binding protein SBP/SHBG. Drosophila proteins Slit, Crumbs, Fat. several proteoglycan precursors. ; PDB: 2C5D_A 4RA0_A 1H30_A 1LHW_A 1KDK_A 1LHU_A 1KDM_A 1LHO_A 1D2S_A 1F5F_A .... Probab=97.71 E-value=0.00019 Score=67.34 Aligned_cols=114 Identities=18% Similarity=0.228 Sum_probs=76.3 Q ss_pred CCCCceEEEEEeCCCCeEEEEEEEcccccCceEEEEEEEeccCCcceeEEEeeeccCCCCceEEEEEEeCCeEEEEEccc Q FBpp0301686 98 SLKGGYLFSVVNPLDTVVQLGVHLSPVVKNSYNVSLVYTQADQNIGRKLASFGVAHVPDKWNSIALQVLSDKVSFYYDCE 177 (1039) Q Consensus 98 ~~~~g~LfsI~~~~d~~~qlgL~lsg~~~~~~~i~L~Y~~~~~~~~q~~~sF~v~laDg~WHrlaLsV~g~~VtLyVDC~ 177 (1039) ....+.||-.-+. +...+|.|.|..- .|.|+|...++ ...+.....+.|++||+|.+......++|.||.+ T Consensus 4 ~~~~GlLly~~~~-~~~df~~l~L~~G-----~l~~~~~~G~g---~~~~~~~~~i~dg~wh~i~~~r~~~~~~l~Vd~~ 74 (131) T PF00054_consen 4 TSPNGLLLYLGSK-DESDFLALELVNG-----RLEFRYNLGGG---PVTLQSPQKINDGKWHTIEVERNGRNGSLQVDGE 74 (131) T ss_dssp SSSSEEEEEEESS-TTSSEEEEEEETT-----EEEEEEESSSE---EEEEEECSETTSSSEEEEEEEEETTEEEEEETTS T ss_pred CCCCceEEECCcC-CCCCEEEEEeeCC-----EEEEEEecCCc---ceeeccCccccCCCeEEEEeeccceeEEEEEcCc Confidence 3466788866554 4457888888654 58888865543 2344445568899999999999999999999988 Q ss_pred cceeeeccCCCc-ceeccCCeeEEEEecC-c-------ccCcccccceeeeEEe Q FBpp0301686 178 LRNTTLVTREPI-ELVFDSASTLYIGQAG-S-------IIGGKFEGYLEKINVY 222 (1039) Q Consensus 178 ~~~t~~l~r~~~-~l~~~~~~~l~IGq~g-~-------~~~~~F~G~LQ~L~i~ 222 (1039) +..+...+.... .+.+ ...||||--. . .....|+|=|++|+|- T Consensus 75 ~~~~~~~~~~~~~~l~~--~~~lyvGG~p~~~~~~~~~~~~~~f~GCi~~~~in 126 (131) T PF00054_consen 75 EPVTGSSPSGATSQLDF--SDPLYVGGLPSSSSPPRIFVSSPGFKGCIRDLKIN 126 (131) T ss_dssp EEEEEEECSSSSSTEEE--CSEEEESSSSTTTGCGSSCSCCSB-EEEEEEEEET T ss_pred cceeEEeeccccccccc--CCcEEEeCCCchhccccccCcCCCeeEEEEEeEEC Confidence 874432222111 2433 3459998754 1 1124589999999884
No 6
>PF00354 Pentaxin: Pentaxin family; InterPro: IPR001759 Pentaxins (or pentraxins) [, ] are a family of proteins which show, under electron microscopy, a discoid arrangement of five noncovalently bound subunits. Proteins of the pentaxin family are involved in acute immunological responses []. Three of the principal members of the pentaxin family are serum proteins: namely, C-reactive protein (CRP) [], serum amyloid P component protein (SAP) [], and female protein (FP) []. CRP is expressed during acute phase response to tissue injury or inflammation in mammals. The protein resembles antibody and performs several functions associated with host defence: it promotes agglutination, bacterial capsular swelling and phagocytosis, and activates the classical complement pathway through its calcium-dependent binding to phosphocholine. CRPs have also been sequenced in an invertebrate, Limulus polyphemus (Atlantic horseshoe crab), where they are a normal constituent of the hemolymph. SAP is a vertebrate protein that is a precursor of amyloid component P. It is found in all types of amyloid deposits, in glomerular basement menbrane and in elastic fibres in blood vessels. SAP binds to various lipoprotein ligands in a calcium-dependent manner, and it has been suggested that, in mammals, this may have important implications in atherosclerosis and amyloidosis. FP is a SAP homologue found in Mesocricetus auratus (Golden hamster). The concentration of this plasma protein is altered by sex steroids and stimuli that elicit an acute phase response. Pentaxin proteins expressed in the nervous system are neural pentaxin I (NPI) and II (NPII) []. NPI and NPII are homologous and can exist within one species. It is suggested that both proteins mediate the uptake of synaptic macromolecules and play a role in synaptic plasticity. Apexin, a sperm acrosomal protein, is a homologue of NPII found in Cavia porcellus (Guinea pig) []. PTX3 (or TSG-14) protein is a cytokine-induced protein that is homologous to CRPs and SAPs, but its function is not yet known.; PDB: 2A3W_F 4AVV_C 4AYU_A 4AVT_J 3KQR_C 3D5O_D 2A3X_G 1SAC_D 2W08_B 1GYK_B .... Probab=97.67 E-value=0.00052 Score=69.34 Aligned_cols=125 Identities=18% Similarity=0.243 Sum_probs=81.9 Q ss_pred cceEEEEEEeecCCCCceEEEEEeCCCCeEEEEEEEcccccCceEEEEEEEeccCCcceeEEEeeeccCCCCceEEEEEE Q FBpp0301686 86 YEFAILITFRQSSLKGGYLFSVVNPLDTVVQLGVHLSPVVKNSYNVSLVYTQADQNIGRKLASFGVAHVPDKWNSIALQV 165 (1039) Q Consensus 86 ~eFSllaT~R~~~~~~g~LfsI~~~~d~~~qlgL~lsg~~~~~~~i~L~Y~~~~~~~~q~~~sF~v~laDg~WHrlaLsV 165 (1039) .+|+|-+++|....+.++|||..... ....|-+..+.. ..+.|+. .+ ....|.+...+.+||+|.++- T Consensus 25 ~~fTvC~w~~~~~~~~~tifSYa~~~-~~nell~~~~~~----~~~~l~i--~g-----~~~~~~~~~~~~~WhhvC~tW 92 (194) T PF00354_consen 25 SAFTVCFWVRTDLSNSGTIFSYATSS-NDNELLLFISSN----GGFELYI--NG-----SSISFSVSPSDLQWHHVCVTW 92 (194) T ss_dssp SEEEEEEEEEESGSS-EEEEEEEETT-EEEEEEEEEETT----TEEEEEE--TT-----EEEEEEECCECSSEEEEEEEE T ss_pred ccEEEEEEEEEcCCCceEEEEeecCC-CCcceEEEEeCC----CeEEEEE--CC-----cEEEEecccCCCCCEEEEEEc Confidence 39999999999888889999988863 333333323322 1344443 22 233455556677999999987 Q ss_pred eC--CeEEEEEccccceeeeccCCCcceeccCCeeEEEEecCc------ccCcccccceeeeEEeeCc Q FBpp0301686 166 LS--DKVSFYYDCELRNTTLVTREPIELVFDSASTLYIGQAGS------IIGGKFEGYLEKINVYGNP 225 (1039) Q Consensus 166 ~g--~~VtLyVDC~~~~t~~l~r~~~~l~~~~~~~l~IGq~g~------~~~~~F~G~LQ~L~i~~dp 225 (1039) +. ..+.||+|-+...+..+.+. ..+...+.|.|||.-. .....|.|+|.+|+|-... T Consensus 93 ~s~~G~~~ly~dG~~~~~~~~~~g---~~i~~gG~~vlGQeQd~~gGgf~~~qsf~G~is~~~iWd~v 157 (194) T PF00354_consen 93 DSSTGSWQLYVDGERVSSGGLAKG---YSIPPGGTLVLGQEQDSYGGGFDSSQSFVGEISDVNIWDRV 157 (194) T ss_dssp ETTTTEEEEEETTEEEEEEESSTT-----B-SSEEEEESS-BSBTTBTCSGGGB--EEEEEEEEESS- T ss_pred ccCCccEEEEeCCEeeeeeeccCC---ceECCCCEEEEEEcccCCCCccCcccccceEEeeEEEEeee Confidence 65 79999999998776555432 3356789999999621 1235799999999998775
No 7
>PF02973 Sialidase: Sialidase, N-terminal domain; InterPro: IPR004124 O-Glycosyl hydrolases (3.2.1. from EC) are a widespread group of enzymes that hydrolyse the glycosidic bond between two or more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. A classification system for glycosyl hydrolases, based on sequence similarity, has led to the definition of 85 different families [, ]. This classification is available on the CAZy (CArbohydrate-Active EnZymes) web site. Sialidases (GH33 from CAZY) hydrolyse alpha-(2->3)-, alpha-(2->6)-, alpha-(2->8)-glycosidic linkages of terminal sialic residues in oligosaccharides, glycoproteins, glycolipids, colominic acid and synthetic substrates. Sialidases may act as pathogenic factors in microbial infections []. The 1.8 A structure of trans-sialidase from leech (Macrobdella decora, Q27701 from SWISSPROT) in complex with 2-deoxy-2, 3-didehydro-NeuAc was solved. The refined model comprising residues 81-769 has a catalytic beta-propeller domain, a N-terminal lectin-like domain and an irregular beta-stranded domain inserted into the catalytic domain [].; GO: 0004308 exo-alpha-sialidase activity, 0005975 carbohydrate metabolic process; PDB: 4FQ4_A 4FPJ_A 4FPO_B 4FOW_A 2VW2_A 4FPL_A 4FPE_A 4FPC_A 2VW0_A 4FOY_A .... Probab=97.50 E-value=0.00089 Score=66.89 Aligned_cols=134 Identities=13% Similarity=0.169 Sum_probs=85.6 Q ss_pred cceEEEEEEeecCCCC-ceEEEEEeCCCCeEEEEEEEcccccCceEEEEEEEeccCCcceeEEEe-eeccCCCCce---- Q FBpp0301686 86 YEFAILITFRQSSLKG-GYLFSVVNPLDTVVQLGVHLSPVVKNSYNVSLVYTQADQNIGRKLASF-GVAHVPDKWN---- 159 (1039) Q Consensus 86 ~eFSllaT~R~~~~~~-g~LfsI~~~~d~~~qlgL~lsg~~~~~~~i~L~Y~~~~~~~~q~~~sF-~v~laDg~WH---- 159 (1039) -+++|++.||+...+. ..||+|.+......||-|.+... +|-+..++.++......... .+.+..++|+ T Consensus 31 ~~gTI~v~Fk~t~~~~~qsLfsiSns~~~n~yF~lyi~~~-----~lG~E~R~~~~~~~y~~~~~~~a~v~~~~~~~~~~ 105 (189) T PF02973_consen 31 EEGTIVVEFKPTSKSGIQSLFSISNSKKGNEYFHLYIRNN-----TLGFELRDQSGNQFYLSSRPAPASVWGGYWNSVTF 105 (189) T ss_dssp SSEEEEEEEEESSSSSEEEEEEEE-SSTTTEEEEEEEETT-----EEEEEEEETTTTBCEEEEET-SSB--TECTCEEEE T ss_pred cceEEEEEEecCCCCceeEEEEecCCCCCCceEEEEEECC-----EEEEEEccCCCCccccccccchhhccccccCCceE Confidence 3899999999965553 34999999888889999999775 58888888775442222221 1455578997 Q ss_pred -EEEEEEe--CCeEEEEEcccccee--eeccCCCcceeccCCeeEEEEecCccc--CcccccceeeeEEeeCccc Q FBpp0301686 160 -SIALQVL--SDKVSFYYDCELRNT--TLVTREPIELVFDSASTLYIGQAGSII--GGKFEGYLEKINVYGNPDA 227 (1039) Q Consensus 160 -rlaLsV~--g~~VtLyVDC~~~~t--~~l~r~~~~l~~~~~~~l~IGq~g~~~--~~~F~G~LQ~L~i~~dp~~ 227 (1039) .++|.+. ...++||+|. ...+ ....+...+ ++.--.++||...+.. .--|.|.|.+|+||-.+.. T Consensus 106 ntva~~ad~~~~~yklY~NG-~l~~~s~~~~~Fi~d--i~~~n~~~lG~t~R~~~~~y~F~G~I~n~~iYn~aLs 177 (189) T PF02973_consen 106 NTVAFVADSPNKGYKLYVNG-VLSVFSKKSGKFISD--IPGLNSVQLGGTKRAGSNAYGFNGTIDNLKIYNRALS 177 (189) T ss_dssp EEEEEEEETTTTEEEEEETT-EEEEEEESTSS-GGG--STT--EEEESSEEETTEEES--EEEEEEEEEESS--- T ss_pred EEEEEeecCCCceEEEEEcc-EEEEEecchhhHhhc--CCCCceEEEeeeEeCCCcccCcccEEEEEEEECCcCC Confidence 6777775 5799999999 3222 222333233 3344578888753322 2358999999999988754
No 8
>PF07953 Toxin_R_bind_N: Clostridium neurotoxin, N-terminal receptor binding; InterPro: IPR012928 The Clostridium neurotoxin family is composed of tetanus neurotoxin and seven serotypes of botulinum neurotoxin. The structure of the botulinum neurotoxin reveals a four domain protein. The N-terminal catalytic domain (IPR000395 from INTERPRO), the central translocation domain and two receptor binding domains []. This domain is the N-terminal receptor binding domain, which is comprised of two seven-stranded beta-sheets sandwiched together to form a jelly role motif []. The role of this domain in receptor binding appears to be indirect. ; GO: 0004222 metalloendopeptidase activity, 0050827 toxin receptor binding, 0009405 pathogenesis, 0051609 inhibition of neurotransmitter uptake, 0005576 extracellular region; PDB: 3RSJ_B 3FUQ_A 1DFQ_A 1A8D_A 1FV3_A 1YXW_A 1DLL_A 1D0H_A 1YYN_A 1FV2_A .... Probab=95.06 E-value=0.11 Score=51.03 Aligned_cols=123 Identities=12% Similarity=0.215 Sum_probs=76.2 Q ss_pred CcEEEccCCCCCcceEEcCCcc---ccCcccccCCCCCc-ceEEEEEEeecCC-C------CceEEEEEeCCCCeEEEEE Q FBpp0301686 51 AGIEFGEAEDGFPAFRFLQTAD---VKSPYRMLLPEKLY-EFAILITFRQSSL-K------GGYLFSVVNPLDTVVQLGV 119 (1039) Q Consensus 51 ~GV~~v~G~d~~pAy~f~~~a~---l~~pt~~~fp~~~~-eFSllaT~R~~~~-~------~g~LfsI~~~~d~~~qlgL 119 (1039) ..|.++. -..-||.+..... ....+..+|-+.++ .|||.+|+|-.+. + .-+|+.=.. +.-=..| T Consensus 18 ~~v~l~~--in~n~~~L~~s~~s~v~v~~~n~i~yn~~~nnFSIsFWlRi~k~~~~~~~~neytII~~~~---nnsGWkI 92 (195) T PF07953_consen 18 GDVQLNY--INNNQFKLYSSNQSEVIVIQNNNIFYNSMYNNFSISFWLRIPKYDNNINLHNEYTIINCMK---NNSGWKI 92 (195) T ss_dssp TTEEEES--SSTTEEEEESSTTCEEEEEEETTGSCSCSSSEEEEEEEEEEECHHSCHHTTSEEEEEEEEE---TTEEEEE T ss_pred CCEEEEE--cCcceEEEccCCcccEEEEecceEEEeccccceeEEEEEEcCCcccccccccceEEEEecc---CCCceEE Confidence 4455554 3345677775544 22335677777888 9999999995221 1 223444332 2334445 Q ss_pred EEcccccCceEEEEEEEeccCCcceeEEEeeec----cCC--CCceEEEEEEeC-CeEEEEEccccceeeecc Q FBpp0301686 120 HLSPVVKNSYNVSLVYTQADQNIGRKLASFGVA----HVP--DKWNSIALQVLS-DKVSFYYDCELRNTTLVT 185 (1039) Q Consensus 120 ~lsg~~~~~~~i~L~Y~~~~~~~~q~~~sF~v~----laD--g~WHrlaLsV~g-~~VtLyVDC~~~~t~~l~ 185 (1039) .|... .+.+...+..+. ++...|++. ++| .+||.+++++.. ....||+|.+++....+. T Consensus 93 ~l~~n-----~lI~tl~D~ng~--~k~i~f~y~~~~~~sdyiNkW~fItIt~~rL~~~~IYING~L~~~~~I~ 158 (195) T PF07953_consen 93 SLRNN-----NLIWTLQDSNGN--EKNIYFRYSESISISDYINKWHFITITNNRLGNSKIYINGKLIDNESIK 158 (195) T ss_dssp EEETT-----EEEEEEEETTSE--EEEEEEESSSTSSTTSSTTSEEEEEEEEETTSEEEEEETTEEEEEEE-T T ss_pred EEeCC-----eEEEEEEccCCc--eEEEEEEeeccCChhHhccceEEEEEEeccCCcceEEECCEEEeccchh Confidence 55443 455555554443 356677532 222 799999999999 777999999999887554
No 9
>PF06439 DUF1080: Domain of Unknown Function (DUF1080); InterPro: IPR010496 This is a family of proteins of unknown function.; PDB: 3IMM_B 4JQT_A 3OSD_A 4QHZ_B 3HBK_A 3H3L_A 3NMB_A 3S5Q_A 4HXC_A 3U1X_A .... Probab=92.23 E-value=0.038 Score=53.75 Aligned_cols=104 Identities=20% Similarity=0.330 Sum_probs=60.7 Q ss_pred cccCCCCCc-ceEEEEEEee-cCCCCceEEEEEe---CCCCeEEEEEEEcccccCceEEEEEEEeccCCcc-----eeEE Q FBpp0301686 78 RMLLPEKLY-EFAILITFRQ-SSLKGGYLFSVVN---PLDTVVQLGVHLSPVVKNSYNVSLVYTQADQNIG-----RKLA 147 (1039) Q Consensus 78 ~~~fp~~~~-eFSllaT~R~-~~~~~g~LfsI~~---~~d~~~qlgL~lsg~~~~~~~i~L~Y~~~~~~~~-----q~~~ 147 (1039) ..++.+..+ +|.|-+.+|. ...+.+++|...+ ..+....+.+.|..... . ....... +... .... T Consensus 44 ~~l~~~~~~~df~l~~d~k~~~~~~~gi~~r~~~~~~~~~~~~gy~~~i~~~~~---~-~~~~~~~-G~~~~~~~~~~~~ 118 (185) T PF06439_consen 44 GYLYTDKEFSDFTLEVDFKITPGGNSGILFRAQDNGENQDPNNGYEVQIDNSGR---G-SKLNRST-GSIYGEIEKNVEA 118 (185) T ss_dssp EEEEESSEBSSEEEEEEEEEGTTSEEEEEEEESTECESSGGGTSEEEEEE-TTT---C-STTTTST-TSBTTTBETTB-S T ss_pred eEEEECCCCCCEEEEEEEEEcCCCcceEEEEeccccCCCccceEEEEEEeCccC---c-cccCccc-ceEeeeeeccccc Confidence 344444445 9999999995 5567777777661 11334445555554322 0 0000011 1110 1122 Q ss_pred EeeeccCCCCceEEEEEEeCCeEEEEEccccceeeeccC Q FBpp0301686 148 SFGVAHVPDKWNSIALQVLSDKVSFYYDCELRNTTLVTR 186 (1039) Q Consensus 148 sF~v~laDg~WHrlaLsV~g~~VtLyVDC~~~~t~~l~r 186 (1039) +..+.+..++||+|.|.|.+++|++|||-.++.+...++ T Consensus 119 ~~~~~~~~~~W~~~~I~~~g~~i~v~vNG~~v~~~~d~~ 157 (185) T PF06439_consen 119 SVNVAFKPGQWNTVRIEVKGNRITVYVNGKLVLEFTDPD 157 (185) T ss_dssp SSCGS--TTSEEEEEEEEETTEEEEEETTEEEEEEETTS T ss_pred ccccccCCCceEEEEEEEECCEEEEEECCEEEEEEEcCC Confidence 234555678999999999999999999999998775554
No 10
>PF01410 COLFI: Fibrillar collagen C-terminal domain; InterPro: IPR000885 Collagens contain a large number of globular domains in between the regions of triple helical repeats IPR008160 from INTERPRO. These domains are involved in binding diverse substrates. One of these domains is found at the C terminus of fibrillar collagens. The exact function of this domain is unknown.; GO: 0005201 extracellular matrix structural constituent, 0005581 collagen trimer; PDB: 4AEJ_A 4AE2_A 4AK3_A. Probab=90.09 E-value=0.029 Score=57.89 Aligned_cols=36 Identities=28% Similarity=0.422 Sum_probs=25.4 Q ss_pred CCchHHHHHHHHHHHhhcCCCCCCCCCCCCCCCCCC Q FBpp0301686 679 ARSSLDELKALRELQDLRDRPDGTAEPPRQPGHSHK 714 (1039) Q Consensus 679 ~~~~~dtLksL~~~~~~~~~P~Gt~~~Pa~~~~d~~ 714 (1039) +.+++..|++|++.++.++.|.||+.+|||+|+|+. T Consensus 2 ~~~i~~~l~~l~~~i~~~~~P~Gtk~~PArtC~dl~ 37 (233) T PF01410_consen 2 DEEIFAALDSLKEEIESIKKPDGTKENPARTCRDLK 37 (233) T ss_dssp -----HHHHHHHHHHHHHHS--SSSSS-BSSHHHHH T ss_pred HHHHHHHHHHHHHHHHhccCCCCCccChHHHHHHHH Confidence 346788999999988888899999999999999863
No 11
>PF14099 Polysacc_lyase: Polysaccharide lyase; InterPro: IPR025975 This family includes heparin lyase I (4.2.2.7 from EC). Heparin lyase I depolymerises heparin by cleaving the glycosidic linkage next to an iduronic acid moiety [, ]. The structure of heparin lyase I consists of a beta-jelly roll domain with a long, deep substrate-binding groove and an unusual thumb domain containing many basic residues extending from the main body of the enzyme []. This family also includes glucuronan lyase, (4.2.2.14 from EC) []. The structure glucuronan lyase is a beta-jelly roll [].; PDB: 2ZZJ_A 3IKW_A 3ILR_A 3INA_A 3IMN_A 3IN9_A. Probab=77.99 E-value=5 Score=39.19 Aligned_cols=78 Identities=10% Similarity=0.087 Sum_probs=44.2 Q ss_pred CceEEEEEeCCCC----eEEEEEEEcccccCceEEEEEEEeccCCcce--eEEEe-eeccCCCCceEEEEEEeC-----C Q FBpp0301686 101 GGYLFSVVNPLDT----VVQLGVHLSPVVKNSYNVSLVYTQADQNIGR--KLASF-GVAHVPDKWNSIALQVLS-----D 168 (1039) Q Consensus 101 ~g~LfsI~~~~d~----~~qlgL~lsg~~~~~~~i~L~Y~~~~~~~~q--~~~sF-~v~laDg~WHrlaLsV~g-----~ 168 (1039) ...|+.+....+. .+-|.|.+... ...+.+.+......... ....+ ..++.-|+||+|.|.|.= . T Consensus 84 ~~~i~Q~~~~~~~~~~~~P~~~l~~~~g---~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~g~W~~~~~~i~~s~~~~G 160 (212) T PF14099_consen 84 WFIIFQWHGSPDGEQSGSPPLALRISGG---RLYLRVRNDDNTSNPNTSNIARYIPSAPIPRGKWHDFVVHIKWSPDGDG 160 (212) T ss_dssp EEEEEEEEEE-TTSSSEEECEEEEEETT---EEEEEEEEE--ETTCEEEEEEEEEECCCS-TTSEEEEEEEEEE-CCTCE T ss_pred eeEEEEEEeCCCCCCCCCCeEEEEEECC---EEEEEEEcCCCCcccccceeeEeeccCccCCCCEEEEEEEEEECCCCCE Confidence 4457777765444 67888888543 23333333333101101 11222 446667999999999922 5 Q ss_pred eEEEEEcccccee Q FBpp0301686 169 KVSFYYDCELRNT 181 (1039) Q Consensus 169 ~VtLyVDC~~~~t 181 (1039) .|.|++|-+++-. T Consensus 161 ~i~vw~nG~~v~~ 173 (212) T PF14099_consen 161 YIEVWVNGKLVVD 173 (212) T ss_dssp EEEEEECCEEEEE T ss_pred EEEEEECCEEEEE Confidence 7999999977643
No 12
>PF16346 DUF4975: Domain of unknown function (DUF4975) Probab=76.46 E-value=13 Score=35.89 Aligned_cols=112 Identities=13% Similarity=0.073 Sum_probs=73.1 Q ss_pred ceEEcCCccccCcccccCCCCCc-ceEEEEEEeecCCCCceEEEEEeCCCCeEEEEEEEcccccCceEEEEEEEeccCCc Q FBpp0301686 64 AFRFLQTADVKSPYRMLLPEKLY-EFAILITFRQSSLKGGYLFSVVNPLDTVVQLGVHLSPVVKNSYNVSLVYTQADQNI 142 (1039) Q Consensus 64 Ay~f~~~a~l~~pt~~~fp~~~~-eFSllaT~R~~~~~~g~LfsI~~~~d~~~qlgL~lsg~~~~~~~i~L~Y~~~~~~~ 142 (1039) .|.|... +..+|+ .|+ .+-|.+|++.......+=|++....+....+.|.+++..++..++.|.=....+.. T Consensus 37 ~~~l~~~------~~v~f~-~L~~~~kIs~ti~~~~~~~~FGi~f~~~~d~~~~Y~i~~np~~~~~~~~~f~~~~~~~~~ 109 (176) T PF16346_consen 37 GYTLSGN------AYVLFN-RLPGTNKISATIKFSEGTDKFGISFRRDSDSEEGYYIRFNPENNNRNRLNFENEGNIGKG 109 (176) T ss_pred eEEEecc------eEEEec-cCCCceEEEEEEEeCCCCCeEEEEEEECCCccccEEEEEeeccccceEEEEEecCccccc Confidence 4777652 234454 455 88899999988777767666666667888899999986433335555222221111 Q ss_pred -ceeEEEeeeccCCCCceEEEEEEeCCeEEEEEccccceee Q FBpp0301686 143 -GRKLASFGVAHVPDKWNSIALQVLSDKVSFYYDCELRNTT 182 (1039) Q Consensus 143 -~q~~~sF~v~laDg~WHrlaLsV~g~~VtLyVDC~~~~t~ 182 (1039) .+....+.+.+..++=.+|.|-++++-+.||||-+-..+. T Consensus 110 ~~~~~~~~~~~~~a~~~y~v~I~~d~SV~v~YVNd~vAlTt 150 (176) T PF16346_consen 110 FIQGIDEYPFELPADNEYHVKIVIDNSVCVVYVNDEVALTT 150 (176) T ss_pred ccccccceeeecCCCCEEEEEEEEcCCEEEEEECCeEEEEE Confidence 1111233455567888999999999999999988776553
No 13
>PF16323 DUF4959: Domain of unknown function (DUF4959) Probab=63.47 E-value=74 Score=31.34 Aligned_cols=20 Identities=15% Similarity=0.223 Sum_probs=11.7 Q ss_pred hhhHHHHHHHHHHHHhhhcc Q FBpp0301686 4 LQGVMFALAMICTLLVPVLG 23 (1039) Q Consensus 4 ~~~~~~~~~~~~~~~~~~~~ 23 (1039) |+..++++++++++|+.|.. T Consensus 1 mk~~~~~~~~~~~~l~sC~~ 20 (225) T PF16323_consen 1 MKKYLLLLLLALLLLASCKE 20 (225) T ss_pred ChhhHHHHHHHHHhEEecCC Confidence 34555566666666656665
No 14
>PF00722 Glyco_hydro_16: Glycosyl hydrolases family 16; InterPro: IPR000757 O-Glycosyl hydrolases (3.2.1. from EC) are a widespread group of enzymes that hydrolyse the glycosidic bond between two or more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. A classification system for glycosyl hydrolases, based on sequence similarity, has led to the definition of 85 different families [, ]. This classification is available on the CAZy (CArbohydrate-Active EnZymes) web site. Glycoside hydrolase family 16 GH16 from CAZY comprises enzymes with a number of known activities; lichenase (3.2.1.73 from EC); xyloglucan xyloglucosyltransferase (2.4.1.207 from EC); agarase (3.2.1.81 from EC); kappa-carrageenase (3.2.1.83 from EC); endo-beta-1,3-glucanase (3.2.1.39 from EC); endo-beta-1,3-1,4-glucanase (3.2.1.6 from EC); endo-beta-galactosidase (3.2.1.103 from EC).; GO: 0004553 hydrolase activity, hydrolyzing O-glycosyl compounds, 0005975 carbohydrate metabolic process; PDB: 3ILN_A 1UMZ_A 1UN1_B 3HR9_A 1MVE_A 3H0O_A 1ZM1_A 3AXD_B 2R49_A 3AXE_A .... Probab=61.21 E-value=35 Score=31.61 Aligned_cols=31 Identities=19% Similarity=0.372 Sum_probs=27.7 Q ss_pred ccCCCCceEEEEEEeCCeEEEEEccccceee Q FBpp0301686 152 AHVPDKWNSIALQVLSDKVSFYYDCELRNTT 182 (1039) Q Consensus 152 ~laDg~WHrlaLsV~g~~VtLyVDC~~~~t~ 182 (1039) ...+..||...|....+.|.+|||-+++.+. T Consensus 106 ~~~~~~~H~y~v~W~~~~i~fyvDg~~~~~~ 136 (177) T PF00722_consen 106 FDDSNDFHTYGVEWTPDSIRFYVDGKLVRTV 136 (177) T ss_dssp STTTTSEEEEEEEEETTEEEEEETTEEEEEE T ss_pred ccccccceEEEEEEeeeeeeeccCCeeEEee Confidence 3457899999999999999999999998875
No 15
>PF02057 Glyco_hydro_59: Glycosyl hydrolase family 59; InterPro: IPR001286 O-Glycosyl hydrolases (3.2.1. from EC) are a widespread group of enzymes that hydrolyse the glycosidic bond between two or more carbohydrates, or between a carbohydrate and a non-carbohydrate moiety. A classification system for glycosyl hydrolases, based on sequence similarity, has led to the definition of 85 different families [, ]. This classification is available on the CAZy (CArbohydrate-Active EnZymes) web site. Glycoside hydrolase family 59 GH59 from CAZY comprises enzymes with only one known activity; galactocerebrosidase (3.2.1.46 from EC). Globoid cell leukodystrophy (Krabbe disease) is a severe, autosomal recessive disorder that results from deficiency of galactocerebrosidase (GALC) activity [, , ]. GALC is responsible for the lysosomal catabolism of certain galactolipids, including galactosylceramide and psychosine [].; GO: 0004336 galactosylceramidase activity, 0006683 galactosylceramide catabolic process; PDB: 4CCC_A 3ZR6_A 4CCE_A 4CCD_A 3ZR5_A. Probab=58.12 E-value=42 Score=38.33 Aligned_cols=50 Identities=14% Similarity=0.210 Sum_probs=32.1 Q ss_pred eeccCCCCceEEEEEEeCCeEEEEEccccceeeeccCCCcceeccCCeeEEEEecC Q FBpp0301686 150 GVAHVPDKWNSIALQVLSDKVSFYYDCELRNTTLVTREPIELVFDSASTLYIGQAG 205 (1039) Q Consensus 150 ~v~laDg~WHrlaLsV~g~~VtLyVDC~~~~t~~l~r~~~~l~~~~~~~l~IGq~g 205 (1039) .+.+..++||+|+|.|+++.++-|||-..+-+.... .+...+.+-||..+ T Consensus 605 ~~~~~~~~WhtLtL~~~g~~~~g~lng~~l~~~~~~------~~p~~G~aaiGT~~ 654 (669) T PF02057_consen 605 KAGVGAGRWHTLTLTVKGSSITGSLNGTVLWTNVPV------SFPKNGWAAIGTSS 654 (669) T ss_dssp E-S--SS-EEEEEEEEETTEEEEEETTEEEEEEEE--------SS---EEEEEESS T ss_pred EeccCCCceEEEEEEEEccEEEEEECCEEEEEeccc------CCCCCceEEEEcCC Confidence 455667899999999999999999999877543221 12345777788763
No 16
>PF08787 Alginate_lyase2: Alginate lyase; InterPro: IPR014895 Alginate lyases are enzymes that degrade the linear polysaccharide alignate. They cleave the glycosidic linkage of alignate through a beta-elimination reaction. This region forms an all beta fold, which is different to the all alpha fold of IPR008397 from INTERPRO. ; PDB: 1UAI_A 1VAV_B 3ZPY_B 4BE3_B 4Q8L_B 4Q8K_A 1J1T_A 2Z42_A 2ZAC_A 2ZAB_A .... Probab=55.02 E-value=61 Score=32.01 Aligned_cols=93 Identities=10% Similarity=-0.032 Sum_probs=56.0 Q ss_pred EEEEEEeec--CCC------CceEEEEEeCC--CCeEEEEEEEccccc-CceEEEEEEEecc-CCcceeEEEeeeccCCC Q FBpp0301686 89 AILITFRQS--SLK------GGYLFSVVNPL--DTVVQLGVHLSPVVK-NSYNVSLVYTQAD-QNIGRKLASFGVAHVPD 156 (1039) Q Consensus 89 SllaT~R~~--~~~------~g~LfsI~~~~--d~~~qlgL~lsg~~~-~~~~i~L~Y~~~~-~~~~q~~~sF~v~laDg 156 (1039) .|.++++.. +.+ +-++-+|.... .....|.|....... ..-.|.+.+.... .........+ -.+.-| T Consensus 86 ~l~a~l~V~~~~~~~~~~~~~viigQIH~~~~~~~~pllkl~~~~~~~~~~g~v~~~~~~~~~~~~~~~~~~~-~~i~LG 164 (236) T PF08787_consen 86 TLEATLAVTQVPSGGKSNNPRVIIGQIHGKDGGSNPPLLKLYYRKEPGNEKGSVYAYVKDNNPDGGDISSNVY-GGIPLG 164 (236) T ss_dssp EEEEEEEEEE-TTTSCTTTCEEEEEEEEESSSTSCEEEEEEEEEESTTTESSEEEEEEESSTCTTSEEEEEEE-EEEETT T ss_pred EEEEEEEEEecCCCCCceeeeEEEEEEecCCCCCCccEEEEEEEEeeccCCCeEEEEEeccCCCCCceEEEeE-cCccCC Confidence 788888852 222 23566777763 367777887742110 0115677776321 1111122222 133457 Q ss_pred CceEEEEEEeCCeEEEEEccccceee Q FBpp0301686 157 KWNSIALQVLSDKVSFYYDCELRNTT 182 (1039) Q Consensus 157 ~WHrlaLsV~g~~VtLyVDC~~~~t~ 182 (1039) +|-++.|.|.++.|+++|+++..... T Consensus 165 ~~F~y~I~v~~~~l~V~~ng~~~~~~ 190 (236) T PF08787_consen 165 EKFSYEIRVSNGTLTVYVNGEGKSTT 190 (236) T ss_dssp -EEEEEEEEETTEEEEEETTEEEEEE T ss_pred CEEEEEEEEeCCEEEEEEECCCceEE Confidence 99999999999999999999987664
No 17
>PF12988 DUF3872: Domain of unknown function, B. Theta Gene description (DUF3872); InterPro: IPR024355 This entry represents proteins of unknown function found primarily in Bacteroides species. The Bacteroides thetaiotaomicron gene coding for this protein is located in a conjugate transposon and appears to be upregulated in the presence of host or other bacterial species compared to growth in pure culture [, ].; PDB: 2L7Q_A 4LBA_A 2L3B_A. Probab=42.90 E-value=52 Score=29.98 Aligned_cols=36 Identities=11% Similarity=0.068 Sum_probs=25.8 Q ss_pred EEEEEEEeccCCcceeEEEeeeccCCCCceEEEEEEeC Q FBpp0301686 130 NVSLVYTQADQNIGRKLASFGVAHVPDKWNSIALQVLS 167 (1039) Q Consensus 130 ~i~L~Y~~~~~~~~q~~~sF~v~laDg~WHrlaLsV~g 167 (1039) ..+|+|++..... +...|-+....|+..++.+++.+ T Consensus 97 ~FrLyYtS~s~~~--q~i~v~veDnfGq~~~l~f~Fn~ 132 (133) T PF12988_consen 97 VFRLYYTSTSADQ--QSIDVYVEDNFGQEQQLTFSFNN 132 (133) T ss_dssp EEEEEEEESSSS---EEEEEEEEETTSEEEEEEEEES- T ss_pred eEEEEEecCCCCC--ceEEEEEEeCCCcEEEEEEEecC Confidence 6789999876443 66666666667888888888764
No 18
>PF07622 DUF1583: Protein of unknown function (DUF1583); InterPro: IPR011475 Most of the Rhodopirellula baltica hypothetical proteins that have this domain also match PF07619 from PFAM. Probab=33.30 E-value=16 Score=39.60 Aligned_cols=37 Identities=19% Similarity=0.273 Sum_probs=33.0 Q ss_pred eeccCCCCceEEEEEEeCCeEEEEEccccceeeeccC Q FBpp0301686 150 GVAHVPDKWNSIALQVLSDKVSFYYDCELRNTTLVTR 186 (1039) Q Consensus 150 ~v~laDg~WHrlaLsV~g~~VtLyVDC~~~~t~~l~r 186 (1039) .+++-++.|.++.|.+.++.|.|.+|-+.+....|+- T Consensus 83 ~~~l~~~~wN~v~l~~~g~~v~~~lN~~~i~~~~~~~ 119 (411) T PF07622_consen 83 PLPLKDNAWNRVKLQRSGDTVQLHLNGQLIYERPLDP 119 (411) T ss_pred CCCCCcccccEEEEEEeCCEEEEEECCeEEEEEecCC Confidence 3567789999999999999999999999999987754
No 19
>PF05018 DUF667: Protein of unknown function (DUF667); InterPro: IPR007714 This family of proteins are highly conserved in eukaryotes. Some proteins in the family are annotated as transcription factors. However, there is currently no support for this in the literature. Probab=28.64 E-value=1.2e+02 Score=28.66 Aligned_cols=31 Identities=26% Similarity=0.671 Sum_probs=20.5 Q ss_pred eeeccCCCCceEEEEEEeC-------------CeEEEEEccccce Q FBpp0301686 149 FGVAHVPDKWNSIALQVLS-------------DKVSFYYDCELRN 180 (1039) Q Consensus 149 F~v~laDg~WHrlaLsV~g-------------~~VtLyVDC~~~~ 180 (1039) ..+.+.++ |..|.|-+.. .+|+++=+|.... T Consensus 121 iPL~l~~~-W~~l~iDL~~l~~~~f~~~~~~l~~i~i~ancrlRr 164 (185) T PF05018_consen 121 IPLRLPPG-WNNLCIDLQSLTSSAFGTTYRSLDSIQICANCRLRR 164 (185) T ss_pred ecCCCCCC-eEEEEEeHHHHHHHHhhccceeEeEEEEeccEEEEE Confidence 34444455 9999998876 4566777775543
No 20
>PF11267 DUF3067: Domain of unknown function (DUF3067); InterPro: IPR021420 This family of proteins has no known function. ; PDB: 2LJW_A. Probab=28.22 E-value=13 Score=32.46 Aligned_cols=14 Identities=36% Similarity=0.650 Sum_probs=11.4 Q ss_pred cChHHHHHHHHhhh Q FBpp0301686 1025 KTADEYAAHLENLL 1038 (1039) Q Consensus 1025 ~~~~~~~~~~~~~~ 1038 (1039) +||.||.+||+.+. T Consensus 42 ltE~eY~~hL~~ia 55 (98) T PF11267_consen 42 LTEEEYLEHLDAIA 55 (98) T ss_dssp S-HHHHHHHHHHHH T ss_pred CCHHHHHHHHHHHH Confidence 69999999999863
No 21
>PF02018 CBM_4_9: Carbohydrate binding domain; InterPro: IPR003305 The 1,4-beta-glucanase CenC from Cellulomonas fimi contains two cellulose-binding domains, CBD(N1) and CBD(N2), arranged in tandem at its N terminus. These homologous CBDs are distinct in their selectivity for binding amorphous and not crystalline cellulose []. Multidimensional heteronuclear nuclear magnetic resonance (NMR) spectroscopy was used to determine the tertiary structure of the 152 amino acid N-terminal cellulose-binding domain from C. fimi 1,4-beta-glucanase CenC (CBDN1) []. The tertiary structure of CBDN1 is strikingly similar to that of the bacterial 1,3-1,4-beta-glucanases, as well as other sugar-binding proteins with jelly-roll folds.; GO: 0016798 hydrolase activity, acting on glycosyl bonds; PDB: 3OEA_B 2ZEX_B 3OEB_A 2ZEY_A 2ZEW_A 2W5F_A 2WZE_A 2WYS_A 1GUI_A 1ULP_A .... Probab=25.93 E-value=2e+02 Score=24.05 Aligned_cols=22 Identities=14% Similarity=0.505 Sum_probs=14.5 Q ss_pred CCCceEEEEEEeCC----eEEEEEcc Q FBpp0301686 155 PDKWNSIALQVLSD----KVSFYYDC 176 (1039) Q Consensus 155 Dg~WHrlaLsV~g~----~VtLyVDC 176 (1039) .++|+++.+.+.-. .+.|||-. T Consensus 101 ~~~W~~~~~~ft~~~~~~~~~l~~~~ 126 (131) T PF02018_consen 101 TGEWQKYSGTFTAPSDDKNVRLYFES 126 (131) T ss_dssp TSSEEEEEEEEEECSSEEEEEEEEEE T ss_pred CCCeEEEEEEEEECCCCCeEEEEEEe Confidence 57888887776653 56666544
No 22
>PF00337 Gal-bind_lectin: Galactoside-binding lectin; InterPro: IPR001079 Galectins (also known as galaptins or S-lectin) are a family of proteins defined by having at least one characteristic carbohydrate recognition domain (CRD) with an affinity for beta-galactosides and sharing certain sequence elements. Members of the galectins family are found in mammals, birds, amphibians, fish, nematodes, sponges, and some fungi. Galectins are known to carry out intra- and extracellular functions through glycoconjugate-mediated recogntion. From the cytosol they may be secreted by non-classical pathways, but they may also be targeted to the nucleus or specific sub-cytosolic sites. Within the same peptide chain some galectins have a CRD with only a few additional amino acids, whereas others have two CRDs joined by a link peptide, and one (galectin-3) has one CRD joined to a different type of domain [, ]. The galectin carbohydrate recognition domain (CRD) is a beta-sandwich of about 135 amino acid. The two sheets are slightly bent with 6 strands forming the concave side and 5 strands forming the convex side. The concave side forms a groove in which carbohydrate is bound, and which is long enough to hold about a linear tetrasaccharide [, ].; GO: 0030246 carbohydrate binding; PDB: 2WSU_B 2WT0_A 2WT1_A 2WT2_B 2WSV_A 2YMZ_A 4LBQ_D 4NO4_A 4GA9_A 3M2M_F .... Probab=20.34 E-value=2.8e+02 Score=23.76 Aligned_cols=101 Identities=11% Similarity=0.027 Sum_probs=63.1 Q ss_pred ccCCCCCc-ceEEEEEEeecCCCCceEEEEEeC--CCCeEEEEEEEcccccCceEEEEEEEeccCCcceeEEE--eeecc Q FBpp0301686 79 MLLPEKLY-EFAILITFRQSSLKGGYLFSVVNP--LDTVVQLGVHLSPVVKNSYNVSLVYTQADQNIGRKLAS--FGVAH 153 (1039) Q Consensus 79 ~~fp~~~~-eFSllaT~R~~~~~~g~LfsI~~~--~d~~~qlgL~lsg~~~~~~~i~L~Y~~~~~~~~q~~~s--F~v~l 153 (1039) .-||..|. .=+|.++-+.......+-+.+... .+....+.|.++.+-+ ...|.+-+...+. +.+-.. -..++ T Consensus 4 ~~l~~~l~~G~~i~I~G~~~~~~~~F~i~l~~~~~~~~~~~i~lh~~~r~~-~~~iv~Ns~~~g~--W~~ee~~~~~~pf 80 (134) T PF00337_consen 4 GPLPGGLEPGDSIIIRGTVPPDAERFSINLQTGPSNEPQDDIALHFNPRFD-ENVIVRNSRINGK--WGQEERSEGPFPF 80 (134) T ss_dssp EEETTEEETTEEEEEEEEEBTTSSBEEEEEEECTTTTTTTEEEEEEEEETT-TTEEEEEEEETTE--E-SEEEEESSTSS T ss_pred EECCCCCCCCCEEEEEEEECCCCCEEEEEeCCCcCCCCCcEEEEEEEEEeC-chhhhhhheeecc--cccccccceeeee Confidence 44566775 556666666666666787777775 1122334444444322 1356666555542 222222 35566 Q ss_pred CCCCceEEEEEEeCCeEEEEEccccceee Q FBpp0301686 154 VPDKWNSIALQVLSDKVSFYYDCELRNTT 182 (1039) Q Consensus 154 aDg~WHrlaLsV~g~~VtLyVDC~~~~t~ 182 (1039) ..++=-.|.|.+..+.+.+|||-..+... T Consensus 81 ~~g~~F~l~I~~~~~~f~I~vng~~~~~f 109 (134) T PF00337_consen 81 RPGQPFELRIVVTEDGFEIYVNGKHFCEF 109 (134) T ss_dssp TTTSEEEEEEEEESSEEEEEETTEEEEEE T ss_pred cCCCcEEEEEEEecceEEEEECCeEEEEe Confidence 68899999999999999999999877654