Query gi|254780700|ref|YP_003065113.1| serine protease DO-like protease [Candidatus Liberibacter asiaticus str. psy62] Match_columns 489 No_of_seqs 319 out of 5140 Neff 6.7 Searched_HMMs 39220 Date Sun May 29 20:46:20 2011 Command /home/congqian_1/programs/hhpred/hhsearch -i 254780700.hhm -d /home/congqian_1/database/cdd/Cdd.hhm No Hit Prob E-value P-value Score SS Cols Query HMM Template HMM 1 TIGR02037 degP_htrA_DO proteas 100.0 0 0 996.3 27.0 440 38-485 1-484 (484) 2 PRK10942 serine endoprotease; 100.0 0 0 836.6 29.4 438 30-486 31-474 (474) 3 PRK10139 serine endoprotease; 100.0 0 0 824.5 28.7 415 36-486 38-455 (455) 4 PRK10898 serine endoprotease; 100.0 0 0 654.0 19.6 308 35-380 42-354 (355) 5 TIGR02038 protease_degS peripl 100.0 0 0 635.1 16.6 310 35-378 42-357 (358) 6 COG0265 DegQ Trypsin-like seri 100.0 0 0 473.5 14.1 306 38-377 33-341 (347) 7 KOG1320 consensus 100.0 0 0 318.9 9.7 303 40-377 130-469 (473) 8 KOG1421 consensus 100.0 2E-36 5.1E-41 250.9 18.9 391 39-489 53-475 (955) 9 KOG1320 consensus 100.0 7.7E-29 2E-33 202.5 14.1 351 106-472 84-459 (473) 10 KOG1421 consensus 99.8 3.1E-19 7.9E-24 141.3 14.1 352 107-488 548-931 (955) 11 cd00987 PDZ_serine_protease PD 99.7 3.9E-19 1E-23 140.6 0.8 90 283-372 1-90 (90) 12 PRK10779 zinc metallopeptidase 99.7 1.4E-16 3.5E-21 124.4 7.0 156 309-475 129-284 (449) 13 cd00991 PDZ_archaeal_metallopr 99.6 3.1E-16 7.9E-21 122.1 -0.1 75 299-373 3-77 (79) 14 cd00986 PDZ_LON_protease PDZ d 99.6 3.5E-16 9E-21 121.8 -0.3 73 305-378 7-79 (79) 15 cd00990 PDZ_glycyl_aminopeptid 99.5 9.1E-15 2.3E-19 112.8 0.2 79 283-376 1-79 (80) 16 TIGR00054 TIGR00054 membrane-a 99.4 1.8E-13 4.7E-18 104.5 4.6 157 305-475 134-294 (463) 17 TIGR02037 degP_htrA_DO proteas 99.4 5.8E-14 1.5E-18 107.7 1.0 299 66-371 39-483 (484) 18 KOG3209 consensus 99.3 3.9E-12 1E-16 96.0 4.6 154 310-468 782-981 (984) 19 cd00989 PDZ_metalloprotease PD 99.3 4.8E-13 1.2E-17 101.8 -0.9 66 307-373 13-78 (79) 20 pfam00089 Trypsin Trypsin. 99.2 8.1E-11 2.1E-15 87.6 8.5 139 109-248 25-196 (218) 21 cd00987 PDZ_serine_protease PD 99.1 5.1E-10 1.3E-14 82.5 9.7 80 392-473 1-86 (90) 22 KOG3209 consensus 99.1 7.9E-11 2E-15 87.7 5.4 160 306-471 674-839 (984) 23 PRK10779 zinc metallopeptidase 99.1 6.1E-12 1.6E-16 94.7 -0.9 70 308-378 223-292 (449) 24 PRK10139 serine endoprotease; 99.1 1.2E-09 3.2E-14 80.0 10.1 81 391-473 266-352 (455) 25 PRK10942 serine endoprotease; 99.0 2.2E-09 5.5E-14 78.5 9.8 81 391-473 288-374 (474) 26 cd00988 PDZ_CTP_protease PDZ d 99.0 2.9E-11 7.4E-16 90.4 -0.1 68 305-373 12-82 (85) 27 PRK10898 serine endoprotease; 99.0 4.6E-09 1.2E-13 76.4 10.5 87 392-488 257-349 (355) 28 COG0793 Prc Periplasmic protea 99.0 4.1E-10 1E-14 83.1 4.4 83 281-375 98-182 (406) 29 TIGR00054 TIGR00054 membrane-a 99.0 8.3E-11 2.1E-15 87.5 -0.2 71 308-379 233-303 (463) 30 cd00991 PDZ_archaeal_metallopr 98.9 4.5E-09 1.1E-13 76.5 8.1 66 408-475 9-74 (79) 31 KOG3834 consensus 98.8 3.3E-08 8.4E-13 70.9 9.1 151 303-466 12-164 (462) 32 cd00989 PDZ_metalloprotease PD 98.8 6.2E-08 1.6E-12 69.2 8.6 61 410-473 13-73 (79) 33 TIGR02860 spore_IV_B stage IV 98.7 1.8E-09 4.7E-14 79.0 0.3 58 317-374 142-200 (423) 34 smart00228 PDZ Domain present 98.7 9.8E-10 2.5E-14 80.7 -1.1 59 306-364 26-84 (85) 35 cd00986 PDZ_LON_protease PDZ d 98.7 1.3E-07 3.4E-12 67.1 8.6 70 408-488 7-76 (79) 36 PRK11186 carboxy-terminal prot 98.7 2.2E-08 5.6E-13 72.1 4.2 60 306-366 257-324 (673) 37 KOG3580 consensus 98.6 3.8E-08 9.7E-13 70.6 4.5 75 296-372 209-286 (1027) 38 cd00136 PDZ PDZ domain, also c 98.6 3.3E-09 8.4E-14 77.3 -1.0 54 307-361 14-69 (70) 39 KOG3605 consensus 98.6 1.6E-08 4E-13 73.0 1.7 124 308-455 675-801 (829) 40 cd00988 PDZ_CTP_protease PDZ d 98.6 5.8E-07 1.5E-11 63.0 9.7 82 394-489 4-85 (85) 41 COG3975 Predicted protease wit 98.6 2.8E-08 7E-13 71.5 2.3 65 307-379 463-527 (558) 42 pfam00595 PDZ PDZ domain (Also 98.6 9.8E-09 2.5E-13 74.3 -0.2 56 306-361 24-79 (80) 43 cd00990 PDZ_glycyl_aminopeptid 98.5 9.3E-07 2.4E-11 61.7 8.8 71 392-473 1-71 (80) 44 TIGR02038 protease_degS peripl 98.5 4.3E-07 1.1E-11 63.9 6.9 78 394-473 264-347 (358) 45 cd00190 Tryp_SPc Trypsin-like 98.4 6.1E-06 1.6E-10 56.5 10.9 136 109-247 25-206 (232) 46 smart00228 PDZ Domain present 98.4 2.6E-06 6.7E-11 58.8 8.8 74 392-470 12-85 (85) 47 KOG3580 consensus 98.4 2.6E-07 6.6E-12 65.2 3.7 60 305-364 428-489 (1027) 48 cd00136 PDZ PDZ domain, also c 98.3 3.2E-06 8.2E-11 58.3 8.2 68 394-467 3-70 (70) 49 smart00020 Tryp_SPc Trypsin-li 98.3 8.2E-06 2.1E-10 55.7 9.8 136 109-247 26-206 (229) 50 TIGR03279 cyano_FeS_chp putati 98.3 1.4E-07 3.6E-12 66.9 0.8 38 310-347 2-39 (433) 51 COG0793 Prc Periplasmic protea 98.3 7E-06 1.8E-10 56.1 9.1 19 418-437 351-369 (406) 52 COG0265 DegQ Trypsin-like seri 98.2 2E-05 5.1E-10 53.2 10.5 85 394-488 251-339 (347) 53 COG3591 V8-like Glu-specific e 98.2 5.6E-06 1.4E-10 56.7 7.5 139 111-253 66-228 (251) 54 COG3480 SdrC Predicted secrete 98.2 2.5E-07 6.3E-12 65.4 0.5 71 306-377 130-201 (342) 55 TIGR02860 spore_IV_B stage IV 98.2 2.6E-06 6.8E-11 58.8 5.2 18 113-130 117-134 (423) 56 cd00992 PDZ_signaling PDZ doma 98.1 4.4E-07 1.1E-11 63.8 -0.6 68 392-466 12-81 (82) 57 cd00992 PDZ_signaling PDZ doma 98.1 1.9E-05 4.9E-10 53.3 7.8 55 305-361 25-81 (82) 58 pfam00863 Peptidase_C4 Peptida 98.1 7.1E-05 1.8E-09 49.7 10.2 126 121-250 40-172 (233) 59 PRK11186 carboxy-terminal prot 98.0 5.5E-05 1.4E-09 50.4 9.3 16 176-191 272-287 (673) 60 pfam00595 PDZ PDZ domain (Also 98.0 3.8E-05 9.7E-10 51.4 8.2 72 390-466 8-79 (80) 61 KOG3129 consensus 97.9 2.1E-06 5.2E-11 59.5 0.6 68 308-375 141-210 (231) 62 pfam10459 Peptidase_S46 Peptid 97.8 6.5E-05 1.7E-09 50.0 6.9 25 108-132 45-69 (696) 63 TIGR01713 typeII_sec_gspC gene 97.8 3.3E-06 8.4E-11 58.2 -0.4 190 145-373 88-280 (281) 64 COG3031 PulC Type II secretory 97.6 9.2E-06 2.3E-10 55.4 -0.7 68 306-373 207-274 (275) 65 TIGR01713 typeII_sec_gspC gene 97.5 0.00048 1.2E-08 44.4 7.4 63 409-473 212-275 (281) 66 TIGR03279 cyano_FeS_chp putati 97.5 0.00017 4.3E-09 47.3 4.5 16 134-149 46-61 (433) 67 pfam05579 Peptidase_S32 Equine 97.4 0.0017 4.3E-08 40.9 8.9 113 111-248 1-114 (426) 68 PRK09681 putative type II secr 97.3 2.6E-05 6.6E-10 52.5 -1.4 64 311-374 252-318 (319) 69 pfam04495 GRASP55_65 GRASP55/6 97.2 0.0085 2.2E-07 36.4 11.0 84 283-373 26-112 (280) 70 PRK09681 putative type II secr 97.2 0.0025 6.3E-08 39.9 8.1 59 413-473 253-312 (319) 71 KOG3553 consensus 97.2 5.7E-05 1.5E-09 50.3 -0.8 35 305-339 58-92 (124) 72 pfam10459 Peptidase_S46 Peptid 97.1 0.00028 7E-09 45.9 1.9 40 154-194 197-251 (696) 73 COG3480 SdrC Predicted secrete 96.9 0.0049 1.3E-07 38.0 7.5 17 178-194 145-162 (342) 74 KOG3542 consensus 96.9 0.00026 6.8E-09 46.1 0.7 39 303-341 559-597 (1283) 75 KOG3542 consensus 96.5 0.003 7.7E-08 39.3 3.9 64 403-469 556-619 (1283) 76 KOG3553 consensus 96.5 0.0043 1.1E-07 38.3 4.5 49 407-456 57-105 (124) 77 KOG3532 consensus 96.5 0.00027 6.8E-09 46.0 -1.8 45 411-455 400-444 (1051) 78 KOG3129 consensus 96.4 0.015 3.8E-07 34.9 6.7 66 410-475 140-205 (231) 79 KOG3532 consensus 96.2 0.018 4.7E-07 34.3 6.5 48 305-352 397-444 (1051) 80 COG3031 PulC Type II secretory 96.2 0.024 6.1E-07 33.6 6.8 59 412-472 209-268 (275) 81 KOG1892 consensus 96.1 0.00086 2.2E-08 42.8 -0.7 76 394-471 945-1021(1629) 82 TIGR00225 prc C-terminal proce 96.1 0.0092 2.3E-07 36.2 4.4 70 307-377 67-146 (361) 83 KOG2921 consensus 96.0 0.011 2.9E-07 35.7 4.7 58 303-360 217-278 (484) 84 COG3975 Predicted protease wit 95.9 0.027 6.9E-07 33.3 6.1 71 393-473 438-516 (558) 85 pfam09342 DUF1986 Domain of un 95.9 0.052 1.3E-06 31.4 7.5 91 108-200 27-137 (267) 86 KOG3605 consensus 95.6 0.0022 5.7E-08 40.2 -0.5 49 417-467 681-732 (829) 87 KOG3571 consensus 95.1 0.094 2.4E-06 29.8 6.5 71 283-363 261-338 (626) 88 KOG3571 consensus 95.0 0.0071 1.8E-07 36.9 0.5 76 391-468 260-338 (626) 89 KOG0606 consensus 94.9 0.051 1.3E-06 31.5 4.8 33 309-341 661-693 (1205) 90 KOG4371 consensus 94.9 0.032 8.1E-07 32.8 3.6 140 322-468 1185-1328(1332) 91 COG0750 Predicted membrane-ass 94.9 0.0042 1.1E-07 38.4 -0.9 58 311-369 134-195 (375) 92 pfam03761 DUF316 Domain of unk 94.7 0.31 7.9E-06 26.5 8.2 128 108-245 68-249 (280) 93 KOG0606 consensus 94.6 0.0057 1.5E-07 37.5 -0.7 72 393-466 638-713 (1205) 94 KOG3606 consensus 94.6 0.0058 1.5E-07 37.5 -0.8 65 274-340 164-229 (358) 95 KOG3550 consensus 94.4 0.0068 1.7E-07 37.1 -0.7 60 408-469 114-174 (207) 96 KOG3551 consensus 94.3 0.032 8.2E-07 32.8 2.6 60 411-472 112-172 (506) 97 KOG2921 consensus 94.3 0.0065 1.7E-07 37.2 -1.0 51 405-455 216-267 (484) 98 TIGR00225 prc C-terminal proce 94.1 0.15 3.8E-06 28.5 5.6 12 180-191 84-95 (361) 99 KOG3651 consensus 94.1 0.017 4.2E-07 34.6 0.7 20 430-449 367-386 (429) 100 KOG1892 consensus 94.1 0.13 3.3E-06 28.9 5.2 58 307-364 961-1019(1629) 101 KOG3552 consensus 94.1 0.14 3.6E-06 28.7 5.4 54 411-469 77-132 (1298) 102 KOG3550 consensus 94.0 0.22 5.6E-06 27.4 6.3 58 305-362 114-172 (207) 103 KOG0609 consensus 93.8 0.28 7.1E-06 26.8 6.6 12 444-455 485-496 (542) 104 KOG3606 consensus 93.5 0.3 7.6E-06 26.6 6.3 70 395-468 180-252 (358) 105 pfam02122 Peptidase_S39 Peptid 92.8 0.41 1E-05 25.7 6.1 116 121-248 43-166 (203) 106 pfam11874 DUF3394 Domain of un 92.6 0.27 7E-06 26.8 5.0 82 343-437 64-150 (183) 107 KOG3549 consensus 92.5 0.016 4.2E-07 34.6 -1.4 13 364-376 361-373 (505) 108 COG5233 GRH1 Peripheral Golgi 92.4 0.15 3.8E-06 28.5 3.4 13 314-326 195-207 (417) 109 KOG0609 consensus 91.1 0.063 1.6E-06 30.9 0.3 21 436-456 511-531 (542) 110 pfam00949 Peptidase_S7 Peptida 89.7 0.18 4.6E-06 28.0 1.7 21 224-244 109-129 (150) 111 KOG3834 consensus 88.1 0.059 1.5E-06 31.1 -1.7 17 40-57 55-71 (462) 112 pfam02907 Peptidase_S29 Hepati 87.7 0.32 8.2E-06 26.4 1.9 124 119-266 20-145 (149) 113 KOG3938 consensus 86.3 0.41 1E-05 25.7 1.8 56 308-363 151-209 (334) 114 pfam03510 Peptidase_C24 2C end 85.7 1.4 3.6E-05 22.3 4.3 58 113-181 3-60 (105) 115 PRK08927 fliI flagellum-specif 84.3 2.7 6.9E-05 20.5 5.2 69 112-192 18-89 (441) 116 pfam11874 DUF3394 Domain of un 83.6 0.21 5.4E-06 27.6 -0.7 30 305-334 121-150 (183) 117 TIGR03496 FliI_clade1 flagella 82.0 3.3 8.5E-05 19.9 4.9 68 132-211 21-90 (411) 118 cd01727 LSm8 The eukaryotic Sm 81.7 3.1 8E-05 20.1 4.7 56 133-190 10-70 (74) 119 pfam00548 Peptidase_C3 3C cyst 81.2 3.6 9.2E-05 19.7 9.2 122 121-247 36-167 (170) 120 PRK00737 small nuclear ribonuc 79.8 1.8 4.5E-05 21.7 2.9 33 132-164 14-46 (72) 121 pfam08605 Rad9_Rad53_bind Fung 78.8 3.1 7.8E-05 20.2 3.8 65 122-190 4-70 (131) 122 PRK13528 outer membrane recept 78.4 1.8 4.7E-05 21.6 2.6 26 1-26 3-28 (727) 123 cd01731 archaeal_Sm1 The archa 77.4 2.4 6.1E-05 20.8 3.0 32 132-163 10-41 (68) 124 pfam00944 Peptidase_S3 Alphavi 77.3 1.2 3E-05 22.8 1.4 99 146-259 32-132 (157) 125 TIGR03497 FliI_clade2 flagella 76.3 5.1 0.00013 18.7 5.3 65 118-194 6-71 (413) 126 cd01717 Sm_B The eukaryotic Sm 75.3 2.7 6.9E-05 20.5 2.8 31 133-163 11-41 (79) 127 cd01729 LSm7 The eukaryotic Sm 74.9 3.1 7.8E-05 20.1 3.0 32 133-164 13-44 (81) 128 cd01728 LSm1 The eukaryotic Sm 73.4 6 0.00015 18.3 4.8 64 125-190 4-72 (74) 129 TIGR03498 FliI_clade3 flagella 72.4 6.3 0.00016 18.1 4.7 50 132-193 23-72 (418) 130 cd01719 Sm_G The eukaryotic Sm 72.2 3.8 9.6E-05 19.6 2.9 31 133-163 11-41 (72) 131 cd00600 Sm_like The eukaryotic 71.6 4.1 0.0001 19.3 3.0 33 132-164 6-38 (63) 132 pfam05416 Peptidase_C37 Southa 71.4 6.6 0.00017 18.0 6.9 25 351-375 443-468 (535) 133 PRK04192 V-type ATP synthase s 71.4 6.6 0.00017 18.0 5.7 68 112-194 4-73 (585) 134 cd01730 LSm3 The eukaryotic Sm 71.1 3.9 0.0001 19.4 2.8 31 133-163 12-42 (82) 135 pfam01423 LSM LSM domain. The 69.9 4.9 0.00012 18.9 3.0 33 132-164 8-40 (66) 136 COG1958 LSM1 Small nuclear rib 69.8 4.8 0.00012 18.9 3.0 32 133-164 18-49 (79) 137 pfam05580 Peptidase_S55 SpoIVB 69.2 3.2 8.3E-05 20.0 2.0 43 218-264 171-213 (219) 138 TIGR01230 agmatinase agmatinas 68.7 2.8 7.1E-05 20.4 1.6 68 148-223 53-138 (296) 139 smart00651 Sm snRNP Sm protein 67.7 5.8 0.00015 18.4 3.0 33 132-164 8-40 (67) 140 PRK05922 type III secretion sy 67.2 8.2 0.00021 17.4 5.5 69 113-193 21-90 (434) 141 cd01722 Sm_F The eukaryotic Sm 67.1 5.4 0.00014 18.6 2.8 33 132-164 11-43 (68) 142 COG0821 gcpE 1-hydroxy-2-methy 66.6 0.38 9.7E-06 25.9 -3.2 64 267-334 112-175 (361) 143 PRK08972 fliI flagellum-specif 66.5 8.4 0.00021 17.4 5.0 71 110-193 22-93 (440) 144 PRK00366 ispG 4-hydroxy-3-meth 66.1 0.41 1E-05 25.7 -3.1 30 415-444 306-339 (367) 145 TIGR01171 rplB_bact ribosomal 65.2 8.9 0.00023 17.2 5.7 136 122-277 48-194 (279) 146 cd01726 LSm6 The eukaryotic Sm 64.6 6.3 0.00016 18.1 2.7 33 132-164 10-42 (67) 147 PRK07196 fliI flagellum-specif 63.3 9.7 0.00025 17.0 5.1 70 112-193 18-88 (434) 148 COG0260 PepB Leucyl aminopepti 61.8 1.6 4.2E-05 21.9 -0.7 41 297-339 290-330 (485) 149 pfam02601 Exonuc_VII_L Exonucl 61.5 10 0.00026 16.8 3.9 21 427-447 255-276 (295) 150 PRK00286 xseA exodeoxyribonucl 60.8 11 0.00027 16.7 4.2 21 427-447 396-417 (443) 151 PRK05015 aminopeptidase B; Pro 60.7 4.7 0.00012 19.0 1.5 41 297-339 228-268 (424) 152 pfam01727 consensus 60.5 8.1 0.00021 17.4 2.7 32 217-248 22-53 (81) 153 KOG3627 consensus 59.5 6.2 0.00016 18.2 1.9 136 110-248 39-227 (256) 154 cd06168 LSm9 The eukaryotic Sm 58.0 9.8 0.00025 16.9 2.8 32 133-164 11-42 (75) 155 cd01732 LSm5 The eukaryotic Sm 56.2 12 0.00031 16.4 3.0 32 132-163 13-44 (76) 156 TIGR00074 hypC_hupF hydrogenas 56.1 8.2 0.00021 17.4 2.1 44 145-188 5-53 (88) 157 COG3127 Predicted ABC-type tra 56.1 6.8 0.00017 17.9 1.7 25 418-442 602-627 (829) 158 PRK02118 V-type ATP synthase s 55.9 13 0.00033 16.2 5.1 49 133-194 27-75 (432) 159 pfam01732 DUF31 Domain of unkn 55.9 6.3 0.00016 18.1 1.5 23 109-131 3-35 (68) 160 TIGR02068 cya_phycin_syn cyano 55.8 12 0.00031 16.3 3.0 93 232-332 439-548 (876) 161 PRK05688 fliI flagellum-specif 54.1 14 0.00035 16.0 5.1 70 112-193 28-101 (451) 162 PRK07721 fliI flagellum-specif 53.6 14 0.00036 15.9 4.9 87 113-211 18-109 (435) 163 KOG4407 consensus 53.2 1.3 3.3E-05 22.5 -2.4 12 392-403 1156-1167(1973) 164 PRK06315 type III secretion sy 52.6 15 0.00037 15.8 4.6 70 112-193 24-94 (442) 165 pfam00883 Peptidase_M17 Cytoso 52.6 2.8 7.2E-05 20.4 -0.7 29 310-339 135-163 (312) 166 pfam06003 SMN Survival motor n 52.1 15 0.00038 15.8 2.9 44 132-175 72-116 (264) 167 COG5640 Secreted trypsin-like 51.8 15 0.00038 15.7 8.3 60 114-174 66-142 (413) 168 COG0298 HypC Hydrogenase matur 50.9 11 0.00029 16.5 2.1 43 145-188 5-47 (82) 169 PRK13579 gcvT glycine cleavage 50.9 9.4 0.00024 17.1 1.7 24 257-280 199-222 (371) 170 PRK00913 leucyl aminopeptidase 47.3 4.1 0.00011 19.3 -0.6 29 310-339 306-334 (491) 171 KOG2597 consensus 46.6 6.6 0.00017 18.0 0.4 40 298-339 313-352 (513) 172 cd00433 Peptidase_M17 Cytosol 45.7 4.4 0.00011 19.1 -0.6 29 310-339 289-317 (468) 173 pfam04551 GcpE GcpE protein. I 45.0 0.44 1.1E-05 25.5 -5.9 39 416-455 296-338 (345) 174 cd04643 CBS_pair_30 The CBS do 42.7 16 0.00041 15.5 1.9 20 227-246 22-41 (116) 175 cd04641 CBS_pair_28 The CBS do 41.3 16 0.0004 15.6 1.6 19 228-246 23-41 (120) 176 pfam00947 Pico_P2A Picornaviru 40.3 22 0.00056 14.7 2.3 37 219-261 76-117 (127) 177 smart00116 CBS Domain in cysta 39.7 19 0.00048 15.1 1.8 20 227-246 21-40 (49) 178 PRK04196 V-type ATP synthase s 38.2 24 0.00062 14.4 6.5 78 115-194 7-97 (460) 179 PRK04972 hypothetical protein; 38.1 4.8 0.00012 18.9 -1.4 51 140-194 228-278 (558) 180 TIGR00758 UDG_fam4 uracil-DNA 37.1 19 0.00049 15.0 1.6 39 281-319 39-77 (185) 181 cd01724 Sm_D1 The eukaryotic S 37.0 25 0.00064 14.3 3.0 34 131-164 10-43 (90) 182 KOG1781 consensus 35.6 8.2 0.00021 17.4 -0.5 29 132-160 27-55 (108) 183 PRK08262 hypothetical protein; 35.2 27 0.00069 14.1 4.6 51 1-51 3-57 (489) 184 cd04602 CBS_pair_IMPDH_2 This 35.2 23 0.00058 14.6 1.7 15 232-246 93-107 (114) 185 TIGR02124 hypE hydrogenase exp 35.1 4.7 0.00012 18.9 -1.8 55 429-486 278-336 (345) 186 cd04614 CBS_pair_1 The CBS dom 35.0 24 0.00061 14.5 1.7 12 233-244 76-87 (96) 187 pfam01079 Hint Hint module. Th 34.6 24 0.00062 14.4 1.7 12 178-189 104-115 (214) 188 COG4956 Integral membrane prot 34.2 14 0.00037 15.9 0.5 43 329-371 268-311 (356) 189 pfam06893 consensus 34.1 23 0.00059 14.5 1.6 32 169-200 46-77 (341) 190 cd01720 Sm_D2 The eukaryotic S 33.6 24 0.0006 14.5 1.5 37 128-164 10-46 (87) 191 PRK06300 enoyl-(acyl carrier p 33.1 11 0.00028 16.7 -0.3 24 291-316 201-224 (298) 192 TIGR00337 PyrG CTP synthase; I 32.7 11 0.00027 16.7 -0.3 82 283-383 108-194 (571) 193 TIGR00739 yajC preprotein tran 32.7 21 0.00053 14.8 1.1 13 178-190 36-48 (86) 194 cd05701 S1_Rrp5_repeat_hs10 S1 32.6 30 0.00076 13.8 2.3 44 144-189 3-55 (69) 195 COG4784 Putative Zn-dependent 32.5 25 0.00063 14.3 1.5 63 354-421 373-438 (479) 196 cd04615 CBS_pair_2 The CBS dom 32.5 29 0.00075 13.9 1.9 19 228-246 23-41 (113) 197 COG3338 Cah Carbonic anhydrase 32.3 30 0.00077 13.8 5.2 24 140-163 123-148 (250) 198 PRK13605 endoribonuclease SymE 32.0 18 0.00045 15.3 0.7 57 123-196 1-57 (113) 199 TIGR01379 thiL thiamine-monoph 31.8 31 0.00078 13.8 2.2 48 144-192 113-160 (336) 200 cd01721 Sm_D3 The eukaryotic S 31.2 31 0.0008 13.7 3.9 33 132-164 10-42 (70) 201 cd04610 CBS_pair_ParBc_assoc T 30.0 32 0.00082 13.6 1.7 15 232-246 86-100 (107) 202 pfam01455 HupF_HypC HupF/HypC 29.5 33 0.00085 13.5 2.0 41 145-188 5-45 (67) 203 KOG3460 consensus 28.6 23 0.00059 14.5 0.8 28 133-160 16-43 (91) 204 TIGR03431 PhnD phosphonate ABC 28.6 35 0.00089 13.4 2.8 14 1-14 1-14 (288) 205 pfam04083 Abhydro_lipase ab-hy 28.3 35 0.00089 13.4 1.8 19 113-131 15-33 (62) 206 pfam02743 Cache_1 Cache domain 28.2 27 0.00068 14.2 1.1 12 233-244 20-31 (81) 207 PRK13484 putative iron-regulat 28.1 35 0.0009 13.4 2.0 17 2-18 1-17 (682) 208 cd04621 CBS_pair_8 The CBS dom 28.1 35 0.0009 13.4 1.9 19 228-246 23-41 (135) 209 cd04592 CBS_pair_EriC_assoc_eu 27.9 36 0.00091 13.3 1.8 21 226-246 21-41 (133) 210 cd04801 CBS_pair_M50_like This 27.6 35 0.00088 13.4 1.6 19 228-246 24-42 (114) 211 pfam04225 OapA Opacity-associa 26.5 20 0.00051 14.9 0.2 28 347-374 39-66 (85) 212 KOG1387 consensus 26.5 23 0.00058 14.6 0.5 28 429-456 392-423 (465) 213 PRK07807 inositol-5-monophosph 26.5 38 0.00096 13.2 1.7 20 109-128 119-141 (479) 214 COG4810 EutS Ethanolamine util 26.3 25 0.00064 14.3 0.7 11 121-131 23-33 (121) 215 cd04606 CBS_pair_Mg_transporte 26.3 38 0.00097 13.2 1.7 15 232-246 87-101 (109) 216 TIGR01975 isoAsp_dipep beta-as 26.3 29 0.00074 13.9 1.0 86 290-383 171-261 (391) 217 cd04627 CBS_pair_14 The CBS do 26.1 38 0.00098 13.1 1.6 18 229-246 99-116 (123) 218 cd04583 CBS_pair_ABC_OpuCA_ass 26.0 39 0.00098 13.1 1.7 14 233-246 89-102 (109) 219 pfam00789 UBX UBX domain. This 25.3 40 0.001 13.0 1.9 29 130-158 4-32 (81) 220 pfam06838 Alum_res Aluminium r 25.0 40 0.001 13.0 1.6 12 232-243 227-238 (405) 221 cd04642 CBS_pair_29 The CBS do 25.0 40 0.001 13.0 1.8 18 229-246 24-41 (126) 222 pfam08669 GCV_T_C Glycine clea 24.9 40 0.001 13.0 2.2 31 230-260 35-67 (95) 223 cd04582 CBS_pair_ABC_OpuCA_ass 24.4 41 0.0011 12.9 1.7 16 231-246 84-99 (106) 224 cd04635 CBS_pair_22 The CBS do 24.2 42 0.0011 12.9 1.8 17 230-246 25-41 (122) 225 cd04596 CBS_pair_DRTGG_assoc T 23.4 43 0.0011 12.8 1.8 19 228-246 24-42 (108) 226 KOG0340 consensus 23.3 24 0.0006 14.5 0.1 187 150-367 69-267 (442) 227 PRK13861 type IV secretion sys 23.2 44 0.0011 12.8 2.8 12 1-12 1-12 (293) 228 cd02558 PSRA_1 PSRA_1: Pseudou 23.2 44 0.0011 12.8 3.1 28 113-140 1-28 (246) 229 TIGR02545 ATP_syn_fliI flagell 23.2 44 0.0011 12.8 4.8 228 113-389 5-286 (439) 230 PRK12696 flgH flagellar basal 22.9 38 0.00098 13.2 1.1 11 1-11 1-11 (238) 231 cd01725 LSm2 The eukaryotic Sm 22.9 44 0.0011 12.8 3.7 33 132-164 11-43 (81) 232 TIGR03219 salicylate_mono sali 22.8 44 0.0011 12.7 2.8 24 130-153 131-155 (414) 233 COG2985 Predicted permease [Ge 22.6 45 0.0011 12.7 2.7 17 179-195 249-265 (544) 234 cd04607 CBS_pair_NTP_transfera 22.6 45 0.0011 12.7 1.7 14 233-246 93-106 (113) 235 cd04587 CBS_pair_CAP-ED_DUF294 22.4 45 0.0012 12.7 1.8 15 232-246 92-106 (113) 236 TIGR01687 moaD_arch MoaD famil 22.2 20 0.0005 15.0 -0.5 22 431-452 57-79 (93) 237 KOG1775 consensus 22.2 46 0.0012 12.7 1.8 28 132-159 17-44 (84) 238 cd04632 CBS_pair_19 The CBS do 22.0 46 0.0012 12.6 1.8 20 228-247 23-42 (128) 239 PRK08472 fliI flagellum-specif 21.8 46 0.0012 12.6 5.7 70 113-194 21-92 (435) 240 cd01723 LSm4 The eukaryotic Sm 21.7 47 0.0012 12.6 3.9 32 132-163 11-42 (76) 241 PRK05713 hypothetical protein; 21.6 47 0.0012 12.6 4.7 61 118-189 65-128 (312) 242 cd04629 CBS_pair_16 The CBS do 21.4 47 0.0012 12.6 1.9 19 228-246 23-41 (114) 243 cd04597 CBS_pair_DRTGG_assoc2 21.4 47 0.0012 12.6 1.7 16 231-246 91-106 (113) 244 TIGR00115 tig trigger factor; 21.4 41 0.001 13.0 0.9 59 108-172 213-274 (475) 245 cd04639 CBS_pair_26 The CBS do 20.7 49 0.0012 12.5 1.8 19 228-246 23-41 (111) 246 TIGR00441 gmhA phosphoheptose 20.7 49 0.0012 12.5 5.2 44 428-473 106-149 (186) 247 PRK06936 type III secretion sy 20.6 49 0.0013 12.5 4.6 70 112-193 24-95 (439) 248 TIGR02624 rhamnu_1P_ald rhamnu 20.6 49 0.0013 12.5 2.7 48 132-181 66-113 (273) 249 PRK09511 nirD nitrite reductas 20.5 49 0.0013 12.4 1.9 25 183-207 36-62 (108) 250 cd04620 CBS_pair_7 The CBS dom 20.4 50 0.0013 12.4 1.9 13 233-245 95-107 (115) 251 PRK08594 enoyl-(acyl carrier p 20.3 24 0.00061 14.5 -0.4 22 292-316 170-191 (256) 252 smart00166 UBX Domain present 20.2 50 0.0013 12.4 1.9 28 131-158 3-30 (80) 253 PRK13254 cytochrome c-type bio 20.2 50 0.0013 12.4 7.2 15 1-15 1-15 (149) 254 cd02005 TPP_PDC_IPDC Thiamine 20.2 50 0.0013 12.4 4.6 36 436-471 141-176 (183) 255 PRK10002 outer membrane protei 20.1 50 0.0013 12.4 2.1 10 1-10 1-10 (362) No 1 >TIGR02037 degP_htrA_DO protease Do; InterPro: IPR011782 Proteolytic enzymes that exploit serine in their catalytic activity are ubiquitous, being found in viruses, bacteria and eukaryotes . They include a wide range of peptidase activity, including exopeptidase, endopeptidase, oligopeptidase and omega-peptidase activity. Over 20 families (denoted S1 - S66) of serine protease have been identified, these being grouped into clans on the basis of structural similarity and other functional evidence . Structures are known for members of the clans and the structures indicate that some appear to be totally unrelated, suggesting different evolutionary origins for the serine peptidases . Not withstanding their different evolutionary origins, there are similarities in the reaction mechanisms of several peptidases. Chymotrypsin, subtilisin and carboxypeptidase C have a catalytic triad of serine, aspartate and histidine in common: serine acts as a nucleophile, aspartate as an electrophile, and histidine as a base . The geometric orientations of the catalytic residues are similar between families, despite different protein folds . The linear arrangements of the catalytic residues commonly reflect clan relationships. For example the catalytic triad in the chymotrypsin clan (PA) is ordered HDS, but is ordered DHS in the subtilisin clan (SB) and SDH in the carboxypeptidase clan (SC) , . Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Each clan is identified with two letters, the first representing the catalytic type of the families included in the clan (with the letter 'P' being used for a clan containing families of more than one of the catalytic types serine, threonine and cysteine). Some families cannot yet be assigned to clans, and when a formal assignment is required, such a family is described as belonging to clan A-, C-, M-, S-, T- or U-, according to the catalytic type. Some clans are divided into subclans because there is evidence of a very ancient divergence within the clan, for example MA(E), the gluzincins, and MA(M), the metzincins. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. The serine, threonine and cysteine peptidases utilise the amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule. This family consists serine peptidases belonging to MEROPS peptidase family S1, subfamily S1C (protease Do, clan PA(S)). They are variously designated DegP, DegQ, heat shock protein HtrA, MucD and protease DO. The ortholog in Pseudomonas aeruginosa is designated MucD and is found in an operon that controls mucoid phenotype. This family also includes the DegQ (HhoA) paralog in Escherichia coli which can rescue a DegP mutant, but not the smaller DegS paralog, which cannot. Members of this family are located in the periplasm and have separable functions as both protease and chaperone. Members have a trypsin domain and two copies of a PDZ domain. This protein protects bacteria from thermal and other stresses and may be important for the survival of bacterial pathogens . The chaperone function is dominant at low temperatures, whereas the proteolytic activity is turned on at elevated temperatures .; GO: 0004252 serine-type endopeptidase activity, 0006508 proteolysis. Probab=100.00 E-value=0 Score=996.30 Aligned_cols=440 Identities=35% Similarity=0.589 Sum_probs=391.2 Q ss_pred CCHHHHHHHHCCCEEEEEEEEEEEECCCCCCCCC-CCCCCCCCCCHHHH-----HHHHCC-CCCCCCC-----CCCC--- Q ss_conf 8988999984895089999999872255555544-55678887700254-----455301-3577666-----6754--- Q gi|254780700|r 38 VDLPPVIARVSPSIVSVMVEPKKKVSVEQMFNAY-GFGNLPEDHPLKNY-----FRKDFH-KFFSGEE-----PILS--- 102 (489) Q Consensus 38 ~~~~~~~~~~~paVV~i~~~~~~~~~~~~~~~~~-~~~~~~~~~~~~~~-----~~~~~~-~~~~~~~-----~~~~--- 102 (489) ++|++++|++.||||+|+++...+....+..... ..+.+|.+.||++| |++||+ +.++... ++.+ T Consensus 1 ~sfa~~ve~~~PaVVnI~~~~~~~~~~~~~~~~~D~~p~~~~g~~fddf~Fd~~F~~FFg~~~~p~~~~~~~~~~~~~~g 80 (484) T TIGR02037 1 PSFAPLVEKVAPAVVNISTEGTVKRRNGPGFDPLDRMPENPGGSPFDDFEFDEFFDQFFGDDEMPNQPGGREFPQPEFVG 80 (484) T ss_pred CCHHHHHHHHCCCEEEEEEEEEEEECCCCCCCCCCCCCCCCCCCCCCCCCCCCCHHHHCCCCCCCCCCCCCCCCCCCCCC T ss_conf 97579899846972899988998514678887656788777778555666874105664888788888888888520037 Q ss_pred -CCCCCCCCEEEEEECCC-C--EEEECHHCCCCCCEEEEECCCCEEEEECCCCCCCCCCEEEEEEECC-CCCCCCCCCCC Q ss_conf -45552234027897599-6--2985101047871437962898067401112334443289996067-66765565567 Q gi|254780700|r 103 -DTVERLMFGSGFFITDD-G--YILTSNHIVEDGASFSVILSDDTELPAKLVGTDALFDLAVLKVQSD-RKFIPVEFEDA 177 (489) Q Consensus 103 -~~~~~~~~GsG~ii~~~-G--~ilTn~hvv~~a~~i~V~~~dg~~~~a~vvg~D~~~DlAvlki~~~-~~~~~~~lg~s 177 (489) ++++.+++||||||++| | |||||||||++|++|+|+|+|||+|+||+||.|+++|||||||+++ ++||+++|||| T Consensus 81 ~~~~~~~~LGSGvIi~~d~Gk~YilTNnHVv~gA~~I~V~L~DgrefkAklvG~D~~~D~AvlKi~~~D~~Lp~~~~GDS 160 (484) T TIGR02037 81 ERERKVRGLGSGVIISADKGKFYILTNNHVVDGADEITVTLSDGREFKAKLVGKDPRTDIAVLKIEAKDKKLPVVKLGDS 160 (484) T ss_pred CCCEEEEECCCCEEEECCCCEEEEEECCEEECCCCEEEEEECCCCEEEEEEECCCCCEEEEEEEEECCCCCCCEEEECCC T ss_conf 64037764144189847898699987543636853799994599485568866677213899998278897456773485 Q ss_pred CCCCCCCEEEEECCCCC-CCCCCCCCCCCCCCCC-CC-CCCCCEEEEEEEEECCCCCCEEEECCCEEEEEECCCCCCCCC Q ss_conf 31112414675236655-3111125874431122-33-443420233233201347703540343035551234455322 Q gi|254780700|r 178 NNIRVGEAVFTIGNPFR-LRGTVSAGIVSALDRD-IP-DRPGTFTQIDAPINQGNSGGPCFNALGHVIGVNAMIVTSGQF 254 (489) Q Consensus 178 ~~~~~G~~v~aiG~P~g-~~~tvt~GiiSa~~R~-~~-~~~~~~iqtDa~InpGnSGGpl~n~~G~viGint~i~~~~g~ 254 (489) |+||+||||+||||||| |++|||+|||||++|+ ++ ..|++|||||||||||||||||||++||||||||||||++|| T Consensus 161 D~LrVGd~V~AIGNPFGNlg~TVT~GIVSAlgRs~~~~~~y~~FIQTDAAINpGNSGGPLvN~~GEvIGINTaI~S~sGG 240 (484) T TIGR02037 161 DKLRVGDWVLAIGNPFGNLGQTVTSGIVSALGRSGLGIGDYENFIQTDAAINPGNSGGPLVNLRGEVIGINTAIYSPSGG 240 (484) T ss_pred CCCEECCEEEEEECCCCCCCCEEEEEEEEEECCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCEEEEEEEEEECCCC T ss_conf 55224349999327742458425788998321688887774765022423374788775356785388888887617888 Q ss_pred CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC-CCCCHHHHHHHCCCCCCCCEEEECCCCCCCCCCCCCHHHHHHHHH Q ss_conf 2222321123321100100002333333433200-034216676441764444113201111121134671167888752 Q gi|254780700|r 255 HMGVGLIIPLSIIKKAIPSLISKGRVDHGWFGIM-TQNLTQELAIPLGLRGTKGSLITAVVKESPADKAGMKVGDVICML 333 (489) Q Consensus 255 ~~GigfaIP~~~~~~i~~~l~~~g~v~rg~lGv~-~~~v~~~la~~lgl~~~~GvlV~~V~~~sPA~~AGLk~GDvI~~i 333 (489) |+|||||||+|+++++++||+++|+|+||||||+ +|++|+|+|++|||+.++||||++|.|+|||+|||||+||||+++ T Consensus 241 ~~GIGFAIP~n~a~~v~~ql~~~G~V~RG~LGV~~~q~~~~d~A~~lGl~~~~GALV~~V~~gSPA~kAGlk~GDvI~~~ 320 (484) T TIGR02037 241 NVGIGFAIPSNMAKNVVDQLIEGGKVQRGWLGVTQIQEITSDLAKSLGLEKQEGALVAQVLPGSPAEKAGLKAGDVILSV 320 (484) T ss_pred EECCHHHHHHHHHHHHHHHHHHCCEEEECEEEEEECCCCCHHHHHHHCCCCCCCEEEEEECCCCCHHCCCCCCCCEEEEE T ss_conf 10101021268899999999838928702151010775797999970888536558885448970100675326689985 Q ss_pred CCCCCCCCCCEEEEECCCCCCCCEEEEECCCCCEEEECCCCC--------CCCCCCCCCC----CCCCCCCCCCEEEEEC Q ss_conf 431478743101222035667520101204781665125565--------5876310000----1246545254698728 Q gi|254780700|r 334 DGRIIKSHQDFVWQIASRSPKEQVKISLCKEGSKHSVAVVLG--------SSPTAKNDMH----LEVGDKELLGMVLQDI 401 (489) Q Consensus 334 ng~~I~~~~~l~~~i~~~~~G~~v~l~v~R~g~~~~~~V~l~--------~~p~~~~~~~----~~~~~~~~lGl~v~~l 401 (489) ||++|+++.||++.|+.++||++++|+|+|+||+++++|+|+ ..+......+ .........|+++.+| T Consensus 321 nGk~i~~~~~L~~~i~~~~pG~~~~L~i~R~Gk~~~~~V~l~~Ld~~~a~~~~~~~~~~~~~~~~~~~~~~~~Gl~v~~L 400 (484) T TIGR02037 321 NGKKIKSFADLRRAIGTLKPGKKVTLTILRKGKEKTITVTLGELDEKTASAAPEEKASSERSTEPGVGRLGFPGLSVANL 400 (484) T ss_pred CCEEECCHHHHHHHHHCCCCCCEEEEEEEECCEEEEEEEEEECCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCEEEECCC T ss_conf 88640587999898740589877999999788688899998107850025664555555432265345532250162379 Q ss_pred CHHHCCE--------EEEEEEEECCCCHHHHCCCCCCCEEEEECCEECCCHHHHHHHHHHHHHCCCCEEEEEEEECCCCC Q ss_conf 9657152--------00799960688978982999888999889999389999999999886259956999997177643 Q gi|254780700|r 402 NDGNKKL--------VRIVALNPNREREVEAKGIQKGMTIVSVNTHEVSCIKDVERLIGKAKEKKRDSVLLQIKYDPDMQ 473 (489) Q Consensus 402 ~~~~~~~--------~gi~vv~v~~~s~Aa~~GL~~GDiIl~VNg~~V~s~~dl~~iL~~~k~~~~~~VLL~V~r~~~~~ 473 (489) +++.++. .||+|+.+.++|+|+++|||+||+|++||+++|+|++||.++|+++++.+.+.++|.|+|++. T Consensus 401 ~~~~~~~l~~~~~~~~Gv~V~~v~~~s~Aa~~Gl~~GDvI~~vN~~~V~s~~e~~~~l~~~~k~~~k~~~L~i~Rg~~-- 478 (484) T TIGR02037 401 TPEIAKKLLNLAGVSKGVVVTKVVSGSPAARAGLQPGDVILSVNQQPVSSVAELNKVLARAKKGGRKKVALLIERGGA-- 478 (484) T ss_pred CHHHHHHHCCCCCCCCCEEEEEECCCCHHHHCCCCCCCEEEEECCCCCCCHHHHHHHHHHHCCCCCEEEEEEEEECCE-- T ss_conf 989999871322677748999733888899717876618995088014678999999997328870479999998780-- Q ss_pred CCCCCCEEEEEE Q ss_conf 346884368887 Q gi|254780700|r 474 SGNDNMSRFVSL 485 (489) Q Consensus 474 ~~~~~~~rFVal 485 (489) .+|+++ T Consensus 479 ------~~~~~~ 484 (484) T TIGR02037 479 ------TIFVTL 484 (484) T ss_pred ------EEEEEC T ss_conf ------689769 No 2 >PRK10942 serine endoprotease; Provisional Probab=100.00 E-value=0 Score=836.56 Aligned_cols=438 Identities=29% Similarity=0.461 Sum_probs=370.4 Q ss_pred CCCCCCCCCCHHHHHHHHCCCEEEEEEEEEEEECCCCCCCCCCCCCCCCCCCHHHHHHHHCCCCC---CCCCCCCCCCCC Q ss_conf 11347555898899998489508999999987225555554455678887700254455301357---766667544555 Q gi|254780700|r 30 EAKLPPSSVDLPPVIARVSPSIVSVMVEPKKKVSVEQMFNAYGFGNLPEDHPLKNYFRKDFHKFF---SGEEPILSDTVE 106 (489) Q Consensus 30 ~~~~~~~~~~~~~~~~~~~paVV~i~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~---~~~~~~~~~~~~ 106 (489) .+...+..+||++++|+++||||||+++.........+.. .....+..+.||.+++..|+..+| .+..+..+++++ T Consensus 31 ~~~~~~~~psfa~~v~~~~PaVVnI~~~~~~~~~~~~~~~-~~~~ffg~~~p~~~~~~pf~~~~~~~~~~~~~~~~~~~~ 109 (474) T PRK10942 31 SATTAQQMPSLAPMLEKVMPSVVSINVEGSTTVNTPRMPR-NFQQFFGDDSPFCQDGSPFQSSPFCQGGQGGNGGGQQQK 109 (474) T ss_pred CCCCCCCCCCHHHHHHHHCCCEEEEEEEEEEECCCCCCCH-HHHHCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC T ss_conf 7544567998799998758957999988776325777871-344304667743334674334643235667667886544 Q ss_pred CCCCEEEEEECCC-CEEEECHHCCCCCCEEEEECCCCEEEEECCCCCCCCCCEEEEEEECCCCCCCCCCCCCCCCCCCCE Q ss_conf 2234027897599-629851010478714379628980674011123344432899960676676556556731112414 Q gi|254780700|r 107 RLMFGSGFFITDD-GYILTSNHIVEDGASFSVILSDDTELPAKLVGTDALFDLAVLKVQSDRKFIPVEFEDANNIRVGEA 185 (489) Q Consensus 107 ~~~~GsG~ii~~~-G~ilTn~hvv~~a~~i~V~~~dg~~~~a~vvg~D~~~DlAvlki~~~~~~~~~~lg~s~~~~~G~~ 185 (489) .+++|||||||++ ||||||||||++|++|+|+|.||++|+|+++|.|+.+|||||||+..++||+++||||+.+++||| T Consensus 110 ~~~lGSG~ii~~d~GyIvTN~HVV~~a~~i~V~l~dg~~~~A~vvG~D~~~DlAvlki~~~~~l~~~~lgdS~~l~vGd~ 189 (474) T PRK10942 110 FMALGSGVIIDADKGYVVTNNHVVDNATVIKVQLSDGRKFDAKVVGKDPRSDIALIQIQNPKNLTAIKMADSDALRVGDY 189 (474) T ss_pred CCCCCCEEEEECCCCEEEECHHHHCCCCEEEEEECCCCEEEEEEEEECCCCCEEEEEECCCCCCCEEECCCCCCCCCCCE T ss_conf 35676779997998889807688499848999917999998899984487667999932788895466158775557888 Q ss_pred EEEECCCCCCCCCCCCCCCCCCCCCC--CCCCCCEEEEEEEEECCCCCCEEEECCCEEEEEECCCCCCCCCCCCCCCCCC Q ss_conf 67523665531111258744311223--3443420233233201347703540343035551234455322222232112 Q gi|254780700|r 186 VFTIGNPFRLRGTVSAGIVSALDRDI--PDRPGTFTQIDAPINQGNSGGPCFNALGHVIGVNAMIVTSGQFHMGVGLIIP 263 (489) Q Consensus 186 v~aiG~P~g~~~tvt~GiiSa~~R~~--~~~~~~~iqtDa~InpGnSGGpl~n~~G~viGint~i~~~~g~~~GigfaIP 263 (489) |+|||||||+++|||.|||||++|+. .+.|.+||||||+||||||||||||++||||||||+|++++|+++||||||| T Consensus 190 ViAiGnP~Gl~~tvT~GIVSa~~R~~~~~~~~~~~IQTDAaINpGNSGGPLvn~~GeVIGINtaI~s~~gg~~GigFAIP 269 (474) T PRK10942 190 TVAIGNPFGLGETVTSGIVSALGRSGLNVENYENFIQTDAAINRGNSGGALVNLNGELIGINTAILAPDGGNIGIGFAIP 269 (474) T ss_pred EEEEECCCCCCCCCCCEEEEEECCCCCCCCCCCCEEEEECCCCCCCCCCCEECCCCCEEEEEEEEECCCCCCCCCCCCCC T ss_conf 99975799878750111687404677786661453786335478998870564799786355578626787554000157 Q ss_pred CCCCCCCCCCCCCCCCCCCCCCCCCCCCCHHHHHHHCCCCCCCCEEEECCCCCCCCCCCCCHHHHHHHHHCCCCCCCCCC Q ss_conf 33211001000023333334332000342166764417644441132011111211346711678887524314787431 Q gi|254780700|r 264 LSIIKKAIPSLISKGRVDHGWFGIMTQNLTQELAIPLGLRGTKGSLITAVVKESPADKAGMKVGDVICMLDGRIIKSHQD 343 (489) Q Consensus 264 ~~~~~~i~~~l~~~g~v~rg~lGv~~~~v~~~la~~lgl~~~~GvlV~~V~~~sPA~~AGLk~GDvI~~ing~~I~~~~~ 343 (489) +|++++++++|+++|++.|||||+.+++++++++++++++...|++|++|.++|||++||||+||+|+++||++|.+..| T Consensus 270 ~n~a~~v~~~L~~~G~v~rg~lGv~~~~v~~~la~~lgl~~~~GalV~~V~~~sPA~kAGL~~GDVI~~vdG~~I~~~~d 349 (474) T PRK10942 270 SNMVKNLTSQMVEYGQVKRGELGIMGTELNSELAKAMKVDAQRGAFVSQVLPNSSAAKAGIKAGDVITSLNGKPISSFAA 349 (474) T ss_pred HHHHHHHHHHHHHCCCCCCEEEEEEEEECCHHHHHHCCCCCCCCCEEEECCCCCCHHHCCCCCCCEEEEECCEECCCHHH T ss_conf 89999899999744753210331598853725677617776777265201779936776999899999989989689999 Q ss_pred EEEEECCCCCCCCEEEEECCCCCEEEECCCCCCCCCCCCCCCCCCCCCCCCCEEEEECCHHHCCEEEEEEEEECCCCHHH Q ss_conf 01222035667520101204781665125565587631000012465452546987289657152007999606889789 Q gi|254780700|r 344 FVWQIASRSPKEQVKISLCKEGSKHSVAVVLGSSPTAKNDMHLEVGDKELLGMVLQDINDGNKKLVRIVALNPNREREVE 423 (489) Q Consensus 344 l~~~i~~~~~G~~v~l~v~R~g~~~~~~V~l~~~p~~~~~~~~~~~~~~~lGl~v~~l~~~~~~~~gi~vv~v~~~s~Aa 423 (489) |.+.++.+++|++++|+|+|+||+++++|+|++.+....... ....|+...+++... ...|+++..+.++|+|+ T Consensus 350 L~~~v~~~~pG~~V~l~v~R~Gk~~~v~Vtl~~~~~~~~~~~-----~~~~gl~~~~l~~~~-~~~GVvV~~V~~~S~Aa 423 (474) T PRK10942 350 LRAQVGTMPVGSKMTLGLLRDGKPVNVNLELQQSSQNQVDSS-----TIFNGIEGAEMSNKG-KDKGVVVDNVKPGTPAA 423 (474) T ss_pred HHHHHHCCCCCCEEEEEEEECCEEEEEEEEECCCCCCCCCCC-----CCCCCCCCCCCCCCC-CCCCEEEEEECCCCHHH T ss_conf 999996189888899999999989999999667875543345-----455686512355567-88866999947999799 Q ss_pred HCCCCCCCEEEEECCEECCCHHHHHHHHHHHHHCCCCEEEEEEEECCCCCCCCCCCEEEEEEE Q ss_conf 829998889998899993899999999998862599569999971776433468843688875 Q gi|254780700|r 424 AKGIQKGMTIVSVNTHEVSCIKDVERLIGKAKEKKRDSVLLQIKYDPDMQSGNDNMSRFVSLK 486 (489) Q Consensus 424 ~~GL~~GDiIl~VNg~~V~s~~dl~~iL~~~k~~~~~~VLL~V~r~~~~~~~~~~~~rFVal~ 486 (489) ++||++||+|++||+++|+|++||++++++ ++..++|+|+|+ +++.||.|+ T Consensus 424 ~aGLr~GDVI~~VN~~~V~s~~dl~~~l~~----~~~~v~L~V~Rg--------g~~~fv~lk 474 (474) T PRK10942 424 QIGLKKGDVIIGANQQPVKNIAELRKILDS----KPSVLALNIQRG--------DSSIYLLMQ 474 (474) T ss_pred HCCCCCCCEEEEECCEECCCHHHHHHHHHH----CCCEEEEEEEEC--------CEEEEEEEC T ss_conf 859999988997799884999999999960----898389999989--------957999969 No 3 >PRK10139 serine endoprotease; Provisional Probab=100.00 E-value=0 Score=824.50 Aligned_cols=415 Identities=30% Similarity=0.498 Sum_probs=362.2 Q ss_pred CCCCHHHHHHHHCCCEEEEEEEEEEEECCCCCCCCCCCCCCCCCCCHHHHHHHHCCCCCCCCCCCCCCCCCCCCCEEEEE Q ss_conf 55898899998489508999999987225555554455678887700254455301357766667544555223402789 Q gi|254780700|r 36 SSVDLPPVIARVSPSIVSVMVEPKKKVSVEQMFNAYGFGNLPEDHPLKNYFRKDFHKFFSGEEPILSDTVERLMFGSGFF 115 (489) Q Consensus 36 ~~~~~~~~~~~~~paVV~i~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~GsG~i 115 (489) ..+||++++|+++||||||+++...... .++.++|++||+..++. ++.++.+++||||| T Consensus 38 ~~psfa~~v~~~~paVVni~~~~~~~~~----------------~~~~~~f~~ff~~~~~~-----~~~~~~~~~GSG~i 96 (455) T PRK10139 38 PLPSLAPMLEKVLPAVVSVRVEGTASQG----------------QKIPEEFKKFFGDDLPD-----QPAQPFEGLGSGVI 96 (455) T ss_pred CCCCHHHHHHHHCCCEEEEEEEEECCCC----------------CCCCHHHHHHHCCCCCC-----CCCCCCCCCCCEEE T ss_conf 8998699998658956999997750457----------------78977788753657777-----88886675767799 Q ss_pred EC-CCCEEEECHHCCCCCCEEEEECCCCEEEEECCCCCCCCCCEEEEEEECCCCCCCCCCCCCCCCCCCCEEEEECCCCC Q ss_conf 75-99629851010478714379628980674011123344432899960676676556556731112414675236655 Q gi|254780700|r 116 IT-DDGYILTSNHIVEDGASFSVILSDDTELPAKLVGTDALFDLAVLKVQSDRKFIPVEFEDANNIRVGEAVFTIGNPFR 194 (489) Q Consensus 116 i~-~~G~ilTn~hvv~~a~~i~V~~~dg~~~~a~vvg~D~~~DlAvlki~~~~~~~~~~lg~s~~~~~G~~v~aiG~P~g 194 (489) || +|||||||||||++|++|+|+|+||++|+|+++|.|+.+|||||||+.+++|++++||||+.+++||||+||||||| T Consensus 97 ids~dG~IvTN~HVV~~a~~i~V~l~dg~~~~A~vvG~D~~~DlAvlki~~~~~l~~~~~gdS~~l~vG~~ViAiGnP~G 176 (455) T PRK10139 97 IDAAKGYVLTNNHVINQAQKISIQLNDGREFDAKLIGSDDQSDIALLQIQNPSKLTQIAIADSDKLRVGDFAVAVGNPFG 176 (455) T ss_pred EECCCCEEECCHHHHCCCCEEEEEECCCCEEEEEEEEECCCCCEEEEEECCCCCCCEEECCCCCCCCCCCEEEEEECCCC T ss_conf 98999889828799499868999917999998999983578667999952688895445578665768998999867998 Q ss_pred CCCCCCCCCCCCCCCCCC--CCCCCEEEEEEEEECCCCCCEEEECCCEEEEEECCCCCCCCCCCCCCCCCCCCCCCCCCC Q ss_conf 311112587443112233--443420233233201347703540343035551234455322222232112332110010 Q gi|254780700|r 195 LRGTVSAGIVSALDRDIP--DRPGTFTQIDAPINQGNSGGPCFNALGHVIGVNAMIVTSGQFHMGVGLIIPLSIIKKAIP 272 (489) Q Consensus 195 ~~~tvt~GiiSa~~R~~~--~~~~~~iqtDa~InpGnSGGpl~n~~G~viGint~i~~~~g~~~GigfaIP~~~~~~i~~ 272 (489) +++|+|.|||||++|+.. +.+.+||||||+||||||||||||++||||||||+|+++++++.|||||||+|+++++++ T Consensus 177 l~~tvt~GIvSa~~R~~~~~~~~~~~IQTDAaINpGNSGGPLvn~~GeVIGINtai~s~~gg~~GigFAIP~n~a~~v~~ 256 (455) T PRK10139 177 LGQTATSGIISALGRSGLNLEGLENFIQTDASINRGNSGGALLNLNGELIGINTAILAPGGGSVGIGFAIPSNMARTLAQ 256 (455) T ss_pred CCCCEEEEEEEECCCCCCCCCCCCCEEEEECCCCCCCCCCCHHHCCCCEEEEEEEEECCCCCCCCEEEECCHHHHHHHHH T ss_conf 88733555773125676676560354785044378888871100378055634689825887442477416999988776 Q ss_pred CCCCCCCCCCCCCCCCCCCCHHHHHHHCCCCCCCCEEEECCCCCCCCCCCCCHHHHHHHHHCCCCCCCCCCEEEEECCCC Q ss_conf 00023333334332000342166764417644441132011111211346711678887524314787431012220356 Q gi|254780700|r 273 SLISKGRVDHGWFGIMTQNLTQELAIPLGLRGTKGSLITAVVKESPADKAGMKVGDVICMLDGRIIKSHQDFVWQIASRS 352 (489) Q Consensus 273 ~l~~~g~v~rg~lGv~~~~v~~~la~~lgl~~~~GvlV~~V~~~sPA~~AGLk~GDvI~~ing~~I~~~~~l~~~i~~~~ 352 (489) +|+++|++.|||||+.+|+++++++++++|+...|++|++|.|+|||++||||+||+|+++||++|++..||.+.++.++ T Consensus 257 ~L~~~G~v~rg~LGv~~~~lt~~~a~~~gl~~~~GalV~~V~~~sPA~kAGLk~GDVI~~vnG~~V~~~~dL~~~v~~~~ 336 (455) T PRK10139 257 QLIDFGEIKRGLLGIKGTEMSADIAKAFNLDVQRGAFVSEVLPNSGSAKAGVKSGDIITSLNGKPLNSFAELRSRIATTE 336 (455) T ss_pred HHHCCCCCCCCEEEEEEEECCHHHHHHCCCCCCCCCEEEEECCCCCHHHCCCCCCCEEEEECCEECCCHHHHHHHHHCCC T ss_conf 65026710331566887652655665416777777356654478836876999999999989989689999999996089 Q ss_pred CCCCEEEEECCCCCEEEECCCCCCCCCCCCCCCCCCCCCCCCCEEEEECCHHHCCEEEEEEEEECCCCHHHHCCCCCCCE Q ss_conf 67520101204781665125565587631000012465452546987289657152007999606889789829998889 Q gi|254780700|r 353 PKEQVKISLCKEGSKHSVAVVLGSSPTAKNDMHLEVGDKELLGMVLQDINDGNKKLVRIVALNPNREREVEAKGIQKGMT 432 (489) Q Consensus 353 ~G~~v~l~v~R~g~~~~~~V~l~~~p~~~~~~~~~~~~~~~lGl~v~~l~~~~~~~~gi~vv~v~~~s~Aa~~GL~~GDi 432 (489) ||++++|+++|+||+++++|+|+..+....... .......|.++.+.... ....++++..+.++|+|+++||++||+ T Consensus 337 pG~~v~l~v~R~Gk~~~~~vtl~~~~~~~~~~~--~~~~~l~g~~l~~~~~~-~~~~GVvV~~V~~gS~Aa~aGLr~GDV 413 (455) T PRK10139 337 PGTKVKLGLLRNGKPLEVEVTLDTSTSSSASAE--MIAPALQGATLSDGQLK-DGTKGIKIDEVVKGSPAAQAGLQKDDV 413 (455) T ss_pred CCCEEEEEEEECCEEEEEEEEECCCCCCCCCCC--CCCCCCCCCCCCCCCCC-CCCCCEEEEEECCCCHHHHCCCCCCCE T ss_conf 888899999999979999999578887653321--14534466644423344-699747999847899899869999999 Q ss_pred EEEECCEECCCHHHHHHHHHHHHHCCCCEEEEEEEECCCCCCCCCCCEEEEEEE Q ss_conf 998899993899999999998862599569999971776433468843688875 Q gi|254780700|r 433 IVSVNTHEVSCIKDVERLIGKAKEKKRDSVLLQIKYDPDMQSGNDNMSRFVSLK 486 (489) Q Consensus 433 Il~VNg~~V~s~~dl~~iL~~~k~~~~~~VLL~V~r~~~~~~~~~~~~rFVal~ 486 (489) |++||+++|+|++||++++++ ++..++|+|+| .+++.||.|+ T Consensus 414 I~~VN~~~V~sv~d~~~~l~~----~~~~v~L~V~R--------gg~~~fv~LR 455 (455) T PRK10139 414 IIGVNRDRVNSIAEMRKVLAA----KPAIIALQIVR--------GNESIYLLLR 455 (455) T ss_pred EEEECCEECCCHHHHHHHHHC----CCCEEEEEEEE--------CCEEEEEEEC T ss_conf 997799873999999999855----89728999998--------9968999959 No 4 >PRK10898 serine endoprotease; Provisional Probab=100.00 E-value=0 Score=653.99 Aligned_cols=308 Identities=29% Similarity=0.508 Sum_probs=284.5 Q ss_pred CCCCCHHHHHHHHCCCEEEEEEEEEEEECCCCCCCCCCCCCCCCCCCHHHHHHHHCCCCCCCCCCCCCCCCCCCCCEEEE Q ss_conf 55589889999848950899999998722555555445567888770025445530135776666754455522340278 Q gi|254780700|r 35 PSSVDLPPVIARVSPSIVSVMVEPKKKVSVEQMFNAYGFGNLPEDHPLKNYFRKDFHKFFSGEEPILSDTVERLMFGSGF 114 (489) Q Consensus 35 ~~~~~~~~~~~~~~paVV~i~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~GsG~ 114 (489) ....+|++++++++||||+|++...... ...+.+..++|||| T Consensus 42 ~~~~s~~~~v~~a~PaVV~I~~~~~~~~--------------------------------------~~~~~~~~~lGSGv 83 (355) T PRK10898 42 ETPASYNQAVRRAAPAVVNVYNRSLNST--------------------------------------SHNQLEIRTLGSGV 83 (355) T ss_pred CCCCCHHHHHHHHCCCEEEEEEEECCCC--------------------------------------CCCCCCCCCCEEEE T ss_conf 6864489999974897799995542567--------------------------------------77766657750089 Q ss_pred EECCCCEEEECHHCCCCCCEEEEECCCCEEEEECCCCCCCCCCEEEEEEECCCCCCCCCCCCCCCCCCCCEEEEECCCCC Q ss_conf 97599629851010478714379628980674011123344432899960676676556556731112414675236655 Q gi|254780700|r 115 FITDDGYILTSNHIVEDGASFSVILSDDTELPAKLVGTDALFDLAVLKVQSDRKFIPVEFEDANNIRVGEAVFTIGNPFR 194 (489) Q Consensus 115 ii~~~G~ilTn~hvv~~a~~i~V~~~dg~~~~a~vvg~D~~~DlAvlki~~~~~~~~~~lg~s~~~~~G~~v~aiG~P~g 194 (489) ||+++||||||||||+++++|+|+|+||++|+|+++|.|+.+|||||||+.+.++|+++|++|+.+++||||+||||||| T Consensus 84 Ii~~~G~IlTN~HVV~~a~~i~V~l~dG~~~~A~vvg~D~~tDLAvLkI~~~~~lp~~~~~~s~~l~vGd~ViAIGnP~g 163 (355) T PRK10898 84 IMDQRGYILTNKHVINDADQIIVALQDGRVFEALLVGSDSLTDLAVLKINATGGLPTIPINPKRTPHIGDVVLAIGNPYN 163 (355) T ss_pred EECCCCEEEECHHHHCCCCEEEEEECCCCEEEEEEEEECCCCCEEEEEECCCCCCCCCCCCCCCCCCCCCEEEEECCCCC T ss_conf 99199479947588599967999908999998899972577787999962677887231488654547987999537755 Q ss_pred CCCCCCCCCCCCCCCCC--CCCCCCEEEEEEEEECCCCCCEEEECCCEEEEEECCCCCCCCC---CCCCCCCCCCCCCCC Q ss_conf 31111258744311223--3443420233233201347703540343035551234455322---222232112332110 Q gi|254780700|r 195 LRGTVSAGIVSALDRDI--PDRPGTFTQIDAPINQGNSGGPCFNALGHVIGVNAMIVTSGQF---HMGVGLIIPLSIIKK 269 (489) Q Consensus 195 ~~~tvt~GiiSa~~R~~--~~~~~~~iqtDa~InpGnSGGpl~n~~G~viGint~i~~~~g~---~~GigfaIP~~~~~~ 269 (489) +++|+|.|||||++|.. .+.+.+||||||+||||||||||+|.+||||||||++++.+++ ..|||||||++++++ T Consensus 164 l~~tvT~GIVSa~~R~~~~~~~~~~~IQTDAaINpGNSGGpLvn~~G~vIGInt~~~~~s~~~~~~~GigFAIP~~~a~~ 243 (355) T PRK10898 164 LGQTITQGIISATGRIGLNPTGRQNFLQTDASINHGNSGGALVNSLGELMGINTLSFDKSNDGETPEGIGFAIPFQLATK 243 (355) T ss_pred CCCCCCCCCCCCCCCCCCCCCCCCCEEEEECCCCCCCCCCCEECCCCCEEEEEEEEECCCCCCCCCCCEEEEECHHHHHH T ss_conf 67744235322346434587665433786043378988884372688499999888604778756654378717899999 Q ss_pred CCCCCCCCCCCCCCCCCCCCCCCHHHHHHHCCCCCCCCEEEECCCCCCCCCCCCCHHHHHHHHHCCCCCCCCCCEEEEEC Q ss_conf 01000023333334332000342166764417644441132011111211346711678887524314787431012220 Q gi|254780700|r 270 AIPSLISKGRVDHGWFGIMTQNLTQELAIPLGLRGTKGSLITAVVKESPADKAGMKVGDVICMLDGRIIKSHQDFVWQIA 349 (489) Q Consensus 270 i~~~l~~~g~v~rg~lGv~~~~v~~~la~~lgl~~~~GvlV~~V~~~sPA~~AGLk~GDvI~~ing~~I~~~~~l~~~i~ 349 (489) ++++|+++|++.|||||+..+++++..++.++++...|++|.+|.|+|||++||||+||+|+++||++|.+..+|...+. T Consensus 244 v~~~Li~~G~v~rg~LGi~~~~~~~~~~~~~~~~~~~Gv~V~~V~~~sPA~~AGL~~GDvI~~idg~~v~~~~~l~~~l~ 323 (355) T PRK10898 244 IMDKLIRDGRVIRGYIGIGGREIAPLHAQGGGIDQLQGIVVNEVSPDGPAANAGIQVNDLIISVNNKPAISALETMDQVA 323 (355) T ss_pred HHHHHHHCCEEECCEEEEEEEECCHHHHHHCCCCCCCCCEEEEECCCCHHHHCCCCCCCEEEEECCEECCCHHHHHHHHH T ss_conf 99999974908552424774437988996548987775289887999958985999899999989989389999999997 Q ss_pred CCCCCCCEEEEECCCCCEEEECCCCCCCCCC Q ss_conf 3566752010120478166512556558763 Q gi|254780700|r 350 SRSPKEQVKISLCKEGSKHSVAVVLGSSPTA 380 (489) Q Consensus 350 ~~~~G~~v~l~v~R~g~~~~~~V~l~~~p~~ 380 (489) .++||++++++++|+|++++++|+|+++|.. T Consensus 324 ~~~pGd~v~l~v~R~G~~~~~~VtL~e~P~~ 354 (355) T PRK10898 324 EIRPGSVIPVVVMRDDKQLTLQVTIQEYPAT 354 (355) T ss_pred HCCCCCEEEEEEEECCEEEEEEEEECCCCCC T ss_conf 1899798999999999999999997888999 No 5 >TIGR02038 protease_degS periplasmic serine peptidase DegS; InterPro: IPR011783 Proteolytic enzymes that exploit serine in their catalytic activity are ubiquitous, being found in viruses, bacteria and eukaryotes . They include a wide range of peptidase activity, including exopeptidase, endopeptidase, oligopeptidase and omega-peptidase activity. Over 20 families (denoted S1 - S66) of serine protease have been identified, these being grouped into clans on the basis of structural similarity and other functional evidence . Structures are known for members of the clans and the structures indicate that some appear to be totally unrelated, suggesting different evolutionary origins for the serine peptidases . Not withstanding their different evolutionary origins, there are similarities in the reaction mechanisms of several peptidases. Chymotrypsin, subtilisin and carboxypeptidase C have a catalytic triad of serine, aspartate and histidine in common: serine acts as a nucleophile, aspartate as an electrophile, and histidine as a base . The geometric orientations of the catalytic residues are similar between families, despite different protein folds . The linear arrangements of the catalytic residues commonly reflect clan relationships. For example the catalytic triad in the chymotrypsin clan (PA) is ordered HDS, but is ordered DHS in the subtilisin clan (SB) and SDH in the carboxypeptidase clan (SC) , . Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Each clan is identified with two letters, the first representing the catalytic type of the families included in the clan (with the letter 'P' being used for a clan containing families of more than one of the catalytic types serine, threonine and cysteine). Some families cannot yet be assigned to clans, and when a formal assignment is required, such a family is described as belonging to clan A-, C-, M-, S-, T- or U-, according to the catalytic type. Some clans are divided into subclans because there is evidence of a very ancient divergence within the clan, for example MA(E), the gluzincins, and MA(M), the metzincins. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. The serine, threonine and cysteine peptidases utilise the amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule. This family consists of the periplasmic serine protease DegS (HhoB). They belong to MEROPS peptidase family S1, subfamily S1C (protease Do, clan PA(S)). They are a shorter paralogs of protease Do (HtrA, DegP) and DegQ (HhoA). They are found in Escherichia coli and several of the gammaproteobacteria. DegS contains a trypsin domain and a single copy of PDZ domain (in contrast to DegP with two copies). A critical role of this DegS is to sense stress by detecting misfolded proteins in the periplasm. DegS then cleaves the periplasmic domain of RseA, a transmembrane protein and inhibitor of sigmaE, activating the sigmaE-driven expression of periplasmic proteases/chaperones , , .; GO: 0004252 serine-type endopeptidase activity, 0006508 proteolysis. Probab=100.00 E-value=0 Score=635.10 Aligned_cols=310 Identities=30% Similarity=0.534 Sum_probs=284.6 Q ss_pred CCCCCHHHHHHHHCCCEEEEEEEEEEEECCCCCCCCCCCCCCCCCCCHHHHHHHHCCCCCCCCCCCCCCCCCCCCCEEEE Q ss_conf 55589889999848950899999998722555555445567888770025445530135776666754455522340278 Q gi|254780700|r 35 PSSVDLPPVIARVSPSIVSVMVEPKKKVSVEQMFNAYGFGNLPEDHPLKNYFRKDFHKFFSGEEPILSDTVERLMFGSGF 114 (489) Q Consensus 35 ~~~~~~~~~~~~~~paVV~i~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~GsG~ 114 (489) ....||...|+|++|||||||.......+.. ....+-+-+++|||| T Consensus 42 ~~~~SF~~AVR~AAPAVVNvYnr~~~~~r~~----------------------------------ndnd~L~i~~LGSGV 87 (358) T TIGR02038 42 EVEISFNKAVRRAAPAVVNVYNRSLSENRSL----------------------------------NDNDQLSIQGLGSGV 87 (358) T ss_pred CHHHHHHHHHHHCCCCEEEEEECCCCCCCCC----------------------------------CCCCCCEECCCCCEE T ss_conf 2456688642330786487752574436786----------------------------------756651461256557 Q ss_pred EECCCCEEEECHHCCCCCCEEEEECCCCEEEEECCCCCCCCCCEEEEEEECCCCCCCCCCCCCCCCCCCCEEEEECCCCC Q ss_conf 97599629851010478714379628980674011123344432899960676676556556731112414675236655 Q gi|254780700|r 115 FITDDGYILTSNHIVEDGASFSVILSDDTELPAKLVGTDALFDLAVLKVQSDRKFIPVEFEDANNIRVGEAVFTIGNPFR 194 (489) Q Consensus 115 ii~~~G~ilTn~hvv~~a~~i~V~~~dg~~~~a~vvg~D~~~DlAvlki~~~~~~~~~~lg~s~~~~~G~~v~aiG~P~g 194 (489) |+|++||||||||||++||+|-|-|.|||.|+|++||.|++||||||||+++..||.+|+-.....++||-|+||||||. T Consensus 88 Ims~~GYIlTN~Hvi~~ADqIvVALQdGr~f~A~lVG~D~~TDLAVLki~a~nGLptiP~N~~~~p~vGDVVLAIGNPYN 167 (358) T TIGR02038 88 IMSKEGYILTNKHVIKKADQIVVALQDGRVFEAELVGSDSLTDLAVLKIEADNGLPTIPVNADRQPHVGDVVLAIGNPYN 167 (358) T ss_pred EECCCCCEEECHHHHHCCCEEEEEECCCCEEEEEEECCCCCCCEEEEEEEECCCCCCCCCCCCCCCCCCCEEEECCCCCC T ss_conf 97288745133575404483998713893577676357775451688995169885112077886850437872368851 Q ss_pred CCCCCCCCCCCCCCCCCC---CCCCCEEEEEEEEECCCCCCEEEECCCEEEEEECCCCCCC---CCCCCCCCCCCCCCCC Q ss_conf 311112587443112233---4434202332332013477035403430355512344553---2222223211233211 Q gi|254780700|r 195 LRGTVSAGIVSALDRDIP---DRPGTFTQIDAPINQGNSGGPCFNALGHVIGVNAMIVTSG---QFHMGVGLIIPLSIIK 268 (489) Q Consensus 195 ~~~tvt~GiiSa~~R~~~---~~~~~~iqtDa~InpGnSGGpl~n~~G~viGint~i~~~~---g~~~GigfaIP~~~~~ 268 (489) |+||+|+|||||++|... .++.+||||||+||.|||||+|+|..||+|||||+.|-.+ +..+||+||||.+++. T Consensus 168 LGQtitqGIISAtGR~~~G~s~Grq~FlQTDAaIN~GNSGGALvnt~GeLvGInT~sf~~~~~G~~~~Gi~FAIP~~LA~ 247 (358) T TIGR02038 168 LGQTITQGIISATGRNGLGSSVGRQNFLQTDAAINAGNSGGALVNTAGELVGINTLSFQKAADGEEAEGINFAIPIKLAS 247 (358) T ss_pred CCCCEEEEEEEEECCCCCCCCCCCHHHHHHHHHHHCCCCCCEEECCCCCEEEEEHHHHCCCCCCCCCCCCCEECCHHHHH T ss_conf 45311111224330011468531024442018770889653002268652210101001457877655563104178999 Q ss_pred CCCCCCCCCCCCCCCCCCCCCCCCHHHHHHHCCCCCCCCEEEECCCCCCCCCCCCCHHHHHHHHHCCCCCCCCCCEEEEE Q ss_conf 00100002333333433200034216676441764444113201111121134671167888752431478743101222 Q gi|254780700|r 269 KAIPSLISKGRVDHGWFGIMTQNLTQELAIPLGLRGTKGSLITAVVKESPADKAGMKVGDVICMLDGRIIKSHQDFVWQI 348 (489) Q Consensus 269 ~i~~~l~~~g~v~rg~lGv~~~~v~~~la~~lgl~~~~GvlV~~V~~~sPA~~AGLk~GDvI~~ing~~I~~~~~l~~~i 348 (489) +|+++|++.|+|-||||||..+|++|-.++-+|++.-+|++|++|.|+|||++|||++.|+|+++||+++.+..++...+ T Consensus 248 ~im~Kli~dGRVIRGy~Gv~G~~I~s~~~~~lg~~~l~Givv~~vdPnGPAA~Ag~l~~Dvilk~dg~~~~g~~~~md~v 327 (358) T TIGR02038 248 KIMDKLIRDGRVIRGYIGVDGEDINSLVAQGLGLEDLRGIVVTGVDPNGPAARAGILVRDVILKVDGKEVIGAEELMDRV 327 (358) T ss_pred HHHHHHHHCCCEEEEEEECCCCCCCHHHHHHCCCCCCCEEEEECCCCCCHHHHHCCCCCCEEEEECCCCCCCHHHHHHHH T ss_conf 99998863797896875028703662666407875224078853489876765067715578986795367565545554 Q ss_pred CCCCCCCCEEEEECCCCCEEEECCCCCCCC Q ss_conf 035667520101204781665125565587 Q gi|254780700|r 349 ASRSPKEQVKISLCKEGSKHSVAVVLGSSP 378 (489) Q Consensus 349 ~~~~~G~~v~l~v~R~g~~~~~~V~l~~~p 378 (489) +..+||++|.++++|+||++++.|+++|.| T Consensus 328 A~~~PG~~v~~tvlR~Gk~l~LpV~I~E~~ 357 (358) T TIGR02038 328 AETRPGSKVLVTVLRKGKQLELPVTIDEKP 357 (358) T ss_pred HCCCCCCEEEEEEECCCCEEEEEEEEECCC T ss_conf 317999778999970696787007850015 No 6 >COG0265 DegQ Trypsin-like serine proteases, typically periplasmic, contain C-terminal PDZ domain [Posttranslational modification, protein turnover, chaperones] Probab=100.00 E-value=0 Score=473.52 Aligned_cols=306 Identities=38% Similarity=0.681 Sum_probs=277.4 Q ss_pred CCHHHHHHHHCCCEEEEEEEEEEEECCCCCCCCCCCCCCCCCCCHHHHHHHHCCCCCCCCCCCCCCCCCCCCCEEEEEEC Q ss_conf 89889999848950899999998722555555445567888770025445530135776666754455522340278975 Q gi|254780700|r 38 VDLPPVIARVSPSIVSVMVEPKKKVSVEQMFNAYGFGNLPEDHPLKNYFRKDFHKFFSGEEPILSDTVERLMFGSGFFIT 117 (489) Q Consensus 38 ~~~~~~~~~~~paVV~i~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~GsG~ii~ 117 (489) .+|+++++++.|+||+|.+...... ..|| ... .......++|||||++ T Consensus 33 ~~~~~~~~~~~~~vV~~~~~~~~~~--------------------~~~~---------~~~---~~~~~~~~~gSg~i~~ 80 (347) T COG0265 33 LSFATAVEKVAPAVVSIATGLTAKL--------------------RSFF---------PSD---PPLRSAEGLGSGFIIS 80 (347) T ss_pred CCHHHHHHHCCCCEEEEEECCCCCC--------------------HHHC---------CCC---CCHHHHCCCCCEEEEC T ss_conf 3578887641687799994133340--------------------2212---------467---4212220334469987 Q ss_pred CCCEEEECHHCCCCCCEEEEECCCCEEEEECCCCCCCCCCEEEEEEECCCCCCCCCCCCCCCCCCCCEEEEECCCCCCCC Q ss_conf 99629851010478714379628980674011123344432899960676676556556731112414675236655311 Q gi|254780700|r 118 DDGYILTSNHIVEDGASFSVILSDDTELPAKLVGTDALFDLAVLKVQSDRKFIPVEFEDANNIRVGEAVFTIGNPFRLRG 197 (489) Q Consensus 118 ~~G~ilTn~hvv~~a~~i~V~~~dg~~~~a~vvg~D~~~DlAvlki~~~~~~~~~~lg~s~~~~~G~~v~aiG~P~g~~~ 197 (489) ++|||+||+|||.+++++.|++.||++++|+++|.|+.+|+|+||++....++++.|++|+.+++|||++|||||||+.+ T Consensus 81 ~~g~ivTn~hVi~~a~~i~v~l~dg~~~~a~~vg~d~~~dlavlki~~~~~~~~~~~~~s~~l~vg~~v~aiGnp~g~~~ 160 (347) T COG0265 81 SDGYIVTNNHVIAGAEEITVTLADGREVPAKLVGKDPISDLAVLKIDGAGGLPVIALGDSDKLRVGDVVVAIGNPFGLGQ 160 (347) T ss_pred CCCEEEEEEEECCCCCEEEEECCCCCEEEEEEECCCCCCCEEEEEEECCCCCCEEEECCCCCCCCCCEEEECCCCCCCCC T ss_conf 87169973005156534688836895886688404665526999960678762267133666635874997637767666 Q ss_pred CCCCCCCCCCCCC-CC--CCCCCEEEEEEEEECCCCCCEEEECCCEEEEEECCCCCCCCCCCCCCCCCCCCCCCCCCCCC Q ss_conf 1125874431122-33--44342023323320134770354034303555123445532222223211233211001000 Q gi|254780700|r 198 TVSAGIVSALDRD-IP--DRPGTFTQIDAPINQGNSGGPCFNALGHVIGVNAMIVTSGQFHMGVGLIIPLSIIKKAIPSL 274 (489) Q Consensus 198 tvt~GiiSa~~R~-~~--~~~~~~iqtDa~InpGnSGGpl~n~~G~viGint~i~~~~g~~~GigfaIP~~~~~~i~~~l 274 (489) |+|.||||+++|. +. ..+.+||||||+||||||||||+|.+|++||||++++++++++.||+||||++.+++++++| T Consensus 161 tvt~Givs~~~r~~v~~~~~~~~~IqtdAain~gnsGgpl~n~~g~~iGint~~~~~~~~~~gigfaiP~~~~~~v~~~l 240 (347) T COG0265 161 TVTSGIVSALGRTGVGSAGGYVNFIQTDAAINPGNSGGPLVNIDGEVVGINTAIIAPSGGSSGIGFAIPVNLVAPVLDEL 240 (347) T ss_pred CEEEEEEECCCCCCCCCCCCCCCEEEECCCCCCCCCCCCEECCCCCEEEEEEEEEECCCCCCCEEEEECHHHHHHHHHHH T ss_conf 33124682244464335667554475146437898888501467718988866640478655425893188899999988 Q ss_pred CCCCCCCCCCCCCCCCCCHHHHHHHCCCCCCCCEEEECCCCCCCCCCCCCHHHHHHHHHCCCCCCCCCCEEEEECCCCCC Q ss_conf 02333333433200034216676441764444113201111121134671167888752431478743101222035667 Q gi|254780700|r 275 ISKGRVDHGWFGIMTQNLTQELAIPLGLRGTKGSLITAVVKESPADKAGMKVGDVICMLDGRIIKSHQDFVWQIASRSPK 354 (489) Q Consensus 275 ~~~g~v~rg~lGv~~~~v~~~la~~lgl~~~~GvlV~~V~~~sPA~~AGLk~GDvI~~ing~~I~~~~~l~~~i~~~~~G 354 (489) +.+|++.|||+|+..++++++.+ +|++...|++|..|.++|||+++|++.||+|+++||+++.+..++...+....+| T Consensus 241 ~~~G~v~~~~lgv~~~~~~~~~~--~g~~~~~G~~V~~v~~~spa~~agi~~Gdii~~~ng~~v~~~~~l~~~v~~~~~g 318 (347) T COG0265 241 ISKGKVVRGYLGVIGEPLTADIA--LGLPVAAGAVVLGVLPGSPAAKAGIKAGDIITAVNGKPVASLSDLVAAVASNRPG 318 (347) T ss_pred HHCCCCCCCCCCEEEEECCCHHC--CCCCCCCCCEEEEECCCCHHHHCCCCCCCEEEEECCEECCCHHHHHHHHHCCCCC T ss_conf 65697434635427687451102--6876788768865179985787378778779978998855788888887326999 Q ss_pred CCEEEEECCCCCEEEECCCCCCC Q ss_conf 52010120478166512556558 Q gi|254780700|r 355 EQVKISLCKEGSKHSVAVVLGSS 377 (489) Q Consensus 355 ~~v~l~v~R~g~~~~~~V~l~~~ 377 (489) +.+.++++|+|+++++.+++.++ T Consensus 319 ~~v~~~~~r~g~~~~~~v~l~~~ 341 (347) T COG0265 319 DEVALKLLRGGKERELAVTLGDR 341 (347) T ss_pred CEEEEEEEECCEEEEEEEECCCC T ss_conf 76889999788357776861555 No 7 >KOG1320 consensus Probab=100.00 E-value=0 Score=318.87 Aligned_cols=303 Identities=25% Similarity=0.339 Sum_probs=244.9 Q ss_pred HHHHHHHHCCCEEEEEEEEEEEECCCCCCCCCCCCCCCCCCCHHHHHHHHCCCCCCCCCCCCCCCCCCCCCEEEEEECCC Q ss_conf 88999984895089999999872255555544556788877002544553013577666675445552234027897599 Q gi|254780700|r 40 LPPVIARVSPSIVSVMVEPKKKVSVEQMFNAYGFGNLPEDHPLKNYFRKDFHKFFSGEEPILSDTVERLMFGSGFFITDD 119 (489) Q Consensus 40 ~~~~~~~~~paVV~i~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~GsG~ii~~~ 119 (489) ..++.++..+|||.|....-.. + ...|-.+. -....|||||++.| T Consensus 130 v~~~~~~cd~Avv~Ie~~ef~~-----------------~----------~~~~e~~~--------ip~l~~S~~Vv~gd 174 (473) T KOG1320 130 VAAVFEECDLAVVYIESEEFWK-----------------G----------MNPFELGD--------IPSLNGSGFVVGGD 174 (473) T ss_pred HHHHHHCCCCEEEEEEECCCCC-----------------C----------CCCCCCCC--------CCCCCCCEEEECCC T ss_conf 7766321553078875123567-----------------7----------76444678--------76556508998388 Q ss_pred CEEEECHHCCCCCCE-----------EEEECCCC--EEEEECCCCCCCCCCEEEEEEECCCC-CCCCCCCCCCCCCCCCE Q ss_conf 629851010478714-----------37962898--06740111233444328999606766-76556556731112414 Q gi|254780700|r 120 GYILTSNHIVEDGAS-----------FSVILSDD--TELPAKLVGTDALFDLAVLKVQSDRK-FIPVEFEDANNIRVGEA 185 (489) Q Consensus 120 G~ilTn~hvv~~a~~-----------i~V~~~dg--~~~~a~vvg~D~~~DlAvlki~~~~~-~~~~~lg~s~~~~~G~~ 185 (489) |+|+||+||+..... |+|.+++| ..+.+.+.|.|+..|+|+++++.+++ +++++++.|..++.|+| T Consensus 175 ~i~VTnghV~~~~~~~y~~~~~~l~~vqI~aa~~~~~s~ep~i~g~d~~~gvA~l~ik~~~~i~~~i~~~~s~~~~~G~~ 254 (473) T KOG1320 175 GIIVTNGHVVRVEPRIYAHSSTVLLRVQIDAAIGPGNSGEPVIVGVDKVAGVAFLKIKTPENILYVIPLGVSSHFRTGVE 254 (473) T ss_pred CEEEEEEEEEEEEECCCCCCCCEEEEEEEEEEECCCCCCCCEEECCCCCCCEEEEEEECCCCCCCEEECCEEEEEECCEE T ss_conf 47998358998773134577740126999970068766797686455336548999744886254376343313305718 Q ss_pred EEEECCCCCCCCCCCCCCCCCCCCCCCC-------CCCCEEEEEEEEECCCCCCEEEECCCEEEEEECCCCCCCCCCCCC Q ss_conf 6752366553111125874431122334-------434202332332013477035403430355512344553222222 Q gi|254780700|r 186 VFTIGNPFRLRGTVSAGIVSALDRDIPD-------RPGTFTQIDAPINQGNSGGPCFNALGHVIGVNAMIVTSGQFHMGV 258 (489) Q Consensus 186 v~aiG~P~g~~~tvt~GiiSa~~R~~~~-------~~~~~iqtDa~InpGnSGGpl~n~~G~viGint~i~~~~g~~~Gi 258 (489) +.|+|+||++.+|+|+|++|+..|.... ....|+|||++||+|||||||+|++|++||+|++...+.+.+.|+ T Consensus 255 ~~a~~~~f~~~nt~~qg~vs~~~R~~~~lg~~~g~~i~~~~qtd~ai~~~nsg~~ll~~DG~~IgVn~~~~~ri~~~~~i 334 (473) T KOG1320 255 VSAIGNGFGLLNTLTQGMVSGQLRKSFKLGLETGVLISKINQTDAAINPGNSGGPLLNLDGEVIGVNTRKVTRIGFSHGI 334 (473) T ss_pred EEECCCCEECCCEEEEEEECCCCCCCCCCCCCCCCEECCEEECCHHHHCCCCCCCEEECCCCEEEEEEEEEEEEECCCCC T ss_conf 77304770003444641030232576556866551101130254444064689967953684850351037774001363 Q ss_pred CCCCCCCCCCCCCCCCCCCCCC---------CCCCCCCCCCCCHHHHHH-----HCCCC--CCCCEEEECCCCCCCCCCC Q ss_conf 3211233211001000023333---------334332000342166764-----41764--4441132011111211346 Q gi|254780700|r 259 GLIIPLSIIKKAIPSLISKGRV---------DHGWFGIMTQNLTQELAI-----PLGLR--GTKGSLITAVVKESPADKA 322 (489) Q Consensus 259 gfaIP~~~~~~i~~~l~~~g~v---------~rg~lGv~~~~v~~~la~-----~lgl~--~~~GvlV~~V~~~sPA~~A 322 (489) +|++|++.+..++.+..++... .+.|+|+....++..+.. .+-.+ ...|++|.+|.|++|+... T Consensus 335 S~~~p~d~vl~~v~r~~e~~~~lr~~~~~~p~~~~~g~~s~~i~~g~vf~~~~~~~~~~~~~~q~v~is~Vlp~~~~~~~ 414 (473) T KOG1320 335 SFKIPIDTVLVIVLRLGEFQISLRPVKPLVPVHQYIGLPSYYIFAGLVFVPLTKSYIFPSGVVQLVLVSQVLPGSINGGY 414 (473) T ss_pred EECCCCHHHHHHHHHHHCCCEEECCCCCCCCCCCCCCCEEEEEECCEEEEECCCCCCCCCCCEEEEEEEEECCCCCCCCC T ss_conf 64057336210132221003231345676665666784049996537885257786665563358999886469976100 Q ss_pred CCHHHHHHHHHCCCCCCCCCCEEEEECCCCCCCCEEEEECCCCCEEEECCCCCCC Q ss_conf 7116788875243147874310122203566752010120478166512556558 Q gi|254780700|r 323 GMKVGDVICMLDGRIIKSHQDFVWQIASRSPKEQVKISLCKEGSKHSVAVVLGSS 377 (489) Q Consensus 323 GLk~GDvI~~ing~~I~~~~~l~~~i~~~~~G~~v~l~v~R~g~~~~~~V~l~~~ 377 (489) ++++||+|++|||++|.+..|+...+.....++++.+..+|..|..++.+...+. T Consensus 415 ~~~~g~~V~~vng~~V~n~~~l~~~i~~~~~~~~v~vl~~~~~e~~tl~Il~~~~ 469 (473) T KOG1320 415 GLKPGDQVVKVNGKPVKNLKHLYELIEECSTEDKVAVLDRRSAEDATLEILPEHK 469 (473) T ss_pred CCCCCCEEEEECCEEEECHHHHHHHHHHCCCCCEEEEEEECCCCCEEEEEECCCC T ss_conf 2367878998889885256879999875276766999971476302589502335 No 8 >KOG1421 consensus Probab=100.00 E-value=2e-36 Score=250.89 Aligned_cols=391 Identities=20% Similarity=0.299 Sum_probs=303.1 Q ss_pred CHHHHHHHHCCCEEEEEEEEEEEECCCCCCCCCCCCCCCCCCCHHHHHHHHCCCCCCCCCCCCCCCCCCCCCEEEEEECC Q ss_conf 98899998489508999999987225555554455678887700254455301357766667544555223402789759 Q gi|254780700|r 39 DLPPVIARVSPSIVSVMVEPKKKVSVEQMFNAYGFGNLPEDHPLKNYFRKDFHKFFSGEEPILSDTVERLMFGSGFFITD 118 (489) Q Consensus 39 ~~~~~~~~~~paVV~i~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~GsG~ii~~ 118 (489) ++..-+.+|-++||+|....... ||- .....+-++||++++ T Consensus 53 ~w~~~ia~VvksvVsI~~S~v~~-----------------------fdt----------------esag~~~atgfvvd~ 93 (955) T KOG1421 53 DWRNTIANVVKSVVSIRFSAVRA-----------------------FDT----------------ESAGESEATGFVVDK 93 (955) T ss_pred HHHHHHHHHCCCEEEEEEHHEEE-----------------------CCC----------------CCCCCCCEEEEEEEC T ss_conf 55555654130079998022210-----------------------223----------------566643115999935 Q ss_pred -CCEEEECHHCCCCCC-EEEEECCCCEEEEECCCCCCCCCCEEEEEEECCC----CCCCCCCCCCCCCCCCCEEEEECCC Q ss_conf -962985101047871-4379628980674011123344432899960676----6765565567311124146752366 Q gi|254780700|r 119 -DGYILTSNHIVEDGA-SFSVILSDDTELPAKLVGTDALFDLAVLKVQSDR----KFIPVEFEDANNIRVGEAVFTIGNP 192 (489) Q Consensus 119 -~G~ilTn~hvv~~a~-~i~V~~~dg~~~~a~vvg~D~~~DlAvlki~~~~----~~~~~~lg~s~~~~~G~~v~aiG~P 192 (489) .||||||+||+.-.. .-.+.|.+..+.+--.+..||-+|+.+++-+... ....+.+. .+..++|.....+||- T Consensus 94 ~~gyiLtnrhvv~pgP~va~avf~n~ee~ei~pvyrDpVhdfGf~r~dps~ir~s~vt~i~la-p~~akvgseirvvgND 172 (955) T KOG1421 94 KLGYILTNRHVVAPGPFVASAVFDNHEEIEIYPVYRDPVHDFGFFRYDPSTIRFSIVTEICLA-PELAKVGSEIRVVGND 172 (955) T ss_pred CCCEEEEECCCCCCCCCEEEEEECCCCCCCCCCCCCCCHHHCCEEECCHHHCCEEEEECCCCC-CCCCCCCCCEEEECCC T ss_conf 644698713556788712687740224687322017863113313127435001233044337-4311268725983277 Q ss_pred CCCCCCCCCCCCCCCCCCCCCC----CC----CEEEEEEEEECCCCCCEEEECCCEEEEEECCCCCCCCCCCCCCCCCCC Q ss_conf 5531111258744311223344----34----202332332013477035403430355512344553222222321123 Q gi|254780700|r 193 FRLRGTVSAGIVSALDRDIPDR----PG----TFTQIDAPINQGNSGGPCFNALGHVIGVNAMIVTSGQFHMGVGLIIPL 264 (489) Q Consensus 193 ~g~~~tvt~GiiSa~~R~~~~~----~~----~~iqtDa~InpGnSGGpl~n~~G~viGint~i~~~~g~~~GigfaIP~ 264 (489) -|---++-+|-+|.++|..+++ |. .|+|.-+.-..|.||.|++|..|..|..|.--.. ..+-+|+.|. T Consensus 173 agEklsIlagflSrldr~apdyg~~~yndfnTfy~QaasstsggssgspVv~i~gyAVAl~agg~~----ssas~ffLpL 248 (955) T KOG1421 173 AGEKLSILAGFLSRLDRNAPDYGEDTYNDFNTFYIQAASSTSGGSSGSPVVDIPGYAVALNAGGSI----SSASDFFLPL 248 (955) T ss_pred CCCEEEEEHHHHHHCCCCCCCCCCCCCCCCCCEEEEEHHCCCCCCCCCCEECCCCEEEEEECCCCC----CCCCCCEEEC T ss_conf 531577630135531478864343210024310340221477887898144564227755037731----1466541451 Q ss_pred CCCCCCCCCCCCCCCCCCCCCCCCCCCCHHHHHHHCCC------------CCCCCEE-EECCCCCCCCCCCCCHHHHHHH Q ss_conf 32110010000233333343320003421667644176------------4444113-2011111211346711678887 Q gi|254780700|r 265 SIIKKAIPSLISKGRVDHGWFGIMTQNLTQELAIPLGL------------RGTKGSL-ITAVVKESPADKAGMKVGDVIC 331 (489) Q Consensus 265 ~~~~~i~~~l~~~g~v~rg~lGv~~~~v~~~la~~lgl------------~~~~Gvl-V~~V~~~sPA~~AGLk~GDvI~ 331 (489) |.+.|-+..+.++..+.||-|-+++-.-.-+-+..||| +...|+| |..|.++|||++. |++||+++ T Consensus 249 drV~RaL~clq~n~PItRGtLqvefl~k~~de~rrlGL~sE~eqv~r~k~P~~tgmLvV~~vL~~gpa~k~-Le~GDill 327 (955) T KOG1421 249 DRVVRALRCLQNNTPITRGTLQVEFLHKLFDECRRLGLSSEWEQVVRTKFPERTGMLVVETVLPEGPAEKK-LEPGDILL 327 (955) T ss_pred CCHHHHHHHHHCCCCCCCCEEEEEEEHHHHHHHHHCCCCHHHHHHHHHCCCCCCEEEEEEEECCCCCHHHC-CCCCCEEE T ss_conf 10110004441599754436999984012688876189588899887428653304999873369803302-57786799 Q ss_pred HHCCCCCCCCCCEEEEECCCCCCCCEEEEECCCCCEEEECCCCCCCCCCCCCCCCCCCCCCCCCEEEEECCHHHCCEE-- Q ss_conf 524314787431012220356675201012047816651255655876310000124654525469872896571520-- Q gi|254780700|r 332 MLDGRIIKSHQDFVWQIASRSPKEQVKISLCKEGSKHSVAVVLGSSPTAKNDMHLEVGDKELLGMVLQDINDGNKKLV-- 409 (489) Q Consensus 332 ~ing~~I~~~~~l~~~i~~~~~G~~v~l~v~R~g~~~~~~V~l~~~p~~~~~~~~~~~~~~~lGl~v~~l~~~~~~~~-- 409 (489) +||+.-+.++.++...+ +...|+.+.|+|+|.|++.++.++.+.+....... ..++.|..+++++++.+..+ T Consensus 328 avN~t~l~df~~l~~iL-Degvgk~l~LtI~Rggqelel~vtvqdlh~itp~R-----~levcGav~hdlsyq~ar~y~l 401 (955) T KOG1421 328 AVNSTCLNDFEALEQIL-DEGVGKNLELTIQRGGQELELTVTVQDLHGITPDR-----FLEVCGAVFHDLSYQLARLYAL 401 (955) T ss_pred EECCEEHHHHHHHHHHH-HHCCCCEEEEEEEECCEEEEEEEEECCCCCCCCCE-----EEEECCEEECCCCHHHHHHCCC T ss_conf 98333168899999877-52358508999984888999999743346788751-----7997124731777899950146 Q ss_pred ---EEEEEEECCCCHHHHCCCCCCCEEEEECCEECCCHHHHHHHHHHHHHCCCCEEEEEEEECCCCCCCCCCCEEEEEEE Q ss_conf ---07999606889789829998889998899993899999999998862599569999971776433468843688875 Q gi|254780700|r 410 ---RIVALNPNREREVEAKGIQKGMTIVSVNTHEVSCIKDVERLIGKAKEKKRDSVLLQIKYDPDMQSGNDNMSRFVSLK 486 (489) Q Consensus 410 ---gi~vv~v~~~s~Aa~~GL~~GDiIl~VNg~~V~s~~dl~~iL~~~k~~~~~~VLL~V~r~~~~~~~~~~~~rFVal~ 486 (489) ++++.+.. +|++.+.++. |-+|.+||++++-+++.|.++++++.++.+..+-..... ....+++..++ T Consensus 402 P~~GvyVa~~~-gsf~~~~~~y-~~ii~~vanK~tPdLdaFidvlk~L~dg~rV~vry~hl~-------dkh~p~v~~v~ 472 (955) T KOG1421 402 PVEGVYVASPG-GSFRHRGPRY-GQIIDSVANKPTPDLDAFIDVLKELPDGARVPVRYHHLT-------DKHSPRVTTVT 472 (955) T ss_pred CCCCEEECCCC-CCCCCCCCCC-EEEEEEECCCCCCCHHHHHHHHHHCCCCCEEEEEEEEEC-------CCCCCEEEEEE T ss_conf 66727974677-7732358710-078876158869977899999973667876668999704-------77883379999 Q ss_pred CCC Q ss_conf 289 Q gi|254780700|r 487 IDK 489 (489) Q Consensus 487 ldk 489 (489) +|+ T Consensus 473 iDr 475 (955) T KOG1421 473 IDR 475 (955) T ss_pred EEC T ss_conf 712 No 9 >KOG1320 consensus Probab=99.96 E-value=7.7e-29 Score=202.52 Aligned_cols=351 Identities=21% Similarity=0.258 Sum_probs=252.8 Q ss_pred CCCCCEEEEEECCCCEEEECHHCCC---CCCEEEEECCCC--EEEEECCCCCCCCCCEEEEEEECCC---CCCCCCCCCC Q ss_conf 5223402789759962985101047---871437962898--0674011123344432899960676---6765565567 Q gi|254780700|r 106 ERLMFGSGFFITDDGYILTSNHIVE---DGASFSVILSDD--TELPAKLVGTDALFDLAVLKVQSDR---KFIPVEFEDA 177 (489) Q Consensus 106 ~~~~~GsG~ii~~~G~ilTn~hvv~---~a~~i~V~~~dg--~~~~a~vvg~D~~~DlAvlki~~~~---~~~~~~lg~s 177 (489) +....||||.+... .++||+|+++ ++....|. ..| ++|.|++...=.++|+|++-|+..+ ...++++++. T Consensus 84 q~~~~~s~f~i~~~-~lltn~~~v~~~~~~~~v~v~-~~gs~~k~~a~v~~~~~~cd~Avv~Ie~~ef~~~~~~~e~~~i 161 (473) T KOG1320 84 QFSSGGSGFAIYGK-KLLTNAHVVAPNNDHKFVTVK-KHGSPRKYKAFVAAVFEECDLAVVYIESEEFWKGMNPFELGDI 161 (473) T ss_pred HHCCCCCCHHHCCC-CEEECCCCCCCCCCCCCCEEC-CCCCCHHHHHHHHHHHHCCCCEEEEEEECCCCCCCCCCCCCCC T ss_conf 01246642211042-004447655642354320102-4799666644677663215530788751235677764446787 Q ss_pred CCCCCCCEEEEECCCCCCCCCCCCCCCCCCCCCC-CC--CCCCEEEEEEEEECCCCCCEEEECCCEEEEEECCCCCCCCC Q ss_conf 3111241467523665531111258744311223-34--43420233233201347703540343035551234455322 Q gi|254780700|r 178 NNIRVGEAVFTIGNPFRLRGTVSAGIVSALDRDI-PD--RPGTFTQIDAPINQGNSGGPCFNALGHVIGVNAMIVTSGQF 254 (489) Q Consensus 178 ~~~~~G~~v~aiG~P~g~~~tvt~GiiSa~~R~~-~~--~~~~~iqtDa~InpGnSGGpl~n~~G~viGint~i~~~~g~ 254 (489) -.+ .+.++.+| |.+.+||.|+|++..... .. ..-..+|+||++||||||||.+.-..++.|+....+- ..+ T Consensus 162 p~l--~~S~~Vv~---gd~i~VTnghV~~~~~~~y~~~~~~l~~vqI~aa~~~~~s~ep~i~g~d~~~gvA~l~ik-~~~ 235 (473) T KOG1320 162 PSL--NGSGFVVG---GDGIIVTNGHVVRVEPRIYAHSSTVLLRVQIDAAIGPGNSGEPVIVGVDKVAGVAFLKIK-TPE 235 (473) T ss_pred CCC--CCCEEEEC---CCCEEEEEEEEEEEEECCCCCCCCEEEEEEEEEEECCCCCCCCEEECCCCCCCEEEEEEE-CCC T ss_conf 655--65089983---884799835899877313457774012699997006876679768645533654899974-488 Q ss_pred CCCCCCCCCCCCCCCCCCCCCCCCCC-CCCCCCCCCCCC-HHHHHHHCCCCCCCCEEEECCCCCCCCCCCCCHHHHHHHH Q ss_conf 22223211233211001000023333-334332000342-1667644176444411320111112113467116788875 Q gi|254780700|r 255 HMGVGLIIPLSIIKKAIPSLISKGRV-DHGWFGIMTQNL-TQELAIPLGLRGTKGSLITAVVKESPADKAGMKVGDVICM 332 (489) Q Consensus 255 ~~GigfaIP~~~~~~i~~~l~~~g~v-~rg~lGv~~~~v-~~~la~~lgl~~~~GvlV~~V~~~sPA~~AGLk~GDvI~~ 332 (489) -|++.||.-...++.......+.. -.++++...|-+ +....+.+.|....|+++.+..+-+.|-+. ++.||.|++ T Consensus 236 --~i~~~i~~~~s~~~~~G~~~~a~~~~f~~~nt~~qg~vs~~~R~~~~lg~~~g~~i~~~~qtd~ai~~-~nsg~~ll~ 312 (473) T KOG1320 236 --NILYVIPLGVSSHFRTGVEVSAIGNGFGLLNTLTQGMVSGQLRKSFKLGLETGVLISKINQTDAAINP-GNSGGPLLN 312 (473) T ss_pred --CCCCEEECCEEEEEECCEEEEECCCCEECCCEEEEEEECCCCCCCCCCCCCCCCEECCEEECCHHHHC-CCCCCCEEE T ss_conf --62543763433133057187730477000344464103023257655686655110113025444406-468996795 Q ss_pred HCCCCCCCC-C-----CEEEEECCCCCCCCEEEEECCCCCEEEECCCCCCCCCCCCCCCCC--CCCCCCCCEEEEECCHH Q ss_conf 243147874-3-----101222035667520101204781665125565587631000012--46545254698728965 Q gi|254780700|r 333 LDGRIIKSH-Q-----DFVWQIASRSPKEQVKISLCKEGSKHSVAVVLGSSPTAKNDMHLE--VGDKELLGMVLQDINDG 404 (489) Q Consensus 333 ing~~I~~~-~-----~l~~~i~~~~~G~~v~l~v~R~g~~~~~~V~l~~~p~~~~~~~~~--~~~~~~lGl~v~~l~~~ 404 (489) +||..|.-. . .+...++.+.|+|++.+.+.|.+ ++..++............. ....-.-|+.+..++.. T Consensus 313 ~DG~~IgVn~~~~~ri~~~~~iS~~~p~d~vl~~v~r~~---e~~~~lr~~~~~~p~~~~~g~~s~~i~~g~vf~~~~~~ 389 (473) T KOG1320 313 LDGEVIGVNTRKVTRIGFSHGISFKIPIDTVLVIVLRLG---EFQISLRPVKPLVPVHQYIGLPSYYIFAGLVFVPLTKS 389 (473) T ss_pred CCCCEEEEEEEEEEEEECCCCCEECCCCHHHHHHHHHHH---CCCEEECCCCCCCCCCCCCCCEEEEEECCEEEEECCCC T ss_conf 368485035103777400136364057336210132221---00323134567666566678404999653788525778 Q ss_pred HC----CEEEEEEEEECCCCHHHHCCCCCCCEEEEECCEECCCHHHHHHHHHHHHHCCCCEEEEEEEECCCC Q ss_conf 71----520079996068897898299988899988999938999999999988625995699999717764 Q gi|254780700|r 405 NK----KLVRIVALNPNREREVEAKGIQKGMTIVSVNTHEVSCIKDVERLIGKAKEKKRDSVLLQIKYDPDM 472 (489) Q Consensus 405 ~~----~~~gi~vv~v~~~s~Aa~~GL~~GDiIl~VNg~~V~s~~dl~~iL~~~k~~~~~~VLL~V~r~~~~ 472 (489) .. ...++++..+.+++++...++.+||+|.+|||++|+++.++.++++.-.+. ..+.++.+|+... T Consensus 390 ~~~~~~~~q~v~is~Vlp~~~~~~~~~~~g~~V~~vng~~V~n~~~l~~~i~~~~~~--~~v~vl~~~~~e~ 459 (473) T KOG1320 390 YIFPSGVVQLVLVSQVLPGSINGGYGLKPGDQVVKVNGKPVKNLKHLYELIEECSTE--DKVAVLDRRSAED 459 (473) T ss_pred CCCCCCCEEEEEEEEECCCCCCCCCCCCCCCEEEEECCEEEECHHHHHHHHHHCCCC--CEEEEEEECCCCC T ss_conf 666556335899988646997610023678789988898852568799998752767--6699997147630 No 10 >KOG1421 consensus Probab=99.83 E-value=3.1e-19 Score=141.25 Aligned_cols=352 Identities=16% Similarity=0.168 Sum_probs=236.9 Q ss_pred CCCCEEEEEECC-CCEEEECHHCCC-CCCEEEEECCCCEEEEECCCCCCCCCCEEEEEEECCCCCCCCCCCCCCCCCCCC Q ss_conf 223402789759-962985101047-871437962898067401112334443289996067667655655673111241 Q gi|254780700|r 107 RLMFGSGFFITD-DGYILTSNHIVE-DGASFSVILSDDTELPAKLVGTDALFDLAVLKVQSDRKFIPVEFEDANNIRVGE 184 (489) Q Consensus 107 ~~~~GsG~ii~~-~G~ilTn~hvv~-~a~~i~V~~~dg~~~~a~vvg~D~~~DlAvlki~~~~~~~~~~lg~s~~~~~G~ 184 (489) ...-|||.|++. .|++++...|+- ++.+++|++.|....+|++.-.++...+|..|-++ +....++|-+ ..++.|| T Consensus 548 ~i~kgt~~i~d~~~g~~vvsr~~vp~d~~d~~vt~~dS~~i~a~~~fL~~t~n~a~~kydp-~~~~~~kl~~-~~v~~gD 625 (955) T KOG1421 548 DIYKGTALIMDTSKGLGVVSRSVVPSDAKDQRVTEADSDGIPANVSFLHPTENVASFKYDP-ALEVQLKLTD-TTVLRGD 625 (955) T ss_pred HHHCCCEEEEECCCCCEEEECCCCCCHHHCEEEEECCCCCCCCEEEEECCCCCEEEECCCH-HHHHHHCCCE-EEEECCC T ss_conf 2205705999736882567424377303113775212554441256733753046860286-6742101310-2674078 Q ss_pred EEEEECCCCCCC-----CCCCCCCCCCCCCCCCCC--CCCE--EEEEEEE-ECCCCCCEEEECCCEEEEEECCCCCCC-C Q ss_conf 467523665531-----111258744311223344--3420--2332332-013477035403430355512344553-2 Q gi|254780700|r 185 AVFTIGNPFRLR-----GTVSAGIVSALDRDIPDR--PGTF--TQIDAPI-NQGNSGGPCFNALGHVIGVNAMIVTSG-Q 253 (489) Q Consensus 185 ~v~aiG~P~g~~-----~tvt~GiiSa~~R~~~~~--~~~~--iqtDa~I-npGnSGGpl~n~~G~viGint~i~~~~-g 253 (489) .+--+|.-..+. -|||.=++--.-++..-+ +.++ |-.+..+ ..++| |-|.|.+|+|+++=-..+.+. + T Consensus 626 ~~~f~g~~~~~r~ltaktsv~dvs~~~~ps~~~pr~r~~n~e~Is~~~nlsT~c~s-g~ltdddg~vvalwl~~~ge~~~ 704 (955) T KOG1421 626 ECTFEGFTEDLRALTAKTSVTDVSVVIIPSSVMPRFRATNLEVISFMDNLSTSCLS-GRLTDDDGEVVALWLSVVGEDVG 704 (955) T ss_pred CEEEECCCCCCHHHCCCCEEEEEEEEEECCCCCCCEEECCEEEEEEECCCCCCCCC-EEEECCCCEEEEEEEEEECCCCC T ss_conf 22574226522212025423345789702777852330234899975143456442-17977997299998643120027 Q ss_pred CC-CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCHHHHHHHCCCCCC-------------CCEEEECCCCCCCC Q ss_conf 22-22232112332110010000233333343320003421667644176444-------------41132011111211 Q gi|254780700|r 254 FH-MGVGLIIPLSIIKKAIPSLISKGRVDHGWFGIMTQNLTQELAIPLGLRGT-------------KGSLITAVVKESPA 319 (489) Q Consensus 254 ~~-~GigfaIP~~~~~~i~~~l~~~g~v~rg~lGv~~~~v~~~la~~lgl~~~-------------~GvlV~~V~~~sPA 319 (489) +. .-.-|-.-+..++++++.|+.++.+.--.+|+.+..++-.-|..+||+.. +=..|+.|.+.-+- T Consensus 705 ~kd~~y~~gl~~~~~l~vl~rlk~g~~~rp~i~~vef~~i~laqar~lglp~e~imk~e~es~~~~ql~~ishv~~~~~k 784 (955) T KOG1421 705 GKDYTYKYGLSMSYILPVLERLKLGPSARPTIAGVEFSHITLAQARTLGLPSEFIMKSEEESTIPRQLYVISHVRPLLHK 784 (955) T ss_pred CCEEEEEECCCHHHHHHHHHHHHCCCCCCCEEECCCEEEEEEEHHHCCCCCHHHHHHHHHCCCCCCEEEEEEEECCCCCC T ss_conf 84258993464588999999973689998456014320488000020499889986555327985137999842257650 Q ss_pred CCCCCHHHHHHHHHCCCCCCCCCCEEEEECCCCCCCCEEEEECCCCCEEEECCCCCCCCCCCCCCCCCCCCCCCCCEEEE Q ss_conf 34671167888752431478743101222035667520101204781665125565587631000012465452546987 Q gi|254780700|r 320 DKAGMKVGDVICMLDGRIIKSHQDFVWQIASRSPKEQVKISLCKEGSKHSVAVVLGSSPTAKNDMHLEVGDKELLGMVLQ 399 (489) Q Consensus 320 ~~AGLk~GDvI~~ing~~I~~~~~l~~~i~~~~~G~~v~l~v~R~g~~~~~~V~l~~~p~~~~~~~~~~~~~~~lGl~v~ 399 (489) - |..||+|+++||+.|...+||.. + .++...++|+|+++++++.+-+.. ......-++|..++ T Consensus 785 --i-l~~gdiilsvngk~itr~~dl~d-~------~eid~~ilrdg~~~~ikipt~p~~-------et~r~vi~~gailq 847 (955) T KOG1421 785 --I-LGVGDIILSVNGKMITRLSDLHD-F------EEIDAVILRDGIEMEIKIPTYPEY-------ETSRAVIWMGAILQ 847 (955) T ss_pred --C-CCCCCEEEEECCEEEEEEHHHHH-H------HHHHEEEEECCCEEEEEECCCCCC-------CCCEEEEEEECCCC T ss_conf --1-13464899956767765022334-5------531204541581899982455511-------35458999703136 Q ss_pred ECCHHHCC-----EEEEEEEEECCCCHHHHCCCCCCCEEEEECCEECCCHHHHHHHHHHHHHCCCCEEEEEEEECCCCCC Q ss_conf 28965715-----2007999606889789829998889998899993899999999998862599569999971776433 Q gi|254780700|r 400 DINDGNKK-----LVRIVALNPNREREVEAKGIQKGMTIVSVNTHEVSCIKDVERLIGKAKEKKRDSVLLQIKYDPDMQS 474 (489) Q Consensus 400 ~l~~~~~~-----~~gi~vv~v~~~s~Aa~~GL~~GDiIl~VNg~~V~s~~dl~~iL~~~k~~~~~~VLL~V~r~~~~~~ 474 (489) +....... ..++++.....+|||.. +|+.-..|++|||..+.+++||.+.+.+..++.-..+.. . T Consensus 848 ~ph~av~~q~edlp~gvyvt~rg~gspalq-~l~aa~fitavng~~t~~lddf~~~~~~ipdnsyv~v~~---------m 917 (955) T KOG1421 848 PPHSAVFEQVEDLPEGVYVTSRGYGSPALQ-MLRAAHFITAVNGHDTNTLDDFYHMLLEIPDNSYVQVKQ---------M 917 (955) T ss_pred CCHHHHHHHHHCCCCCEEEEECCCCCHHHH-HCCHHEEEEEECCCCCCCHHHHHHHHHHCCCCCEEEEEE---------E T ss_conf 843889998740677538851135886674-212000688735613676889999983279886389997---------2 Q ss_pred CCCCCEEEEEEECC Q ss_conf 46884368887528 Q gi|254780700|r 475 GNDNMSRFVSLKID 488 (489) Q Consensus 475 ~~~~~~rFVal~ld 488 (489) ...+.+--|+++.+ T Consensus 918 tfd~vp~~~s~k~n 931 (955) T KOG1421 918 TFDGVPSIVSVKPN 931 (955) T ss_pred CCCCCCEEEEECCC T ss_conf 23798468995568 No 11 >cd00987 PDZ_serine_protease PDZ domain of tryspin-like serine proteases, such as DegP/HtrA, which are oligomeric proteins involved in heat-shock response, chaperone function, and apoptosis. May be responsible for substrate recognition and/or binding, as most PDZ domains bind C-terminal polypeptides, though binding to internal (non-C-terminal) polypeptides and even to lipids has been demonstrated. In this subfamily of protease-associated PDZ domains a C-terminal beta-strand forms the peptide-binding groove base, a circular permutation with respect to PDZ domains found in Eumetazoan signaling proteins. Probab=99.73 E-value=3.9e-19 Score=140.62 Aligned_cols=90 Identities=40% Similarity=0.671 Sum_probs=87.0 Q ss_pred CCCCCCCCCCHHHHHHHCCCCCCCCEEEECCCCCCCCCCCCCHHHHHHHHHCCCCCCCCCCEEEEECCCCCCCCEEEEEC Q ss_conf 43320003421667644176444411320111112113467116788875243147874310122203566752010120 Q gi|254780700|r 283 GWFGIMTQNLTQELAIPLGLRGTKGSLITAVVKESPADKAGMKVGDVICMLDGRIIKSHQDFVWQIASRSPKEQVKISLC 362 (489) Q Consensus 283 g~lGv~~~~v~~~la~~lgl~~~~GvlV~~V~~~sPA~~AGLk~GDvI~~ing~~I~~~~~l~~~i~~~~~G~~v~l~v~ 362 (489) ||||+.++++++++++.++++...|++|++|.++|||++|||++||+|++|||++|.+..++...+...++|+++.++++ T Consensus 1 p~lGi~~~~l~~~~~~~~~~~~~~Gv~V~~V~~~spA~~aGl~~GDiI~~ing~~i~~~~~~~~~l~~~~~g~~v~~~v~ 80 (90) T cd00987 1 PWLGVTVQDLTPDLAEELGLKDTKGVLVASVDPGSPAAKAGLKPGDVILAVNGKPVKSVADLRRALAELKPGDKVTLTVL 80 (90) T ss_pred CCCCEEEEECCHHHHHHCCCCCCCEEEEEEECCCCHHHHCCCCCCCEEEEECCEECCCHHHHHHHHHHCCCCCEEEEEEE T ss_conf 94117987699999998498999779999989999599829999989999999993899999999982699987999999 Q ss_pred CCCCEEEECC Q ss_conf 4781665125 Q gi|254780700|r 363 KEGSKHSVAV 372 (489) Q Consensus 363 R~g~~~~~~V 372 (489) |+|+.++++| T Consensus 81 R~g~~~~~~v 90 (90) T cd00987 81 RGGKELTVTV 90 (90) T ss_pred ECCEEEEEEC T ss_conf 9999999789 No 12 >PRK10779 zinc metallopeptidase; Provisional Probab=99.67 E-value=1.4e-16 Score=124.39 Aligned_cols=156 Identities=16% Similarity=0.143 Sum_probs=90.1 Q ss_pred EEECCCCCCCCCCCCCHHHHHHHHHCCCCCCCCCCEEEEECCCCCCCCEEEEECCCCCEEEECCCCCCCCCCCCCCCCCC Q ss_conf 32011111211346711678887524314787431012220356675201012047816651255655876310000124 Q gi|254780700|r 309 LITAVVKESPADKAGMKVGDVICMLDGRIIKSHQDFVWQIASRSPKEQVKISLCKEGSKHSVAVVLGSSPTAKNDMHLEV 388 (489) Q Consensus 309 lV~~V~~~sPA~~AGLk~GDvI~~ing~~I~~~~~l~~~i~~~~~G~~v~l~v~R~g~~~~~~V~l~~~p~~~~~~~~~~ 388 (489) .|.+|.++|||++|||++||.|+++||+++.++.+....+...--.+.+.+.+.|.++.......+.-...... . . T Consensus 129 ~i~~v~~~s~a~~agl~~GD~i~~idg~~~~~~~~~~~~l~~~~g~~~~~i~v~~~~~~~~~~~~~~~~~~~~~--~--~ 204 (449) T PRK10779 129 VVGEIAPNSIAAQAQIAPGTELKAVDGIETPDWDAVRLQLVSKIGDEQTTVTVAPFGSDQRRDKTLDLRHWAFE--P--D 204 (449) T ss_pred EECCCCCCCHHHHCCCCCCCEEEEECCEECCCHHHHHHHHHHHCCCCCEEEEEECCCCCCEEEEEECCHHCCCC--C--C T ss_conf 00431468888873888887899989998576898899998850577606999407864113332020110246--5--4 Q ss_pred CCCCCCCEEEEECCHHHCCEEEEEEEEECCCCHHHHCCCCCCCEEEEECCEECCCHHHHHHHHHHHHHCCCCEEEEEEEE Q ss_conf 65452546987289657152007999606889789829998889998899993899999999998862599569999971 Q gi|254780700|r 389 GDKELLGMVLQDINDGNKKLVRIVALNPNREREVEAKGIQKGMTIVSVNTHEVSCIKDVERLIGKAKEKKRDSVLLQIKY 468 (489) Q Consensus 389 ~~~~~lGl~v~~l~~~~~~~~gi~vv~v~~~s~Aa~~GL~~GDiIl~VNg~~V~s~~dl~~iL~~~k~~~~~~VLL~V~r 468 (489) .......+.+.+..+. ..-++..+.++|||+++||++||.|++|||++|++++|+...+++. ..+.+.+.+.| T Consensus 205 ~~~~~~~lgi~~~~p~----~~~vi~~V~~~spA~~AGL~~GD~I~~Ing~~i~s~~~l~~~i~~~---~~~~i~l~v~R 277 (449) T PRK10779 205 KQDPVSSLGIRPRGPQ----IEPVLEEVQPNSAASKAGLQAGDRIVKVDGQPLTQWVTFVMLVRDN---PGKPLALEIER 277 (449) T ss_pred CCCCHHHCCCCCCCCC----CCCEEEEECCCCHHHHCCCCCCCEEEEECCEECCCHHHHHHHHHHC---CCCEEEEEEEE T ss_conf 3451122153336777----7743542079998997489888779999998716599999999868---99869999997 Q ss_pred CCCCCCC Q ss_conf 7764334 Q gi|254780700|r 469 DPDMQSG 475 (489) Q Consensus 469 ~~~~~~~ 475 (489) ++..... T Consensus 278 ~g~~~~~ 284 (449) T PRK10779 278 QGSPLSL 284 (449) T ss_pred CCCEEEE T ss_conf 8958999 No 13 >cd00991 PDZ_archaeal_metalloprotease PDZ domain of archaeal zinc metalloprotases, presumably membrane-associated or integral membrane proteases, which may be involved in signalling and regulatory mechanisms. May be responsible for substrate recognition and/or binding, as most PDZ domains bind C-terminal polypeptides, and binding to internal (non-C-terminal) polypeptides and even to lipids has been demonstrated. In this subfamily of protease-associated PDZ domains a C-terminal beta-strand forms the peptide-binding groove base, a circular permutation with respect to PDZ domains found in Eumetazoan signaling proteins. Probab=99.56 E-value=3.1e-16 Score=122.14 Aligned_cols=75 Identities=25% Similarity=0.423 Sum_probs=70.2 Q ss_pred HCCCCCCCCEEEECCCCCCCCCCCCCHHHHHHHHHCCCCCCCCCCEEEEECCCCCCCCEEEEECCCCCEEEECCC Q ss_conf 417644441132011111211346711678887524314787431012220356675201012047816651255 Q gi|254780700|r 299 PLGLRGTKGSLITAVVKESPADKAGMKVGDVICMLDGRIIKSHQDFVWQIASRSPKEQVKISLCKEGSKHSVAVV 373 (489) Q Consensus 299 ~lgl~~~~GvlV~~V~~~sPA~~AGLk~GDvI~~ing~~I~~~~~l~~~i~~~~~G~~v~l~v~R~g~~~~~~V~ 373 (489) .|+++...|++|..|.++|||++|||++||+|++|||++|.+..||.+.+...+||++++++++|+|++++...+ T Consensus 3 ~l~~e~~~Gv~V~~V~~gsPA~~AGL~~GDVI~~Ing~~I~~~~d~~~~l~~~~pG~~v~v~v~R~g~~lT~~~~ 77 (79) T cd00991 3 VLSAEAVAGVVIVGVIVGSPAENAVLHTGDVIYSINGTPITTLEDFMEALKPTKPGEVITVTVLPSTTKLTNVST 77 (79) T ss_pred CCCCCCCCCEEEEEECCCCHHHHCCCCCCCEEEEECCEECCCHHHHHHHHHCCCCCCEEEEEEEECCEEEEEEEE T ss_conf 567445597799996789969986999888999989999879999999996189999899999989999777771 No 14 >cd00986 PDZ_LON_protease PDZ domain of ATP-dependent LON serine proteases. Most PDZ domains bind C-terminal polypeptides, though binding to internal (non-C-terminal) polypeptides and even to lipids has been demonstrated. In this bacterial subfamily of protease-associated PDZ domains a C-terminal beta-strand is thought to form the peptide-binding groove base, a circular permutation with respect to PDZ domains found in Eumetazoan signaling proteins. Probab=99.56 E-value=3.5e-16 Score=121.76 Aligned_cols=73 Identities=27% Similarity=0.447 Sum_probs=69.9 Q ss_pred CCCEEEECCCCCCCCCCCCCHHHHHHHHHCCCCCCCCCCEEEEECCCCCCCCEEEEECCCCCEEEECCCCCCCC Q ss_conf 44113201111121134671167888752431478743101222035667520101204781665125565587 Q gi|254780700|r 305 TKGSLITAVVKESPADKAGMKVGDVICMLDGRIIKSHQDFVWQIASRSPKEQVKISLCKEGSKHSVAVVLGSSP 378 (489) Q Consensus 305 ~~GvlV~~V~~~sPA~~AGLk~GDvI~~ing~~I~~~~~l~~~i~~~~~G~~v~l~v~R~g~~~~~~V~l~~~p 378 (489) ..|++|.+|.++|||+.+ |++||+|++|||++|.+..+|...+..+++||+|+++++|+|++++++++|+++| T Consensus 7 ~~Gv~V~~V~~gsPA~~~-Lk~GDvI~~vdGk~v~~~~~l~~~i~~~~~Gd~V~l~v~R~gk~~~~~vtL~~~P 79 (79) T cd00986 7 YHGVYVTSVVEGMPAAGK-LKAGDHIIAVDGKPFKEAEELIDYIQSKKEGDTVKLKVKREEKELPEDLILKTFP 79 (79) T ss_pred CCEEEEEEECCCCCHHHC-CCCCCEEEEECCEECCCHHHHHHHHHCCCCCCEEEEEEEECCEEEEEEEEEECCC T ss_conf 781899996799973770-7789999999998957999999999659999989999999999999999972489 No 15 >cd00990 PDZ_glycyl_aminopeptidase PDZ domain associated with archaeal and bacterial M61 glycyl-aminopeptidases. May be responsible for substrate recognition and/or binding, as most PDZ domains bind C-terminal polypeptides, and binding to internal (non-C-terminal) polypeptides and even to lipids has been demonstrated. In this subfamily of protease-associated PDZ domains a C-terminal beta-strand is presumed to form the peptide-binding groove base, a circular permutation with respect to PDZ domains found in Eumetazoan signaling proteins. Probab=99.45 E-value=9.1e-15 Score=112.77 Aligned_cols=79 Identities=23% Similarity=0.395 Sum_probs=69.2 Q ss_pred CCCCCCCCCCHHHHHHHCCCCCCCCEEEECCCCCCCCCCCCCHHHHHHHHHCCCCCCCCCCEEEEECCCCCCCCEEEEEC Q ss_conf 43320003421667644176444411320111112113467116788875243147874310122203566752010120 Q gi|254780700|r 283 GWFGIMTQNLTQELAIPLGLRGTKGSLITAVVKESPADKAGMKVGDVICMLDGRIIKSHQDFVWQIASRSPKEQVKISLC 362 (489) Q Consensus 283 g~lGv~~~~v~~~la~~lgl~~~~GvlV~~V~~~sPA~~AGLk~GDvI~~ing~~I~~~~~l~~~i~~~~~G~~v~l~v~ 362 (489) ||||+.++. ...+++|++|.++|||++|||++||+|++|||..++++.++ +...++|++|+|++. T Consensus 1 P~lGi~~~~------------~~g~~~V~~V~~~sPA~~AGl~~GD~IvaidG~~v~~~~~~---~~~~~~G~~v~l~v~ 65 (80) T cd00990 1 PYLGLTLDK------------EEGLGKVTFVRDDSPADKAGLVAGDELVAVNGWRVDALQDR---LKEYQAGDPVELTVF 65 (80) T ss_pred CCCCEEEEC------------CCCCEEEEEECCCCHHHHCCCCCCCEEEEECCEEEHHHHHH---HHHCCCCCEEEEEEE T ss_conf 956669865------------69959999988899699859998999999999992378999---973699898999999 Q ss_pred CCCCEEEECCCCCC Q ss_conf 47816651255655 Q gi|254780700|r 363 KEGSKHSVAVVLGS 376 (489) Q Consensus 363 R~g~~~~~~V~l~~ 376 (489) |+|+.+++++||++ T Consensus 66 R~g~l~~~~vtL~~ 79 (80) T cd00990 66 RDDRLIEVPLTLAD 79 (80) T ss_pred ECCEEEEEEEEECC T ss_conf 99999998999359 No 16 >TIGR00054 TIGR00054 membrane-associated zinc metalloprotease, putative; InterPro: IPR004387 Metalloproteases are the most diverse of the four main types of protease, with more than 50 families identified to date. In these enzymes, a divalent cation, usually zinc, activates the water molecule. The metal ion is held in place by amino acid ligands, usually three in number. The known metal ligands are His, Glu, Asp or Lys and at least one other residue is required for catalysis, which may play an electrophillic role. Of the known metalloproteases, around half contain an HEXXH motif, which has been shown in crystallographic studies to form part of the metal-binding site . The HEXXH motif is relatively common, but can be more stringently defined for metalloproteases as 'abXHEbbHbc', where 'a' is most often valine or threonine and forms part of the S1' subsite in thermolysin and neprilysin, 'b' is an uncharged residue, and 'c' a hydrophobic residue. Proline is never found in this site, possibly because it would break the helical structure adopted by this motif in metalloproteases . Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Each clan is identified with two letters, the first representing the catalytic type of the families included in the clan (with the letter 'P' being used for a clan containing families of more than one of the catalytic types serine, threonine and cysteine). Some families cannot yet be assigned to clans, and when a formal assignment is required, such a family is described as belonging to clan A-, C-, M-, S-, T- or U-, according to the catalytic type. Some clans are divided into subclans because there is evidence of a very ancient divergence within the clan, for example MA(E), the gluzincins, and MA(M), the metzincins. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. The serine, threonine and cysteine peptidases utilise the amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule. This family contains putative zinc metallopeptidases belonging to MEROPS peptidase family M50 (S2P protease family, clan MM). The N-terminal region of contains a perfectly conserved motif HEXGH, where the Glu is the active site and the His residues coordinate the metal cation. The family of bacterial and plant proteins also includes a region that hits the PDZ domain (IPR001478 from INTERPRO), found in a number of proteins targeted to the membrane by binding to a peptide ligand . The family includes EcfE, which is a homolog of human site-2 protease (S2P), a membrane-bound zinc metalloprotease involved in regulated intramembrane proteolysis. In Escherichia coli EcfE activates the sigma(E) pathway of stress response through a site-2 cleavage of anti-sigma(E), RseA.; GO: 0004222 metalloendopeptidase activity, 0006508 proteolysis, 0016021 integral to membrane. Probab=99.41 E-value=1.8e-13 Score=104.45 Aligned_cols=157 Identities=19% Similarity=0.251 Sum_probs=112.2 Q ss_pred CCCEEEECCCCCCCCCCCCCHHHHHHHHHCCCCCCCCCCEEEEECCCCCCCCEEEEECC----CCCEEEECCCCCCCCCC Q ss_conf 44113201111121134671167888752431478743101222035667520101204----78166512556558763 Q gi|254780700|r 305 TKGSLITAVVKESPADKAGMKVGDVICMLDGRIIKSHQDFVWQIASRSPKEQVKISLCK----EGSKHSVAVVLGSSPTA 380 (489) Q Consensus 305 ~~GvlV~~V~~~sPA~~AGLk~GDvI~~ing~~I~~~~~l~~~i~~~~~G~~v~l~v~R----~g~~~~~~V~l~~~p~~ 380 (489) ..+-.|..+.++|.|.+|+|.+||.|+++|++.+.++.+++..+.....|+. .++|.+ ...+++.++.|.+.--. T Consensus 134 ~~~~vi~~~~~~S~a~~a~~~~Gd~il~~~~~~~~~f~~~~~~~~~~~~g~~-~~~I~~~PF~S~~e~~~~L~L~~~~~~ 212 (463) T TIGR00054 134 EVGPVIEELDKNSIALEAGIEPGDEILSVNGKKIPGFKDVRKQIADIVAGEP-MVEILAAPFNSDIEREVKLDLRNWTFE 212 (463) T ss_pred CCCCCCCCCCHHHHHHHHCCCCCCEEEEECCCCCCCCHHHHHHHHHHHCCCC-CCEEEECCCCCCCHHHCCCCCEEEEEE T ss_conf 1366456554457998711689847874077667880889999999751785-415776577754112000033123862 Q ss_pred CCCCCCCCCCCCCCCEEEEECCHHHCCEEEEEEEEECCCCHHHHCCCCCCCEEEEECCEECCCHHHHHHHHHHHHHCCCC Q ss_conf 10000124654525469872896571520079996068897898299988899988999938999999999988625995 Q gi|254780700|r 381 KNDMHLEVGDKELLGMVLQDINDGNKKLVRIVALNPNREREVEAKGIQKGMTIVSVNTHEVSCIKDVERLIGKAKEKKRD 460 (489) Q Consensus 381 ~~~~~~~~~~~~~lGl~v~~l~~~~~~~~gi~vv~v~~~s~Aa~~GL~~GDiIl~VNg~~V~s~~dl~~iL~~~k~~~~~ 460 (489) ... ...+.-+-+.+..| ....+...+.++|+|++|||+.||.|++|||.+.++|.|+...+++ ...+ T Consensus 213 ~~~------~~~~~~lgl~~~~P----~ie~vl~~~~~N~~A~~AGLk~GD~I~~i~g~~l~~w~d~v~~v~~---np~~ 279 (463) T TIGR00054 213 VEK------EDAVEQLGLKPRGP----KIEPVLSDVTPNSPAEKAGLKEGDKIISIDGEKLKSWRDFVSLVKE---NPGK 279 (463) T ss_pred CCC------CCHHHHCCCCCCCC----CCEEEECCCCCCCHHHHCCCCCCCEEEEECCCCCCCHHHHHHHHHH---CCCC T ss_conf 112------55255414424787----2012331267885377534656888985568123442458999986---8995 Q ss_pred EEEEEEEECCCCCCC Q ss_conf 699999717764334 Q gi|254780700|r 461 SVLLQIKYDPDMQSG 475 (489) Q Consensus 461 ~VLL~V~r~~~~~~~ 475 (489) .+-+.|+|++..... T Consensus 280 ~~~i~v~R~G~~l~~ 294 (463) T TIGR00054 280 SLEIKVERNGETLSI 294 (463) T ss_pred EEEEEEEECCCEEEE T ss_conf 699999727814634 No 17 >TIGR02037 degP_htrA_DO protease Do; InterPro: IPR011782 Proteolytic enzymes that exploit serine in their catalytic activity are ubiquitous, being found in viruses, bacteria and eukaryotes . They include a wide range of peptidase activity, including exopeptidase, endopeptidase, oligopeptidase and omega-peptidase activity. Over 20 families (denoted S1 - S66) of serine protease have been identified, these being grouped into clans on the basis of structural similarity and other functional evidence . Structures are known for members of the clans and the structures indicate that some appear to be totally unrelated, suggesting different evolutionary origins for the serine peptidases . Not withstanding their different evolutionary origins, there are similarities in the reaction mechanisms of several peptidases. Chymotrypsin, subtilisin and carboxypeptidase C have a catalytic triad of serine, aspartate and histidine in common: serine acts as a nucleophile, aspartate as an electrophile, and histidine as a base . The geometric orientations of the catalytic residues are similar between families, despite different protein folds . The linear arrangements of the catalytic residues commonly reflect clan relationships. For example the catalytic triad in the chymotrypsin clan (PA) is ordered HDS, but is ordered DHS in the subtilisin clan (SB) and SDH in the carboxypeptidase clan (SC) , . Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Each clan is identified with two letters, the first representing the catalytic type of the families included in the clan (with the letter 'P' being used for a clan containing families of more than one of the catalytic types serine, threonine and cysteine). Some families cannot yet be assigned to clans, and when a formal assignment is required, such a family is described as belonging to clan A-, C-, M-, S-, T- or U-, according to the catalytic type. Some clans are divided into subclans because there is evidence of a very ancient divergence within the clan, for example MA(E), the gluzincins, and MA(M), the metzincins. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. The serine, threonine and cysteine peptidases utilise the amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule. This family consists serine peptidases belonging to MEROPS peptidase family S1, subfamily S1C (protease Do, clan PA(S)). They are variously designated DegP, DegQ, heat shock protein HtrA, MucD and protease DO. The ortholog in Pseudomonas aeruginosa is designated MucD and is found in an operon that controls mucoid phenotype. This family also includes the DegQ (HhoA) paralog in Escherichia coli which can rescue a DegP mutant, but not the smaller DegS paralog, which cannot. Members of this family are located in the periplasm and have separable functions as both protease and chaperone. Members have a trypsin domain and two copies of a PDZ domain. This protein protects bacteria from thermal and other stresses and may be important for the survival of bacterial pathogens . The chaperone function is dominant at low temperatures, whereas the proteolytic activity is turned on at elevated temperatures .; GO: 0004252 serine-type endopeptidase activity, 0006508 proteolysis. Probab=99.39 E-value=5.8e-14 Score=107.66 Aligned_cols=299 Identities=21% Similarity=0.283 Sum_probs=167.8 Q ss_pred CCCCCCCCCCCCCCCCHHHHHH-HHCCCCC----CCCCC-----CCCC----CCCCCCCEEE--EEECCCCEEEECHHCC Q ss_conf 5555445567888770025445-5301357----76666-----7544----5552234027--8975996298510104 Q gi|254780700|r 66 QMFNAYGFGNLPEDHPLKNYFR-KDFHKFF----SGEEP-----ILSD----TVERLMFGSG--FFITDDGYILTSNHIV 129 (489) Q Consensus 66 ~~~~~~~~~~~~~~~~~~~~~~-~~~~~~~----~~~~~-----~~~~----~~~~~~~GsG--~ii~~~G~ilTn~hvv 129 (489) ++....++...+.+.+|+.||. +.+..+. ..+.+ .... ..--.+.-.| .|+..+ ||+.++.-| T Consensus 39 ~~~~g~~fddf~Fd~~F~~FFg~~~~p~~~~~~~~~~~~~~g~~~~~~~~LGSGvIi~~d~Gk~YilTNn-HVv~gA~~I 117 (484) T TIGR02037 39 ENPGGSPFDDFEFDEFFDQFFGDDEMPNQPGGREFPQPEFVGERERKVRGLGSGVIISADKGKFYILTNN-HVVDGADEI 117 (484) T ss_pred CCCCCCCCCCCCCCCCHHHHCCCCCCCCCCCCCCCCCCCCCCCCCEEEEECCCCEEEECCCCEEEEEECC-EEECCCCEE T ss_conf 7777785556668741056648887888888888885200376403776414418984789869998754-363685379 Q ss_pred C----CCCEEEEEC--CCCEEEEECCCCCCCC-CCEEEEEEECCCCCC----CCCCCCCCCCCCCCEEE-----EECCC- Q ss_conf 7----871437962--8980674011123344-432899960676676----55655673111241467-----52366- Q gi|254780700|r 130 E----DGASFSVIL--SDDTELPAKLVGTDAL-FDLAVLKVQSDRKFI----PVEFEDANNIRVGEAVF-----TIGNP- 192 (489) Q Consensus 130 ~----~a~~i~V~~--~dg~~~~a~vvg~D~~-~DlAvlki~~~~~~~----~~~lg~s~~~~~G~~v~-----aiG~P- 192 (489) . +..++.-++ .|- +.+-=|+-.|.. .+|=.|++-+..+|. .+.+|+.-- .+|++|. |.|-- T Consensus 118 ~V~L~DgrefkAklvG~D~-~~D~AvlKi~~~D~~Lp~~~~GDSD~LrVGd~V~AIGNPFG-Nlg~TVT~GIVSAlgRs~ 195 (484) T TIGR02037 118 TVTLSDGREFKAKLVGKDP-RTDIAVLKIEAKDKKLPVVKLGDSDKLRVGDWVLAIGNPFG-NLGQTVTSGIVSALGRSG 195 (484) T ss_pred EEEECCCCEEEEEEECCCC-CEEEEEEEEECCCCCCCEEEECCCCCCEECCEEEEEECCCC-CCCCEEEEEEEEEECCCC T ss_conf 9994599485568866677-21389999827889745677348555224349999327742-458425788998321688 Q ss_pred CCCC------CC---C-------------------CCCCCCCCC----------------------------CCCCCCCC Q ss_conf 5531------11---1-------------------258744311----------------------------22334434 Q gi|254780700|r 193 FRLR------GT---V-------------------SAGIVSALD----------------------------RDIPDRPG 216 (489) Q Consensus 193 ~g~~------~t---v-------------------t~GiiSa~~----------------------------R~~~~~~~ 216 (489) .+.+ || + -+=|+|.-+ |-+. + T Consensus 196 ~~~~~y~~FIQTDAAINpGNSGGPLvN~~GEvIGINTaI~S~sGG~~GIGFAIP~n~a~~v~~ql~~~G~V~RG~L---G 272 (484) T TIGR02037 196 LGIGDYENFIQTDAAINPGNSGGPLVNLRGEVIGINTAIYSPSGGNVGIGFAIPSNMAKNVVDQLIEGGKVQRGWL---G 272 (484) T ss_pred CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCEEEEEEEEEECCCCEECCHHHHHHHHHHHHHHHHHHCCEEEECEE---E T ss_conf 8877747650224233747887753567853888888876178881010102126889999999983892870215---1 Q ss_pred CE-EEE---E---------------EEEECCC--------CCCEEEECCCEEEEEECCCCCCCCCCCCCCCCCCCCCCC- Q ss_conf 20-233---2---------------3320134--------770354034303555123445532222223211233211- Q gi|254780700|r 217 TF-TQI---D---------------APINQGN--------SGGPCFNALGHVIGVNAMIVTSGQFHMGVGLIIPLSIIK- 268 (489) Q Consensus 217 ~~-iqt---D---------------a~InpGn--------SGGpl~n~~G~viGint~i~~~~g~~~GigfaIP~~~~~- 268 (489) -- ||. | |.|.||. .|=-++-.+|+-|---...-..= +.+..|=.++...+| T Consensus 273 V~~~q~~~~d~A~~lGl~~~~GALV~~V~~gSPA~kAGlk~GDvI~~~nGk~i~~~~~L~~~i-~~~~pG~~~~L~i~R~ 351 (484) T TIGR02037 273 VTQIQEITSDLAKSLGLEKQEGALVAQVLPGSPAEKAGLKAGDVILSVNGKKIKSFADLRRAI-GTLKPGKKVTLTILRK 351 (484) T ss_pred EEECCCCCHHHHHHHCCCCCCCEEEEEECCCCCHHCCCCCCCCEEEEECCEEECCHHHHHHHH-HCCCCCCEEEEEEEEC T ss_conf 010775797999970888536558885448970100675326689985886405879998987-4058987799999978 Q ss_pred -------CCCCCC--------C-----CCCC-----CCCCCCCCCCCCCHHHHHHHCCC-CC-CCCEEEECCCCCCCCCC Q ss_conf -------001000--------0-----2333-----33343320003421667644176-44-44113201111121134 Q gi|254780700|r 269 -------KAIPSL--------I-----SKGR-----VDHGWFGIMTQNLTQELAIPLGL-RG-TKGSLITAVVKESPADK 321 (489) Q Consensus 269 -------~i~~~l--------~-----~~g~-----v~rg~lGv~~~~v~~~la~~lgl-~~-~~GvlV~~V~~~sPA~~ 321 (489) =.+.+| . +... -..+|+|+.+.+|+++.++.|.. .. ..|++|+.|.++|||++ T Consensus 352 Gk~~~~~V~l~~Ld~~~a~~~~~~~~~~~~~~~~~~~~~~~~Gl~v~~L~~~~~~~l~~~~~~~~Gv~V~~v~~~s~Aa~ 431 (484) T TIGR02037 352 GKEKTITVTLGELDEKTASAAPEEKASSERSTEPGVGRLGFPGLSVANLTPEIAKKLLNLAGVSKGVVVTKVVSGSPAAR 431 (484) T ss_pred CEEEEEEEEEECCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCEEEECCCCHHHHHHHCCCCCCCCCEEEEEECCCCHHHH T ss_conf 86888999981078500256645555554322653455322501623799899998713226777489997338888997 Q ss_pred CCCHHHHHHHHHCCCCCCCCCCEEEEECCCCCCCC--EEEEECCCCCEEEEC Q ss_conf 67116788875243147874310122203566752--010120478166512 Q gi|254780700|r 322 AGMKVGDVICMLDGRIIKSHQDFVWQIASRSPKEQ--VKISLCKEGSKHSVA 371 (489) Q Consensus 322 AGLk~GDvI~~ing~~I~~~~~l~~~i~~~~~G~~--v~l~v~R~g~~~~~~ 371 (489) +|||+||||++||+++|++..||...|....-++. +.|.|+|++..+-+. T Consensus 432 ~Gl~~GDvI~~vN~~~V~s~~e~~~~l~~~~k~~~k~~~L~i~Rg~~~~~~~ 483 (484) T TIGR02037 432 AGLQPGDVILSVNQQPVSSVAELNKVLARAKKGGRKKVALLIERGGATIFVT 483 (484) T ss_pred CCCCCCCEEEEECCCCCCCHHHHHHHHHHHCCCCCEEEEEEEEECCEEEEEE T ss_conf 1787661899508801467899999999732887047999999878068976 No 18 >KOG3209 consensus Probab=99.27 E-value=3.9e-12 Score=95.97 Aligned_cols=154 Identities=19% Similarity=0.316 Sum_probs=92.9 Q ss_pred EECCCCCCCCCCCC-CHHHHHHHHHCCCCCCCCCCEEEEECCCCCCCCEEEEECCCCCEEEEC----------CCC---- Q ss_conf 20111112113467-116788875243147874310122203566752010120478166512----------556---- Q gi|254780700|r 310 ITAVVKESPADKAG-MKVGDVICMLDGRIIKSHQDFVWQIASRSPKEQVKISLCKEGSKHSVA----------VVL---- 374 (489) Q Consensus 310 V~~V~~~sPA~~AG-Lk~GDvI~~ing~~I~~~~~l~~~i~~~~~G~~v~l~v~R~g~~~~~~----------V~l---- 374 (489) |.++.++|||++.| |++||.|++|||+.|.+.++-.-.-.-+..|-+|+|+|.-..+.-... .+. T Consensus 782 iGrIieGSPAdRCgkLkVGDrilAVNG~sI~~lsHadiv~LIKdaGlsVtLtIip~ee~~~~~~~~sa~~~s~~t~~~~~ 861 (984) T KOG3209 782 IGRIIEGSPADRCGKLKVGDRILAVNGQSILNLSHADIVSLIKDAGLSVTLTIIPPEEAGPPTSMTSAEKQSPFTQNGPY 861 (984) T ss_pred CCCCCCCCHHHHHCCCCCCCEEEEECCEEEECCCCHHHHHHHHHCCCEEEEEECCHHCCCCCCCCCCHHHCCCCCCCCCH T ss_conf 43125698167505543265688754703303672568888873685589997480104898777544115841114887 Q ss_pred -------CCCCCCCCCCCCC-------CC-----CC-----------CCCCEEEEECCHHHCCEEEEEEEEECCCCHHHH Q ss_conf -------5587631000012-------46-----54-----------525469872896571520079996068897898 Q gi|254780700|r 375 -------GSSPTAKNDMHLE-------VG-----DK-----------ELLGMVLQDINDGNKKLVRIVALNPNREREVEA 424 (489) Q Consensus 375 -------~~~p~~~~~~~~~-------~~-----~~-----------~~lGl~v~~l~~~~~~~~gi~vv~v~~~s~Aa~ 424 (489) +.++......... .. +. .-+|++++ ....-...+.++....++||.+ T Consensus 862 ~q~~glp~~~~s~~~~~pqpdt~~~~~~~~r~~qn~~~~~VelErG~kGFGFSiR---GGreynM~LfVLRlAeDGPA~r 938 (984) T KOG3209 862 EQQYGLPGPRPSVYEEHPQPDTFQGLSINDRMSQNGDLYTVELERGAKGFGFSIR---GGREYNMDLFVLRLAEDGPAIR 938 (984) T ss_pred HHCCCCCCCCCCCCCCCCCCCCCCCEECCCCCCCCCCEEEEEEECCCCCCCEEEE---CCCCCCCCEEEEEECCCCCCCC T ss_conf 6706999997321233799752123102653435688268886116665514750---6643466448999436787212 Q ss_pred CC-CCCCCEEEEECCEECCCHHHHHHHHHHHHHCCCCEEEEEEEE Q ss_conf 29-998889998899993899999999998862599569999971 Q gi|254780700|r 425 KG-IQKGMTIVSVNTHEVSCIKDVERLIGKAKEKKRDSVLLQIKY 468 (489) Q Consensus 425 ~G-L~~GDiIl~VNg~~V~s~~dl~~iL~~~k~~~~~~VLL~V~r 468 (489) .| ++.||-|++|||++.+++.. .++++-+|.+++. ++|..+| T Consensus 939 dGrm~VGDqi~eINGesTkgmtH-~rAIelIk~gg~~-vll~Lr~ 981 (984) T KOG3209 939 DGRMRVGDQITEINGESTKGMTH-DRAIELIKQGGRR-VLLLLRR 981 (984) T ss_pred CCCEEECCEEEEECCCCCCCCCH-HHHHHHHHHCCEE-EEEEECC T ss_conf 68314243678865844688727-8899998718838-9999626 No 19 >cd00989 PDZ_metalloprotease PDZ domain of bacterial and plant zinc metalloprotases, presumably membrane-associated or integral membrane proteases, which may be involved in signalling and regulatory mechanisms. May be responsible for substrate recognition and/or binding, as most PDZ domains bind C-terminal polypeptides, and binding to internal (non-C-terminal) polypeptides and even to lipids has been demonstrated. In this subfamily of protease-associated PDZ domains a C-terminal beta-strand forms the peptide-binding groove base, a circular permutation with respect to PDZ domains found in Eumetazoan signaling proteins. Probab=99.25 E-value=4.8e-13 Score=101.79 Aligned_cols=66 Identities=32% Similarity=0.544 Sum_probs=61.1 Q ss_pred CEEEECCCCCCCCCCCCCHHHHHHHHHCCCCCCCCCCEEEEECCCCCCCCEEEEECCCCCEEEECCC Q ss_conf 1132011111211346711678887524314787431012220356675201012047816651255 Q gi|254780700|r 307 GSLITAVVKESPADKAGMKVGDVICMLDGRIIKSHQDFVWQIASRSPKEQVKISLCKEGSKHSVAVV 373 (489) Q Consensus 307 GvlV~~V~~~sPA~~AGLk~GDvI~~ing~~I~~~~~l~~~i~~~~~G~~v~l~v~R~g~~~~~~V~ 373 (489) +.+|.+|.++|||++|||++||+|++|||++|.++.++...+.. .+|++++++++|+|+.++++++ T Consensus 13 ~~vV~~V~~~spA~~AGl~~GD~I~~ing~~v~~~~~~~~~i~~-~~~~~i~l~v~R~g~~~~~~vt 78 (79) T cd00989 13 EPVIGEVVPGSPAAKAGLKAGDRILAINGQKIKSWEDLVDAVQE-NPGKPLTLTVERNGETITLTLT 78 (79) T ss_pred CCEEEEECCCCHHHHCCCCCCCEEEEECCEECCCHHHHHHHHHH-CCCCEEEEEEEECCEEEEEEEE T ss_conf 99999989999899859999999999999995899999999985-8998899999999999899987 No 20 >pfam00089 Trypsin Trypsin. Probab=99.21 E-value=8.1e-11 Score=87.58 Aligned_cols=139 Identities=27% Similarity=0.381 Sum_probs=83.9 Q ss_pred CCEEEEEECCCCEEEECHHCCCCCCEEEEECCC-------CE--EEEE-CCCC---CCC--CCCEEEEEEECCC----CC Q ss_conf 340278975996298510104787143796289-------80--6740-1112---334--4432899960676----67 Q gi|254780700|r 109 MFGSGFFITDDGYILTSNHIVEDGASFSVILSD-------DT--ELPA-KLVG---TDA--LFDLAVLKVQSDR----KF 169 (489) Q Consensus 109 ~~GsG~ii~~~G~ilTn~hvv~~a~~i~V~~~d-------g~--~~~a-~vvg---~D~--~~DlAvlki~~~~----~~ 169 (489) -..+|++|+++ ||||++|.+.+....+|.+.. +. .++. +++- +++ ..||||||++.+- .. T Consensus 25 ~~C~GtLIs~~-~VLTaAhCv~~~~~~~v~~g~~~~~~~~~~~~~~~v~~i~~hp~~~~~~~~DiAll~L~~~v~~~~~v 103 (218) T pfam00089 25 HFCGGSLISEN-WVLTAAHCVSNAKSVRVVLGAHNIVLREGGEQKFDVEKIIVHPNYNPDTLNDIALLKLKSPVTLGDTV 103 (218) T ss_pred EEEEEEEEECC-EEEECHHHCCCCCCCEEEEEECCCCCCCCCCEEEEEEEEEECCCCCCCCCCCEEEECCCCCEECCCCE T ss_conf 89899997199-99979455689987389992033455799838999999998887678876545723034624835745 Q ss_pred CCCCCCCC-CCCCCCCEEEEECCCC----CCCC---CCCCCCCCCCC--CCCCC-CCCCEEEEEE---EEECCCCCCEEE Q ss_conf 65565567-3111241467523665----5311---11258744311--22334-4342023323---320134770354 Q gi|254780700|r 170 IPVEFEDA-NNIRVGEAVFTIGNPF----RLRG---TVSAGIVSALD--RDIPD-RPGTFTQIDA---PINQGNSGGPCF 235 (489) Q Consensus 170 ~~~~lg~s-~~~~~G~~v~aiG~P~----g~~~---tvt~GiiSa~~--R~~~~-~~~~~iqtDa---~InpGnSGGpl~ 235 (489) .++.|.++ ....+|+.+.+.|... +... +++.-+++... +.... -..+++.+++ .+.+|+|||||+ T Consensus 104 ~pi~l~~~~~~~~~g~~~~~~Gwg~~~~~~~~~~l~~~~~~i~~~~~C~~~~~~~~~~~~~Ca~~~~~~~c~GDsGgPl~ 183 (218) T pfam00089 104 RPICLPTASSDLPVGTTCTVSGWGNTKTLGLPDTLQEVTVPVVSRETCRSAYGGTVTDTMICAGAGGKDACQGDSGGPLV 183 (218) T ss_pred EEEECCCCCCCCCCCCEEEEEEECCCCCCCCCCEEEEEEEEEECHHHHHHHCCCCCCCCCEEECCCCCCCCCCCCCCEEE T ss_conf 67774766566569989999970766889944066676777708999864247976866587357998788884488149 Q ss_pred ECCCEEEEEECCC Q ss_conf 0343035551234 Q gi|254780700|r 236 NALGHVIGVNAMI 248 (489) Q Consensus 236 n~~G~viGint~i 248 (489) +.+++++||.|.- T Consensus 184 ~~~~~lvGi~S~g 196 (218) T pfam00089 184 CSDGELIGIVSWG 196 (218) T ss_pred ECCCEEEEEEEEC T ss_conf 2499899999858 No 21 >cd00987 PDZ_serine_protease PDZ domain of tryspin-like serine proteases, such as DegP/HtrA, which are oligomeric proteins involved in heat-shock response, chaperone function, and apoptosis. May be responsible for substrate recognition and/or binding, as most PDZ domains bind C-terminal polypeptides, though binding to internal (non-C-terminal) polypeptides and even to lipids has been demonstrated. In this subfamily of protease-associated PDZ domains a C-terminal beta-strand forms the peptide-binding groove base, a circular permutation with respect to PDZ domains found in Eumetazoan signaling proteins. Probab=99.13 E-value=5.1e-10 Score=82.49 Aligned_cols=80 Identities=20% Similarity=0.305 Sum_probs=68.2 Q ss_pred CCCCEEEEECCHHHCC------EEEEEEEEECCCCHHHHCCCCCCCEEEEECCEECCCHHHHHHHHHHHHHCCCCEEEEE Q ss_conf 5254698728965715------2007999606889789829998889998899993899999999998862599569999 Q gi|254780700|r 392 ELLGMVLQDINDGNKK------LVRIVALNPNREREVEAKGIQKGMTIVSVNTHEVSCIKDVERLIGKAKEKKRDSVLLQ 465 (489) Q Consensus 392 ~~lGl~v~~l~~~~~~------~~gi~vv~v~~~s~Aa~~GL~~GDiIl~VNg~~V~s~~dl~~iL~~~k~~~~~~VLL~ 465 (489) .++|+.+.++++..+. ..|+++..+.++|||+++||++||+|++||+++|.+..||.+++++.+. +..+.+. T Consensus 1 p~lGi~~~~l~~~~~~~~~~~~~~Gv~V~~V~~~spA~~aGl~~GDiI~~ing~~i~~~~~~~~~l~~~~~--g~~v~~~ 78 (90) T cd00987 1 PWLGVTVQDLTPDLAEELGLKDTKGVLVASVDPGSPAAKAGLKPGDVILAVNGKPVKSVADLRRALAELKP--GDKVTLT 78 (90) T ss_pred CCCCEEEEECCHHHHHHCCCCCCCEEEEEEECCCCHHHHCCCCCCCEEEEECCEECCCHHHHHHHHHHCCC--CCEEEEE T ss_conf 94117987699999998498999779999989999599829999989999999993899999999982699--9879999 Q ss_pred EEECCCCC Q ss_conf 97177643 Q gi|254780700|r 466 IKYDPDMQ 473 (489) Q Consensus 466 V~r~~~~~ 473 (489) +.|++... T Consensus 79 v~R~g~~~ 86 (90) T cd00987 79 VLRGGKEL 86 (90) T ss_pred EEECCEEE T ss_conf 99999999 No 22 >KOG3209 consensus Probab=99.13 E-value=7.9e-11 Score=87.65 Aligned_cols=160 Identities=18% Similarity=0.252 Sum_probs=96.1 Q ss_pred CCEEEECCCCCCCCCCCC-CHHHHHHHHHCCCCCCCCCC--EEEEECCCCCCCCEEEEECCCCCEEEEC--CCCCCCCCC Q ss_conf 411320111112113467-11678887524314787431--0122203566752010120478166512--556558763 Q gi|254780700|r 306 KGSLITAVVKESPADKAG-MKVGDVICMLDGRIIKSHQD--FVWQIASRSPKEQVKISLCKEGSKHSVA--VVLGSSPTA 380 (489) Q Consensus 306 ~GvlV~~V~~~sPA~~AG-Lk~GDvI~~ing~~I~~~~~--l~~~i~~~~~G~~v~l~v~R~g~~~~~~--V~l~~~p~~ 380 (489) +-++|..+.+.+.|++-| |+.||.|+.|||.+|...++ ...++....-...|.|+|.|.-..-.-. -.....+.. T Consensus 674 qpi~iG~Iv~lGaAe~DGRL~~gDElv~iDG~pV~GksH~~vv~Lm~~AArnghV~LtVRRkv~~~~~~rsp~~s~~~~~ 753 (984) T KOG3209 674 QPIYIGAIVPLGAAEEDGRLREGDELVCIDGIPVEGKSHSEVVDLMEAAARNGHVNLTVRRKVRTGPARRSPRNSAAPSG 753 (984) T ss_pred CEEEEEEEEECCCCCCCCCCCCCCEEEEECCEECCCCCHHHHHHHHHHHHHCCCEEEEEEEEEEECCCCCCCCCCCCCCC T ss_conf 80587456544553445764678727986580336733889999999997559468999620231554578655668889 Q ss_pred CCCCCCCCCCCCCCCEEEEECCHHHCCEEEEEEEEECCCCHHHHCC-CCCCCEEEEECCEECCCHHHHHHHHHHHHHCCC Q ss_conf 1000012465452546987289657152007999606889789829-998889998899993899999999998862599 Q gi|254780700|r 381 KNDMHLEVGDKELLGMVLQDINDGNKKLVRIVALNPNREREVEAKG-IQKGMTIVSVNTHEVSCIKDVERLIGKAKEKKR 459 (489) Q Consensus 381 ~~~~~~~~~~~~~lGl~v~~l~~~~~~~~gi~vv~v~~~s~Aa~~G-L~~GDiIl~VNg~~V~s~~dl~~iL~~~k~~~~ 459 (489) .-+........+-+|+.+ ++...+...+ +-.+.++|||+++| |++||.|++|||+.|.++.. ..+++-+|+.+ T Consensus 754 ~yDV~lhR~ENeGFGFVi--~sS~~kp~sg--iGrIieGSPAdRCgkLkVGDrilAVNG~sI~~lsH-adiv~LIKdaG- 827 (984) T KOG3209 754 PYDVVLHRKENEGFGFVI--MSSQNKPESG--IGRIIEGSPADRCGKLKVGDRILAVNGQSILNLSH-ADIVSLIKDAG- 827 (984) T ss_pred CEEEEEECCCCCCEEEEE--EECCCCCCCC--CCCCCCCCHHHHHCCCCCCCEEEEECCEEEECCCC-HHHHHHHHHCC- T ss_conf 700698604677602899--8436689877--43125698167505543265688754703303672-56888887368- Q ss_pred CEEEEEEEECCC Q ss_conf 569999971776 Q gi|254780700|r 460 DSVLLQIKYDPD 471 (489) Q Consensus 460 ~~VLL~V~r~~~ 471 (489) .+|.|+|-.... T Consensus 828 lsVtLtIip~ee 839 (984) T KOG3209 828 LSVTLTIIPPEE 839 (984) T ss_pred CEEEEEECCHHC T ss_conf 558999748010 No 23 >PRK10779 zinc metallopeptidase; Provisional Probab=99.11 E-value=6.1e-12 Score=94.74 Aligned_cols=70 Identities=26% Similarity=0.498 Sum_probs=63.0 Q ss_pred EEEECCCCCCCCCCCCCHHHHHHHHHCCCCCCCCCCEEEEECCCCCCCCEEEEECCCCCEEEECCCCCCCC Q ss_conf 13201111121134671167888752431478743101222035667520101204781665125565587 Q gi|254780700|r 308 SLITAVVKESPADKAGMKVGDVICMLDGRIIKSHQDFVWQIASRSPKEQVKISLCKEGSKHSVAVVLGSSP 378 (489) Q Consensus 308 vlV~~V~~~sPA~~AGLk~GDvI~~ing~~I~~~~~l~~~i~~~~~G~~v~l~v~R~g~~~~~~V~l~~~p 378 (489) .+|.+|.|+|||++||||+||.|++|||++|.++.|+...+.. .+|++++++++|+|+..+++++....+ T Consensus 223 ~vi~~V~~~spA~~AGL~~GD~I~~Ing~~i~s~~~l~~~i~~-~~~~~i~l~v~R~g~~~~~~v~p~~~~ 292 (449) T PRK10779 223 PVLEEVQPNSAASKAGLQAGDRIVKVDGQPLTQWVTFVMLVRD-NPGKPLALEIERQGSPLSLTLIPDTKP 292 (449) T ss_pred CEEEEECCCCHHHHCCCCCCCEEEEECCEECCCHHHHHHHHHH-CCCCEEEEEEEECCCEEEEEEEEEEEC T ss_conf 4354207999899748988877999999871659999999986-899869999997895899999640351 No 24 >PRK10139 serine endoprotease; Provisional Probab=99.09 E-value=1.2e-09 Score=80.04 Aligned_cols=81 Identities=15% Similarity=0.172 Sum_probs=64.5 Q ss_pred CCCCCEEEEECCHHHCCE------EEEEEEEECCCCHHHHCCCCCCCEEEEECCEECCCHHHHHHHHHHHHHCCCCEEEE Q ss_conf 452546987289657152------00799960688978982999888999889999389999999999886259956999 Q gi|254780700|r 391 KELLGMVLQDINDGNKKL------VRIVALNPNREREVEAKGIQKGMTIVSVNTHEVSCIKDVERLIGKAKEKKRDSVLL 464 (489) Q Consensus 391 ~~~lGl~v~~l~~~~~~~------~gi~vv~v~~~s~Aa~~GL~~GDiIl~VNg~~V~s~~dl~~iL~~~k~~~~~~VLL 464 (489) ..++|+.+++++++..+. .|+++.++.++|||+++||++||+|+++||++|++..||...+...+.+ ..+-| T Consensus 266 rg~LGv~~~~lt~~~a~~~gl~~~~GalV~~V~~~sPA~kAGLk~GDVI~~vnG~~V~~~~dL~~~v~~~~pG--~~v~l 343 (455) T PRK10139 266 RGLLGIKGTEMSADIAKAFNLDVQRGAFVSEVLPNSGSAKAGVKSGDIITSLNGKPLNSFAELRSRIATTEPG--TKVKL 343 (455) T ss_pred CCEEEEEEEECCHHHHHHCCCCCCCCCEEEEECCCCCHHHCCCCCCCEEEEECCEECCCHHHHHHHHHCCCCC--CEEEE T ss_conf 3156688765265566541677777735665447883687699999999998998968999999999608988--88999 Q ss_pred EEEECCCCC Q ss_conf 997177643 Q gi|254780700|r 465 QIKYDPDMQ 473 (489) Q Consensus 465 ~V~r~~~~~ 473 (489) .|.|++... T Consensus 344 ~v~R~Gk~~ 352 (455) T PRK10139 344 GLLRNGKPL 352 (455) T ss_pred EEEECCEEE T ss_conf 999999799 No 25 >PRK10942 serine endoprotease; Provisional Probab=99.04 E-value=2.2e-09 Score=78.48 Aligned_cols=81 Identities=17% Similarity=0.191 Sum_probs=65.4 Q ss_pred CCCCCEEEEECCHHHCC------EEEEEEEEECCCCHHHHCCCCCCCEEEEECCEECCCHHHHHHHHHHHHHCCCCEEEE Q ss_conf 45254698728965715------200799960688978982999888999889999389999999999886259956999 Q gi|254780700|r 391 KELLGMVLQDINDGNKK------LVRIVALNPNREREVEAKGIQKGMTIVSVNTHEVSCIKDVERLIGKAKEKKRDSVLL 464 (489) Q Consensus 391 ~~~lGl~v~~l~~~~~~------~~gi~vv~v~~~s~Aa~~GL~~GDiIl~VNg~~V~s~~dl~~iL~~~k~~~~~~VLL 464 (489) ..++|+..++++++... ..|+++..+.++|||+++||++||+|+++||++|++..||...+...+.+ ..+-| T Consensus 288 rg~lGv~~~~v~~~la~~lgl~~~~GalV~~V~~~sPA~kAGL~~GDVI~~vdG~~I~~~~dL~~~v~~~~pG--~~V~l 365 (474) T PRK10942 288 RGELGIMGTELNSELAKAMKVDAQRGAFVSQVLPNSSAAKAGIKAGDVITSLNGKPISSFAALRAQVGTMPVG--SKMTL 365 (474) T ss_pred CEEEEEEEEECCHHHHHHCCCCCCCCCEEEECCCCCCHHHCCCCCCCEEEEECCEECCCHHHHHHHHHCCCCC--CEEEE T ss_conf 1033159885372567761777677726520177993677699989999998998968999999999618988--88999 Q ss_pred EEEECCCCC Q ss_conf 997177643 Q gi|254780700|r 465 QIKYDPDMQ 473 (489) Q Consensus 465 ~V~r~~~~~ 473 (489) .|.|++... T Consensus 366 ~v~R~Gk~~ 374 (474) T PRK10942 366 GLLRDGKPV 374 (474) T ss_pred EEEECCEEE T ss_conf 999999899 No 26 >cd00988 PDZ_CTP_protease PDZ domain of C-terminal processing-, tail-specific-, and tricorn proteases, which function in posttranslational protein processing, maturation, and disassembly or degradation, in Bacteria, Archaea, and plant chloroplasts. May be responsible for substrate recognition and/or binding, as most PDZ domains bind C-terminal polypeptides, and binding to internal (non-C-terminal) polypeptides and even to lipids has been demonstrated. In this subfamily of protease-associated PDZ domains a C-terminal beta-strand forms the peptide-binding groove base, a circular permutation with respect to PDZ domains found in Eumetazoan signaling proteins. Probab=99.03 E-value=2.9e-11 Score=90.45 Aligned_cols=68 Identities=32% Similarity=0.563 Sum_probs=53.7 Q ss_pred CCCEEEECCCCCCCCCCCCCHHHHHHHHHCCCCCCCC--CCEEEEECCCCCCCCEEEEECC-CCCEEEECCC Q ss_conf 4411320111112113467116788875243147874--3101222035667520101204-7816651255 Q gi|254780700|r 305 TKGSLITAVVKESPADKAGMKVGDVICMLDGRIIKSH--QDFVWQIASRSPKEQVKISLCK-EGSKHSVAVV 373 (489) Q Consensus 305 ~~GvlV~~V~~~sPA~~AGLk~GDvI~~ing~~I~~~--~~l~~~i~~~~~G~~v~l~v~R-~g~~~~~~V~ 373 (489) ..|++|.+|.++|||++|||++||+|++|||+++.+. .++...+. -++|+.|+|++.| +++.++++++ T Consensus 12 ~~~~~V~~v~~gsPA~~aGl~~GD~I~~Vng~~v~~~~~~~~~~~lr-g~~Gt~V~l~v~R~~~~~~~~~l~ 82 (85) T cd00988 12 DGGLVITSVLPGSPAAKAGIKAGDIIVAIDGEPVDGLSLEDVVKLLR-GKAGTKVRLTLKRGDGEPREVTLT 82 (85) T ss_pred CCEEEEEEECCCCHHHHHCCCCCCEEEEECCEECCCCCHHHHHHHHC-CCCCCEEEEEEECCCCCEEEEEEE T ss_conf 99899999689995898089999999999999978999999999865-999988999999099989999999 No 27 >PRK10898 serine endoprotease; Provisional Probab=99.02 E-value=4.6e-09 Score=76.42 Aligned_cols=87 Identities=16% Similarity=0.181 Sum_probs=66.3 Q ss_pred CCCCEEEEECCHHHCC------EEEEEEEEECCCCHHHHCCCCCCCEEEEECCEECCCHHHHHHHHHHHHHCCCCEEEEE Q ss_conf 5254698728965715------2007999606889789829998889998899993899999999998862599569999 Q gi|254780700|r 392 ELLGMVLQDINDGNKK------LVRIVALNPNREREVEAKGIQKGMTIVSVNTHEVSCIKDVERLIGKAKEKKRDSVLLQ 465 (489) Q Consensus 392 ~~lGl~v~~l~~~~~~------~~gi~vv~v~~~s~Aa~~GL~~GDiIl~VNg~~V~s~~dl~~iL~~~k~~~~~~VLL~ 465 (489) .++|+..+++++.... ..++++..+.++|||+++||++||+|+++||++|.+..|+.+.+.+.+.+ ..+-|. T Consensus 257 g~LGi~~~~~~~~~~~~~~~~~~~Gv~V~~V~~~sPA~~AGL~~GDvI~~idg~~v~~~~~l~~~l~~~~pG--d~v~l~ 334 (355) T PRK10898 257 GYIGIGGREIAPLHAQGGGIDQLQGIVVNEVSPDGPAANAGIQVNDLIISVNNKPAISALETMDQVAEIRPG--SVIPVV 334 (355) T ss_pred CEEEEEEEECCHHHHHHCCCCCCCCCEEEEECCCCHHHHCCCCCCCEEEEECCEECCCHHHHHHHHHHCCCC--CEEEEE T ss_conf 242477443798899654898777528988799995898599989999998998938999999999718997--989999 Q ss_pred EEECCCCCCCCCCCEEEEEEECC Q ss_conf 97177643346884368887528 Q gi|254780700|r 466 IKYDPDMQSGNDNMSRFVSLKID 488 (489) Q Consensus 466 V~r~~~~~~~~~~~~rFVal~ld 488 (489) |.|++.. +-+.++|. T Consensus 335 v~R~G~~--------~~~~VtL~ 349 (355) T PRK10898 335 VMRDDKQ--------LTLQVTIQ 349 (355) T ss_pred EEECCEE--------EEEEEEEC T ss_conf 9999999--------99999978 No 28 >COG0793 Prc Periplasmic protease [Cell envelope biogenesis, outer membrane] Probab=98.99 E-value=4.1e-10 Score=83.14 Aligned_cols=83 Identities=30% Similarity=0.456 Sum_probs=61.5 Q ss_pred CCCCCCCCCCCCHHHHHHHCCCCCCCCEEEECCCCCCCCCCCCCHHHHHHHHHCCCCCCCCC--CEEEEECCCCCCCCEE Q ss_conf 33433200034216676441764444113201111121134671167888752431478743--1012220356675201 Q gi|254780700|r 281 DHGWFGIMTQNLTQELAIPLGLRGTKGSLITAVVKESPADKAGMKVGDVICMLDGRIIKSHQ--DFVWQIASRSPKEQVK 358 (489) Q Consensus 281 ~rg~lGv~~~~v~~~la~~lgl~~~~GvlV~~V~~~sPA~~AGLk~GDvI~~ing~~I~~~~--~l~~~i~~~~~G~~v~ 358 (489) +.+++|+.++.-+ ..++.|.++.++|||++|||++||+|++|||+++.... +....+ .-++|+.|+ T Consensus 98 ~~~GiG~~i~~~~-----------~~~~~V~s~~~~~PA~kagi~~GD~I~~IdG~~~~~~~~~~av~~i-rG~~Gt~V~ 165 (406) T COG0793 98 EFGGIGIELQMED-----------IGGVKVVSPIDGSPAAKAGIKPGDVIIKIDGKSVGGVSLDEAVKLI-RGKPGTKVT 165 (406) T ss_pred ECCCCEEEEEEEC-----------CCCCEEECCCCCCCHHHHCCCCCCEEEEECCEECCCCCHHHHHHHH-CCCCCCEEE T ss_conf 0146038999843-----------7982695068899267608998888999899976677777899972-689997568 Q ss_pred EEECCCCCEEEECCCCC Q ss_conf 01204781665125565 Q gi|254780700|r 359 ISLCKEGSKHSVAVVLG 375 (489) Q Consensus 359 l~v~R~g~~~~~~V~l~ 375 (489) |++.|.+....+.+++. T Consensus 166 L~i~r~~~~k~~~v~l~ 182 (406) T COG0793 166 LTILRAGGGKPFTVTLT 182 (406) T ss_pred EEEEECCCCCEEEEEEE T ss_conf 99996689953689999 No 29 >TIGR00054 TIGR00054 membrane-associated zinc metalloprotease, putative; InterPro: IPR004387 Metalloproteases are the most diverse of the four main types of protease, with more than 50 families identified to date. In these enzymes, a divalent cation, usually zinc, activates the water molecule. The metal ion is held in place by amino acid ligands, usually three in number. The known metal ligands are His, Glu, Asp or Lys and at least one other residue is required for catalysis, which may play an electrophillic role. Of the known metalloproteases, around half contain an HEXXH motif, which has been shown in crystallographic studies to form part of the metal-binding site . The HEXXH motif is relatively common, but can be more stringently defined for metalloproteases as 'abXHEbbHbc', where 'a' is most often valine or threonine and forms part of the S1' subsite in thermolysin and neprilysin, 'b' is an uncharged residue, and 'c' a hydrophobic residue. Proline is never found in this site, possibly because it would break the helical structure adopted by this motif in metalloproteases . Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Each clan is identified with two letters, the first representing the catalytic type of the families included in the clan (with the letter 'P' being used for a clan containing families of more than one of the catalytic types serine, threonine and cysteine). Some families cannot yet be assigned to clans, and when a formal assignment is required, such a family is described as belonging to clan A-, C-, M-, S-, T- or U-, according to the catalytic type. Some clans are divided into subclans because there is evidence of a very ancient divergence within the clan, for example MA(E), the gluzincins, and MA(M), the metzincins. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. The serine, threonine and cysteine peptidases utilise the amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule. This family contains putative zinc metallopeptidases belonging to MEROPS peptidase family M50 (S2P protease family, clan MM). The N-terminal region of contains a perfectly conserved motif HEXGH, where the Glu is the active site and the His residues coordinate the metal cation. The family of bacterial and plant proteins also includes a region that hits the PDZ domain (IPR001478 from INTERPRO), found in a number of proteins targeted to the membrane by binding to a peptide ligand . The family includes EcfE, which is a homolog of human site-2 protease (S2P), a membrane-bound zinc metalloprotease involved in regulated intramembrane proteolysis. In Escherichia coli EcfE activates the sigma(E) pathway of stress response through a site-2 cleavage of anti-sigma(E), RseA.; GO: 0004222 metalloendopeptidase activity, 0006508 proteolysis, 0016021 integral to membrane. Probab=98.96 E-value=8.3e-11 Score=87.52 Aligned_cols=71 Identities=31% Similarity=0.543 Sum_probs=64.0 Q ss_pred EEEECCCCCCCCCCCCCHHHHHHHHHCCCCCCCCCCEEEEECCCCCCCCEEEEECCCCCEEEECCCCCCCCC Q ss_conf 132011111211346711678887524314787431012220356675201012047816651255655876 Q gi|254780700|r 308 SLITAVVKESPADKAGMKVGDVICMLDGRIIKSHQDFVWQIASRSPKEQVKISLCKEGSKHSVAVVLGSSPT 379 (489) Q Consensus 308 vlV~~V~~~sPA~~AGLk~GDvI~~ing~~I~~~~~l~~~i~~~~~G~~v~l~v~R~g~~~~~~V~l~~~p~ 379 (489) ..+.+|.++|||++||||+||.|+++||.+.++|.|+...+.. .|++.+.+++.|+|+..+.+++....++ T Consensus 233 ~vl~~~~~N~~A~~AGLk~GD~I~~i~g~~l~~w~d~v~~v~~-np~~~~~i~v~R~G~~l~~~l~p~~~~~ 303 (463) T TIGR00054 233 PVLSDVTPNSPAEKAGLKEGDKIISIDGEKLKSWRDFVSLVKE-NPGKSLEIKVERNGETLSISLTPEAKKD 303 (463) T ss_pred EEECCCCCCCHHHHCCCCCCCEEEEECCCCCCCHHHHHHHHHH-CCCCEEEEEEEECCCEEEEEEEEEEECC T ss_conf 2331267885377534656888985568123442458999986-8995699999727814634787530079 No 30 >cd00991 PDZ_archaeal_metalloprotease PDZ domain of archaeal zinc metalloprotases, presumably membrane-associated or integral membrane proteases, which may be involved in signalling and regulatory mechanisms. May be responsible for substrate recognition and/or binding, as most PDZ domains bind C-terminal polypeptides, and binding to internal (non-C-terminal) polypeptides and even to lipids has been demonstrated. In this subfamily of protease-associated PDZ domains a C-terminal beta-strand forms the peptide-binding groove base, a circular permutation with respect to PDZ domains found in Eumetazoan signaling proteins. Probab=98.94 E-value=4.5e-09 Score=76.49 Aligned_cols=66 Identities=12% Similarity=0.175 Sum_probs=57.7 Q ss_pred EEEEEEEEECCCCHHHHCCCCCCCEEEEECCEECCCHHHHHHHHHHHHHCCCCEEEEEEEECCCCCCC Q ss_conf 20079996068897898299988899988999938999999999988625995699999717764334 Q gi|254780700|r 408 LVRIVALNPNREREVEAKGIQKGMTIVSVNTHEVSCIKDVERLIGKAKEKKRDSVLLQIKYDPDMQSG 475 (489) Q Consensus 408 ~~gi~vv~v~~~s~Aa~~GL~~GDiIl~VNg~~V~s~~dl~~iL~~~k~~~~~~VLL~V~r~~~~~~~ 475 (489) ..|+++..+.++|||+++||++||+|++|||++|.+++||.+++...+.+ ..+-+.|.|++...+. T Consensus 9 ~~Gv~V~~V~~gsPA~~AGL~~GDVI~~Ing~~I~~~~d~~~~l~~~~pG--~~v~v~v~R~g~~lT~ 74 (79) T cd00991 9 VAGVVIVGVIVGSPAENAVLHTGDVIYSINGTPITTLEDFMEALKPTKPG--EVITVTVLPSTTKLTN 74 (79) T ss_pred CCCEEEEEECCCCHHHHCCCCCCCEEEEECCEECCCHHHHHHHHHCCCCC--CEEEEEEEECCEEEEE T ss_conf 59779999678996998699988899998999987999999999618999--9899999989999777 No 31 >KOG3834 consensus Probab=98.82 E-value=3.3e-08 Score=70.94 Aligned_cols=151 Identities=19% Similarity=0.209 Sum_probs=97.1 Q ss_pred CCCCCEEEECCCCCCCCCCCCCHH-HHHHHHHCCCCCCCCCCEEEEECCCCCCCCEEEEECCCCCEEEECCCCCCCCCCC Q ss_conf 444411320111112113467116-7888752431478743101222035667520101204781665125565587631 Q gi|254780700|r 303 RGTKGSLITAVVKESPADKAGMKV-GDVICMLDGRIIKSHQDFVWQIASRSPKEQVKISLCKEGSKHSVAVVLGSSPTAK 381 (489) Q Consensus 303 ~~~~GvlV~~V~~~sPA~~AGLk~-GDvI~~ing~~I~~~~~l~~~i~~~~~G~~v~l~v~R~g~~~~~~V~l~~~p~~~ 381 (489) ....|..|.+|.++|||.+|||.+ -|.|++|||..++.-.|....+..... ++|+++++-......-.+.+.. T Consensus 12 ggteg~hvlkVqedSpa~~aglepffdFIvSI~g~rL~~dnd~Lk~llk~~s-ekVkltv~n~kt~~~R~v~I~p----- 85 (462) T KOG3834 12 GGTEGYHVLKVQEDSPAHKAGLEPFFDFIVSINGIRLNKDNDTLKALLKANS-EKVKLTVYNSKTQEVRIVEIVP----- 85 (462) T ss_pred CCCEEEEEEEEECCCHHHHCCCCHHHHHHHEECCCCCCCCHHHHHHHHHHCC-CCEEEEEEECCCCEEEEEEECC----- T ss_conf 7730478998633786775675034444100174002676589999887424-1217998855542367898424----- Q ss_pred CCCCCCCCCCCCCCEEEEECCHHHCCEEEEEEEEECCCCHHHHCCCC-CCCEEEEECCEECCCHHHHHHHHHHHHHCCCC Q ss_conf 00001246545254698728965715200799960688978982999-88899988999938999999999988625995 Q gi|254780700|r 382 NDMHLEVGDKELLGMVLQDINDGNKKLVRIVALNPNREREVEAKGIQ-KGMTIVSVNTHEVSCIKDVERLIGKAKEKKRD 460 (489) Q Consensus 382 ~~~~~~~~~~~~lGl~v~~l~~~~~~~~gi~vv~v~~~s~Aa~~GL~-~GDiIl~VNg~~V~s~~dl~~iL~~~k~~~~~ 460 (489) ...+...++|+.++=.+..........+.++.++|+|+.+||+ -+|.|+.+-..-....+||..+++.. .++ T Consensus 86 ----s~~wggqllGvsvrFcsf~~A~~~vwHvl~V~p~SPaalAgl~~~~DYivG~~~~~~~~~eDl~~lIesh---e~k 158 (462) T KOG3834 86 ----SNNWGGQLLGVSVRFCSFDGAVESVWHVLSVEPNSPAALAGLRPYTDYIVGIWDAVMHEEEDLFTLIESH---EGK 158 (462) T ss_pred ----CCCCCCCCCCEEEEECCCCCCHHHEEEEEECCCCCHHHHCCCCCCCCEEECCHHHHCCCHHHHHHHHHHC---CCC T ss_conf ----3123454013588741576531112344643899878850553365357435455234157899999860---278 Q ss_pred EEEEEE Q ss_conf 699999 Q gi|254780700|r 461 SVLLQI 466 (489) Q Consensus 461 ~VLL~V 466 (489) .+-|.| T Consensus 159 pLklyV 164 (462) T KOG3834 159 PLKLYV 164 (462) T ss_pred CCCEEE T ss_conf 741467 No 32 >cd00989 PDZ_metalloprotease PDZ domain of bacterial and plant zinc metalloprotases, presumably membrane-associated or integral membrane proteases, which may be involved in signalling and regulatory mechanisms. May be responsible for substrate recognition and/or binding, as most PDZ domains bind C-terminal polypeptides, and binding to internal (non-C-terminal) polypeptides and even to lipids has been demonstrated. In this subfamily of protease-associated PDZ domains a C-terminal beta-strand forms the peptide-binding groove base, a circular permutation with respect to PDZ domains found in Eumetazoan signaling proteins. Probab=98.75 E-value=6.2e-08 Score=69.22 Aligned_cols=61 Identities=11% Similarity=0.192 Sum_probs=52.6 Q ss_pred EEEEEEECCCCHHHHCCCCCCCEEEEECCEECCCHHHHHHHHHHHHHCCCCEEEEEEEECCCCC Q ss_conf 0799960688978982999888999889999389999999999886259956999997177643 Q gi|254780700|r 410 RIVALNPNREREVEAKGIQKGMTIVSVNTHEVSCIKDVERLIGKAKEKKRDSVLLQIKYDPDMQ 473 (489) Q Consensus 410 gi~vv~v~~~s~Aa~~GL~~GDiIl~VNg~~V~s~~dl~~iL~~~k~~~~~~VLL~V~r~~~~~ 473 (489) ..++..+.++|+|+++||++||+|++|||+++.+++|+.+++.+. ....+.+.|+|++... T Consensus 13 ~~vV~~V~~~spA~~AGl~~GD~I~~ing~~v~~~~~~~~~i~~~---~~~~i~l~v~R~g~~~ 73 (79) T cd00989 13 EPVIGEVVPGSPAAKAGLKAGDRILAINGQKIKSWEDLVDAVQEN---PGKPLTLTVERNGETI 73 (79) T ss_pred CCEEEEECCCCHHHHCCCCCCCEEEEECCEECCCHHHHHHHHHHC---CCCEEEEEEEECCEEE T ss_conf 999999899998998599999999999999958999999999858---9988999999999998 No 33 >TIGR02860 spore_IV_B stage IV sporulation protein B; InterPro: IPR014219 Proteolytic enzymes that exploit serine in their catalytic activity are ubiquitous, being found in viruses, bacteria and eukaryotes . They include a wide range of peptidase activity, including exopeptidase, endopeptidase, oligopeptidase and omega-peptidase activity. Over 20 families (denoted S1 - S66) of serine protease have been identified, these being grouped into clans on the basis of structural similarity and other functional evidence . Structures are known for members of the clans and the structures indicate that some appear to be totally unrelated, suggesting different evolutionary origins for the serine peptidases . Not withstanding their different evolutionary origins, there are similarities in the reaction mechanisms of several peptidases. Chymotrypsin, subtilisin and carboxypeptidase C have a catalytic triad of serine, aspartate and histidine in common: serine acts as a nucleophile, aspartate as an electrophile, and histidine as a base . The geometric orientations of the catalytic residues are similar between families, despite different protein folds . The linear arrangements of the catalytic residues commonly reflect clan relationships. For example the catalytic triad in the chymotrypsin clan (PA) is ordered HDS, but is ordered DHS in the subtilisin clan (SB) and SDH in the carboxypeptidase clan (SC) , . Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Each clan is identified with two letters, the first representing the catalytic type of the families included in the clan (with the letter 'P' being used for a clan containing families of more than one of the catalytic types serine, threonine and cysteine). Some families cannot yet be assigned to clans, and when a formal assignment is required, such a family is described as belonging to clan A-, C-, M-, S-, T- or U-, according to the catalytic type. Some clans are divided into subclans because there is evidence of a very ancient divergence within the clan, for example MA(E), the gluzincins, and MA(M), the metzincins. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. The serine, threonine and cysteine peptidases utilise the amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule. SpoIVB, the stage IV sporulation protein B of endospore-forming bacteria such as Bacillus subtilis, is a serine proteinase expressed in the spore (rather than mother cell) compartment, that participates in a proteolytic activation cascade for Sigma-K. It appears to be universal among endospore-forming bacteria and occurs nowhere else. The members of this entry belong to MEROPS peptidase family S55 (SpoIVB peptidase, clan PA).. Probab=98.73 E-value=1.8e-09 Score=78.97 Aligned_cols=58 Identities=26% Similarity=0.490 Sum_probs=53.0 Q ss_pred CCCCCCCCHHHHHHHHHCCCCCCCCCCEEEEECCCC-CCCCEEEEECCCCCEEEECCCC Q ss_conf 211346711678887524314787431012220356-6752010120478166512556 Q gi|254780700|r 317 SPADKAGMKVGDVICMLDGRIIKSHQDFVWQIASRS-PKEQVKISLCKEGSKHSVAVVL 374 (489) Q Consensus 317 sPA~~AGLk~GDvI~~ing~~I~~~~~l~~~i~~~~-~G~~v~l~v~R~g~~~~~~V~l 374 (489) |||+.||||.||+|++|||++|++..|+...+.... .|+.++|+|.|+++..+..+.- T Consensus 142 sPg~~AGi~~GD~I~~iNg~~i~~~~d~~~~i~~~g~~g~~l~l~i~R~~~~i~~~~~p 200 (423) T TIGR02860 142 SPGEEAGIQIGDIILKINGEKIKNMEDIAKLINKAGKTGEKLKLTIKRGGKIIETKIKP 200 (423) T ss_pred CCHHHCCEEEEEEEEEECCCHHCCHHHHHHHHHHHHHCCCEEEEEEEECCCEEEEEEEE T ss_conf 63654784561089998881103534568888754305954899998589089986613 No 34 >smart00228 PDZ Domain present in PSD-95, Dlg, and ZO-1/2. Also called DHR (Dlg homologous region) or GLGF (relatively well conserved tetrapeptide in these domains). Some PDZs have been shown to bind C-terminal polypeptides; others appear to bind internal (non-C-terminal) polypeptides. Different PDZs possess different binding specificities. Probab=98.73 E-value=9.8e-10 Score=80.69 Aligned_cols=59 Identities=29% Similarity=0.445 Sum_probs=31.6 Q ss_pred CCEEEECCCCCCCCCCCCCHHHHHHHHHCCCCCCCCCCEEEEECCCCCCCCEEEEECCC Q ss_conf 41132011111211346711678887524314787431012220356675201012047 Q gi|254780700|r 306 KGSLITAVVKESPADKAGMKVGDVICMLDGRIIKSHQDFVWQIASRSPKEQVKISLCKE 364 (489) Q Consensus 306 ~GvlV~~V~~~sPA~~AGLk~GDvI~~ing~~I~~~~~l~~~i~~~~~G~~v~l~v~R~ 364 (489) .|++|..|.++|||+++||++||+|++|||+++.+..+..........++.+.|++.|. T Consensus 26 ~gv~I~~v~~~s~A~~~Gl~~GD~I~~vng~~v~~~~~~~~~~~~~~~~~~v~l~v~r~ 84 (85) T smart00228 26 GGVVVSSVVPGSPAAKAGLKVGDVILEVNGTSVEGLTHLEAVDLLKKAGGKVTLTVLRG 84 (85) T ss_pred CCEEEEEECCCCHHHHCCCCCCCEEEEECCEECCCCCHHHHHHHHHCCCCEEEEEEEEC T ss_conf 98999998799947876898999999999999899989999999877999799999949 No 35 >cd00986 PDZ_LON_protease PDZ domain of ATP-dependent LON serine proteases. Most PDZ domains bind C-terminal polypeptides, though binding to internal (non-C-terminal) polypeptides and even to lipids has been demonstrated. In this bacterial subfamily of protease-associated PDZ domains a C-terminal beta-strand is thought to form the peptide-binding groove base, a circular permutation with respect to PDZ domains found in Eumetazoan signaling proteins. Probab=98.68 E-value=1.3e-07 Score=67.12 Aligned_cols=70 Identities=17% Similarity=0.190 Sum_probs=57.3 Q ss_pred EEEEEEEEECCCCHHHHCCCCCCCEEEEECCEECCCHHHHHHHHHHHHHCCCCEEEEEEEECCCCCCCCCCCEEEEEEEC Q ss_conf 20079996068897898299988899988999938999999999988625995699999717764334688436888752 Q gi|254780700|r 408 LVRIVALNPNREREVEAKGIQKGMTIVSVNTHEVSCIKDVERLIGKAKEKKRDSVLLQIKYDPDMQSGNDNMSRFVSLKI 487 (489) Q Consensus 408 ~~gi~vv~v~~~s~Aa~~GL~~GDiIl~VNg~~V~s~~dl~~iL~~~k~~~~~~VLL~V~r~~~~~~~~~~~~rFVal~l 487 (489) ..|+++..+.++|||+.+ |++||+|++|||++|++.+||.+++...+.+ ..+-|.+.|++.. .-+.++| T Consensus 7 ~~Gv~V~~V~~gsPA~~~-Lk~GDvI~~vdGk~v~~~~~l~~~i~~~~~G--d~V~l~v~R~gk~--------~~~~vtL 75 (79) T cd00986 7 YHGVYVTSVVEGMPAAGK-LKAGDHIIAVDGKPFKEAEELIDYIQSKKEG--DTVKLKVKREEKE--------LPEDLIL 75 (79) T ss_pred CCEEEEEEECCCCCHHHC-CCCCCEEEEECCEECCCHHHHHHHHHCCCCC--CEEEEEEEECCEE--------EEEEEEE T ss_conf 781899996799973770-7789999999998957999999999659999--9899999999999--------9999997 Q ss_pred C Q ss_conf 8 Q gi|254780700|r 488 D 488 (489) Q Consensus 488 d 488 (489) . T Consensus 76 ~ 76 (79) T cd00986 76 K 76 (79) T ss_pred E T ss_conf 2 No 36 >PRK11186 carboxy-terminal protease; Provisional Probab=98.66 E-value=2.2e-08 Score=72.07 Aligned_cols=60 Identities=28% Similarity=0.408 Sum_probs=46.2 Q ss_pred CCEEEECCCCCCCCCCCC-CHHHHHHHHH--CCCCCCCCC-----CEEEEECCCCCCCCEEEEECCCCC Q ss_conf 411320111112113467-1167888752--431478743-----101222035667520101204781 Q gi|254780700|r 306 KGSLITAVVKESPADKAG-MKVGDVICML--DGRIIKSHQ-----DFVWQIASRSPKEQVKISLCKEGS 366 (489) Q Consensus 306 ~GvlV~~V~~~sPA~~AG-Lk~GDvI~~i--ng~~I~~~~-----~l~~~i~~~~~G~~v~l~v~R~g~ 366 (489) ..+.|.++.|||||+++| |++||+|++| +++++.+.. +....|. -+.|++|.|+|+|.++ T Consensus 257 ~~~~Iv~~i~GgPA~k~g~L~~gD~Ii~V~q~~~~~~dviG~~lddvV~lIR-G~kGT~V~L~I~r~~~ 324 (673) T PRK11186 257 DYTVIKSLVAGGPAAKSKKLSVGDKIVGVGQDGKEIVDVIGWRLDDVVALIK-GPKGSKVRLEILPAGK 324 (673) T ss_pred CEEEEEEECCCCHHHHHCCCCCCCEEEEECCCCCCCCCCCCCCHHHHHHHHC-CCCCCEEEEEEEECCC T ss_conf 9899997068995887389998999998257898742023765999999853-8998879999997888 No 37 >KOG3580 consensus Probab=98.63 E-value=3.8e-08 Score=70.55 Aligned_cols=75 Identities=15% Similarity=0.334 Sum_probs=47.8 Q ss_pred HHHHCCCCCCCCEEEECCCCCCCCCCCC-CHHHHHHHHHCCCCCCCCC--CEEEEECCCCCCCCEEEEECCCCCEEEECC Q ss_conf 7644176444411320111112113467-1167888752431478743--101222035667520101204781665125 Q gi|254780700|r 296 LAIPLGLRGTKGSLITAVVKESPADKAG-MKVGDVICMLDGRIIKSHQ--DFVWQIASRSPKEQVKISLCKEGSKHSVAV 372 (489) Q Consensus 296 la~~lgl~~~~GvlV~~V~~~sPA~~AG-Lk~GDvI~~ing~~I~~~~--~l~~~i~~~~~G~~v~l~v~R~g~~~~~~V 372 (489) -.|.+||.-..-++|.++...|-|++-| |+.||+|++|||..-.+++ |-+.+|.. ...++.|.|+|+....-+.+ T Consensus 209 ~nEEyGlrLgSqIFvKeit~~gLAardgnlqEGDiiLkINGtvteNmSLtDar~LIEk--S~GKL~lvVlRD~~qtLiNi 286 (1027) T KOG3580 209 ANEEYGLRLGSQIFVKEITRTGLAARDGNLQEGDIILKINGTVTENMSLTDARKLIEK--SRGKLQLVVLRDSQQTLINI 286 (1027) T ss_pred CCHHHCCCCCCHHHHHHHCCCCHHHCCCCCCCCCEEEEECCEEECCCCCHHHHHHHHH--CCCCEEEEEEECCCCEEEEC T ss_conf 5555440100212343330231111248865563799977674034440567889874--36736899993278515506 No 38 >cd00136 PDZ PDZ domain, also called DHR (Dlg homologous region) or GLGF (after a conserved sequence motif). Many PDZ domains bind C-terminal polypeptides, though binding to internal (non-C-terminal) polypeptides and even to lipids has been demonstrated. Heterodimerization through PDZ-PDZ domain interactions adds to the domain's versatility, and PDZ domain-mediated interactions may be modulated dynamically through target phosphorylation. Some PDZ domains play a role in scaffolding supramolecular complexes. PDZ domains are found in diverse signaling proteins in bacteria, archebacteria, and eurkayotes. This CD contains two distinct structural subgroups with either a N- or C-terminal beta-strand forming the peptide-binding groove base. The circular permutation placing the strand on the N-terminus appears to be found in Eumetazoa only, while the C-terminal variant is found in all three kingdoms of life, and seems to co-occur with protease domains. PDZ domains have been named after PSD95(pos Probab=98.62 E-value=3.3e-09 Score=77.34 Aligned_cols=54 Identities=30% Similarity=0.515 Sum_probs=28.9 Q ss_pred CEEEECCCCCCCCCCCCCHHHHHHHHHCCCCCCCCC--CEEEEECCCCCCCCEEEEE Q ss_conf 113201111121134671167888752431478743--1012220356675201012 Q gi|254780700|r 307 GSLITAVVKESPADKAGMKVGDVICMLDGRIIKSHQ--DFVWQIASRSPKEQVKISL 361 (489) Q Consensus 307 GvlV~~V~~~sPA~~AGLk~GDvI~~ing~~I~~~~--~l~~~i~~~~~G~~v~l~v 361 (489) +++|..|.++|||++|||++||+|++|||+++.++. +....+.. .+|++++|++ T Consensus 14 ~i~V~~v~~~spA~~aGL~~GD~I~~ing~~v~~~~~~~~~~~l~~-~~g~~v~l~v 69 (70) T cd00136 14 GVVVLSVEPGSPAERAGLQAGDVILAVNGTDVKNLTLEDVAELLKK-EVGEKVTLTV 69 (70) T ss_pred CEEEEECCCCCHHHHCCCCCCCEEEEECCEECCCCCHHHHHHHHCC-CCCCEEEEEE T ss_conf 8999980998979987999899999999999689989999999628-9879799998 No 39 >KOG3605 consensus Probab=98.59 E-value=1.6e-08 Score=72.98 Aligned_cols=124 Identities=19% Similarity=0.320 Sum_probs=86.7 Q ss_pred EEEECCCCCCCCCCCC-CHHHHHHHHHCCCCCCCC--CCEEEEECCCCCCCCEEEEECCCCCEEEECCCCCCCCCCCCCC Q ss_conf 1320111112113467-116788875243147874--3101222035667520101204781665125565587631000 Q gi|254780700|r 308 SLITAVVKESPADKAG-MKVGDVICMLDGRIIKSH--QDFVWQIASRSPKEQVKISLCKEGSKHSVAVVLGSSPTAKNDM 384 (489) Q Consensus 308 vlV~~V~~~sPA~~AG-Lk~GDvI~~ing~~I~~~--~~l~~~i~~~~~G~~v~l~v~R~g~~~~~~V~l~~~p~~~~~~ 384 (489) |+|.....++||+++| |..||.|++|||..+-.. +.-+.+|...+--+.|+|+|.+---..++.+ .+|+ T Consensus 675 VViAnmm~~GpAarsgkLnIGDQiiaING~SLVGLPLstcQs~Ik~~KnQT~VkltiV~cpPV~~V~I---~RPd----- 746 (829) T KOG3605 675 VVIANMMHGGPAARSGKLNIGDQIMSINGTSLVGLPLSTCQSIIKGLKNQTAVKLNIVSCPPVTTVLI---RRPD----- 746 (829) T ss_pred HHHHHCCCCCHHHHCCCCCCCCEEEEECCCEECCCCHHHHHHHHHCCCCCCEEEEEEECCCCCEEEEE---ECCC----- T ss_conf 99875136771654387663222576447211066079999998615554058887761898347885---0665----- Q ss_pred CCCCCCCCCCCEEEEECCHHHCCEEEEEEEEECCCCHHHHCCCCCCCEEEEECCEECCCHHHHHHHHHHHH Q ss_conf 01246545254698728965715200799960688978982999888999889999389999999999886 Q gi|254780700|r 385 HLEVGDKELLGMVLQDINDGNKKLVRIVALNPNREREVEAKGIQKGMTIVSVNTHEVSCIKDVERLIGKAK 455 (489) Q Consensus 385 ~~~~~~~~~lGl~v~~l~~~~~~~~gi~vv~v~~~s~Aa~~GL~~GDiIl~VNg~~V~s~~dl~~iL~~~k 455 (489) .+.-+|+.|++ | ++.....++.|++.|+|+|..|++|||+.|--.- ..+++..+. T Consensus 747 -----~kyQLGFSVQN---------G-iICSLlRGGIAERGGVRVGHRIIEINgQSVVA~p-HekIV~lLs 801 (829) T KOG3605 747 -----LRYQLGFSVQN---------G-IICSLLRGGIAERGGVRVGHRIIEINGQSVVATP-HEKIVQLLS 801 (829) T ss_pred -----CHHHCCCEEEC---------C-EEEHHHCCCCHHCCCCEEEEEEEEECCCEEEECC-HHHHHHHHH T ss_conf -----13110614307---------6-7510320551001670110068998793677562-899999998 No 40 >cd00988 PDZ_CTP_protease PDZ domain of C-terminal processing-, tail-specific-, and tricorn proteases, which function in posttranslational protein processing, maturation, and disassembly or degradation, in Bacteria, Archaea, and plant chloroplasts. May be responsible for substrate recognition and/or binding, as most PDZ domains bind C-terminal polypeptides, and binding to internal (non-C-terminal) polypeptides and even to lipids has been demonstrated. In this subfamily of protease-associated PDZ domains a C-terminal beta-strand forms the peptide-binding groove base, a circular permutation with respect to PDZ domains found in Eumetazoan signaling proteins. Probab=98.59 E-value=5.8e-07 Score=63.00 Aligned_cols=82 Identities=22% Similarity=0.222 Sum_probs=58.3 Q ss_pred CCEEEEECCHHHCCEEEEEEEEECCCCHHHHCCCCCCCEEEEECCEECCCHHHHHHHHHHHHHCCCCEEEEEEEECCCCC Q ss_conf 54698728965715200799960688978982999888999889999389999999999886259956999997177643 Q gi|254780700|r 394 LGMVLQDINDGNKKLVRIVALNPNREREVEAKGIQKGMTIVSVNTHEVSCIKDVERLIGKAKEKKRDSVLLQIKYDPDMQ 473 (489) Q Consensus 394 lGl~v~~l~~~~~~~~gi~vv~v~~~s~Aa~~GL~~GDiIl~VNg~~V~s~~dl~~iL~~~k~~~~~~VLL~V~r~~~~~ 473 (489) +|+.+... .-++.+..+.++|||+++||++||+|++|||+++.+. .+.++.+.++......|.|.|+|+. T Consensus 4 iGi~~~~~------~~~~~V~~v~~gsPA~~aGl~~GD~I~~Vng~~v~~~-~~~~~~~~lrg~~Gt~V~l~v~R~~--- 73 (85) T cd00988 4 IGLELKYD------DGGLVITSVLPGSPAAKAGIKAGDIIVAIDGEPVDGL-SLEDVVKLLRGKAGTKVRLTLKRGD--- 73 (85) T ss_pred EEEEEEEE------CCEEEEEEECCCCHHHHHCCCCCCEEEEECCEECCCC-CHHHHHHHHCCCCCCEEEEEEECCC--- T ss_conf 99999997------9989999968999589808999999999999997899-9999999865999988999999099--- Q ss_pred CCCCCCEEEEEEECCC Q ss_conf 3468843688875289 Q gi|254780700|r 474 SGNDNMSRFVSLKIDK 489 (489) Q Consensus 474 ~~~~~~~rFVal~ldk 489 (489) +..+=+.|.-+| T Consensus 74 ----~~~~~~~l~R~k 85 (85) T cd00988 74 ----GEPREVTLTRLK 85 (85) T ss_pred ----CCEEEEEEEECC T ss_conf ----989999999897 No 41 >COG3975 Predicted protease with the C-terminal PDZ domain [General function prediction only] Probab=98.56 E-value=2.8e-08 Score=71.45 Aligned_cols=65 Identities=31% Similarity=0.454 Sum_probs=40.3 Q ss_pred CEEEECCCCCCCCCCCCCHHHHHHHHHCCCCCCCCCCEEEEECCCCCCCCEEEEECCCCCEEEECCCCCCCCC Q ss_conf 1132011111211346711678887524314787431012220356675201012047816651255655876 Q gi|254780700|r 307 GSLITAVVKESPADKAGMKVGDVICMLDGRIIKSHQDFVWQIASRSPKEQVKISLCKEGSKHSVAVVLGSSPT 379 (489) Q Consensus 307 GvlV~~V~~~sPA~~AGLk~GDvI~~ing~~I~~~~~l~~~i~~~~~G~~v~l~v~R~g~~~~~~V~l~~~p~ 379 (489) +.+|+.|.++|||++|||.+||.|++|||. ...+.+.++++.+++.+.|.+..+++.|+++..+. T Consensus 463 ~~~i~~V~~~gPA~~AGl~~Gd~ivai~G~--------s~~l~~~~~~d~i~v~~~~~~~L~e~~v~~~~~~~ 527 (558) T COG3975 463 HEKITFVFPGGPAYKAGLSPGDKIVAINGI--------SDQLDRYKVNDKIQVHVFREGRLREFLVKLGGDPT 527 (558) T ss_pred EEEEEECCCCCHHHHCCCCCCCEEEEECCC--------CCCCCCCCCCCCEEEEECCCCCEEEEECCCCCCCC T ss_conf 069984478981675158875679997673--------55522144266248998257823885213688766 No 42 >pfam00595 PDZ PDZ domain (Also known as DHR or GLGF). PDZ domains are found in diverse signaling proteins. Probab=98.55 E-value=9.8e-09 Score=74.31 Aligned_cols=56 Identities=20% Similarity=0.383 Sum_probs=27.1 Q ss_pred CCEEEECCCCCCCCCCCCCHHHHHHHHHCCCCCCCCCCEEEEECCCCCCCCEEEEE Q ss_conf 41132011111211346711678887524314787431012220356675201012 Q gi|254780700|r 306 KGSLITAVVKESPADKAGMKVGDVICMLDGRIIKSHQDFVWQIASRSPKEQVKISL 361 (489) Q Consensus 306 ~GvlV~~V~~~sPA~~AGLk~GDvI~~ing~~I~~~~~l~~~i~~~~~G~~v~l~v 361 (489) .|++|.+|.|+|||+++||++||+|++|||+++.+..+..........++.++|+| T Consensus 24 ~~~~V~~V~~~~~A~~~gL~~GD~Il~VNg~~v~~~~~~~~~~~l~~~~~~v~L~V 79 (80) T pfam00595 24 PGIFVSEVLPGGAAEAGGLQVGDRILSINGQDLENMSHDEAVLALKGSGGEVTLTI 79 (80) T ss_pred CCEEEEEECCCCCHHHCCCCCCCEEEEECCEECCCCCHHHHHHHHHCCCCEEEEEE T ss_conf 89899997789805548799999999999999899989999999974999299998 No 43 >cd00990 PDZ_glycyl_aminopeptidase PDZ domain associated with archaeal and bacterial M61 glycyl-aminopeptidases. May be responsible for substrate recognition and/or binding, as most PDZ domains bind C-terminal polypeptides, and binding to internal (non-C-terminal) polypeptides and even to lipids has been demonstrated. In this subfamily of protease-associated PDZ domains a C-terminal beta-strand is presumed to form the peptide-binding groove base, a circular permutation with respect to PDZ domains found in Eumetazoan signaling proteins. Probab=98.50 E-value=9.3e-07 Score=61.71 Aligned_cols=71 Identities=20% Similarity=0.243 Sum_probs=54.6 Q ss_pred CCCCEEEEECCHHHCCEEEEEEEEECCCCHHHHCCCCCCCEEEEECCEECCCHHHHHHHHHHHHHCCCCEEEEEEEECCC Q ss_conf 52546987289657152007999606889789829998889998899993899999999998862599569999971776 Q gi|254780700|r 392 ELLGMVLQDINDGNKKLVRIVALNPNREREVEAKGIQKGMTIVSVNTHEVSCIKDVERLIGKAKEKKRDSVLLQIKYDPD 471 (489) Q Consensus 392 ~~lGl~v~~l~~~~~~~~gi~vv~v~~~s~Aa~~GL~~GDiIl~VNg~~V~s~~dl~~iL~~~k~~~~~~VLL~V~r~~~ 471 (489) +++|++++. ..-.+.+..|.++|||+++||.+||.|++|||.++++++++ ++..+ .+..|.|.+.|++. T Consensus 1 P~lGi~~~~------~~g~~~V~~V~~~sPA~~AGl~~GD~IvaidG~~v~~~~~~---~~~~~--~G~~v~l~v~R~g~ 69 (80) T cd00990 1 PYLGLTLDK------EEGLGKVTFVRDDSPADKAGLVAGDELVAVNGWRVDALQDR---LKEYQ--AGDPVELTVFRDDR 69 (80) T ss_pred CCCCEEEEC------CCCCEEEEEECCCCHHHHCCCCCCCEEEEECCEEEHHHHHH---HHHCC--CCCEEEEEEEECCE T ss_conf 956669865------69959999988899699859998999999999992378999---97369--98989999999999 Q ss_pred CC Q ss_conf 43 Q gi|254780700|r 472 MQ 473 (489) Q Consensus 472 ~~ 473 (489) .. T Consensus 70 l~ 71 (80) T cd00990 70 LI 71 (80) T ss_pred EE T ss_conf 99 No 44 >TIGR02038 protease_degS periplasmic serine peptidase DegS; InterPro: IPR011783 Proteolytic enzymes that exploit serine in their catalytic activity are ubiquitous, being found in viruses, bacteria and eukaryotes . They include a wide range of peptidase activity, including exopeptidase, endopeptidase, oligopeptidase and omega-peptidase activity. Over 20 families (denoted S1 - S66) of serine protease have been identified, these being grouped into clans on the basis of structural similarity and other functional evidence . Structures are known for members of the clans and the structures indicate that some appear to be totally unrelated, suggesting different evolutionary origins for the serine peptidases . Not withstanding their different evolutionary origins, there are similarities in the reaction mechanisms of several peptidases. Chymotrypsin, subtilisin and carboxypeptidase C have a catalytic triad of serine, aspartate and histidine in common: serine acts as a nucleophile, aspartate as an electrophile, and histidine as a base . The geometric orientations of the catalytic residues are similar between families, despite different protein folds . The linear arrangements of the catalytic residues commonly reflect clan relationships. For example the catalytic triad in the chymotrypsin clan (PA) is ordered HDS, but is ordered DHS in the subtilisin clan (SB) and SDH in the carboxypeptidase clan (SC) , . Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Each clan is identified with two letters, the first representing the catalytic type of the families included in the clan (with the letter 'P' being used for a clan containing families of more than one of the catalytic types serine, threonine and cysteine). Some families cannot yet be assigned to clans, and when a formal assignment is required, such a family is described as belonging to clan A-, C-, M-, S-, T- or U-, according to the catalytic type. Some clans are divided into subclans because there is evidence of a very ancient divergence within the clan, for example MA(E), the gluzincins, and MA(M), the metzincins. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. The serine, threonine and cysteine peptidases utilise the amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule. This family consists of the periplasmic serine protease DegS (HhoB). They belong to MEROPS peptidase family S1, subfamily S1C (protease Do, clan PA(S)). They are a shorter paralogs of protease Do (HtrA, DegP) and DegQ (HhoA). They are found in Escherichia coli and several of the gammaproteobacteria. DegS contains a trypsin domain and a single copy of PDZ domain (in contrast to DegP with two copies). A critical role of this DegS is to sense stress by detecting misfolded proteins in the periplasm. DegS then cleaves the periplasmic domain of RseA, a transmembrane protein and inhibitor of sigmaE, activating the sigmaE-driven expression of periplasmic proteases/chaperones , , .; GO: 0004252 serine-type endopeptidase activity, 0006508 proteolysis. Probab=98.49 E-value=4.3e-07 Score=63.85 Aligned_cols=78 Identities=19% Similarity=0.208 Sum_probs=62.2 Q ss_pred CCEEEEECCHHHCC------EEEEEEEEECCCCHHHHCCCCCCCEEEEECCEECCCHHHHHHHHHHHHHCCCCEEEEEEE Q ss_conf 54698728965715------200799960688978982999888999889999389999999999886259956999997 Q gi|254780700|r 394 LGMVLQDINDGNKK------LVRIVALNPNREREVEAKGIQKGMTIVSVNTHEVSCIKDVERLIGKAKEKKRDSVLLQIK 467 (489) Q Consensus 394 lGl~v~~l~~~~~~------~~gi~vv~v~~~s~Aa~~GL~~GDiIl~VNg~~V~s~~dl~~iL~~~k~~~~~~VLL~V~ 467 (489) +|..-+++++-..+ .-||+++++.|++|||++||+++|+|+++|++++.+.+++-+.+.+.++++ .|++.|- T Consensus 264 ~Gv~G~~I~s~~~~~lg~~~l~Givv~~vdPnGPAA~Ag~l~~Dvilk~dg~~~~g~~~~md~vA~~~PG~--~v~~tvl 341 (358) T TIGR02038 264 IGVDGEDINSLVAQGLGLEDLRGIVVTGVDPNGPAARAGILVRDVILKVDGKEVIGAEELMDRVAETRPGS--KVLVTVL 341 (358) T ss_pred EECCCCCCCHHHHHHCCCCCCCEEEEECCCCCCHHHHHCCCCCCEEEEECCCCCCCHHHHHHHHHCCCCCC--EEEEEEE T ss_conf 50287036626664078752240788534898767650677155789867953675655455543179997--7899997 Q ss_pred ECCCCC Q ss_conf 177643 Q gi|254780700|r 468 YDPDMQ 473 (489) Q Consensus 468 r~~~~~ 473 (489) |.+..+ T Consensus 342 R~Gk~l 347 (358) T TIGR02038 342 RKGKQL 347 (358) T ss_pred CCCCEE T ss_conf 069678 No 45 >cd00190 Tryp_SPc Trypsin-like serine protease; Many of these are synthesized as inactive precursor zymogens that are cleaved during limited proteolysis to generate their active forms. Alignment contains also inactive enzymes that have substitutions of the catalytic triad residues. Probab=98.40 E-value=6.1e-06 Score=56.51 Aligned_cols=136 Identities=23% Similarity=0.335 Sum_probs=73.6 Q ss_pred CCEEEEEECCCCEEEECHHCCCC--CCEEEEECCC---------CEEEEEC-CCC------CCCCCCEEEEEEECCC--- Q ss_conf 34027897599629851010478--7143796289---------8067401-112------3344432899960676--- Q gi|254780700|r 109 MFGSGFFITDDGYILTSNHIVED--GASFSVILSD---------DTELPAK-LVG------TDALFDLAVLKVQSDR--- 167 (489) Q Consensus 109 ~~GsG~ii~~~G~ilTn~hvv~~--a~~i~V~~~d---------g~~~~a~-vvg------~D~~~DlAvlki~~~~--- 167 (489) ...+|.+|+++ ||||++|.+.+ ...+.|.+-. +..+..+ ++- .....||||||++.+- T Consensus 25 ~~CgGtLIs~~-~VLTAAHCv~~~~~~~~~V~~G~~~~~~~~~~~~~~~v~~i~~Hp~y~~~~~~~DIALl~L~~~v~~~ 103 (232) T cd00190 25 HFCGGSLISPR-WVLTAAHCVYSSAPSNYTVRLGSHDLSSNEGGGQVIKVKKVIVHPNYNPSTYDNDIALLKLKRPVTLS 103 (232) T ss_pred EEEEEEEECCC-EEEECHHHCCCCCCCCEEEEEEEEECCCCCCCCEEEEEEEEEECCCCCCCCCCCCEEEEECCCCEEEC T ss_conf 99999995399-99989666789998657999956345888999689999999989988888877758877057732822 Q ss_pred -CCCCCCCCC-CCCCCCCCEEEEECCCCCCCC----------CCCCCCCCCCC--CCCC---CCCCCEEEE-----EEEE Q ss_conf -676556556-731112414675236655311----------11258744311--2233---443420233-----2332 Q gi|254780700|r 168 -KFIPVEFED-ANNIRVGEAVFTIGNPFRLRG----------TVSAGIVSALD--RDIP---DRPGTFTQI-----DAPI 225 (489) Q Consensus 168 -~~~~~~lg~-s~~~~~G~~v~aiG~P~g~~~----------tvt~GiiSa~~--R~~~---~~~~~~iqt-----Da~I 225 (489) ...|+.|.. ...+..|+.+.+.|. |... .+..-+++... +... .....+|-+ +... T Consensus 104 ~~v~picLp~~~~~~~~~~~~~~~Gw--G~~~~~~~~~~~L~~~~~~v~~~~~C~~~~~~~~~~~~~~iCa~~~~~~~~~ 181 (232) T cd00190 104 DNVRPICLPSSGYNLPAGTTCTVSGW--GRTSEGGPLPDVLQEVNVPIVSNAECKRAYSYGGTITDNMLCAGGLEGGKDA 181 (232) T ss_pred CCCCEEECCCCCCCCCCCCEEEEECC--CCCCCCCCCCCEEEEEEEEECCHHHHHHHHCCCCCCCCCEEEECCCCCCCCC T ss_conf 56741677986666679978998532--5436898778855899998768999867635688658873772747999740 Q ss_pred ECCCCCCEEEEC---CCEEEEEECC Q ss_conf 013477035403---4303555123 Q gi|254780700|r 226 NQGNSGGPCFNA---LGHVIGVNAM 247 (489) Q Consensus 226 npGnSGGpl~n~---~G~viGint~ 247 (489) -.|.|||||+-. .+.++||.|. T Consensus 182 C~GDsGgPL~~~~~~~~~l~Gi~S~ 206 (232) T cd00190 182 CQGDSGGPLVCNDNGRGVLVGIVSW 206 (232) T ss_pred CCCCCCCEEEEEECCCEEEEEEEEE T ss_conf 4340389479978991999999998 No 46 >smart00228 PDZ Domain present in PSD-95, Dlg, and ZO-1/2. Also called DHR (Dlg homologous region) or GLGF (relatively well conserved tetrapeptide in these domains). Some PDZs have been shown to bind C-terminal polypeptides; others appear to bind internal (non-C-terminal) polypeptides. Different PDZs possess different binding specificities. Probab=98.39 E-value=2.6e-06 Score=58.83 Aligned_cols=74 Identities=19% Similarity=0.144 Sum_probs=55.1 Q ss_pred CCCCEEEEECCHHHCCEEEEEEEEECCCCHHHHCCCCCCCEEEEECCEECCCHHHHHHHHHHHHHCCCCEEEEEEEECC Q ss_conf 5254698728965715200799960688978982999888999889999389999999999886259956999997177 Q gi|254780700|r 392 ELLGMVLQDINDGNKKLVRIVALNPNREREVEAKGIQKGMTIVSVNTHEVSCIKDVERLIGKAKEKKRDSVLLQIKYDP 470 (489) Q Consensus 392 ~~lGl~v~~l~~~~~~~~gi~vv~v~~~s~Aa~~GL~~GDiIl~VNg~~V~s~~dl~~iL~~~k~~~~~~VLL~V~r~~ 470 (489) ..+|+++...... ..++++..+.++|+|+++||++||+|++|||.++.+..+ .+++..++.. ...+.|.+.|+. T Consensus 12 ~~~G~~~~~~~~~---~~gv~I~~v~~~s~A~~~Gl~~GD~I~~vng~~v~~~~~-~~~~~~~~~~-~~~v~l~v~r~~ 85 (85) T smart00228 12 GGLGFSLVGGKDE---GGGVVVSSVVPGSPAAKAGLKVGDVILEVNGTSVEGLTH-LEAVDLLKKA-GGKVTLTVLRGG 85 (85) T ss_pred CCCCEEEEECCCC---CCCEEEEEECCCCHHHHCCCCCCCEEEEECCEECCCCCH-HHHHHHHHCC-CCEEEEEEEECC T ss_conf 9668899841578---998999998799947876898999999999999899989-9999998779-997999999496 No 47 >KOG3580 consensus Probab=98.39 E-value=2.6e-07 Score=65.24 Aligned_cols=60 Identities=22% Similarity=0.298 Sum_probs=47.2 Q ss_pred CCCEEEECCCCCCCCCCCCCHHHHHHHHHCCCCCCCCC--CEEEEECCCCCCCCEEEEECCC Q ss_conf 44113201111121134671167888752431478743--1012220356675201012047 Q gi|254780700|r 305 TKGSLITAVVKESPADKAGMKVGDVICMLDGRIIKSHQ--DFVWQIASRSPKEQVKISLCKE 364 (489) Q Consensus 305 ~~GvlV~~V~~~sPA~~AGLk~GDvI~~ing~~I~~~~--~l~~~i~~~~~G~~v~l~v~R~ 364 (489) .-|++|..|.++|||++-||+.||.|++||..+..+.. +-...|....+|+.|++--++. T Consensus 428 DVGIFVaGvqegspA~~eGlqEGDQIL~VN~vdF~nl~REeAVlfLL~lPkGEevtilaQ~k 489 (1027) T KOG3580 428 DVGIFVAGVQEGSPAEQEGLQEGDQILKVNTVDFRNLVREEAVLFLLELPKGEEVTILAQSK 489 (1027) T ss_pred CEEEEEEECCCCCCHHHCCCCCCCEEEEECCCCCHHHHHHHHHHHHHCCCCCCEEEEHHHHH T ss_conf 23588741126883011130003626775363301042788899986289976776134356 No 48 >cd00136 PDZ PDZ domain, also called DHR (Dlg homologous region) or GLGF (after a conserved sequence motif). Many PDZ domains bind C-terminal polypeptides, though binding to internal (non-C-terminal) polypeptides and even to lipids has been demonstrated. Heterodimerization through PDZ-PDZ domain interactions adds to the domain's versatility, and PDZ domain-mediated interactions may be modulated dynamically through target phosphorylation. Some PDZ domains play a role in scaffolding supramolecular complexes. PDZ domains are found in diverse signaling proteins in bacteria, archebacteria, and eurkayotes. This CD contains two distinct structural subgroups with either a N- or C-terminal beta-strand forming the peptide-binding groove base. The circular permutation placing the strand on the N-terminus appears to be found in Eumetazoa only, while the C-terminal variant is found in all three kingdoms of life, and seems to co-occur with protease domains. PDZ domains have been named after PSD95(pos Probab=98.34 E-value=3.2e-06 Score=58.28 Aligned_cols=68 Identities=24% Similarity=0.373 Sum_probs=50.6 Q ss_pred CCEEEEECCHHHCCEEEEEEEEECCCCHHHHCCCCCCCEEEEECCEECCCHHHHHHHHHHHHHCCCCEEEEEEE Q ss_conf 54698728965715200799960688978982999888999889999389999999999886259956999997 Q gi|254780700|r 394 LGMVLQDINDGNKKLVRIVALNPNREREVEAKGIQKGMTIVSVNTHEVSCIKDVERLIGKAKEKKRDSVLLQIK 467 (489) Q Consensus 394 lGl~v~~l~~~~~~~~gi~vv~v~~~s~Aa~~GL~~GDiIl~VNg~~V~s~~dl~~iL~~~k~~~~~~VLL~V~ 467 (489) +|+.+....+ .++++..+.++|||+++||++||.|++|||+++.++. +.++.+.++......+.|.|+ T Consensus 3 lG~~l~~~~~-----~~i~V~~v~~~spA~~aGL~~GD~I~~ing~~v~~~~-~~~~~~~l~~~~g~~v~l~v~ 70 (70) T cd00136 3 LGFSIRGGTE-----GGVVVLSVEPGSPAERAGLQAGDVILAVNGTDVKNLT-LEDVAELLKKEVGEKVTLTVR 70 (70) T ss_pred EEEEEEECCC-----CCEEEEECCCCCHHHHCCCCCCCEEEEECCEECCCCC-HHHHHHHHCCCCCCEEEEEEC T ss_conf 3189996698-----9899998099897998799989999999999968998-999999962898797999989 No 49 >smart00020 Tryp_SPc Trypsin-like serine protease. Many of these are synthesised as inactive precursor zymogens that are cleaved during limited proteolysis to generate their active forms. A few, however, are active as single chain molecules, and others are inactive due to substitutions of the catalytic triad residues. Probab=98.31 E-value=8.2e-06 Score=55.67 Aligned_cols=136 Identities=22% Similarity=0.280 Sum_probs=73.7 Q ss_pred CCEEEEEECCCCEEEECHHCCCCC--CEEEEECC------C--CEEEEEC-CCC------CCCCCCEEEEEEECCCC--- Q ss_conf 340278975996298510104787--14379628------9--8067401-112------33444328999606766--- Q gi|254780700|r 109 MFGSGFFITDDGYILTSNHIVEDG--ASFSVILS------D--DTELPAK-LVG------TDALFDLAVLKVQSDRK--- 168 (489) Q Consensus 109 ~~GsG~ii~~~G~ilTn~hvv~~a--~~i~V~~~------d--g~~~~a~-vvg------~D~~~DlAvlki~~~~~--- 168 (489) -..+|.+|+++ ||||++|.+.+. ..+.|.+- + ...+..+ ++- .....||||||++.+-. T Consensus 26 ~~CgGsLIs~~-~VLTAAhCv~~~~~~~~~V~~G~~~~~~~~~~~~~~v~~i~~Hp~y~~~~~~nDIALl~L~~~v~~~~ 104 (229) T smart00020 26 HFCGGSLISPR-WVLTAAHCVYGSDPSNIRVRLGSHDLSSGEEGQVIKVSKVIIHPNYNPSTYDNDIALLKLKSPVTLSD 104 (229) T ss_pred EEEEEEEEECC-EEEECCEECCCCCCCCEEEEEEEEECCCCCCCEEEEEEEEEECCCCCCCCCCCCEEEEECCCCCCCCC T ss_conf 89899997399-99969461368888757999744374579985899999999898998988757589994587618025 Q ss_pred -CCCCCCCC-CCCCCCCCEEEEECCCCCCCC-----------CCCCCCCCCCC--CCCCC---CCCCEEEE-----EEEE Q ss_conf -76556556-731112414675236655311-----------11258744311--22334---43420233-----2332 Q gi|254780700|r 169 -FIPVEFED-ANNIRVGEAVFTIGNPFRLRG-----------TVSAGIVSALD--RDIPD---RPGTFTQI-----DAPI 225 (489) Q Consensus 169 -~~~~~lg~-s~~~~~G~~v~aiG~P~g~~~-----------tvt~GiiSa~~--R~~~~---~~~~~iqt-----Da~I 225 (489) ..|+.|.. ...+..|+.+.+.|. |... .+..-++|... +.... ....+|-+ +... T Consensus 105 ~v~picLp~~~~~~~~~~~~~v~Gw--G~~~~~~~~~~~~L~~~~~~v~~~~~C~~~~~~~~~~~~~~iCa~~~~~~~~~ 182 (229) T smart00020 105 NVRPICLPSSNYNVPAGTTCTVSGW--GRTSEGAGSLPDTLQEVNVPIVSNATCRRAYSGGGAITDNMLCAGGLEGGKDA 182 (229) T ss_pred CEEEEECCCCCCCCCCCCEEEEECC--CCCCCCCCCCCCEEEEEEEEECCHHHHHHHCCCCCCCCCCCCCCCCCCCCCCC T ss_conf 2677655886665569987998411--53028888888643799998848899877626888657664822568999654 Q ss_pred ECCCCCCEEEECC--CEEEEEECC Q ss_conf 0134770354034--303555123 Q gi|254780700|r 226 NQGNSGGPCFNAL--GHVIGVNAM 247 (489) Q Consensus 226 npGnSGGpl~n~~--G~viGint~ 247 (489) -.|.|||||+-.+ +.++||.|. T Consensus 183 C~GDsGgPL~~~~~~~~l~Gi~S~ 206 (229) T smart00020 183 CQGDSGGPLVCNDGRWVLVGIVSW 206 (229) T ss_pred CCCCCCCCEEEECCCEEEEEEEEE T ss_conf 356668705998992999999988 No 50 >TIGR03279 cyano_FeS_chp putative FeS-containing Cyanobacterial-specific oxidoreductase. Members of this protein family are predicted FeS-containing oxidoreductases of unknown function, apparently restricted to and universal across the Cyanobacteria. The high trusted cutoff score for this model, 700 bits, excludes homologs from other lineages. This exclusion seems justified because a significant number of sequence positions are simultaneously unique to and invariant across the Cyanobacteria, suggesting a specialized, conserved function, perhaps related to photosynthesis. A distantly related protein family, TIGR03278, in universal in and restricted to archaeal methanogens, and may be linked to methanogenesis. Probab=98.31 E-value=1.4e-07 Score=66.92 Aligned_cols=38 Identities=26% Similarity=0.463 Sum_probs=18.1 Q ss_pred EECCCCCCCCCCCCCHHHHHHHHHCCCCCCCCCCEEEE Q ss_conf 20111112113467116788875243147874310122 Q gi|254780700|r 310 ITAVVKESPADKAGMKVGDVICMLDGRIIKSHQDFVWQ 347 (489) Q Consensus 310 V~~V~~~sPA~~AGLk~GDvI~~ing~~I~~~~~l~~~ 347 (489) |.+|.|+|+|+++||++||.|++|||+++++.-|++.. T Consensus 2 I~~V~pgSiA~e~Gie~GD~llsING~~i~DiiDy~f~ 39 (433) T TIGR03279 2 ISAVLPGSIAEELGFEPGDALVSINGVAPRDLIDYQFL 39 (433) T ss_pred EEEECCCCHHHHHCCCCCCEEEEECCCCCCCCEEEEEC T ss_conf 41577999789838999988998899455551434112 No 51 >COG0793 Prc Periplasmic protease [Cell envelope biogenesis, outer membrane] Probab=98.29 E-value=7e-06 Score=56.12 Aligned_cols=19 Identities=32% Similarity=0.352 Sum_probs=9.4 Q ss_pred CCCHHHHCCCCCCCEEEEEC Q ss_conf 88978982999888999889 Q gi|254780700|r 418 REREVEAKGIQKGMTIVSVN 437 (489) Q Consensus 418 ~~s~Aa~~GL~~GDiIl~VN 437 (489) .+..-...|+.+ |+.+..- T Consensus 351 ~G~~i~~~GI~P-DI~v~~~ 369 (406) T COG0793 351 SGRSIEGKGITP-DIEVPQA 369 (406) T ss_pred CCCCCCCCCCCC-CEEECCC T ss_conf 995323557589-8761467 No 52 >COG0265 DegQ Trypsin-like serine proteases, typically periplasmic, contain C-terminal PDZ domain [Posttranslational modification, protein turnover, chaperones] Probab=98.24 E-value=2e-05 Score=53.23 Aligned_cols=85 Identities=19% Similarity=0.266 Sum_probs=62.8 Q ss_pred CCEEEEECCHHHC----CEEEEEEEEECCCCHHHHCCCCCCCEEEEECCEECCCHHHHHHHHHHHHHCCCCEEEEEEEEC Q ss_conf 5469872896571----520079996068897898299988899988999938999999999988625995699999717 Q gi|254780700|r 394 LGMVLQDINDGNK----KLVRIVALNPNREREVEAKGIQKGMTIVSVNTHEVSCIKDVERLIGKAKEKKRDSVLLQIKYD 469 (489) Q Consensus 394 lGl~v~~l~~~~~----~~~gi~vv~v~~~s~Aa~~GL~~GDiIl~VNg~~V~s~~dl~~iL~~~k~~~~~~VLL~V~r~ 469 (489) +|..+.+++.... ...|+++..+.++++|+++|++.||+|+++||+++.+..++...+.... ....+.+.+.|+ T Consensus 251 lgv~~~~~~~~~~~g~~~~~G~~V~~v~~~spa~~agi~~Gdii~~~ng~~v~~~~~l~~~v~~~~--~g~~v~~~~~r~ 328 (347) T COG0265 251 LGVIGEPLTADIALGLPVAAGAVVLGVLPGSPAAKAGIKAGDIITAVNGKPVASLSDLVAAVASNR--PGDEVALKLLRG 328 (347) T ss_pred CCEEEEECCCHHCCCCCCCCCCEEEEECCCCHHHHCCCCCCCEEEEECCEECCCHHHHHHHHHCCC--CCCEEEEEEEEC T ss_conf 542768745110268767887688651799857873787787799789988557888888873269--997688999978 Q ss_pred CCCCCCCCCCEEEEEEECC Q ss_conf 7643346884368887528 Q gi|254780700|r 470 PDMQSGNDNMSRFVSLKID 488 (489) Q Consensus 470 ~~~~~~~~~~~rFVal~ld 488 (489) +..+.+.+++. T Consensus 329 --------g~~~~~~v~l~ 339 (347) T COG0265 329 --------GKERELAVTLG 339 (347) T ss_pred --------CEEEEEEEECC T ss_conf --------83577768615 No 53 >COG3591 V8-like Glu-specific endopeptidase [Amino acid transport and metabolism] Probab=98.23 E-value=5.6e-06 Score=56.73 Aligned_cols=139 Identities=19% Similarity=0.132 Sum_probs=79.2 Q ss_pred EEEEEECCCCEEEECHHCCCCC----CEEEEEC----CCCE-EEE--ECCC----CCCCCCCEEEEEEECC--------- Q ss_conf 0278975996298510104787----1437962----8980-674--0111----2334443289996067--------- Q gi|254780700|r 111 GSGFFITDDGYILTSNHIVEDG----ASFSVIL----SDDT-ELP--AKLV----GTDALFDLAVLKVQSD--------- 166 (489) Q Consensus 111 GsG~ii~~~G~ilTn~hvv~~a----~~i~V~~----~dg~-~~~--a~vv----g~D~~~DlAvlki~~~--------- 166 (489) -|+|+|.++ .+|||.||+... .++.+-. .++. .+. .... |.=-+.|.+..++... T Consensus 66 ~~~~lI~pn-tvLTa~Hc~~s~~~G~~~~~~~p~g~~~~~~~~~~~~~~~~~~~~g~~~~~d~~~~~v~~~~~~~g~~~~ 144 (251) T COG3591 66 TAATLIGPN-TVLTAGHCIYSPDYGEDDIAAAPPGVNSDGGPFYGITKIEIRVYPGELYKEDGASYDVGEAALESGINIG 144 (251) T ss_pred EEEEEECCC-EEEEEEEEEECCCCCHHHHHHCCCCCCCCCCCCCCEEEEEEEECCCCEECCCCCEEECCHHHHCCCCCCC T ss_conf 467998576-5788436984278784551005776167788877453678886388204068732541678755688866 Q ss_pred CCCCCCCCCCCCCCCCCCEEEEECCCCCCCCCCCCCCCCCCCCCCCCCCCCEEEEEEEEECCCCCCEEEECCCEEEEEEC Q ss_conf 66765565567311124146752366553111125874431122334434202332332013477035403430355512 Q gi|254780700|r 167 RKFIPVEFEDANNIRVGEAVFTIGNPFRLRGTVSAGIVSALDRDIPDRPGTFTQIDAPINQGNSGGPCFNALGHVIGVNA 246 (489) Q Consensus 167 ~~~~~~~lg~s~~~~~G~~v~aiG~P~g~~~tvt~GiiSa~~R~~~~~~~~~iqtDa~InpGnSGGpl~n~~G~viGint 246 (489) .......+.-....++++.+-.+|+|.+--.+-++ ..| ..+ +......+++-|+-+-||+||.|+++.+.++||+-+ T Consensus 145 ~~~~~~~~~~~~~~~~~d~i~v~GYP~dk~~~~~~-~e~-t~~-v~~~~~~~l~y~~dT~pG~SGSpv~~~~~~vigv~~ 221 (251) T COG3591 145 DVVNYLKRNTASEAKANDRITVIGYPGDKPNIGTM-WES-TGK-VNSIKGNKLFYDADTLPGSSGSPVLISKDEVIGVHY 221 (251) T ss_pred CCCCCCCCCCCCCCCCCCEEEEEECCCCCCCCEEE-EEE-CCE-EEEEECCEEEEEECCCCCCCCCCEEECCCEEEEEEE T ss_conf 32044434431122037746787446898763267-340-542-678823268888020578899825702665999997 Q ss_pred CCCCCCC Q ss_conf 3445532 Q gi|254780700|r 247 MIVTSGQ 253 (489) Q Consensus 247 ~i~~~~g 253 (489) ......+ T Consensus 222 ~g~~~~~ 228 (251) T COG3591 222 NGPGANG 228 (251) T ss_pred CCCCCCC T ss_conf 3777666 No 54 >COG3480 SdrC Predicted secreted protein containing a PDZ domain [Signal transduction mechanisms] Probab=98.22 E-value=2.5e-07 Score=65.36 Aligned_cols=71 Identities=23% Similarity=0.425 Sum_probs=65.1 Q ss_pred CCEEEECCCCCCCCCCCCCHHHHHHHHHCCCCCCCCCCEEEEECCCCCCCCEEEEECC-CCCEEEECCCCCCC Q ss_conf 4113201111121134671167888752431478743101222035667520101204-78166512556558 Q gi|254780700|r 306 KGSLITAVVKESPADKAGMKVGDVICMLDGRIIKSHQDFVWQIASRSPKEQVKISLCK-EGSKHSVAVVLGSS 377 (489) Q Consensus 306 ~GvlV~~V~~~sPA~~AGLk~GDvI~~ing~~I~~~~~l~~~i~~~~~G~~v~l~v~R-~g~~~~~~V~l~~~ 377 (489) .|+++..|..+|||.. -|+.||-|+++||+++.+..|+...+...++|++|++++.| ++++.....++... T Consensus 130 ~gvyv~~v~~~~~~~g-kl~~gD~i~avdg~~f~s~~e~i~~v~~~k~Gd~VtI~~~r~~~~~~~~~~tl~~~ 201 (342) T COG3480 130 AGVYVLSVIDNSPFKG-KLEAGDTIIAVDGEPFTSSDELIDYVSSKKPGDEVTIDYERHNETPEIVTITLIKN 201 (342) T ss_pred EEEEEEECCCCCCHHC-EECCCCEEEEECCEECCCHHHHHHHHHCCCCCCEEEEEEEECCCCCCEEEEEEEEE T ss_conf 3279997147863102-23268768855894457889999998546889769999995169872689999960 No 55 >TIGR02860 spore_IV_B stage IV sporulation protein B; InterPro: IPR014219 Proteolytic enzymes that exploit serine in their catalytic activity are ubiquitous, being found in viruses, bacteria and eukaryotes . They include a wide range of peptidase activity, including exopeptidase, endopeptidase, oligopeptidase and omega-peptidase activity. Over 20 families (denoted S1 - S66) of serine protease have been identified, these being grouped into clans on the basis of structural similarity and other functional evidence . Structures are known for members of the clans and the structures indicate that some appear to be totally unrelated, suggesting different evolutionary origins for the serine peptidases . Not withstanding their different evolutionary origins, there are similarities in the reaction mechanisms of several peptidases. Chymotrypsin, subtilisin and carboxypeptidase C have a catalytic triad of serine, aspartate and histidine in common: serine acts as a nucleophile, aspartate as an electrophile, and histidine as a base . The geometric orientations of the catalytic residues are similar between families, despite different protein folds . The linear arrangements of the catalytic residues commonly reflect clan relationships. For example the catalytic triad in the chymotrypsin clan (PA) is ordered HDS, but is ordered DHS in the subtilisin clan (SB) and SDH in the carboxypeptidase clan (SC) , . Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Each clan is identified with two letters, the first representing the catalytic type of the families included in the clan (with the letter 'P' being used for a clan containing families of more than one of the catalytic types serine, threonine and cysteine). Some families cannot yet be assigned to clans, and when a formal assignment is required, such a family is described as belonging to clan A-, C-, M-, S-, T- or U-, according to the catalytic type. Some clans are divided into subclans because there is evidence of a very ancient divergence within the clan, for example MA(E), the gluzincins, and MA(M), the metzincins. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. The serine, threonine and cysteine peptidases utilise the amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule. SpoIVB, the stage IV sporulation protein B of endospore-forming bacteria such as Bacillus subtilis, is a serine proteinase expressed in the spore (rather than mother cell) compartment, that participates in a proteolytic activation cascade for Sigma-K. It appears to be universal among endospore-forming bacteria and occurs nowhere else. The members of this entry belong to MEROPS peptidase family S55 (SpoIVB peptidase, clan PA).. Probab=98.19 E-value=2.6e-06 Score=58.81 Aligned_cols=18 Identities=17% Similarity=0.322 Sum_probs=8.5 Q ss_pred EEEECCCCEEEECHHCCC Q ss_conf 789759962985101047 Q gi|254780700|r 113 GFFITDDGYILTSNHIVE 130 (489) Q Consensus 113 G~ii~~~G~ilTn~hvv~ 130 (489) ||=...+|-.+-=+|-|+ T Consensus 117 GVkL~T~GVLVVG~s~i~ 134 (423) T TIGR02860 117 GVKLNTKGVLVVGFSDIE 134 (423) T ss_pred EEEEECCCEEEEEEEEEE T ss_conf 169833857999886540 No 56 >cd00992 PDZ_signaling PDZ domain found in a variety of Eumetazoan signaling molecules, often in tandem arrangements. May be responsible for specific protein-protein interactions, as most PDZ domains bind C-terminal polypeptides, and binding to internal (non-C-terminal) polypeptides and even to lipids has been demonstrated. In this subfamily of PDZ domains an N-terminal beta-strand forms the peptide-binding groove base, a circular permutation with respect to PDZ domains found in proteases. Probab=98.09 E-value=4.4e-07 Score=63.80 Aligned_cols=68 Identities=24% Similarity=0.254 Sum_probs=41.5 Q ss_pred CCCCEEEEECCHHHCCEEEEEEEEECCCCHHHHCCCCCCCEEEEECCEECC--CHHHHHHHHHHHHHCCCCEEEEEE Q ss_conf 525469872896571520079996068897898299988899988999938--999999999988625995699999 Q gi|254780700|r 392 ELLGMVLQDINDGNKKLVRIVALNPNREREVEAKGIQKGMTIVSVNTHEVS--CIKDVERLIGKAKEKKRDSVLLQI 466 (489) Q Consensus 392 ~~lGl~v~~l~~~~~~~~gi~vv~v~~~s~Aa~~GL~~GDiIl~VNg~~V~--s~~dl~~iL~~~k~~~~~~VLL~V 466 (489) ..+|+++..-... ..++++..+.++|+|++.+|++||.|++|||+++. +.+++.+++++. . ..+.|.| T Consensus 12 ~~lG~~l~~~~~~---~~~~~I~~v~~~s~A~~~~L~~GD~Il~INg~~v~~~~~~~v~~~l~~~---~-~~v~L~V 81 (82) T cd00992 12 GGLGFSLRGGKDS---GGGIFVSRVEPGGPAERGGLRVGDRILEVNGVSVEGLTHEEAVELLKNS---G-DEVTLTV 81 (82) T ss_pred CCCCEEEECCCCC---CCCEEEEEECCCCCHHHCCCCCCCEEEEECCEECCCCCHHHHHHHHHCC---C-CEEEEEE T ss_conf 9617899622579---9999999986899034348999999898999999999899999999849---9-9599998 No 57 >cd00992 PDZ_signaling PDZ domain found in a variety of Eumetazoan signaling molecules, often in tandem arrangements. May be responsible for specific protein-protein interactions, as most PDZ domains bind C-terminal polypeptides, and binding to internal (non-C-terminal) polypeptides and even to lipids has been demonstrated. In this subfamily of PDZ domains an N-terminal beta-strand forms the peptide-binding groove base, a circular permutation with respect to PDZ domains found in proteases. Probab=98.09 E-value=1.9e-05 Score=53.30 Aligned_cols=55 Identities=22% Similarity=0.462 Sum_probs=43.4 Q ss_pred CCCEEEECCCCCCCCCCCCCHHHHHHHHHCCCCCCCC--CCEEEEECCCCCCCCEEEEE Q ss_conf 4411320111112113467116788875243147874--31012220356675201012 Q gi|254780700|r 305 TKGSLITAVVKESPADKAGMKVGDVICMLDGRIIKSH--QDFVWQIASRSPKEQVKISL 361 (489) Q Consensus 305 ~~GvlV~~V~~~sPA~~AGLk~GDvI~~ing~~I~~~--~~l~~~i~~~~~G~~v~l~v 361 (489) ..|++|.+|.|+|||+++||++||+|++|||+++.+. .+....+.. .++.++|.+ T Consensus 25 ~~~~~I~~v~~~s~A~~~~L~~GD~Il~INg~~v~~~~~~~v~~~l~~--~~~~v~L~V 81 (82) T cd00992 25 GGGIFVSRVEPGGPAERGGLRVGDRILEVNGVSVEGLTHEEAVELLKN--SGDEVTLTV 81 (82) T ss_pred CCCEEEEEECCCCCHHHCCCCCCCEEEEECCEECCCCCHHHHHHHHHC--CCCEEEEEE T ss_conf 999999998689903434899999989899999999989999999984--999599998 No 58 >pfam00863 Peptidase_C4 Peptidase family C4. This peptidase is present in the nuclear inclusion protein of potyviruses. Probab=98.06 E-value=7.1e-05 Score=49.68 Aligned_cols=126 Identities=18% Similarity=0.288 Sum_probs=78.1 Q ss_pred EEEECHHCCC-CCCEEEEECCCCEEE-E---ECCCCCCCCCCEEEEEEECCCCCCCCCC-CCCCCCCCCCEEEEECCCCC Q ss_conf 2985101047-871437962898067-4---0111233444328999606766765565-56731112414675236655 Q gi|254780700|r 121 YILTSNHIVE-DGASFSVILSDDTEL-P---AKLVGTDALFDLAVLKVQSDRKFIPVEF-EDANNIRVGEAVFTIGNPFR 194 (489) Q Consensus 121 ~ilTn~hvv~-~a~~i~V~~~dg~~~-~---a~vvg~D~~~DlAvlki~~~~~~~~~~l-g~s~~~~~G~~v~aiG~P~g 194 (489) ||+||+|.-. +-..++|...-|.-. + .--+..=+..|+.+||. |+++||.+- -.-..++.||.|.-+|.-|- T Consensus 40 ~IItn~HLfkrnnG~L~i~s~hG~f~v~Nt~~l~v~~i~g~DliiIrm--PkDfpPf~~~l~FR~P~~~ervclVg~nFq 117 (233) T pfam00863 40 YIITNAHLFKRNNGTLTIRSQHGEFTVKNTTQLKVHPIEGRDIVIIRL--PKDFPPFPQKLKFRAPTEGERVCLVGTNFQ 117 (233) T ss_pred EEEECHHHEECCCCEEEEEECCCEEECCCCCEEEEEEECCCCEEEEEC--CCCCCCCCCCCCCCCCCCCCEEEEEEEEEC T ss_conf 998754514248972999961326984887468788727942899969--998898630122589998886999965652 Q ss_pred CCCCCCCCCCCCCCCCCCCCCCCEEEEEEEEECCCCCCEEEECC-CEEEEEECCCCC Q ss_conf 31111258744311223344342023323320134770354034-303555123445 Q gi|254780700|r 195 LRGTVSAGIVSALDRDIPDRPGTFTQIDAPINQGNSGGPCFNAL-GHVIGVNAMIVT 250 (489) Q Consensus 195 ~~~tvt~GiiSa~~R~~~~~~~~~iqtDa~InpGnSGGpl~n~~-G~viGint~i~~ 250 (489) -....+ .||......+.....|...=..-..|.-|.|+++.+ |.+|||.++-.. T Consensus 118 ~k~~~s--~vSesS~i~p~~~~~fWkHwIsTk~G~CGlPlVs~~Dg~IVGiHsl~~~ 172 (233) T pfam00863 118 DKSISS--TVSESSAIFPEGNSGFWKHWISTKDGMCGLPLVSTKDGKIVGIHSLANN 172 (233) T ss_pred CCCEEE--EECCCEEEEECCCCCEEEEEEECCCCCCCCCEEECCCCCEEEEEECCCC T ss_conf 784248--9868605776499987689874899867884698357939988840267 No 59 >PRK11186 carboxy-terminal protease; Provisional Probab=98.03 E-value=5.5e-05 Score=50.42 Aligned_cols=16 Identities=19% Similarity=0.638 Sum_probs=10.9 Q ss_pred CCCCCCCCCEEEEECC Q ss_conf 6731112414675236 Q gi|254780700|r 176 DANNIRVGEAVFTIGN 191 (489) Q Consensus 176 ~s~~~~~G~~v~aiG~ 191 (489) .+..+++||.++++|- T Consensus 272 k~g~L~~gD~Ii~V~q 287 (673) T PRK11186 272 KSKKLSVGDKIVGVGQ 287 (673) T ss_pred HHCCCCCCCEEEEECC T ss_conf 7389998999998257 No 60 >pfam00595 PDZ PDZ domain (Also known as DHR or GLGF). PDZ domains are found in diverse signaling proteins. Probab=98.01 E-value=3.8e-05 Score=51.43 Aligned_cols=72 Identities=25% Similarity=0.245 Sum_probs=50.8 Q ss_pred CCCCCCEEEEECCHHHCCEEEEEEEEECCCCHHHHCCCCCCCEEEEECCEECCCHHHHHHHHHHHHHCCCCEEEEEE Q ss_conf 54525469872896571520079996068897898299988899988999938999999999988625995699999 Q gi|254780700|r 390 DKELLGMVLQDINDGNKKLVRIVALNPNREREVEAKGIQKGMTIVSVNTHEVSCIKDVERLIGKAKEKKRDSVLLQI 466 (489) Q Consensus 390 ~~~~lGl~v~~l~~~~~~~~gi~vv~v~~~s~Aa~~GL~~GDiIl~VNg~~V~s~~dl~~iL~~~k~~~~~~VLL~V 466 (489) ....+|+++..-.. ...++++..+.++|+|++.||++||+|++|||+++.++. ..+++..++..+ ..+.|.| T Consensus 8 ~~~~lG~~l~~~~~---~~~~~~V~~V~~~~~A~~~gL~~GD~Il~VNg~~v~~~~-~~~~~~~l~~~~-~~v~L~V 79 (80) T pfam00595 8 KRGGLGFSLVGGSD---KGPGIFVSEVLPGGAAEAGGLQVGDRILSINGQDLENMS-HDEAVLALKGSG-GEVTLTI 79 (80) T ss_pred CCCCCCEEEECCCC---CCCCEEEEEECCCCCHHHCCCCCCCEEEEECCEECCCCC-HHHHHHHHHCCC-CEEEEEE T ss_conf 99960889975478---998989999778980554879999999999999989998-999999997499-9299998 No 61 >KOG3129 consensus Probab=97.94 E-value=2.1e-06 Score=59.51 Aligned_cols=68 Identities=24% Similarity=0.351 Sum_probs=56.5 Q ss_pred EEEECCCCCCCCCCCCCHHHHHHHHHCCCCCCCCCCEEE--EECCCCCCCCEEEEECCCCCEEEECCCCC Q ss_conf 132011111211346711678887524314787431012--22035667520101204781665125565 Q gi|254780700|r 308 SLITAVVKESPADKAGMKVGDVICMLDGRIIKSHQDFVW--QIASRSPKEQVKISLCKEGSKHSVAVVLG 375 (489) Q Consensus 308 vlV~~V~~~sPA~~AGLk~GDvI~~ing~~I~~~~~l~~--~i~~~~~G~~v~l~v~R~g~~~~~~V~l~ 375 (489) ++|.+|.|+|||++|||+.||.|+++.+..-.++..|.+ .+.....+..+.++++|.|+...++++.. T Consensus 141 a~V~sV~~~SPA~~aGl~~gD~il~fGnV~sgn~~~lq~i~~~v~~~e~~~v~v~v~R~g~~v~L~ltP~ 210 (231) T KOG3129 141 AVVDSVVPGSPADEAGLCVGDEILKFGNVHSGNFLPLQNIAAVVQSNEDQIVSVTVIREGQKVVLSLTPK 210 (231) T ss_pred EEEEECCCCCHHHHHCCCCCCEEEEECCCCCCCCHHHHHHHHHHHHCCCCCEEEEEECCCCEEEEEECCC T ss_conf 8875227898345407543765788533246552258898999874437623579961797788996764 No 62 >pfam10459 Peptidase_S46 Peptidase S46. Dipeptidyl-peptidase 7 (DPP-7) is the best characterized member of this family. It is a serine peptidase that is located on the cell surface and is predicted to have two N-terminal transmembrane domains. Probab=97.84 E-value=6.5e-05 Score=49.95 Aligned_cols=25 Identities=32% Similarity=0.401 Sum_probs=21.0 Q ss_pred CCCEEEEEECCCCEEEECHHCCCCC Q ss_conf 2340278975996298510104787 Q gi|254780700|r 108 LMFGSGFFITDDGYILTSNHIVEDG 132 (489) Q Consensus 108 ~~~GsG~ii~~~G~ilTn~hvv~~a 132 (489) .+-+||-+||++|+|+||+|++.++ T Consensus 45 ~gGCsasfVS~~GLvlTNHHC~~~~ 69 (696) T pfam10459 45 LGGCSASFVSPDGLVLTNHHCAYGA 69 (696) T ss_pred CCCEEEEEECCCCEEEECCCHHHHH T ss_conf 8952688985896466632114778 No 63 >TIGR01713 typeII_sec_gspC general secretion pathway protein C; InterPro: IPR001639 The general (type II) secretion pathway (GSP) within Gram-negative bacteria is a signal sequence-dependent process responsible for protein export , , . The process has two stages: exoproteins are first translocated across the inner membrane by the general signal-dependent export pathway (GEP), and then across the outer membrane by a species-specific accessory mechanism. A number of molecules are involved in the GSP; one of these is known as the 'C' protein, the most probable location of which is the inner membrane . This suggests that protein C is part of the GEP apparatus, aiding trans-location of exoproteins from the cytoplasm to the periplasm, prior to transport across the outer membrane. The size of the 'C' protein is around 270 to 300 amino acids. It apparently contains a single transmembrane domain located in the N-terminal section. The short N-terminal domain is predicted to be cytoplasmic and the large C-terminal domain periplasmic. The gene encoding the 'C' protein has been sequenced in a variety of bacteria such as Aeromonas (exeC); Erwinia (outC); Escherichia coli (yheE or gspC); Klebsiella pneumoniae (pulC); or Vibrio cholerae (epsC).; GO: 0008565 protein transporter activity, 0015628 protein secretion by the type II secretion system, 0015627 type II protein secretion system complex. Probab=97.80 E-value=3.3e-06 Score=58.20 Aligned_cols=190 Identities=12% Similarity=0.151 Sum_probs=104.8 Q ss_pred EEECCCCCCCCCCEEEEEEECCCCCCCCCCCCCCCCCCCCEEEEECCCCCCCCCCCCCCCCCCCCCCCCCCCCEEEEEEE Q ss_conf 74011123344432899960676676556556731112414675236655311112587443112233443420233233 Q gi|254780700|r 145 LPAKLVGTDALFDLAVLKVQSDRKFIPVEFEDANNIRVGEAVFTIGNPFRLRGTVSAGIVSALDRDIPDRPGTFTQIDAP 224 (489) Q Consensus 145 ~~a~vvg~D~~~DlAvlki~~~~~~~~~~lg~s~~~~~G~~v~aiG~P~g~~~tvt~GiiSa~~R~~~~~~~~~iqtDa~ 224 (489) ..+-+++.|...-||+|.=.. +. ...-++++-.-..|..+.+|-+ +.=|++..|| ++. + T Consensus 88 l~G~~~s~d~~rs~aii~~g~-~q-~~~g~ne~~~G~~Gaki~~i~~--------DrVi~~~~Gr-----~E~-l----- 146 (281) T TIGR01713 88 LTGIVASSDRERSIAIIEEGS-EQ-VSLGINESLEGYKGAKIAKIEP--------DRVIFEYNGR-----YEK-L----- 146 (281) T ss_pred EEEEEEECCCCCEEEEEECCC-EE-EEEECCCCCCCCCCCEEEEECC--------CEEEEECCCC-----EEE-E----- T ss_conf 877886048652178882388-53-6631576578988627788728--------7889702782-----556-6----- Q ss_pred EECCCCCCEEEECCCEEEEEECCCCCCCCCCCCCCCCCCCC--CCCCCCCCCCCCC-CCCCCCCCCCCCCCHHHHHHHCC Q ss_conf 20134770354034303555123445532222223211233--2110010000233-33334332000342166764417 Q gi|254780700|r 225 INQGNSGGPCFNALGHVIGVNAMIVTSGQFHMGVGLIIPLS--IIKKAIPSLISKG-RVDHGWFGIMTQNLTQELAIPLG 301 (489) Q Consensus 225 InpGnSGGpl~n~~G~viGint~i~~~~g~~~GigfaIP~~--~~~~i~~~l~~~g-~v~rg~lGv~~~~v~~~la~~lg 301 (489) +|.+.+++.=.-.+ +.....-+.--.=|+|.. ..+++.++|.+.- ..-.-|+-+. ++-. T Consensus 147 --------~L~~~~~~~~atas-vsn~~~~~P~~~~~~P~~~~~~~~~~~~l~~~p~~~~~~Y~~~s--Pv~~------- 208 (281) T TIGR01713 147 --------ELKNTKGEKSATAS-VSNLTRVRPENSSSLPSEEVKLRRIIEELTKEPQQKIFDYIRLS--PVMK------- 208 (281) T ss_pred --------EEECCCCCCCCEEE-ECCCCCCCCCCCCCCCCCCCCHHHHHHHHHHCHHHHHHCEEEEE--EEEE------- T ss_conf --------52258876663132-05678888887223774342068999987515566663004676--7763------- Q ss_pred CCCCCCEEEECCCCCCCCCCCCCHHHHHHHHHCCCCCCCCCCEEEEECCCCCCCCEEEEECCCCCEEEECCC Q ss_conf 644441132011111211346711678887524314787431012220356675201012047816651255 Q gi|254780700|r 302 LRGTKGSLITAVVKESPADKAGMKVGDVICMLDGRIIKSHQDFVWQIASRSPKEQVKISLCKEGSKHSVAVV 373 (489) Q Consensus 302 l~~~~GvlV~~V~~~sPA~~AGLk~GDvI~~ing~~I~~~~~l~~~i~~~~~G~~v~l~v~R~g~~~~~~V~ 373 (489) =++..|..+.-.-+..-..+.|||.||+.+++||..+++..+..+++....--..++|+|.|+|+...+.|. T Consensus 209 ~~K~~GyRlnPgK~~~lF~~~GLq~gD~AvalNgLdLrd~e~a~~~l~~l~~~~~~~ltv~RdG~~~dIy~~ 280 (281) T TIGR01713 209 DDKLEGYRLNPGKDPSLFYKSGLQDGDIAVALNGLDLRDPEQAKQALQLLRELTELTLTVERDGQREDIYVE 280 (281) T ss_pred CCEEEEEEECCCCCHHHHHHHCCCCCCEEEEECCCCCCCHHHHHHHHHHHCCCCCEEEEEEECCCCCEEEEE T ss_conf 884788882478985453411685673246536888779899999999730486608999977942046443 No 64 >COG3031 PulC Type II secretory pathway, component PulC [Intracellular trafficking and secretion] Probab=97.59 E-value=9.2e-06 Score=55.36 Aligned_cols=68 Identities=15% Similarity=0.270 Sum_probs=53.7 Q ss_pred CCEEEECCCCCCCCCCCCCHHHHHHHHHCCCCCCCCCCEEEEECCCCCCCCEEEEECCCCCEEEECCC Q ss_conf 41132011111211346711678887524314787431012220356675201012047816651255 Q gi|254780700|r 306 KGSLITAVVKESPADKAGMKVGDVICMLDGRIIKSHQDFVWQIASRSPKEQVKISLCKEGSKHSVAVV 373 (489) Q Consensus 306 ~GvlV~~V~~~sPA~~AGLk~GDvI~~ing~~I~~~~~l~~~i~~~~~G~~v~l~v~R~g~~~~~~V~ 373 (489) .|..+.-..++|-.++.|||.||+-+++|+..++++.++.+++....--+.+.++++|+|+...+.|. T Consensus 207 ~Gyr~~pgkd~slF~~sglq~GDIavaiNnldltdp~~m~~llq~l~~m~s~qlTv~R~G~rhdInV~ 274 (275) T COG3031 207 EGYRFEPGKDGSLFYKSGLQRGDIAVAINNLDLTDPEDMFRLLQMLRNMPSLQLTVIRRGKRHDINVR 274 (275) T ss_pred EEEEECCCCCCCHHHHHCCCCCCEEEEECCCCCCCHHHHHHHHHHHHCCCCEEEEEEECCCCCEEEEC T ss_conf 88983689983244550688765689965866689899999999611386507999945853112531 No 65 >TIGR01713 typeII_sec_gspC general secretion pathway protein C; InterPro: IPR001639 The general (type II) secretion pathway (GSP) within Gram-negative bacteria is a signal sequence-dependent process responsible for protein export , , . The process has two stages: exoproteins are first translocated across the inner membrane by the general signal-dependent export pathway (GEP), and then across the outer membrane by a species-specific accessory mechanism. A number of molecules are involved in the GSP; one of these is known as the 'C' protein, the most probable location of which is the inner membrane . This suggests that protein C is part of the GEP apparatus, aiding trans-location of exoproteins from the cytoplasm to the periplasm, prior to transport across the outer membrane. The size of the 'C' protein is around 270 to 300 amino acids. It apparently contains a single transmembrane domain located in the N-terminal section. The short N-terminal domain is predicted to be cytoplasmic and the large C-terminal domain periplasmic. The gene encoding the 'C' protein has been sequenced in a variety of bacteria such as Aeromonas (exeC); Erwinia (outC); Escherichia coli (yheE or gspC); Klebsiella pneumoniae (pulC); or Vibrio cholerae (epsC).; GO: 0008565 protein transporter activity, 0015628 protein secretion by the type II secretion system, 0015627 type II protein secretion system complex. Probab=97.53 E-value=0.00048 Score=44.41 Aligned_cols=63 Identities=17% Similarity=0.277 Sum_probs=51.8 Q ss_pred EEEEEEEECCCCHHH-HCCCCCCCEEEEECCEECCCHHHHHHHHHHHHHCCCCEEEEEEEECCCCC Q ss_conf 007999606889789-82999888999889999389999999999886259956999997177643 Q gi|254780700|r 409 VRIVALNPNREREVE-AKGIQKGMTIVSVNTHEVSCIKDVERLIGKAKEKKRDSVLLQIKYDPDMQ 473 (489) Q Consensus 409 ~gi~vv~v~~~s~Aa-~~GL~~GDiIl~VNg~~V~s~~dl~~iL~~~k~~~~~~VLL~V~r~~~~~ 473 (489) +.-+.+.+.+++.+. +.|||.||+.+++||-.+++.++..++|..+.+. ..+-|+|+|+|... T Consensus 212 ~~GyRlnPgK~~~lF~~~GLq~gD~AvalNgLdLrd~e~a~~~l~~l~~~--~~~~ltv~RdG~~~ 275 (281) T TIGR01713 212 LEGYRLNPGKDPSLFYKSGLQDGDIAVALNGLDLRDPEQAKQALQLLREL--TELTLTVERDGQRE 275 (281) T ss_pred EEEEEECCCCCHHHHHHHCCCCCCEEEEECCCCCCCHHHHHHHHHHHCCC--CCEEEEEEECCCCC T ss_conf 78888247898545341168567324653688877989999999973048--66089999779420 No 66 >TIGR03279 cyano_FeS_chp putative FeS-containing Cyanobacterial-specific oxidoreductase. Members of this protein family are predicted FeS-containing oxidoreductases of unknown function, apparently restricted to and universal across the Cyanobacteria. The high trusted cutoff score for this model, 700 bits, excludes homologs from other lineages. This exclusion seems justified because a significant number of sequence positions are simultaneously unique to and invariant across the Cyanobacteria, suggesting a specialized, conserved function, perhaps related to photosynthesis. A distantly related protein family, TIGR03278, in universal in and restricted to archaeal methanogens, and may be linked to methanogenesis. Probab=97.47 E-value=0.00017 Score=47.31 Aligned_cols=16 Identities=6% Similarity=0.194 Sum_probs=7.9 Q ss_pred EEEEECCCCEEEEECC Q ss_conf 4379628980674011 Q gi|254780700|r 134 SFSVILSDDTELPAKL 149 (489) Q Consensus 134 ~i~V~~~dg~~~~a~v 149 (489) ++.|.-.||..+..++ T Consensus 46 ~L~v~~~~Ge~~~iei 61 (433) T TIGR03279 46 ELEVLDANGESHQIEI 61 (433) T ss_pred EEEEECCCCCEEEEEE T ss_conf 9999958997999998 No 67 >pfam05579 Peptidase_S32 Equine arteritis virus serine endopeptidase S32. Serine peptidases involved in processing nidovirus polyprotein. Probab=97.39 E-value=0.0017 Score=40.92 Aligned_cols=113 Identities=19% Similarity=0.222 Sum_probs=66.8 Q ss_pred EEEEEECCCCEEEECHHCCCCCCEEEEECCCCEEEEECCCCCCCCCCEEEEEEEC-CCCCCCCCCCCCCCCCCCCEEEEE Q ss_conf 0278975996298510104787143796289806740111233444328999606-766765565567311124146752 Q gi|254780700|r 111 GSGFFITDDGYILTSNHIVEDGASFSVILSDDTELPAKLVGTDALFDLAVLKVQS-DRKFIPVEFEDANNIRVGEAVFTI 189 (489) Q Consensus 111 GsG~ii~~~G~ilTn~hvv~~a~~i~V~~~dg~~~~a~vvg~D~~~DlAvlki~~-~~~~~~~~lg~s~~~~~G~~v~ai 189 (489) |+=|.|+.+-.++|+.||+.+... +|. .+|-.+- .-.+..-|.|.-.+.. +...|.++|.+..---.--|..+. T Consensus 1 GgVfti~g~~vvvTAsHvl~~~~a-rv~-~~g~~~~---ltFk~~GDyA~A~~~~w~G~aP~~~fa~~~ytGrAyw~tst 75 (426) T pfam05579 1 GGVFTINGNVVVVTASHVLGGNKA-RVS-GVGFNQM---LTFKTNGDYAFAVVPEWPGAAPKLKFAQRGYTGRAYWCTST 75 (426) T ss_pred CCEEEECCEEEEEEEEEEECCCCE-EEE-ECCCEEE---EEEECCCCEEEEECCCCCCCCCCCEECCCCCCCCEEEECCC T ss_conf 974887892899986777359825-886-3143058---99610573445655778888874243477776643663278 Q ss_pred CCCCCCCCCCCCCCCCCCCCCCCCCCCCEEEEEEEEECCCCCCEEEECCCEEEEEECCC Q ss_conf 36655311112587443112233443420233233201347703540343035551234 Q gi|254780700|r 190 GNPFRLRGTVSAGIVSALDRDIPDRPGTFTQIDAPINQGNSGGPCFNALGHVIGVNAMI 248 (489) Q Consensus 190 G~P~g~~~tvt~GiiSa~~R~~~~~~~~~iqtDa~InpGnSGGpl~n~~G~viGint~i 248 (489) | +..|+|+ ...-| +--++|.||.|+++.+|++|||.|-. T Consensus 76 G--------vE~glvg--------~~~a~----cfT~cGDSGSpVi~e~g~lvGVHTGS 114 (426) T pfam05579 76 G--------VEPGLVG--------LGFAF----CFTKCGDSGSPVITEDGNLVGVHTGS 114 (426) T ss_pred C--------CCCCCCC--------CCEEE----EECCCCCCCCCCCCCCCCEEEEECCC T ss_conf 8--------8755015--------74389----98467888995377899789886268 No 68 >PRK09681 putative type II secretion protein GspC; Provisional Probab=97.29 E-value=2.6e-05 Score=52.48 Aligned_cols=64 Identities=16% Similarity=0.349 Sum_probs=48.1 Q ss_pred ECCCCCCCC---CCCCCHHHHHHHHHCCCCCCCCCCEEEEECCCCCCCCEEEEECCCCCEEEECCCC Q ss_conf 011111211---3467116788875243147874310122203566752010120478166512556 Q gi|254780700|r 311 TAVVKESPA---DKAGMKVGDVICMLDGRIIKSHQDFVWQIASRSPKEQVKISLCKEGSKHSVAVVL 374 (489) Q Consensus 311 ~~V~~~sPA---~~AGLk~GDvI~~ing~~I~~~~~l~~~i~~~~~G~~v~l~v~R~g~~~~~~V~l 374 (489) -+|.||..+ ..+|||.||++++|||.+..++.........++--++++|+|.|+|+..++.+.| T Consensus 252 YRl~PGkd~~lF~~~Glq~gDlavsiNG~dLtDp~~a~~~~~~l~~ate~~ltVeRdGq~~~I~isL 318 (319) T PRK09681 252 YAVKPGADRSLFDASGFKEGDIAIALNQQDFTDPRAMIALMRQLPSMDSIQLTVLRKGARYDISIAL 318 (319) T ss_pred EEECCCCCHHHHHHCCCCCCCEEEEECCCCCCCHHHHHHHHHHHHHCCEEEEEEEECCEEEEEEEEC T ss_conf 8727998889999729998888898269667898999999996000715589999799689999971 No 69 >pfam04495 GRASP55_65 GRASP55/65 family. GRASP55 (Golgi reassembly stacking protein of 55 kDa) and GRASP65 (a 65 kDa) protein are highly homologous. GRASP55 is a component of the Golgi stacking machinery. GRASP65, an N-ethylmaleimide- sensitive membrane protein required for the stacking of Golgi cisternae in a cell-free system. Probab=97.24 E-value=0.0085 Score=36.43 Aligned_cols=84 Identities=18% Similarity=0.241 Sum_probs=55.5 Q ss_pred CCCCCCCCCCHHHHHHHCCCCCCCCEEEECCCCCCCCCCCCCHH-HHHHHHHCCCCCCCCCCEEEEECCCCCCCCEEEEE Q ss_conf 43320003421667644176444411320111112113467116-78887524314787431012220356675201012 Q gi|254780700|r 283 GWFGIMTQNLTQELAIPLGLRGTKGSLITAVVKESPADKAGMKV-GDVICMLDGRIIKSHQDFVWQIASRSPKEQVKISL 361 (489) Q Consensus 283 g~lGv~~~~v~~~la~~lgl~~~~GvlV~~V~~~sPA~~AGLk~-GDvI~~ing~~I~~~~~l~~~i~~~~~G~~v~l~v 361 (489) |-||++++--.-+ -.......|-.|.++|||++|||++ .|-|+..+..-+....+|..+|.. ..+..+.|-| T Consensus 26 ~lLG~si~~~~~~------~a~~~~whvl~v~~~SPA~~AgL~~~~DYIiG~~~~~l~~~~~l~~~v~~-~~~~~l~lyV 98 (280) T pfam04495 26 GLLGLSLRWCSFS------GANENVWHVLDVHPNSPAALAGLQPYSDYIIGTDSGLLRGEDDLFELVES-HEGRPLKLYV 98 (280) T ss_pred CCEEEEEEECCCC------CCCCEEEEEEECCCCCHHHHCCCCCCCCEEEECCCCCCCCHHHHHHHHHH-HCCCCEEEEE T ss_conf 6305799841565------65330689984489997997488877786873684231456789999997-3699769999 Q ss_pred CCCCCE--EEECCC Q ss_conf 047816--651255 Q gi|254780700|r 362 CKEGSK--HSVAVV 373 (489) Q Consensus 362 ~R~g~~--~~~~V~ 373 (489) +-.... +++.|+ T Consensus 99 YN~~~d~~R~V~i~ 112 (280) T pfam04495 99 YNSETDVVREVTIT 112 (280) T ss_pred ECCCCCCEEEEEEE T ss_conf 65788836789985 No 70 >PRK09681 putative type II secretion protein GspC; Provisional Probab=97.23 E-value=0.0025 Score=39.89 Aligned_cols=59 Identities=19% Similarity=0.343 Sum_probs=45.8 Q ss_pred EEEECCCCHH-HHCCCCCCCEEEEECCEECCCHHHHHHHHHHHHHCCCCEEEEEEEECCCCC Q ss_conf 9960688978-982999888999889999389999999999886259956999997177643 Q gi|254780700|r 413 ALNPNREREV-EAKGIQKGMTIVSVNTHEVSCIKDVERLIGKAKEKKRDSVLLQIKYDPDMQ 473 (489) Q Consensus 413 vv~v~~~s~A-a~~GL~~GDiIl~VNg~~V~s~~dl~~iL~~~k~~~~~~VLL~V~r~~~~~ 473 (489) -+.+.++... ...||+.||++++|||.+.+++..-.+++.++++. ..+-|.|+|+|... T Consensus 253 Rl~PGkd~~lF~~~Glq~gDlavsiNG~dLtDp~~a~~~~~~l~~a--te~~ltVeRdGq~~ 312 (319) T PRK09681 253 AVKPGADRSLFDASGFKEGDIAIALNQQDFTDPRAMIALMRQLPSM--DSIQLTVLRKGARY 312 (319) T ss_pred EECCCCCHHHHHHCCCCCCCEEEEECCCCCCCHHHHHHHHHHHHHC--CEEEEEEEECCEEE T ss_conf 7279988899997299988888982696678989999999960007--15589999799689 No 71 >KOG3553 consensus Probab=97.16 E-value=5.7e-05 Score=50.29 Aligned_cols=35 Identities=34% Similarity=0.530 Sum_probs=21.2 Q ss_pred CCCEEEECCCCCCCCCCCCCHHHHHHHHHCCCCCC Q ss_conf 44113201111121134671167888752431478 Q gi|254780700|r 305 TKGSLITAVVKESPADKAGMKVGDVICMLDGRIIK 339 (489) Q Consensus 305 ~~GvlV~~V~~~sPA~~AGLk~GDvI~~ing~~I~ 339 (489) ..|++|++|.++|||+.|||+.+|.|+.+||-... T Consensus 58 D~GiYvT~V~eGsPA~~AGLrihDKIlQvNG~DfT 92 (124) T KOG3553 58 DKGIYVTRVSEGSPAEIAGLRIHDKILQVNGWDFT 92 (124) T ss_pred CCCEEEEEECCCCHHHHHCCEECCEEEEECCCEEE T ss_conf 76479997046983664002203568886474058 No 72 >pfam10459 Peptidase_S46 Peptidase S46. Dipeptidyl-peptidase 7 (DPP-7) is the best characterized member of this family. It is a serine peptidase that is located on the cell surface and is predicted to have two N-terminal transmembrane domains. Probab=97.06 E-value=0.00028 Score=45.95 Aligned_cols=40 Identities=18% Similarity=0.301 Sum_probs=27.5 Q ss_pred CCCCEEEEEEECC------------CCCC---CCCCCCCCCCCCCCEEEEECCCCC Q ss_conf 4443289996067------------6676---556556731112414675236655 Q gi|254780700|r 154 ALFDLAVLKVQSD------------RKFI---PVEFEDANNIRVGEAVFTIGNPFR 194 (489) Q Consensus 154 ~~~DlAvlki~~~------------~~~~---~~~lg~s~~~~~G~~v~aiG~P~g 194 (489) +..|++++|+=.. .++. .+++ ..+.++.||.|+.+|+|-. T Consensus 197 htgDfs~fR~Y~~~dg~PA~ys~dnvP~~p~~~l~v-s~~GvkeGDfvmV~GyPG~ 251 (696) T pfam10459 197 HTGDFSFFRAYAGKDGKPADYSKDNVPYKPKHFLKV-SAQGVKEGDFVMVAGYPGR 251 (696) T ss_pred CCCCEEEEEEEECCCCCCCCCCCCCCCCCCCCCCCC-CCCCCCCCCEEEEECCCCC T ss_conf 557658999876788982424646876788544762-5568899986999358998 No 73 >COG3480 SdrC Predicted secreted protein containing a PDZ domain [Signal transduction mechanisms] Probab=96.95 E-value=0.0049 Score=37.97 Aligned_cols=17 Identities=24% Similarity=0.864 Sum_probs=10.0 Q ss_pred CCCCCCCEEEEE-CCCCC Q ss_conf 311124146752-36655 Q gi|254780700|r 178 NNIRVGEAVFTI-GNPFR 194 (489) Q Consensus 178 ~~~~~G~~v~ai-G~P~g 194 (489) ..++.||.++|+ |.||. T Consensus 145 gkl~~gD~i~avdg~~f~ 162 (342) T COG3480 145 GKLEAGDTIIAVDGEPFT 162 (342) T ss_pred CEECCCCEEEEECCEECC T ss_conf 223268768855894457 No 74 >KOG3542 consensus Probab=96.91 E-value=0.00026 Score=46.05 Aligned_cols=39 Identities=26% Similarity=0.386 Sum_probs=29.4 Q ss_pred CCCCCEEEECCCCCCCCCCCCCHHHHHHHHHCCCCCCCC Q ss_conf 444411320111112113467116788875243147874 Q gi|254780700|r 303 RGTKGSLITAVVKESPADKAGMKVGDVICMLDGRIIKSH 341 (489) Q Consensus 303 ~~~~GvlV~~V~~~sPA~~AGLk~GDvI~~ing~~I~~~ 341 (489) ++.-|++|.+|.|||-|+++|||.||.|++|||+...+. T Consensus 559 EkGfgifV~~V~pgskAa~~GlKRgDqilEVNgQnfeni 597 (1283) T KOG3542 559 EKGFGIFVAEVFPGSKAAREGLKRGDQILEVNGQNFENI 597 (1283) T ss_pred CCCCEEEEEEECCCCHHHHHHHHHHHHHHHCCCCCHHHH T ss_conf 556406886306884677765420114321045232220 No 75 >KOG3542 consensus Probab=96.53 E-value=0.003 Score=39.31 Aligned_cols=64 Identities=17% Similarity=0.267 Sum_probs=50.0 Q ss_pred HHHCCEEEEEEEEECCCCHHHHCCCCCCCEEEEECCEECCCHHHHHHHHHHHHHCCCCEEEEEEEEC Q ss_conf 6571520079996068897898299988899988999938999999999988625995699999717 Q gi|254780700|r 403 DGNKKLVRIVALNPNREREVEAKGIQKGMTIVSVNTHEVSCIKDVERLIGKAKEKKRDSVLLQIKYD 469 (489) Q Consensus 403 ~~~~~~~gi~vv~v~~~s~Aa~~GL~~GDiIl~VNg~~V~s~~dl~~iL~~~k~~~~~~VLL~V~r~ 469 (489) .+..++++|.+.++.+++.|++.||+.||-|++|||+..+++. +.++.+-+.++ ..+.|.+..+ T Consensus 556 GGsEkGfgifV~~V~pgskAa~~GlKRgDqilEVNgQnfenis-~~KA~eiLrnn--thLtltvKtN 619 (1283) T KOG3542 556 GGSEKGFGIFVAEVFPGSKAAREGLKRGDQILEVNGQNFENIS-AKKAEEILRNN--THLTLTVKTN 619 (1283) T ss_pred CCCCCCCEEEEEEECCCCHHHHHHHHHHHHHHHCCCCCHHHHH-HHHHHHHHCCC--CEEEEEEECC T ss_conf 6765564068863068846777654201143210452322202-77899986378--4489998524 No 76 >KOG3553 consensus Probab=96.50 E-value=0.0043 Score=38.32 Aligned_cols=49 Identities=12% Similarity=0.169 Sum_probs=37.9 Q ss_pred CEEEEEEEEECCCCHHHHCCCCCCCEEEEECCEECCCHHHHHHHHHHHHH Q ss_conf 52007999606889789829998889998899993899999999998862 Q gi|254780700|r 407 KLVRIVALNPNREREVEAKGIQKGMTIVSVNTHEVSCIKDVERLIGKAKE 456 (489) Q Consensus 407 ~~~gi~vv~v~~~s~Aa~~GL~~GDiIl~VNg~~V~s~~dl~~iL~~~k~ 456 (489) ...+++++++.++|||+.+|||.+|.|+.+||...+-+. ..++++.++. T Consensus 57 tD~GiYvT~V~eGsPA~~AGLrihDKIlQvNG~DfTMvT-Hd~Avk~i~k 105 (124) T KOG3553 57 TDKGIYVTRVSEGSPAEIAGLRIHDKILQVNGWDFTMVT-HDQAVKRITK 105 (124) T ss_pred CCCCEEEEEECCCCHHHHHCCEECCEEEEECCCEEEEEE-HHHHHHHHHH T ss_conf 776479997046983664002203568886474058887-6888878637 No 77 >KOG3532 consensus Probab=96.46 E-value=0.00027 Score=46.03 Aligned_cols=45 Identities=9% Similarity=0.228 Sum_probs=30.7 Q ss_pred EEEEEECCCCHHHHCCCCCCCEEEEECCEECCCHHHHHHHHHHHH Q ss_conf 799960688978982999888999889999389999999999886 Q gi|254780700|r 411 IVALNPNREREVEAKGIQKGMTIVSVNTHEVSCIKDVERLIGKAK 455 (489) Q Consensus 411 i~vv~v~~~s~Aa~~GL~~GDiIl~VNg~~V~s~~dl~~iL~~~k 455 (489) +.+..+.++++|.++.+.+||++++|||.||++.++..+.++... T Consensus 400 v~v~tv~~ns~a~k~~~~~gdvlvai~~~pi~s~~q~~~~~~s~~ 444 (1051) T KOG3532 400 VKVCTVEDNSLADKAAFKPGDVLVAINNVPIRSERQATRFLQSTT 444 (1051) T ss_pred EEEEEECCCCHHHHHCCCCCCEEEEECCCCCHHHHHHHHHHHHCC T ss_conf 899970689754675268655699855852315999999998614 No 78 >KOG3129 consensus Probab=96.38 E-value=0.015 Score=34.91 Aligned_cols=66 Identities=8% Similarity=0.030 Sum_probs=48.4 Q ss_pred EEEEEEECCCCHHHHCCCCCCCEEEEECCEECCCHHHHHHHHHHHHHCCCCEEEEEEEECCCCCCC Q ss_conf 079996068897898299988899988999938999999999988625995699999717764334 Q gi|254780700|r 410 RIVALNPNREREVEAKGIQKGMTIVSVNTHEVSCIKDVERLIGKAKEKKRDSVLLQIKYDPDMQSG 475 (489) Q Consensus 410 gi~vv~v~~~s~Aa~~GL~~GDiIl~VNg~~V~s~~dl~~iL~~~k~~~~~~VLL~V~r~~~~~~~ 475 (489) -.++-++.++|||+++||+.||.|+++....-.+...|.++-...+......+-+.|.|.+..... T Consensus 140 Fa~V~sV~~~SPA~~aGl~~gD~il~fGnV~sgn~~~lq~i~~~v~~~e~~~v~v~v~R~g~~v~L 205 (231) T KOG3129 140 FAVVDSVVPGSPADEAGLCVGDEILKFGNVHSGNFLPLQNIAAVVQSNEDQIVSVTVIREGQKVVL 205 (231) T ss_pred EEEEEECCCCCHHHHHCCCCCCEEEEECCCCCCCCHHHHHHHHHHHHCCCCCEEEEEECCCCEEEE T ss_conf 488752278983454075437657885332465522588989998744376235799617977889 No 79 >KOG3532 consensus Probab=96.24 E-value=0.018 Score=34.33 Aligned_cols=48 Identities=27% Similarity=0.360 Sum_probs=40.4 Q ss_pred CCCEEEECCCCCCCCCCCCCHHHHHHHHHCCCCCCCCCCEEEEECCCC Q ss_conf 441132011111211346711678887524314787431012220356 Q gi|254780700|r 305 TKGSLITAVVKESPADKAGMKVGDVICMLDGRIIKSHQDFVWQIASRS 352 (489) Q Consensus 305 ~~GvlV~~V~~~sPA~~AGLk~GDvI~~ing~~I~~~~~l~~~i~~~~ 352 (489) .+-+-|..|.+++||.||-+++||++++|||.||++.++..+.+.... T Consensus 397 ~~~v~v~tv~~ns~a~k~~~~~gdvlvai~~~pi~s~~q~~~~~~s~~ 444 (1051) T KOG3532 397 NRAVKVCTVEDNSLADKAAFKPGDVLVAINNVPIRSERQATRFLQSTT 444 (1051) T ss_pred CEEEEEEEECCCCHHHHHCCCCCCEEEEECCCCCHHHHHHHHHHHHCC T ss_conf 637899970689754675268655699855852315999999998614 No 80 >COG3031 PulC Type II secretory pathway, component PulC [Intracellular trafficking and secretion] Probab=96.15 E-value=0.024 Score=33.56 Aligned_cols=59 Identities=19% Similarity=0.292 Sum_probs=46.4 Q ss_pred EEEEECCC-CHHHHCCCCCCCEEEEECCEECCCHHHHHHHHHHHHHCCCCEEEEEEEECCCC Q ss_conf 99960688-97898299988899988999938999999999988625995699999717764 Q gi|254780700|r 412 VALNPNRE-REVEAKGIQKGMTIVSVNTHEVSCIKDVERLIGKAKEKKRDSVLLQIKYDPDM 472 (489) Q Consensus 412 ~vv~v~~~-s~Aa~~GL~~GDiIl~VNg~~V~s~~dl~~iL~~~k~~~~~~VLL~V~r~~~~ 472 (489) +..++.++ +.-+..||+.||+-+++|+..++++++..+++..+.+. .++-|+|+|+|.. T Consensus 209 yr~~pgkd~slF~~sglq~GDIavaiNnldltdp~~m~~llq~l~~m--~s~qlTv~R~G~r 268 (275) T COG3031 209 YRFEPGKDGSLFYKSGLQRGDIAVAINNLDLTDPEDMFRLLQMLRNM--PSLQLTVIRRGKR 268 (275) T ss_pred EEECCCCCCCHHHHHCCCCCCEEEEECCCCCCCHHHHHHHHHHHHCC--CCEEEEEEECCCC T ss_conf 98368998324455068876568996586668989999999961138--6507999945853 No 81 >KOG1892 consensus Probab=96.15 E-value=0.00086 Score=42.78 Aligned_cols=76 Identities=16% Similarity=0.274 Sum_probs=52.1 Q ss_pred CCEEEEECCHHHCCEEEEEEEEECCCCHHHHCC-CCCCCEEEEECCEECCCHHHHHHHHHHHHHCCCCEEEEEEEECCC Q ss_conf 546987289657152007999606889789829-998889998899993899999999998862599569999971776 Q gi|254780700|r 394 LGMVLQDINDGNKKLVRIVALNPNREREVEAKG-IQKGMTIVSVNTHEVSCIKDVERLIGKAKEKKRDSVLLQIKYDPD 471 (489) Q Consensus 394 lGl~v~~l~~~~~~~~gi~vv~v~~~s~Aa~~G-L~~GDiIl~VNg~~V~s~~dl~~iL~~~k~~~~~~VLL~V~r~~~ 471 (489) +|+.+-.-....+...||++-.|+++++|+.-| |..||-+++|||+..-.+.+ +.+-... ...+..|-|.|...+. T Consensus 945 mGLSIVAAkGaGq~klGIYvKsVV~GgaAd~DGRL~aGDQLLsVdG~SLiGisQ-ErAA~lm-trtg~vV~leVaKqgA 1021 (1629) T KOG1892 945 MGLSIVAAKGAGQRKLGIYVKSVVEGGAADHDGRLEAGDQLLSVDGHSLIGISQ-ERAARLM-TRTGNVVHLEVAKQGA 1021 (1629) T ss_pred CCEEEEEECCCCCCCCCEEEEEECCCCCCCCCCCCCCCCEEEEECCCCCCCCCH-HHHHHHH-HCCCCEEEEEHHHHHH T ss_conf 324787604677541114798731587545556401576366455820114258-8899987-4248757875132356 No 82 >TIGR00225 prc C-terminal processing peptidase; InterPro: IPR004447 Proteolytic enzymes that exploit serine in their catalytic activity are ubiquitous, being found in viruses, bacteria and eukaryotes . They include a wide range of peptidase activity, including exopeptidase, endopeptidase, oligopeptidase and omega-peptidase activity. Over 20 families (denoted S1 - S66) of serine protease have been identified, these being grouped into clans on the basis of structural similarity and other functional evidence . Structures are known for members of the clans and the structures indicate that some appear to be totally unrelated, suggesting different evolutionary origins for the serine peptidases . Not withstanding their different evolutionary origins, there are similarities in the reaction mechanisms of several peptidases. Chymotrypsin, subtilisin and carboxypeptidase C have a catalytic triad of serine, aspartate and histidine in common: serine acts as a nucleophile, aspartate as an electrophile, and histidine as a base . The geometric orientations of the catalytic residues are similar between families, despite different protein folds . The linear arrangements of the catalytic residues commonly reflect clan relationships. For example the catalytic triad in the chymotrypsin clan (PA) is ordered HDS, but is ordered DHS in the subtilisin clan (SB) and SDH in the carboxypeptidase clan (SC) , . Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Each clan is identified with two letters, the first representing the catalytic type of the families included in the clan (with the letter 'P' being used for a clan containing families of more than one of the catalytic types serine, threonine and cysteine). Some families cannot yet be assigned to clans, and when a formal assignment is required, such a family is described as belonging to clan A-, C-, M-, S-, T- or U-, according to the catalytic type. Some clans are divided into subclans because there is evidence of a very ancient divergence within the clan, for example MA(E), the gluzincins, and MA(M), the metzincins. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. The serine, threonine and cysteine peptidases utilise the amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule. This group of serine peptidases belong to MEROPS peptidase family S41 (clan SM), subfamily S41A (C-terminal processing peptidase). It is a family of C-terminal peptidases with different substrates in different species, including processing of D1 protein of the photosystem II reaction centre in higher plants, and cleavage of a peptide of 11 residues from the precursor form of penicillin-binding protein in Escherichia coli.; GO: 0008236 serine-type peptidase activity, 0006508 proteolysis. Probab=96.09 E-value=0.0092 Score=36.24 Aligned_cols=70 Identities=24% Similarity=0.368 Sum_probs=53.2 Q ss_pred CEEEECCCCCCCCCCCCCHHHHHHHHHCCC-----CCCCCC--CEEEEECCCCCCCCEEEEECCCC---CEEEECCCCCC Q ss_conf 113201111121134671167888752431-----478743--10122203566752010120478---16651255655 Q gi|254780700|r 307 GSLITAVVKESPADKAGMKVGDVICMLDGR-----IIKSHQ--DFVWQIASRSPKEQVKISLCKEG---SKHSVAVVLGS 376 (489) Q Consensus 307 GvlV~~V~~~sPA~~AGLk~GDvI~~ing~-----~I~~~~--~l~~~i~~~~~G~~v~l~v~R~g---~~~~~~V~l~~ 376 (489) -+.+....+++||.++|+++||.|+++|++ .+..+. +.... ..-++|..+.+++.|.| +...+...+.. T Consensus 67 ~~~~~~~~~g~p~~~~g~~~~d~~~~~~~~~~~~~~~~~~~~~~~~~~-~~g~~g~~~~~~~~~~g~g~~~~~~~~~~~~ 145 (361) T TIGR00225 67 ELVIVSPLEGSPAEKAGLKPGDKILKVNGKGGPLESVLGLSLDDAVAL-IRGKKGTKVSLEILRAGKGGKSGPLDFTLKR 145 (361) T ss_pred EEEEEECCCCCCHHHCCCCCCCEEEEECCCCCCCHHHHHCCHHHHHHH-HCCCCCCEEEEEEECCCCCCCCEEEEEEEEH T ss_conf 378862146773112046666406861676664102220125788997-5077786168998427778753026787511 Q ss_pred C Q ss_conf 8 Q gi|254780700|r 377 S 377 (489) Q Consensus 377 ~ 377 (489) . T Consensus 146 ~ 146 (361) T TIGR00225 146 D 146 (361) T ss_pred H T ss_conf 0 No 83 >KOG2921 consensus Probab=96.05 E-value=0.011 Score=35.66 Aligned_cols=58 Identities=31% Similarity=0.365 Sum_probs=43.5 Q ss_pred CCCCCEEEECCCCCCCCCC-CCCHHHHHHHHHCCCCCCCCCCEEEEECC---CCCCCCEEEE Q ss_conf 4444113201111121134-67116788875243147874310122203---5667520101 Q gi|254780700|r 303 RGTKGSLITAVVKESPADK-AGMKVGDVICMLDGRIIKSHQDFVWQIAS---RSPKEQVKIS 360 (489) Q Consensus 303 ~~~~GvlV~~V~~~sPA~~-AGLk~GDvI~~ing~~I~~~~~l~~~i~~---~~~G~~v~l~ 360 (489) ....|+.|++|...||+-- -||.+||+|+++||-+|.+.+|..+-+.. +++|..+.-. T Consensus 217 a~g~gV~Vtev~~~Spl~gprGL~vgdvitsldgcpV~~v~dW~ecl~tsl~~~ngycvsas 278 (484) T KOG2921 217 AHGEGVTVTEVPSVSPLFGPRGLSVGDVITSLDGCPVHKVSDWLECLATSLDKENGYCVSAS 278 (484) T ss_pred HCCCEEEEEECCCCCCCCCCCCCCCCCEEEECCCCCCCCHHHHHHHHHHHCCCCCCEEECHH T ss_conf 63850799944555777576567766557753785458888999999864466787432688 No 84 >COG3975 Predicted protease with the C-terminal PDZ domain [General function prediction only] Probab=95.91 E-value=0.027 Score=33.26 Aligned_cols=71 Identities=11% Similarity=0.083 Sum_probs=44.3 Q ss_pred CCCEEEEECCHHH--------CCEEEEEEEEECCCCHHHHCCCCCCCEEEEECCEECCCHHHHHHHHHHHHHCCCCEEEE Q ss_conf 2546987289657--------15200799960688978982999888999889999389999999999886259956999 Q gi|254780700|r 393 LLGMVLQDINDGN--------KKLVRIVALNPNREREVEAKGIQKGMTIVSVNTHEVSCIKDVERLIGKAKEKKRDSVLL 464 (489) Q Consensus 393 ~lGl~v~~l~~~~--------~~~~gi~vv~v~~~s~Aa~~GL~~GDiIl~VNg~~V~s~~dl~~iL~~~k~~~~~~VLL 464 (489) ..|+++.+...+. ...-...+..|.++|||+.+||.+||.|++|||.. ..+++ .+- +..+-+ T Consensus 438 ~~gL~~~~~~~~~~~LGl~v~~~~g~~~i~~V~~~gPA~~AGl~~Gd~ivai~G~s-~~l~~-------~~~--~d~i~v 507 (558) T COG3975 438 RFGLTFTPKPREAYYLGLKVKSEGGHEKITFVFPGGPAYKAGLSPGDKIVAINGIS-DQLDR-------YKV--NDKIQV 507 (558) T ss_pred HCCEEEEECCCCCCCCCEEECCCCCEEEEEECCCCCHHHHCCCCCCCEEEEECCCC-CCCCC-------CCC--CCCEEE T ss_conf 23348874688876543586056880699844789816751588756799976735-55221-------442--662489 Q ss_pred EEEECCCCC Q ss_conf 997177643 Q gi|254780700|r 465 QIKYDPDMQ 473 (489) Q Consensus 465 ~V~r~~~~~ 473 (489) .+.|.+... T Consensus 508 ~~~~~~~L~ 516 (558) T COG3975 508 HVFREGRLR 516 (558) T ss_pred EECCCCCEE T ss_conf 982578238 No 85 >pfam09342 DUF1986 Domain of unknown function (DUF1986). This domain is found in serine proteases and is predicted to contain disulphide bonds. Probab=95.88 E-value=0.052 Score=31.42 Aligned_cols=91 Identities=19% Similarity=0.262 Sum_probs=60.0 Q ss_pred CCCEEEEEECCCCEEEECHHCCCC----CCEEEEECCCCEEEEE------CCCCCC-----CCCCEEEEEEECCCCCC-- Q ss_conf 234027897599629851010478----7143796289806740------111233-----44432899960676676-- Q gi|254780700|r 108 LMFGSGFFITDDGYILTSNHIVED----GASFSVILSDDTELPA------KLVGTD-----ALFDLAVLKVQSDRKFI-- 170 (489) Q Consensus 108 ~~~GsG~ii~~~G~ilTn~hvv~~----a~~i~V~~~dg~~~~a------~vvg~D-----~~~DlAvlki~~~~~~~-- 170 (489) .-.-+|++||.. ||||..|...+ ...+.|.|..++.+.. .+.-.| |.+|++||+++.+..+. T Consensus 27 ~~~C~Gvlid~~-WiLva~~Cl~~i~l~~~Yisv~LGg~kt~~~i~sp~EQI~rVD~~~~vp~s~i~LLhL~~p~~ft~y 105 (267) T pfam09342 27 NYRCTGVLIDLS-WVLVSHSCLWDTSLEHSYISVVLGGHKTLKSVKGPYEQIYRVDCRKDLPRSKISLLHLKSPATFSNH 105 (267) T ss_pred EEEEEEEEECCC-EEEEEHHHHCCCCCCCCEEEEEECCCEEEECCCCCCEEEEEEEEEECCCCCCEEEEEECCCCCCCCC T ss_conf 389888997272-8998056557787776338999347403521469801799951042168511689997485532232 Q ss_pred --CCCCCC-CCCCCCCCEEEEECCCCCCCCCCC Q ss_conf --556556-731112414675236655311112 Q gi|254780700|r 171 --PVEFED-ANNIRVGEAVFTIGNPFRLRGTVS 200 (489) Q Consensus 171 --~~~lg~-s~~~~~G~~v~aiG~P~g~~~tvt 200 (489) |+-+-+ |+...-+...+|+|+- ..+..-| T Consensus 106 VlP~flp~tsn~~~~~~~CisVg~d-d~gr~kT 137 (267) T pfam09342 106 VLPTFVPSTRNHNEKNNKCVTVGQD-DTGRNKT 137 (267) T ss_pred EEEEECCCCCCCCCCCCEEEEEEEC-CCCCEEE T ss_conf 2102426757887789845898820-4686036 No 86 >KOG3605 consensus Probab=95.59 E-value=0.0022 Score=40.15 Aligned_cols=49 Identities=20% Similarity=0.290 Sum_probs=35.4 Q ss_pred CCCCHHHHCC-CCCCCEEEEECCEECCC--HHHHHHHHHHHHHCCCCEEEEEEE Q ss_conf 6889789829-99888999889999389--999999999886259956999997 Q gi|254780700|r 417 NREREVEAKG-IQKGMTIVSVNTHEVSC--IKDVERLIGKAKEKKRDSVLLQIK 467 (489) Q Consensus 417 ~~~s~Aa~~G-L~~GDiIl~VNg~~V~s--~~dl~~iL~~~k~~~~~~VLL~V~ 467 (489) ..+++|++.| |--||-|++|||...-. +..-..+++..|+. ..|.|.|. T Consensus 681 m~~GpAarsgkLnIGDQiiaING~SLVGLPLstcQs~Ik~~KnQ--T~VkltiV 732 (829) T KOG3605 681 MHGGPAARSGKLNIGDQIMSINGTSLVGLPLSTCQSIIKGLKNQ--TAVKLNIV 732 (829) T ss_pred CCCCHHHHCCCCCCCCEEEEECCCEECCCCHHHHHHHHHCCCCC--CEEEEEEE T ss_conf 36771654387663222576447211066079999998615554--05888776 No 87 >KOG3571 consensus Probab=95.07 E-value=0.094 Score=29.79 Aligned_cols=71 Identities=18% Similarity=0.408 Sum_probs=46.5 Q ss_pred CCCCCCCCCCHHHHHHHCCCCCCCCEEEECCCCCCCCCCCC-CHHHHHHHHHCCCCCCCCCC------EEEEECCCCCCC Q ss_conf 43320003421667644176444411320111112113467-11678887524314787431------012220356675 Q gi|254780700|r 283 GWFGIMTQNLTQELAIPLGLRGTKGSLITAVVKESPADKAG-MKVGDVICMLDGRIIKSHQD------FVWQIASRSPKE 355 (489) Q Consensus 283 g~lGv~~~~v~~~la~~lgl~~~~GvlV~~V~~~sPA~~AG-Lk~GDvI~~ing~~I~~~~~------l~~~i~~~~~G~ 355 (489) +|||+.+..-+.+ ....|++|..+.+++.-+.-| |.+||.|+.||.....++.. |++++....| T Consensus 261 nfLGisivgqsn~-------rgDggIYVgsImkgGAVA~DGRIe~GDMiLQVNevsFENmSNd~AVrvLREaV~~~gP-- 331 (626) T KOG3571 261 NFLGISIVGQSNA-------RGDGGIYVGSIMKGGAVALDGRIEPGDMILQVNEVSFENMSNDQAVRVLREAVSRPGP-- 331 (626) T ss_pred CCCEEEEECCCCC-------CCCCCEEEEEECCCCEEECCCCCCCCCEEEEEEECCHHHCCCHHHHHHHHHHHCCCCC-- T ss_conf 3220576234466-------7777458864136860311476575533787400123104764999999998636787-- Q ss_pred CEEEEECC Q ss_conf 20101204 Q gi|254780700|r 356 QVKISLCK 363 (489) Q Consensus 356 ~v~l~v~R 363 (489) ++|++-. T Consensus 332 -i~ltvAk 338 (626) T KOG3571 332 -IKLTVAK 338 (626) T ss_pred -EEEEEEE T ss_conf -3788860 No 88 >KOG3571 consensus Probab=94.99 E-value=0.0071 Score=36.94 Aligned_cols=76 Identities=17% Similarity=0.222 Sum_probs=46.7 Q ss_pred CCCCCEEEEECCHHHCCEEEEEEEEECCCCHHHHCC-CCCCCEEEEECCEECCCH--HHHHHHHHHHHHCCCCEEEEEEE Q ss_conf 452546987289657152007999606889789829-998889998899993899--99999999886259956999997 Q gi|254780700|r 391 KELLGMVLQDINDGNKKLVRIVALNPNREREVEAKG-IQKGMTIVSVNTHEVSCI--KDVERLIGKAKEKKRDSVLLQIK 467 (489) Q Consensus 391 ~~~lGl~v~~l~~~~~~~~gi~vv~v~~~s~Aa~~G-L~~GDiIl~VNg~~V~s~--~dl~~iL~~~k~~~~~~VLL~V~ 467 (489) .++||+++.-- .+.+..-+|++-++.+++..+.-| +.+||.|+.||....++. +|..++|.++.... .++.|.|- T Consensus 260 vnfLGisivgq-sn~rgDggIYVgsImkgGAVA~DGRIe~GDMiLQVNevsFENmSNd~AVrvLREaV~~~-gPi~ltvA 337 (626) T KOG3571 260 VNFLGISIVGQ-SNARGDGGIYVGSIMKGGAVALDGRIEPGDMILQVNEVSFENMSNDQAVRVLREAVSRP-GPIKLTVA 337 (626) T ss_pred CCCCEEEEECC-CCCCCCCCEEEEEECCCCEEECCCCCCCCCEEEEEEECCHHHCCCHHHHHHHHHHHCCC-CCEEEEEE T ss_conf 43220576234-46677774588641368603114765755337874001231047649999999986367-87378886 Q ss_pred E Q ss_conf 1 Q gi|254780700|r 468 Y 468 (489) Q Consensus 468 r 468 (489) . T Consensus 338 k 338 (626) T KOG3571 338 K 338 (626) T ss_pred E T ss_conf 0 No 89 >KOG0606 consensus Probab=94.92 E-value=0.051 Score=31.48 Aligned_cols=33 Identities=30% Similarity=0.503 Sum_probs=23.3 Q ss_pred EEECCCCCCCCCCCCCHHHHHHHHHCCCCCCCC Q ss_conf 320111112113467116788875243147874 Q gi|254780700|r 309 LITAVVKESPADKAGMKVGDVICMLDGRIIKSH 341 (489) Q Consensus 309 lV~~V~~~sPA~~AGLk~GDvI~~ing~~I~~~ 341 (489) .|..|.++|||..||+++||.|+.+||+++... T Consensus 661 ~v~sv~egsPA~~agls~~DlIthvnge~v~gl 693 (1205) T KOG0606 661 SVGSVEEGSPAFEAGLSAGDLITHVNGEPVHGL 693 (1205) T ss_pred EEEEECCCCCCCCCCCCCCCEEEECCCCCCCHH T ss_conf 445423788733467772334674168543001 No 90 >KOG4371 consensus Probab=94.88 E-value=0.032 Score=32.80 Aligned_cols=140 Identities=16% Similarity=0.203 Sum_probs=67.2 Q ss_pred CCCHHHHHHHHHCCCCCCCCCCEEEEECCCCCCCCEEEEECCCCCEEEECCCCCCCCC-CCCCCCCC--CCCCCCCCEEE Q ss_conf 6711678887524314787431012220356675201012047816651255655876-31000012--46545254698 Q gi|254780700|r 322 AGMKVGDVICMLDGRIIKSHQDFVWQIASRSPKEQVKISLCKEGSKHSVAVVLGSSPT-AKNDMHLE--VGDKELLGMVL 398 (489) Q Consensus 322 AGLk~GDvI~~ing~~I~~~~~l~~~i~~~~~G~~v~l~v~R~g~~~~~~V~l~~~p~-~~~~~~~~--~~~~~~lGl~v 398 (489) -.|+.||+++.+||..+..-.+....--...-|+.+.|.++|..-... ..-+..... ........ ......+|+.+ T Consensus 1185 pd~~~g~~l~~~n~i~~~~~~~~~~~~~~~~~~~~~~~~~~r~~~~~~-d~~~~s~~~~~~~l~~~~~~~~p~~~~~~~~ 1263 (1332) T KOG4371 1185 PDIRVGDVLLYVNGIAVEGKVHQEVVAMLRGGGDRVVLGVQRPPPAYS-DQHHASSTSASAPLISVMLLKKPMATLGLSL 1263 (1332) T ss_pred CCCCHHHHHHHCCCEEEECHHHHHHHHHHHCCCCEEEEEEECCCCCCC-CCHHHHHHCCCCHHHHHEEEECCCCCCCCCC T ss_conf 873200222113443442024699999875467537887305884224-4222221024430222024504533466653 Q ss_pred EECCHHHCCEEEEEEEEECCCCHHHHCC-CCCCCEEEEECCEECCCHHHHHHHHHHHHHCCCCEEEEEEEE Q ss_conf 7289657152007999606889789829-998889998899993899999999998862599569999971 Q gi|254780700|r 399 QDINDGNKKLVRIVALNPNREREVEAKG-IQKGMTIVSVNTHEVSCIKDVERLIGKAKEKKRDSVLLQIKY 468 (489) Q Consensus 399 ~~l~~~~~~~~gi~vv~v~~~s~Aa~~G-L~~GDiIl~VNg~~V~s~~dl~~iL~~~k~~~~~~VLL~V~r 468 (489) ..-++. -++.+-.....+.|...| +|+||.+.+.+++++..+-- .++|++++--. .++.+.++| T Consensus 1264 ~~~~~s----~~~~~~~~~~~~~a~~~~~~r~g~~~~~~~~~~~~~~~p-~~~l~~~~~v~-~p~~~~~~~ 1328 (1332) T KOG4371 1264 AKRTMS----DGIFIRNIAQDSAASSEGTLRVGDRLVSLDGEPVDGFTP-ATILEKLKLVQ-GPVQITVTR 1328 (1332) T ss_pred CCCCCC----CCEEEECCCCCCCCCCCCCCCCCCEEECCCCCCCCCCCH-HHHHHHHHHCC-CCHHHEEHH T ss_conf 335767----863540111444445544210365533258866789874-89998742114-756210015 No 91 >COG0750 Predicted membrane-associated Zn-dependent proteases 1 [Cell envelope biogenesis, outer membrane] Probab=94.86 E-value=0.0042 Score=38.39 Aligned_cols=58 Identities=26% Similarity=0.436 Sum_probs=46.4 Q ss_pred ECCCCCCCCCCCCCHHHHHHHHHCCCCCCCCCCEEEEECCCCCCCC---EEEEECC-CCCEEE Q ss_conf 0111112113467116788875243147874310122203566752---0101204-781665 Q gi|254780700|r 311 TAVVKESPADKAGMKVGDVICMLDGRIIKSHQDFVWQIASRSPKEQ---VKISLCK-EGSKHS 369 (489) Q Consensus 311 ~~V~~~sPA~~AGLk~GDvI~~ing~~I~~~~~l~~~i~~~~~G~~---v~l~v~R-~g~~~~ 369 (489) ..+..+|||+.+|+++||.|+++|++++.++.+....+.. ..+.. +.+.+.| +++... T Consensus 134 ~~v~~~s~a~~a~l~~Gd~iv~~~~~~i~~~~~~~~~~~~-~~~~~~~~~~i~~~~~~~~~~~ 195 (375) T COG0750 134 GEVAPKSAAALAGLRPGDRIVAVDGEKVASWDDVRRLLVA-AAGDVFNLLTILVIRLDGEAHA 195 (375) T ss_pred CCCCCCCHHHHCCCCCCCEEEECCCEECCCHHHHHHHHHH-CCCCCCCEEEEEEEECCCEEEC T ss_conf 3445476788757888978995085204566677799875-3356555079999832654420 No 92 >pfam03761 DUF316 Domain of unknown function (DUF316). This family of proteins with unknown function are from Caenorhabditis elegans. The protein has GO references indicating the protein is a positive regulator of growth rate and is also involved in nematode larval development. Probab=94.66 E-value=0.31 Score=26.49 Aligned_cols=128 Identities=21% Similarity=0.308 Sum_probs=65.2 Q ss_pred CCCEEEEEECCCCEEEECHHCCCCCCE----------------------------EEEEC---CCCE------EEEECCC Q ss_conf 234027897599629851010478714----------------------------37962---8980------6740111 Q gi|254780700|r 108 LMFGSGFFITDDGYILTSNHIVEDGAS----------------------------FSVIL---SDDT------ELPAKLV 150 (489) Q Consensus 108 ~~~GsG~ii~~~G~ilTn~hvv~~a~~----------------------------i~V~~---~dg~------~~~a~vv 150 (489) ....+|++||+. ||||.+|++-.... +.+.. .+++ ..+|.++ T Consensus 68 ~~~~~gT~IS~R-HiLTss~~~l~~~~~~~~~~~~~~~C~g~~~~l~vP~~~l~~~~v~~~~~~~~~~~~~~~v~ka~il 146 (280) T pfam03761 68 NYKPPATFISTR-HILTSSRLFLNGKMLNWKNTGDNDTCSGGLGHLEVPPEVLDKFDIMDLSKKKGKNSFRDNITRAYVL 146 (280) T ss_pred CEEECEEEEECC-EEEEEEEEEEECCEECCCCCCCCCCCCCCCCCEECCHHHHHCEEEEECCCCCCCCCCCCCEEEEEEE T ss_conf 146220783020-0556623687344101245676755679972424888886166786025556776442551699999 Q ss_pred CCC----CCCCEE----EEEEECCC-CCCCCCCCC-CCCCCCCCEEEEECCCCCCCCCCCCCCCCCCCCCC--CCC--CC Q ss_conf 233----444328----99960676-676556556-73111241467523665531111258744311223--344--34 Q gi|254780700|r 151 GTD----ALFDLA----VLKVQSDR-KFIPVEFED-ANNIRVGEAVFTIGNPFRLRGTVSAGIVSALDRDI--PDR--PG 216 (489) Q Consensus 151 g~D----~~~DlA----vlki~~~~-~~~~~~lg~-s~~~~~G~~v~aiG~P~g~~~tvt~GiiSa~~R~~--~~~--~~ 216 (489) ..- ...+++ +++++.+- .-.++=|++ +...+.++.+..-|.+- ...+ . +|.+ .+. .. T Consensus 147 n~C~~~~~~~~~~~~pMIiEL~~~~~n~s~~Clad~~~~~~~~~~~~~yG~~~--~~~l----~---~~~~~i~~~~~~~ 217 (280) T pfam03761 147 NICANTKSKFDLSAKPMLVELEGPEPNISYPCLADESTSLEKGDAVDVYGIDS--SGEL----K---HRKLNIVNCYSND 217 (280) T ss_pred ECCCCCCCCCCCCCCCEEEEEECCCCCCCCCEECCCHHHCCCCCCEEEECCCC--CCCE----E---ECCCEEEECCCCH T ss_conf 41357766553346646999823534687632056412101476147742588--8733----6---5132044035634 Q ss_pred CEEEEEEEEECCCCCCEEE---ECCCEEEEEE Q ss_conf 2023323320134770354---0343035551 Q gi|254780700|r 217 TFTQIDAPINQGNSGGPCF---NALGHVIGVN 245 (489) Q Consensus 217 ~~iqtDa~InpGnSGGpl~---n~~G~viGin 245 (489) ..+.|+-..-.|.+||||+ |-+.-+||+- T Consensus 218 ~~v~~~~~~~~gd~GG~li~~~~gk~tviGi~ 249 (280) T pfam03761 218 LSIGTDQYLCKGDDGGPLIKNVSGKNTVIGFG 249 (280) T ss_pred HHEECCCCCCCCCCCCCEEEEECCCEEEEEEE T ss_conf 42005577445775761079888948999996 No 93 >KOG0606 consensus Probab=94.62 E-value=0.0057 Score=37.55 Aligned_cols=72 Identities=18% Similarity=0.195 Sum_probs=46.0 Q ss_pred CCCEEEEECCH--HHCCEE--EEEEEEECCCCHHHHCCCCCCCEEEEECCEECCCHHHHHHHHHHHHHCCCCEEEEEE Q ss_conf 25469872896--571520--079996068897898299988899988999938999999999988625995699999 Q gi|254780700|r 393 LLGMVLQDIND--GNKKLV--RIVALNPNREREVEAKGIQKGMTIVSVNTHEVSCIKDVERLIGKAKEKKRDSVLLQI 466 (489) Q Consensus 393 ~lGl~v~~l~~--~~~~~~--gi~vv~v~~~s~Aa~~GL~~GDiIl~VNg~~V~s~~dl~~iL~~~k~~~~~~VLL~V 466 (489) -+|++++.+.- ++..-+ .-.+..+..+|+|..+||+++|.|+.+||+++.... ..++++-+-+++.+ +.+.+ T Consensus 638 ~yGft~~airVy~Gd~d~ytvhh~v~sv~egsPA~~agls~~DlIthvnge~v~gl~-H~ev~~Lll~~gn~-v~~~t 713 (1205) T KOG0606 638 KYGFTLRAIRVYMGDKDVYTVHHSVGSVEEGSPAFEAGLSAGDLITHVNGEPVHGLV-HTEVMELLLKSGNK-VTLRT 713 (1205) T ss_pred CCCCEEEEEEEECCCCCCCEEEEEEEEECCCCCCCCCCCCCCCEEEECCCCCCCHHH-HHHHHHHHHHCCCE-EEEEE T ss_conf 368303567884177542112224454237887334677723346741685430010-89999999714775-58985 No 94 >KOG3606 consensus Probab=94.58 E-value=0.0058 Score=37.53 Aligned_cols=65 Identities=23% Similarity=0.345 Sum_probs=49.8 Q ss_pred CCCCCCCCCCCCCCCCCCCHHHHHHHCCCCCCCCEEEECCCCCCCCCCCC-CHHHHHHHHHCCCCCCC Q ss_conf 00233333343320003421667644176444411320111112113467-11678887524314787 Q gi|254780700|r 274 LISKGRVDHGWFGIMTQNLTQELAIPLGLRGTKGSLITAVVKESPADKAG-MKVGDVICMLDGRIIKS 340 (489) Q Consensus 274 l~~~g~v~rg~lGv~~~~v~~~la~~lgl~~~~GvlV~~V~~~sPA~~AG-Lk~GDvI~~ing~~I~~ 340 (489) |.++|.- + -||+++.+-+.--...-||.+..|++|++..||+-|+..| |-+.|.+++|||.+|.. T Consensus 164 L~khG~e-k-PLGFYIRDG~SVRVtp~GlekvpGIFISRlVpGGLAeSTGLLaVnDEVlEVNGIEVaG 229 (358) T KOG3606 164 LHKHGSE-K-PLGFYIRDGTSVRVTPHGLEKVPGIFISRLVPGGLAESTGLLAVNDEVLEVNGIEVAG 229 (358) T ss_pred HHHCCCC-C-CCEEEEECCCEEEECCCCCCCCCCEEEEEECCCCCCCCCCEEEECCEEEEECCEEECC T ss_conf 5333787-8-7457871684687545553226734788503775201344055324168875778415 No 95 >KOG3550 consensus Probab=94.41 E-value=0.0068 Score=37.05 Aligned_cols=60 Identities=22% Similarity=0.217 Sum_probs=38.8 Q ss_pred EEEEEEEEECCCCHHHHC-CCCCCCEEEEECCEECCCHHHHHHHHHHHHHCCCCEEEEEEEEC Q ss_conf 200799960688978982-99988899988999938999999999988625995699999717 Q gi|254780700|r 408 LVRIVALNPNREREVEAK-GIQKGMTIVSVNTHEVSCIKDVERLIGKAKEKKRDSVLLQIKYD 469 (489) Q Consensus 408 ~~gi~vv~v~~~s~Aa~~-GL~~GDiIl~VNg~~V~s~~dl~~iL~~~k~~~~~~VLL~V~r~ 469 (489) ..-|++..+.+++-|.+- ||+.||-+++|||..|..- ..+++++-+|... .++-|.|++- T Consensus 114 nspiyisriipggvadrhgglkrgdqllsvngvsvege-~hekavellkaa~-gsvklvvryt 174 (207) T KOG3550 114 NSPIYISRIIPGGVADRHGGLKRGDQLLSVNGVSVEGE-HHEKAVELLKAAV-GSVKLVVRYT 174 (207) T ss_pred CCCEEEEEECCCCCCCCCCCCCCCCEEEEECCEEECCH-HHHHHHHHHHHHC-CCEEEEEECC T ss_conf 89647886247752001376445564676546420313-1699999999735-7678987607 No 96 >KOG3551 consensus Probab=94.34 E-value=0.032 Score=32.76 Aligned_cols=60 Identities=22% Similarity=0.346 Sum_probs=0.0 Q ss_pred EEEEEECCCCHHHHCC-CCCCCEEEEECCEECCCHHHHHHHHHHHHHCCCCEEEEEEEECCCC Q ss_conf 7999606889789829-9988899988999938999999999988625995699999717764 Q gi|254780700|r 411 IVALNPNREREVEAKG-IQKGMTIVSVNTHEVSCIKDVERLIGKAKEKKRDSVLLQIKYDPDM 472 (489) Q Consensus 411 i~vv~v~~~s~Aa~~G-L~~GDiIl~VNg~~V~s~~dl~~iL~~~k~~~~~~VLL~V~r~~~~ 472 (489) |++..+-++-.|...+ |..||.|++|||....+... .+++..+|..+++ |+|+|+.-++. T Consensus 112 IlISKIFkGlAADQt~aL~~gDaIlSVNG~dL~~AtH-deAVqaLKraGke-V~levKy~REv 172 (506) T KOG3551 112 ILISKIFKGLAADQTGALFLGDAILSVNGEDLRDATH-DEAVQALKRAGKE-VLLEVKYMREV 172 (506) T ss_pred EEHHHHHCCCCCCCCCCEEECCEEEEECCHHHHHCCH-HHHHHHHHHHCCE-EEEEEEHHHHC T ss_conf 5667751522203236623144799735523332026-9999999861755-32132101214 No 97 >KOG2921 consensus Probab=94.32 E-value=0.0065 Score=37.20 Aligned_cols=51 Identities=12% Similarity=0.087 Sum_probs=28.0 Q ss_pred HCCEEEEEEEEECCCCHHHH-CCCCCCCEEEEECCEECCCHHHHHHHHHHHH Q ss_conf 71520079996068897898-2999888999889999389999999999886 Q gi|254780700|r 405 NKKLVRIVALNPNREREVEA-KGIQKGMTIVSVNTHEVSCIKDVERLIGKAK 455 (489) Q Consensus 405 ~~~~~gi~vv~v~~~s~Aa~-~GL~~GDiIl~VNg~~V~s~~dl~~iL~~~k 455 (489) ...+.++.++++...||+.. .||.+||+|.++||-||++.+|+.+-+..++ T Consensus 216 ya~g~gV~Vtev~~~Spl~gprGL~vgdvitsldgcpV~~v~dW~ecl~tsl 267 (484) T KOG2921 216 YAHGEGVTVTEVPSVSPLFGPRGLSVGDVITSLDGCPVHKVSDWLECLATSL 267 (484) T ss_pred HHCCCEEEEEECCCCCCCCCCCCCCCCCEEEECCCCCCCCHHHHHHHHHHHC T ss_conf 6638507999445557775765677665577537854588889999998644 No 98 >TIGR00225 prc C-terminal processing peptidase; InterPro: IPR004447 Proteolytic enzymes that exploit serine in their catalytic activity are ubiquitous, being found in viruses, bacteria and eukaryotes . They include a wide range of peptidase activity, including exopeptidase, endopeptidase, oligopeptidase and omega-peptidase activity. Over 20 families (denoted S1 - S66) of serine protease have been identified, these being grouped into clans on the basis of structural similarity and other functional evidence . Structures are known for members of the clans and the structures indicate that some appear to be totally unrelated, suggesting different evolutionary origins for the serine peptidases . Not withstanding their different evolutionary origins, there are similarities in the reaction mechanisms of several peptidases. Chymotrypsin, subtilisin and carboxypeptidase C have a catalytic triad of serine, aspartate and histidine in common: serine acts as a nucleophile, aspartate as an electrophile, and histidine as a base . The geometric orientations of the catalytic residues are similar between families, despite different protein folds . The linear arrangements of the catalytic residues commonly reflect clan relationships. For example the catalytic triad in the chymotrypsin clan (PA) is ordered HDS, but is ordered DHS in the subtilisin clan (SB) and SDH in the carboxypeptidase clan (SC) , . Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Each clan is identified with two letters, the first representing the catalytic type of the families included in the clan (with the letter 'P' being used for a clan containing families of more than one of the catalytic types serine, threonine and cysteine). Some families cannot yet be assigned to clans, and when a formal assignment is required, such a family is described as belonging to clan A-, C-, M-, S-, T- or U-, according to the catalytic type. Some clans are divided into subclans because there is evidence of a very ancient divergence within the clan, for example MA(E), the gluzincins, and MA(M), the metzincins. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. The serine, threonine and cysteine peptidases utilise the amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule. This group of serine peptidases belong to MEROPS peptidase family S41 (clan SM), subfamily S41A (C-terminal processing peptidase). It is a family of C-terminal peptidases with different substrates in different species, including processing of D1 protein of the photosystem II reaction centre in higher plants, and cleavage of a peptide of 11 residues from the precursor form of penicillin-binding protein in Escherichia coli.; GO: 0008236 serine-type peptidase activity, 0006508 proteolysis. Probab=94.10 E-value=0.15 Score=28.52 Aligned_cols=12 Identities=8% Similarity=0.603 Sum_probs=5.3 Q ss_pred CCCCCEEEEECC Q ss_conf 112414675236 Q gi|254780700|r 180 IRVGEAVFTIGN 191 (489) Q Consensus 180 ~~~G~~v~aiG~ 191 (489) ++.||.++.++. T Consensus 84 ~~~~d~~~~~~~ 95 (361) T TIGR00225 84 LKPGDKILKVNG 95 (361) T ss_pred CCCCCEEEEECC T ss_conf 666640686167 No 99 >KOG3651 consensus Probab=94.07 E-value=0.017 Score=34.60 Aligned_cols=20 Identities=20% Similarity=0.313 Sum_probs=10.4 Q ss_pred CCEEEEECCEECCCHHHHHH Q ss_conf 88999889999389999999 Q gi|254780700|r 430 GMTIVSVNTHEVSCIKDVER 449 (489) Q Consensus 430 GDiIl~VNg~~V~s~~dl~~ 449 (489) ||==.+.|++|..++.|... T Consensus 367 ~ddeie~n~~p~~~~~dv~~ 386 (429) T KOG3651 367 GDDEIELNDNPLEDLIDVND 386 (429) T ss_pred CCCHHHHCCCCCCCHHHHCC T ss_conf 77334324798677545225 No 100 >KOG1892 consensus Probab=94.07 E-value=0.13 Score=28.89 Aligned_cols=58 Identities=21% Similarity=0.313 Sum_probs=24.7 Q ss_pred CEEEECCCCCCCCCCCC-CHHHHHHHHHCCCCCCCCCCEEEEECCCCCCCCEEEEECCC Q ss_conf 11320111112113467-11678887524314787431012220356675201012047 Q gi|254780700|r 307 GSLITAVVKESPADKAG-MKVGDVICMLDGRIIKSHQDFVWQIASRSPKEQVKISLCKE 364 (489) Q Consensus 307 GvlV~~V~~~sPA~~AG-Lk~GDvI~~ing~~I~~~~~l~~~i~~~~~G~~v~l~v~R~ 364 (489) |++|..|.+|+||+.-| |+.||.+++|||..+-...+-+.+-...+.|..|.|+|-.. T Consensus 961 GIYvKsVV~GgaAd~DGRL~aGDQLLsVdG~SLiGisQErAA~lmtrtg~vV~leVaKq 1019 (1629) T KOG1892 961 GIYVKSVVEGGAADHDGRLEAGDQLLSVDGHSLIGISQERAARLMTRTGNVVHLEVAKQ 1019 (1629) T ss_pred CEEEEEECCCCCCCCCCCCCCCCEEEEECCCCCCCCCHHHHHHHHHCCCCEEEEEHHHH T ss_conf 14798731587545556401576366455820114258889998742487578751323 No 101 >KOG3552 consensus Probab=94.05 E-value=0.14 Score=28.70 Aligned_cols=54 Identities=20% Similarity=0.178 Sum_probs=0.0 Q ss_pred EEEEEECCCCHHHHCCCCCCCEEEEECCEECC--CHHHHHHHHHHHHHCCCCEEEEEEEEC Q ss_conf 79996068897898299988899988999938--999999999988625995699999717 Q gi|254780700|r 411 IVALNPNREREVEAKGIQKGMTIVSVNTHEVS--CIKDVERLIGKAKEKKRDSVLLQIKYD 469 (489) Q Consensus 411 i~vv~v~~~s~Aa~~GL~~GDiIl~VNg~~V~--s~~dl~~iL~~~k~~~~~~VLL~V~r~ 469 (489) ++|..|.+++++.-+ |.+||-|+.|||++|+ .++.+.+++...+.. |+|+|-++ T Consensus 77 viVr~VT~GGps~GK-L~PGDQIl~vN~Epv~daprervIdlvRace~s----v~ltV~qP 132 (1298) T KOG3552 77 VIVRFVTEGGPSIGK-LQPGDQILAVNGEPVKDAPRERVIDLVRACESS----VNLTVCQP 132 (1298) T ss_pred EEEEEECCCCCCCCC-CCCCCEEEEECCCCCCCCCHHHHHHHHHHHHHH----CCEEEECC T ss_conf 699984689876563-367774787468632114388999999987641----03488604 No 102 >KOG3550 consensus Probab=93.96 E-value=0.22 Score=27.42 Aligned_cols=58 Identities=17% Similarity=0.240 Sum_probs=40.8 Q ss_pred CCCEEEECCCCCCCCCCCC-CHHHHHHHHHCCCCCCCCCCEEEEECCCCCCCCEEEEEC Q ss_conf 4411320111112113467-116788875243147874310122203566752010120 Q gi|254780700|r 305 TKGSLITAVVKESPADKAG-MKVGDVICMLDGRIIKSHQDFVWQIASRSPKEQVKISLC 362 (489) Q Consensus 305 ~~GvlV~~V~~~sPA~~AG-Lk~GDvI~~ing~~I~~~~~l~~~i~~~~~G~~v~l~v~ 362 (489) ..-++|+++.|++-|++-| ||.||.++++||..+..-.+-...-..+..-..|+|.+. T Consensus 114 nspiyisriipggvadrhgglkrgdqllsvngvsvege~hekavellkaa~gsvklvvr 172 (207) T KOG3550 114 NSPIYISRIIPGGVADRHGGLKRGDQLLSVNGVSVEGEHHEKAVELLKAAVGSVKLVVR 172 (207) T ss_pred CCCEEEEEECCCCCCCCCCCCCCCCEEEEECCEEECCHHHHHHHHHHHHHCCCEEEEEE T ss_conf 89647886247752001376445564676546420313169999999973576789876 No 103 >KOG0609 consensus Probab=93.84 E-value=0.28 Score=26.77 Aligned_cols=12 Identities=17% Similarity=0.412 Sum_probs=6.0 Q ss_pred HHHHHHHHHHHH Q ss_conf 999999999886 Q gi|254780700|r 444 IKDVERLIGKAK 455 (489) Q Consensus 444 ~~dl~~iL~~~k 455 (489) -.+|+++++++. T Consensus 485 d~~Lq~i~~eS~ 496 (542) T KOG0609 485 DEDLQEIIDESA 496 (542) T ss_pred HHHHHHHHHHHH T ss_conf 799999999999 No 104 >KOG3606 consensus Probab=93.53 E-value=0.3 Score=26.60 Aligned_cols=70 Identities=17% Similarity=0.243 Sum_probs=44.8 Q ss_pred CEEEEECCHHHCCEEEEEEEEECCCCHHHHCC-CCCCCEEEEECCEECC--CHHHHHHHHHHHHHCCCCEEEEEEEE Q ss_conf 46987289657152007999606889789829-9988899988999938--99999999998862599569999971 Q gi|254780700|r 395 GMVLQDINDGNKKLVRIVALNPNREREVEAKG-IQKGMTIVSVNTHEVS--CIKDVERLIGKAKEKKRDSVLLQIKY 468 (489) Q Consensus 395 Gl~v~~l~~~~~~~~gi~vv~v~~~s~Aa~~G-L~~GDiIl~VNg~~V~--s~~dl~~iL~~~k~~~~~~VLL~V~r 468 (489) |..++-......+.-||.+....+++-|+..| |.++|.+++|||.+|. +++++...+-. +. ..+.+.|+. T Consensus 180 G~SVRVtp~GlekvpGIFISRlVpGGLAeSTGLLaVnDEVlEVNGIEVaGKTLDQVTDMMvA---Ns-hNLIiTVkP 252 (358) T KOG3606 180 GTSVRVTPHGLEKVPGIFISRLVPGGLAESTGLLAVNDEVLEVNGIEVAGKTLDQVTDMMVA---NS-HNLIITVKP 252 (358) T ss_pred CCEEEECCCCCCCCCCEEEEEECCCCCCCCCCEEEECCEEEEECCEEECCCCHHHHHHHHHH---CC-CCEEEEECC T ss_conf 84687545553226734788503775201344055324168875778415238887888763---44-643899614 No 105 >pfam02122 Peptidase_S39 Peptidase S39. This family contains polyprotein processing endopeptidases from RNA viruses. Probab=92.81 E-value=0.41 Score=25.74 Aligned_cols=116 Identities=26% Similarity=0.397 Sum_probs=71.5 Q ss_pred EEEECHHCCCCCCEEEEECCCCEEEE---ECCCCCCCCCCEEEEEEEC----CCCCCCCCCCCCCCCCCCCEEE-EECCC Q ss_conf 29851010478714379628980674---0111233444328999606----7667655655673111241467-52366 Q gi|254780700|r 121 YILTSNHIVEDGASFSVILSDDTELP---AKLVGTDALFDLAVLKVQS----DRKFIPVEFEDANNIRVGEAVF-TIGNP 192 (489) Q Consensus 121 ~ilTn~hvv~~a~~i~V~~~dg~~~~---a~vvg~D~~~DlAvlki~~----~~~~~~~~lg~s~~~~~G~~v~-aiG~P 192 (489) -++|++||+.++..+... .+|...+ -+.+..+...|+.+|+... ......+.|..++.+..|..-+ .... T Consensus 43 ~lvt~~h~~~~~~~~~s~-~tg~kipl~eF~~l~~~~~~D~~il~gppnWes~lgck~~~~~t~~~l~~g~a~~~~~~~- 120 (203) T pfam02122 43 ALVTAEHVLSDPSLVLSL-RTGEKIPLAEFKVLLESNLADILILVGPPNWESILGCKAVHFTTADQLAKGPASFYTLRK- 120 (203) T ss_pred EEEEEEEECCCCCEEEEE-CCCCEEEHHHCEEEEECCCCCEEEEECCCCCHHHCCCCCCCEEEHHHHCCCCEEEEEECC- T ss_conf 478878862445168970-258610257745552226765788716996032114242130757886068626999739- Q ss_pred CCCCCCCCCCCCCCCCCCCCCCCCCEEEEEEEEECCCCCCEEEECCCEEEEEECCC Q ss_conf 55311112587443112233443420233233201347703540343035551234 Q gi|254780700|r 193 FRLRGTVSAGIVSALDRDIPDRPGTFTQIDAPINQGNSGGPCFNALGHVIGVNAMI 248 (489) Q Consensus 193 ~g~~~tvt~GiiSa~~R~~~~~~~~~iqtDa~InpGnSGGpl~n~~G~viGint~i 248 (489) + +...+ || .+...++.|.+.=.--.||-||-|.||.+ .++|+.+.- T Consensus 121 -~-~W~~~----~A---ki~g~~~~~~~VlSnT~~G~SG~pyf~gk-~~vGvH~G~ 166 (203) T pfam02122 121 -D-EWPSS----SA---KIPGSEGKFASVLSNTSPGHSGTPYFSGK-NVVGVHKGS 166 (203) T ss_pred -C-EEEEC----CC---EEECCCCCEEEEECCCCCCCCCCCCCCCC-EEEEEEECC T ss_conf -8-48841----33---88056785688873899988888411583-689997066 No 106 >pfam11874 DUF3394 Domain of unknown function (DUF3394). This domain is functionally uncharacterized. This domain is found in bacteria. This presumed domain is about 190 amino acids in length. This domain is found associated with pfam06808. Probab=92.59 E-value=0.27 Score=26.82 Aligned_cols=82 Identities=20% Similarity=0.227 Sum_probs=50.9 Q ss_pred CEEEEECCCCCCCCEEEEECCC---CCEEEECC--CCCCCCCCCCCCCCCCCCCCCCCEEEEECCHHHCCEEEEEEEEEC Q ss_conf 1012220356675201012047---81665125--565587631000012465452546987289657152007999606 Q gi|254780700|r 343 DFVWQIASRSPKEQVKISLCKE---GSKHSVAV--VLGSSPTAKNDMHLEVGDKELLGMVLQDINDGNKKLVRIVALNPN 417 (489) Q Consensus 343 ~l~~~i~~~~~G~~v~l~v~R~---g~~~~~~V--~l~~~p~~~~~~~~~~~~~~~lGl~v~~l~~~~~~~~gi~vv~v~ 417 (489) ++...+....+|+.+.++|.|. |+..+..+ .+++..+ .....+..|+.+.+- .-.+.+-.+. T Consensus 64 ~~~~~~~~~~~~~~lri~v~g~~~~G~~~~~~~~~~i~~~~~-------~~~rl~~~Gl~l~~e------~~~~~vd~~~ 130 (183) T pfam11874 64 ELVQAAEALPAGEELRLRVEGEDLEGDPVEKTVLLPLGEGAD-------GEERLEDAGLTLREE------GGKVIVDEVE 130 (183) T ss_pred HHHHHHHCCCCCCEEEEEEEEECCCCCEEEEEEEEEECCCCC-------HHHHHHHHCCEEEEC------CCCEEEEECC T ss_conf 999999718999869999982178897378999999678984-------664399809779805------9928999548 Q ss_pred CCCHHHHCCCCCCCEEEEEC Q ss_conf 88978982999888999889 Q gi|254780700|r 418 REREVEAKGIQKGMTIVSVN 437 (489) Q Consensus 418 ~~s~Aa~~GL~~GDiIl~VN 437 (489) .+|+|+++|+.-||.|.++- T Consensus 131 f~s~Aek~G~d~d~~I~~v~ 150 (183) T pfam11874 131 FGSPAEKAGIDFDWEIVEVE 150 (183) T ss_pred CCCHHHHHCCCCCCEEEEEE T ss_conf 89868882687786899998 No 107 >KOG3549 consensus Probab=92.51 E-value=0.016 Score=34.62 Aligned_cols=13 Identities=31% Similarity=0.309 Sum_probs=6.7 Q ss_pred CCCEEEECCCCCC Q ss_conf 7816651255655 Q gi|254780700|r 364 EGSKHSVAVVLGS 376 (489) Q Consensus 364 ~g~~~~~~V~l~~ 376 (489) .|+..-++|.++. T Consensus 361 ~ge~~yfsVEl~s 373 (505) T KOG3549 361 GGEPRYFSVELRS 373 (505) T ss_pred CCCCEEEEEEHHH T ss_conf 9984379873555 No 108 >COG5233 GRH1 Peripheral Golgi membrane protein [Intracellular trafficking and secretion] Probab=92.36 E-value=0.15 Score=28.51 Aligned_cols=13 Identities=15% Similarity=0.309 Sum_probs=7.3 Q ss_pred CCCCCCCCCCCHH Q ss_conf 1112113467116 Q gi|254780700|r 314 VKESPADKAGMKV 326 (489) Q Consensus 314 ~~~sPA~~AGLk~ 326 (489) .+++|++.|+|-+ T Consensus 195 I~d~p~a~a~l~P 207 (417) T COG5233 195 IQDKPPAYALLSP 207 (417) T ss_pred CCCCCHHHCCCCC T ss_conf 3788504304687 No 109 >KOG0609 consensus Probab=91.06 E-value=0.063 Score=30.92 Aligned_cols=21 Identities=19% Similarity=0.300 Sum_probs=10.8 Q ss_pred ECCEECCCHHHHHHHHHHHHH Q ss_conf 899993899999999998862 Q gi|254780700|r 436 VNTHEVSCIKDVERLIGKAKE 456 (489) Q Consensus 436 VNg~~V~s~~dl~~iL~~~k~ 456 (489) ||..-=.+.++|..++.++.. T Consensus 511 vN~dld~t~~eL~~~iekl~t 531 (542) T KOG0609 511 VNSDLDKTFRELKTAIEKLRT 531 (542) T ss_pred ECCCHHHHHHHHHHHHHHHCC T ss_conf 747478899999999997404 No 110 >pfam00949 Peptidase_S7 Peptidase S7, Flavivirus NS3 serine protease. The viral genome is a positive strand RNA that encodes a single polyprotein precursor. Processing of the polyprotein precursor into mature proteins is carried out by the host signal peptidase and by NS3 serine protease, which requires NS2B (pfam01002) as a cofactor. Probab=89.71 E-value=0.18 Score=28.00 Aligned_cols=21 Identities=38% Similarity=0.819 Sum_probs=18.9 Q ss_pred EEECCCCCCEEEECCCEEEEE Q ss_conf 320134770354034303555 Q gi|254780700|r 224 PINQGNSGGPCFNALGHVIGV 244 (489) Q Consensus 224 ~InpGnSGGpl~n~~G~viGi 244 (489) -.-+|.||.|+||.+|+++|| T Consensus 109 d~p~GSSGSpI~N~~g~ivGl 129 (150) T pfam00949 109 DFPGGSSGSPIFNQNGQIVGL 129 (150) T ss_pred CCCCCCCCCCEECCCCCEEEE T ss_conf 679999998658689979999 No 111 >KOG3834 consensus Probab=88.10 E-value=0.059 Score=31.09 Aligned_cols=17 Identities=24% Similarity=0.243 Sum_probs=9.3 Q ss_pred HHHHHHHHCCCEEEEEEE Q ss_conf 889999848950899999 Q gi|254780700|r 40 LPPVIARVSPSIVSVMVE 57 (489) Q Consensus 40 ~~~~~~~~~paVV~i~~~ 57 (489) +..+++..... |.+.+. T Consensus 55 Lk~llk~~sek-Vkltv~ 71 (462) T KOG3834 55 LKALLKANSEK-VKLTVY 71 (462) T ss_pred HHHHHHHCCCC-EEEEEE T ss_conf 99988742412-179988 No 112 >pfam02907 Peptidase_S29 Hepatitis C virus NS3 protease. Hepatitis C virus NS3 protein is a serine protease which has a trypsin-like fold. The non-structural (NS) protein NS3 is one of the NS proteins involved in replication of the HCV genome. NS2-3 proteinase, a zinc-dependent enzyme, performs a single proteolytic cut to release the N-terminus of NS3. The action of NS3 proteinase (NS3P), which resides in the N-terminal one-third of the NS3 protein, then yields all remaining non-structural proteins. The C-terminal two-thirds of the NS3 protein contain a helicase. The functional relationship between the proteinase and helicase domains is unknown. NS3 has a structural zinc-binding site and requires cofactor NS4A. Probab=87.67 E-value=0.32 Score=26.37 Aligned_cols=124 Identities=21% Similarity=0.281 Sum_probs=64.9 Q ss_pred CCEEEECHHCCCCCCEEEEECCCCEEEEECCCCCCCCCCEEEEEEE-CCCCCCCCCCCCCCCCCCCCEEEEECCCCCCCC Q ss_conf 9629851010478714379628980674011123344432899960-676676556556731112414675236655311 Q gi|254780700|r 119 DGYILTSNHIVEDGASFSVILSDDTELPAKLVGTDALFDLAVLKVQ-SDRKFIPVEFEDANNIRVGEAVFTIGNPFRLRG 197 (489) Q Consensus 119 ~G~ilTn~hvv~~a~~i~V~~~dg~~~~a~vvg~D~~~DlAvlki~-~~~~~~~~~lg~s~~~~~G~~v~aiG~P~g~~~ 197 (489) +|..-|-||=... =+ ++ |..=+..-...+..-|+..-... +...|.+-+-|.++-- + |-+- T Consensus 20 nGVlwT~yHGags---rt--lA-gp~Gpv~~my~~~~~Dl~g~p~P~Ga~sL~pCtCg~sdly-----l--Vtr~----- 81 (149) T pfam02907 20 NGVLWTVYHGAGS---RT--LA-GPKGPVNQMYTSADDDLVGYPLPPGAGSLTPCTCGSTDLY-----L--VTRD----- 81 (149) T ss_pred CCEEEEEEECCCC---CC--CC-CCCCCCCEEEECCCCCEECCCCCCCCCCCCCCCCCCCCEE-----E--EECC----- T ss_conf 6689998726787---14--13-7888601436625567711228998874362023685179-----9--9545----- Q ss_pred CCCCCCCCCCCCCCCCCCCCEEE-EEEEEECCCCCCEEEECCCEEEEEECCCCCCCCCCCCCCCCCCCCC Q ss_conf 11258744311223344342023-3233201347703540343035551234455322222232112332 Q gi|254780700|r 198 TVSAGIVSALDRDIPDRPGTFTQ-IDAPINQGNSGGPCFNALGHVIGVNAMIVTSGQFHMGVGLIIPLSI 266 (489) Q Consensus 198 tvt~GiiSa~~R~~~~~~~~~iq-tDa~InpGnSGGpl~n~~G~viGint~i~~~~g~~~GigfaIP~~~ 266 (489) .-+|-+ |..++.+..++. .-.+--.|.||||++=-.|++|||-.+..-..|--..+-| +|.+. T Consensus 82 ---~dvip~--rr~gd~~~~L~~p~pis~~kGSSGgPiLC~~GH~VGmf~aavct~gvakai~f-~P~e~ 145 (149) T pfam02907 82 ---GDLIPG--RRRGDPRVSLLSPRPLSDLKGSSGGPILCPSGHVVGMFRAAVCSGGVVKAVRF-VPVET 145 (149) T ss_pred ---CCEEEE--EECCCCEEEEECCCCEEECCCCCCCCEECCCCCEEEEEEEEEECCCCEEEEEE-EECCC T ss_conf ---747667--63489407871465200013788995626898557779798871772667877-77012 No 113 >KOG3938 consensus Probab=86.27 E-value=0.41 Score=25.71 Aligned_cols=56 Identities=20% Similarity=0.310 Sum_probs=42.7 Q ss_pred EEEECCCCCCCCCCCC-CHHHHHHHHHCCCCCCCCCCEE--EEECCCCCCCCEEEEECC Q ss_conf 1320111112113467-1167888752431478743101--222035667520101204 Q gi|254780700|r 308 SLITAVVKESPADKAG-MKVGDVICMLDGRIIKSHQDFV--WQIASRSPKEQVKISLCK 363 (489) Q Consensus 308 vlV~~V~~~sPA~~AG-Lk~GDvI~~ing~~I~~~~~l~--~~i~~~~~G~~v~l~v~R 363 (489) ++|..+.++|--++-- +.+||.|-+|||+.|-.++++. +.|...+-|++.+|.+.- T Consensus 151 AFIKrIkegsvidri~~i~VGd~IEaiNge~ivG~RHYeVArmLKel~rge~ftlrLie 209 (334) T KOG3938 151 AFIKRIKEGSVIDRIEAICVGDHIEAINGESIVGKRHYEVARMLKELPRGETFTLRLIE 209 (334) T ss_pred EEEEEECCCCHHHHHHHEEHHHHHHHHCCCCCCCHHHHHHHHHHHHCCCCCEEEEEEEC T ss_conf 44576158742103120007767876168610134389999999855468805899614 No 114 >pfam03510 Peptidase_C24 2C endopeptidase (C24) cysteine protease family. Probab=85.66 E-value=1.4 Score=22.27 Aligned_cols=58 Identities=9% Similarity=0.157 Sum_probs=41.5 Q ss_pred EEEECCCCEEEECHHCCCCCCEEEEECCCCEEEEECCCCCCCCCCEEEEEEECCCCCCCCCCCCCCCCC Q ss_conf 789759962985101047871437962898067401112334443289996067667655655673111 Q gi|254780700|r 113 GFFITDDGYILTSNHIVEDGASFSVILSDDTELPAKLVGTDALFDLAVLKVQSDRKFIPVEFEDANNIR 181 (489) Q Consensus 113 G~ii~~~G~ilTn~hvv~~a~~i~V~~~dg~~~~a~vvg~D~~~DlAvlki~~~~~~~~~~lg~s~~~~ 181 (489) ++-| .+|..+|+.||...++.+ +|.++ +-.+...|+|++|... ..++++++|++..++ T Consensus 3 avHI-GnG~~vt~tHva~~~~~v-----~g~~f----~~~~~~ge~~~v~~~~-~~~~~~~vg~g~Pv~ 60 (105) T pfam03510 3 AVHI-GNGVYISVTHVASGSDRV-----LGSEF----KDCKTNGETCLVRGPA-ILLPAVQIGSGKPVC 60 (105) T ss_pred EEEE-CCCEEEEEEEEEECCCEE-----CCCCC----EEEECCCCEEEEECCC-CCCCCCEECCCCCEE T ss_conf 2896-796899999884027437-----48586----8885178779997778-899730805799777 No 115 >PRK08927 fliI flagellum-specific ATP synthase; Validated Probab=84.29 E-value=2.7 Score=20.47 Aligned_cols=69 Identities=22% Similarity=0.294 Sum_probs=41.2 Q ss_pred EEEEECCCCEEEECHHCCC---CCCEEEEECCCCEEEEECCCCCCCCCCEEEEEEECCCCCCCCCCCCCCCCCCCCEEEE Q ss_conf 2789759962985101047---8714379628980674011123344432899960676676556556731112414675 Q gi|254780700|r 112 SGFFITDDGYILTSNHIVE---DGASFSVILSDDTELPAKLVGTDALFDLAVLKVQSDRKFIPVEFEDANNIRVGEAVFT 188 (489) Q Consensus 112 sG~ii~~~G~ilTn~hvv~---~a~~i~V~~~dg~~~~a~vvg~D~~~DlAvlki~~~~~~~~~~lg~s~~~~~G~~v~a 188 (489) +|-|..-.|.+++...... -.+...|...||+...|+|++.+ -|.+++.. |+..+-+..|+.|.. T Consensus 18 ~GrV~~V~G~~vev~g~~~~~~iG~~c~I~~~~g~~i~aEVvgf~--~~~~~l~~----------~~~t~Gi~~G~~V~~ 85 (441) T PRK08927 18 YGRVVGVRGLLVEVAGPIHAMSVGARIVVETGDGREIPCEVIGFR--GDRALLMP----------FGPLEGVRRGCRAVI 85 (441) T ss_pred EEEEEEEECEEEEEEECCCCCCCCCEEEEECCCCCEEEEEEEEEC--CCEEEEEE----------CCCCCCCCCCCEEEE T ss_conf 789999977089998057777758889999089988999999885--98799998----------888778899999998 Q ss_pred ECCC Q ss_conf 2366 Q gi|254780700|r 189 IGNP 192 (489) Q Consensus 189 iG~P 192 (489) .|.| T Consensus 86 tg~~ 89 (441) T PRK08927 86 ANAA 89 (441) T ss_pred CCCC T ss_conf 9999 No 116 >pfam11874 DUF3394 Domain of unknown function (DUF3394). This domain is functionally uncharacterized. This domain is found in bacteria. This presumed domain is about 190 amino acids in length. This domain is found associated with pfam06808. Probab=83.59 E-value=0.21 Score=27.55 Aligned_cols=30 Identities=27% Similarity=0.320 Sum_probs=25.1 Q ss_pred CCCEEEECCCCCCCCCCCCCHHHHHHHHHC Q ss_conf 441132011111211346711678887524 Q gi|254780700|r 305 TKGSLITAVVKESPADKAGMKVGDVICMLD 334 (489) Q Consensus 305 ~~GvlV~~V~~~sPA~~AGLk~GDvI~~in 334 (489) ...++|..+..+|||+++|+.-||+|+++- T Consensus 121 ~~~~~vd~~~f~s~Aek~G~d~d~~I~~v~ 150 (183) T pfam11874 121 GGKVIVDEVEFGSPAEKAGIDFDWEIVEVE 150 (183) T ss_pred CCCEEEEECCCCCHHHHHCCCCCCEEEEEE T ss_conf 992899954889868882687786899998 No 117 >TIGR03496 FliI_clade1 flagellar protein export ATPase FliI. Members of this protein family are the FliI protein of bacterial flagellum systems. This protein acts to drive protein export for flagellar biosynthesis. The most closely related family is the YscN family of bacterial type III secretion systems. This model represents one (of three) segment of the FliI family tree. These have been modeled separately in order to exclude the type III secretion ATPases more effectively. Probab=81.96 E-value=3.3 Score=19.91 Aligned_cols=68 Identities=24% Similarity=0.462 Sum_probs=40.2 Q ss_pred CCEEEEECCCCEEEEECCCCCCCCCCEEEEEEECCCCCCCCCCCCCCCCCCCCEEEEECCCCCC--CCCCCCCCCCCCCC Q ss_conf 7143796289806740111233444328999606766765565567311124146752366553--11112587443112 Q gi|254780700|r 132 GASFSVILSDDTELPAKLVGTDALFDLAVLKVQSDRKFIPVEFEDANNIRVGEAVFTIGNPFRL--RGTVSAGIVSALDR 209 (489) Q Consensus 132 a~~i~V~~~dg~~~~a~vvg~D~~~DlAvlki~~~~~~~~~~lg~s~~~~~G~~v~aiG~P~g~--~~tvt~GiiSa~~R 209 (489) .+...|.-.||+...|+|++.+... + ....|++.+.++.|+.|...|.|+.. +..+---++.+++| T Consensus 21 Ge~c~I~~~~g~~i~aEVVgf~~~~--v----------~l~~~~~~~Gi~~G~~V~~tg~~~~v~vg~~lLGRVid~lG~ 88 (411) T TIGR03496 21 GSRCEIESADGDPIEAEVVGFSGDR--V----------LLMPLEDVEGLRPGARVFPLEGPLRLPVGDSLLGRVIDGLGR 88 (411) T ss_pred CCEEEEEECCCCEEEEEEEEECCCE--E----------EEEECCCCCCCCCCCEEEECCCCCEEECCHHHCCCEECCCCC T ss_conf 8889999389978999999972997--9----------999866887888899999789966676387653788578887 Q ss_pred CC Q ss_conf 23 Q gi|254780700|r 210 DI 211 (489) Q Consensus 210 ~~ 211 (489) .+ T Consensus 89 Pl 90 (411) T TIGR03496 89 PL 90 (411) T ss_pred CC T ss_conf 65 No 118 >cd01727 LSm8 The eukaryotic Sm and Sm-like (LSm) proteins associate with RNA to form the core domain of the ribonucleoprotein particles involved in a variety of RNA processing events including pre-mRNA splicing, telomere replication, and mRNA degradation. Members of this family share a highly conserved Sm fold containing an N-terminal helix followed by a strongly bent five-stranded antiparallel beta-sheet. LSm8 is one of at least seven subunits that assemble onto U6 snRNA to form a seven-membered ring structure. Sm-like proteins exist in archaea as well as prokaryotes that form heptameric and hexameric ring structures similar to those found in eukaryotes. Probab=81.65 E-value=3.1 Score=20.08 Aligned_cols=56 Identities=23% Similarity=0.259 Sum_probs=37.3 Q ss_pred CEEEEECCCCEEEEECCCCCCCCCCEEEEEEEC-----CCCCCCCCCCCCCCCCCCCEEEEEC Q ss_conf 143796289806740111233444328999606-----7667655655673111241467523 Q gi|254780700|r 133 ASFSVILSDDTELPAKLVGTDALFDLAVLKVQS-----DRKFIPVEFEDANNIRVGEAVFTIG 190 (489) Q Consensus 133 ~~i~V~~~dg~~~~a~vvg~D~~~DlAvlki~~-----~~~~~~~~lg~s~~~~~G~~v~aiG 190 (489) .++.|.+.|||.+-+.+.|+|+.+.+-+-.... +......++| --+--||.|..+| T Consensus 10 k~V~Vi~~DGR~~vG~L~gfDq~~NlvL~~~~Er~~~~~~~~e~~~lG--l~iiRGdnvvlig 70 (74) T cd01727 10 KTVSVITVDGRVIVGTLKGFDQATNLILDDSHERVYSSDEGVEQVVLG--LYIIRGDNIAVVG 70 (74) T ss_pred CEEEEEECCCCEEEEEEEECCCCCEEEEEEEEEEEEECCCCCEEEEEE--EEEEECCCEEEEE T ss_conf 789999858959999998426732498642599998089984079877--9999668199996 No 119 >pfam00548 Peptidase_C3 3C cysteine protease (picornain 3C). Picornaviral proteins are expressed as a single polyprotein which is cleaved by the viral 3C cysteine protease. Probab=81.19 E-value=3.6 Score=19.69 Aligned_cols=122 Identities=20% Similarity=0.328 Sum_probs=62.2 Q ss_pred EEEECHHCCCCCCEEEEECCCCEEEEE----CCC-CCCCCCCEEEEEEECCCCCCCCCCCCCCCC-CCCCEEEEECCCCC Q ss_conf 298510104787143796289806740----111-233444328999606766765565567311-12414675236655 Q gi|254780700|r 121 YILTSNHIVEDGASFSVILSDDTELPA----KLV-GTDALFDLAVLKVQSDRKFIPVEFEDANNI-RVGEAVFTIGNPFR 194 (489) Q Consensus 121 ~ilTn~hvv~~a~~i~V~~~dg~~~~a----~vv-g~D~~~DlAvlki~~~~~~~~~~lg~s~~~-~~G~~v~aiG~P~g 194 (489) +.|-+.| ....++|.+ ||+.++. +++ ..+..+|++++|++..+.+.=+.-==.+.+ +..+.+++|-+.- T Consensus 36 ~~V~p~H-a~~~~~i~~---~g~~~~v~d~~~lv~~~g~~lelt~v~l~rnekFRDIr~~i~~~~~~~~~~~l~i~~~~- 110 (170) T pfam00548 36 VLVLPRH-ANPGDTIVL---DGKLVKVLDSYELVDRFGVNLELTLVKLKRNEKFRDIRKYLPEDIKKGNEAVLLINNSE- 110 (170) T ss_pred EEEEECC-CCCCCEEEE---CCEEEEEEEEEEEECCCCCEEEEEEEECCCCCCHHHHHHHCCCCCCCCCCCEEEEECCC- T ss_conf 9998668-899988999---99997740359977689987878999938986310045542546777885199997389- Q ss_pred CCCC-CCCCCCCCCCCCCCCC--CCCEEEEEEEEECCCCCCEEEEC-CCEEEEEECC Q ss_conf 3111-1258744311223344--34202332332013477035403-4303555123 Q gi|254780700|r 195 LRGT-VSAGIVSALDRDIPDR--PGTFTQIDAPINQGNSGGPCFNA-LGHVIGVNAM 247 (489) Q Consensus 195 ~~~t-vt~GiiSa~~R~~~~~--~~~~iqtDa~InpGnSGGpl~n~-~G~viGint~ 247 (489) +... +..|-++..+.-..++ ....+-=+++--.|.-||+|+.. .|+++||.++ T Consensus 111 ~~~~i~~vg~v~~~g~i~lsG~~t~r~l~Y~~pTk~G~CGgvl~~~~~gkIlGiHvg 167 (170) T pfam00548 111 FGRLIIPVGFVTYYGFITLSGTPTHRTLSYNAPTKAGQCGGVVIANGTGKILGIHVA 167 (170) T ss_pred CCCEEEECCCCEECCEEECCCCEECCEEEECCCCCCCCCCCEEEECCCCEEEEEEEC T ss_conf 876799742203245670799640335886578989712678997799719999958 No 120 >PRK00737 small nuclear ribonucleoprotein; Provisional Probab=79.83 E-value=1.8 Score=21.66 Aligned_cols=33 Identities=21% Similarity=0.322 Sum_probs=28.7 Q ss_pred CCEEEEECCCCEEEEECCCCCCCCCCEEEEEEE Q ss_conf 714379628980674011123344432899960 Q gi|254780700|r 132 GASFSVILSDDTELPAKLVGTDALFDLAVLKVQ 164 (489) Q Consensus 132 a~~i~V~~~dg~~~~a~vvg~D~~~DlAvlki~ 164 (489) ...++|+|.+|++|.+++.++|...++.+=..+ T Consensus 14 ~k~V~V~Lk~gr~~~G~L~~~D~~mNlVL~da~ 46 (72) T PRK00737 14 NSPVLVRLKGGREFRGELQGYDIHMNLVLANAE 46 (72) T ss_pred CCEEEEEECCCCEEEEEEEEECCCCCEEECCEE T ss_conf 984999998998999999998531117982559 No 121 >pfam08605 Rad9_Rad53_bind Fungal Rad9-like Rad53-binding. In Saccharomyces cerevisiae the Rad9 a key adaptor protein in DNA damage checkpoint pathways. DNA damage induces Rad9 phosphorylation, and Rad53 specifically associates with this region of Rad9, when phosphorylated, via Rad53 pfam00498 domains. This region is structurally composed of a pair of TUDOR domains. Probab=78.79 E-value=3.1 Score=20.15 Aligned_cols=65 Identities=18% Similarity=0.175 Sum_probs=47.0 Q ss_pred EEECHHCCCCCCEEEEECCCCEEEEECCCCCCCCCCEEEEEEECCCCCCCCCCCCC--CCCCCCCEEEEEC Q ss_conf 98510104787143796289806740111233444328999606766765565567--3111241467523 Q gi|254780700|r 122 ILTSNHIVEDGASFSVILSDDTELPAKLVGTDALFDLAVLKVQSDRKFIPVEFEDA--NNIRVGEAVFTIG 190 (489) Q Consensus 122 ilTn~hvv~~a~~i~V~~~dg~~~~a~vvg~D~~~DlAvlki~~~~~~~~~~lg~s--~~~~~G~~v~aiG 190 (489) .||..+++. -+.+|.. ++-+.|+|++++.++..+-.+++.+... ..++.++- =++|+||.|-.=+ T Consensus 4 ~Lt~~dI~~-~~aVW~~-y~~kyYPg~~v~~~~~~~~~~V~Fedg~--~ev~~~dv~~LdLRIGD~Vkvd~ 70 (131) T pfam08605 4 TLTKKDIIF-MDAVWYY-YNLKFYPGKILSKGTSQDGSIVEFEEGT--YEVKNGDLYYLDLRIGDAVKCDM 70 (131) T ss_pred CCCHHHCCC-CCCEEEE-ECCEECCEEEEEECCCCCEEEEEECCCC--EEECCCCEEEEEEECCCEEEECC T ss_conf 055756136-6536889-7353611899987478870799984476--17652513478651189998799 No 122 >PRK13528 outer membrane receptor FepA; Provisional Probab=78.43 E-value=1.8 Score=21.57 Aligned_cols=26 Identities=15% Similarity=0.183 Sum_probs=21.1 Q ss_pred CCHHHHHHHHHHHHHHHHHHHHHHHH Q ss_conf 93027899999999999999975321 Q gi|254780700|r 1 MFKRQILSVKSICTVALTCVIFSSTY 26 (489) Q Consensus 1 m~~r~~~~~~~~~~~~l~~~~~~~~~ 26 (489) ||+++.+.++++|+++|+..+.++.. T Consensus 3 ~~~~~~~~~~~~~~~~l~~~~~a~~~ 28 (727) T PRK13528 3 MRANKILWLLTVVLAGLNSQLSAAES 28 (727) T ss_pred CHHHHHHHHHHHHHHHHHHHHHHHHC T ss_conf 30466999999999987505556533 No 123 >cd01731 archaeal_Sm1 The archaeal sm1 proteins: The Sm proteins are conserved in all three domains of life and are always associated with U-rich RNA sequences. They function to mediate RNA-RNA interactions and RNA biogenesis. All Sm proteins contain a common sequence motif in two segments, Sm1 and Sm2, separated by a short variable linker. Eukaryotic Sm proteins form part of specific small nuclear ribonucleoproteins (snRNPs) that are involved in the processing of pre-mRNAs to mature mRNAs, and are a major component of the eukaryotic spliceosome. Most snRNPs consist of seven Sm proteins (B/B', D1, D2, D3, E, F and G) arranged in a ring on a uridine-rich sequence (Sm site), plus a small nuclear RNA (snRNA) (either U1, U2, U5 or U4/6). Since archaebacteria do not have any splicing apparatus, Sm proteins of archaebacteria may play a more general role. Archaeal Lsm proteins are likely to represent the ancestral Sm domain. Probab=77.35 E-value=2.4 Score=20.81 Aligned_cols=32 Identities=19% Similarity=0.238 Sum_probs=28.2 Q ss_pred CCEEEEECCCCEEEEECCCCCCCCCCEEEEEE Q ss_conf 71437962898067401112334443289996 Q gi|254780700|r 132 GASFSVILSDDTELPAKLVGTDALFDLAVLKV 163 (489) Q Consensus 132 a~~i~V~~~dg~~~~a~vvg~D~~~DlAvlki 163 (489) ...+.|.|.+|++|.+++.++|....+.+=.. T Consensus 10 ~k~V~V~Lk~g~~~~G~L~~~D~~mNlvL~da 41 (68) T cd01731 10 NKPVLVKLKGGKEVRGRLKSYDQHMNLVLEDA 41 (68) T ss_pred CCEEEEEECCCCEEEEEEEEECCCCCEEECCE T ss_conf 98599999899899999999947531898246 No 124 >pfam00944 Peptidase_S3 Alphavirus core protein. Also known as coat protein C and capsid protein C. This makes the literature very confusing. Alphaviruses consist of a nucleoprotein core, a lipid membrane which envelopes the core, and glycoprotein spikes protruding from the lipid membrane. Probab=77.27 E-value=1.2 Score=22.77 Aligned_cols=99 Identities=24% Similarity=0.284 Sum_probs=46.2 Q ss_pred EECCCCCCCCCCEEEEEEECCC--CCCCCCCCCCCCCCCCCEEEEECCCCCCCCCCCCCCCCCCCCCCCCCCCCEEEEEE Q ss_conf 4011123344432899960676--67655655673111241467523665531111258744311223344342023323 Q gi|254780700|r 146 PAKLVGTDALFDLAVLKVQSDR--KFIPVEFEDANNIRVGEAVFTIGNPFRLRGTVSAGIVSALDRDIPDRPGTFTQIDA 223 (489) Q Consensus 146 ~a~vvg~D~~~DlAvlki~~~~--~~~~~~lg~s~~~~~G~~v~aiG~P~g~~~tvt~GiiSa~~R~~~~~~~~~iqtDa 223 (489) +++|.|.=-.-+||-||..-.+ +|.++.+- ..++--..-++--.|.|+.+= ..|-| .-....|.--.. T Consensus 32 P~HVkG~iD~p~LA~lkfkkss~yDlE~a~~P--~~Mksda~~yt~e~p~g~YNw-hhGav-------q~~~grftip~g 101 (157) T pfam00944 32 PLHVKGTIDNPVLAKLKFKKSSKYDLEFAQVP--QNMRSDAFKYTHEKPEGFYNW-HHGAV-------QYSNGRFTVPKG 101 (157) T ss_pred CCCCCCCCCCHHHHHHEECCCCCCCHHHHHCC--HHHHHCCCCCCCCCCCCEECC-CCCEE-------EEECCEEEECCC T ss_conf 31035415887885311011201152132354--434403122432288722123-01117-------975875983346 Q ss_pred EEECCCCCCEEEECCCEEEEEECCCCCCCCCCCCCC Q ss_conf 320134770354034303555123445532222223 Q gi|254780700|r 224 PINQGNSGGPCFNALGHVIGVNAMIVTSGQFHMGVG 259 (489) Q Consensus 224 ~InpGnSGGpl~n~~G~viGint~i~~~~g~~~Gig 259 (489) .=-||.||-|++|-.|+||+|- -.|.+.|-. T Consensus 102 ~g~~GDSGRpi~DN~GrVVaIV-----LGG~neG~r 132 (157) T pfam00944 102 VGGKGDSGRPILDNTGRVVAIV-----LGGANEGSR 132 (157) T ss_pred CCCCCCCCCCCCCCCCCEEEEE-----ECCCCCCCC T ss_conf 7788888981165888789999-----558887872 No 125 >TIGR03497 FliI_clade2 flagellar protein export ATPase FliI. Members of this protein family are the FliI protein of bacterial flagellum systems. This protein acts to drive protein export for flagellar biosynthesis. The most closely related family is the YscN family of bacterial type III secretion systems. This model represents one (of three) segment of the FliI family tree. These have been modeled separately in order to exclude the type III secretion ATPases more effectively. Probab=76.25 E-value=5.1 Score=18.74 Aligned_cols=65 Identities=20% Similarity=0.300 Sum_probs=34.9 Q ss_pred CCCEEEECHHC-CCCCCEEEEECCCCEEEEECCCCCCCCCCEEEEEEECCCCCCCCCCCCCCCCCCCCEEEEECCCCC Q ss_conf 99629851010-478714379628980674011123344432899960676676556556731112414675236655 Q gi|254780700|r 118 DDGYILTSNHI-VEDGASFSVILSDDTELPAKLVGTDALFDLAVLKVQSDRKFIPVEFEDANNIRVGEAVFTIGNPFR 194 (489) Q Consensus 118 ~~G~ilTn~hv-v~~a~~i~V~~~dg~~~~a~vvg~D~~~DlAvlki~~~~~~~~~~lg~s~~~~~G~~v~aiG~P~g 194 (489) -.|.+++.... +.=.+..+|...|+....|+|++.+. |.+++. .|++...+..|++|...|.|+. T Consensus 6 i~G~~iev~g~~~~iGe~c~I~~~~g~~i~aEVv~~~~--~~~~l~----------~~~~t~Gi~~G~~V~~tg~~~~ 71 (413) T TIGR03497 6 VIGLTIESKGPKAKIGELCSILTKGGKPVLAEVVGFKE--ENVLLM----------PLGEVEGIGPGSLVIATGRPLA 71 (413) T ss_pred EEEEEEEEEECCCCCCCEEEEEECCCCEEEEEEEEECC--CEEEEE----------EECCCCCCCCCCEEEECCCCCE T ss_conf 98279999807998567599994899889999999829--979999----------9369878899999998999747 No 126 >cd01717 Sm_B The eukaryotic Sm and Sm-like (LSm) proteins associate with RNA to form core domain of the ribonucleoprotein particles involved in a variety of RNA processing events including pre-mRNA splicing, telomere replication, and mRNA degradation. Members of this family share a highly conserved Sm fold containing an N-terminal helix followed by a strongly bent five-stranded antiparallel beta-sheet. Sm subunit B heterodimerizes with subunit D3 and three such heterodimers form a hexameric ring structure with alternating B and D3 subunits. The D3 - B heterodimer also assembles into a heptameric ring containing D1, D2, E, F, and G subunits. Sm-like proteins exist in archaea as well as prokaryotes which form heptameric and hexameric ring structures similar to those found in eukaryotes. Probab=75.26 E-value=2.7 Score=20.47 Aligned_cols=31 Identities=16% Similarity=0.297 Sum_probs=27.4 Q ss_pred CEEEEECCCCEEEEECCCCCCCCCCEEEEEE Q ss_conf 1437962898067401112334443289996 Q gi|254780700|r 133 ASFSVILSDDTELPAKLVGTDALFDLAVLKV 163 (489) Q Consensus 133 ~~i~V~~~dg~~~~a~vvg~D~~~DlAvlki 163 (489) ..++|++.|||.|.+++.++|....+.+=.. T Consensus 11 ~~vrv~~~DGR~~vG~l~~~D~~~NlVL~~~ 41 (79) T cd01717 11 YRLRVTLQDGRQFVGQFLAFDKHMNLVLSDC 41 (79) T ss_pred CEEEEEEECCCEEEEEEEEECCCCCEEEECC T ss_conf 8799999689599999999747663898374 No 127 >cd01729 LSm7 The eukaryotic Sm and Sm-like (LSm) proteins associate with RNA to form the core domain of the ribonucleoprotein particles involved in a variety of RNA processing events including pre-mRNA splicing, telomere replication, and mRNA degradation. Members of this family share a highly conserved Sm fold containing an N-terminal helix followed by a strongly bent five-stranded antiparallel beta-sheet. LSm7 is one of at least seven subunits that assemble onto U6 snRNA to form a seven-membered ring structure. Sm-like proteins exist in archaea as well as prokaryotes that form heptameric and hexameric ring structures similar to those found in eukaryotes. Probab=74.95 E-value=3.1 Score=20.13 Aligned_cols=32 Identities=22% Similarity=0.298 Sum_probs=28.0 Q ss_pred CEEEEECCCCEEEEECCCCCCCCCCEEEEEEE Q ss_conf 14379628980674011123344432899960 Q gi|254780700|r 133 ASFSVILSDDTELPAKLVGTDALFDLAVLKVQ 164 (489) Q Consensus 133 ~~i~V~~~dg~~~~a~vvg~D~~~DlAvlki~ 164 (489) .++.|++.|||++.+.+.|+|+...+-+=... T Consensus 13 k~V~Vkl~~gR~v~G~L~gfD~~mNLVL~d~~ 44 (81) T cd01729 13 KKIRVKFQGGREVTGILKGYDQLLNLVLDDTV 44 (81) T ss_pred CEEEEEECCCCEEEEEEECCCCCCEEEEEEEE T ss_conf 68999987993999999704662017766359 No 128 >cd01728 LSm1 The eukaryotic Sm and Sm-like (LSm) proteins associate with RNA to form the core domain of the ribonucleoprotein particles involved in a variety of RNA processing events including pre-mRNA splicing, telomere replication, and mRNA degradation. Members of this family share a highly conserved Sm fold containing an N-terminal helix followed by a strongly bent five-stranded antiparallel beta-sheet. LSm1 is one of at least seven subunits that assemble onto U6 snRNA to form a seven-membered ring structure. Sm-like proteins exist in archaea as well as prokaryotes that form heptameric and hexameric ring structures similar to those found in eukaryotes. Probab=73.36 E-value=6 Score=18.28 Aligned_cols=64 Identities=23% Similarity=0.285 Sum_probs=39.1 Q ss_pred CHHCCCC-CCEEEEECCCCEEEEECCCCCCCCCCEEEEEE----ECCCCCCCCCCCCCCCCCCCCEEEEEC Q ss_conf 1010478-71437962898067401112334443289996----067667655655673111241467523 Q gi|254780700|r 125 SNHIVED-GASFSVILSDDTELPAKLVGTDALFDLAVLKV----QSDRKFIPVEFEDANNIRVGEAVFTIG 190 (489) Q Consensus 125 n~hvv~~-a~~i~V~~~dg~~~~a~vvg~D~~~DlAvlki----~~~~~~~~~~lg~s~~~~~G~~v~aiG 190 (489) +++..+. -.++.|.+.|||.+-+.+.++|+...+-+=.. -..+....+++| --+--||-|+.+| T Consensus 4 ~asL~~~ldkkv~V~l~dgR~~~G~Lr~fDq~~NlvL~~~~Eri~~~~~~~~i~~G--~~vIRGdnVvliG 72 (74) T cd01728 4 TASLVDDLDKKVVVLLRDGRKLIGILRSFDQFANLVLQDTVERIYVGDKYGDIPRG--IFIIRGENVVLLG 72 (74) T ss_pred HHHHHHHHCCEEEEEECCCCEEEEEEEEECCCCEEEEEEEEEEEECCCCCCEEEEE--EEEEECCCEEEEE T ss_conf 56878862989999988998999999987465419932058999748853269867--8999779399997 No 129 >TIGR03498 FliI_clade3 flagellar protein export ATPase FliI. Members of this protein family are the FliI protein of bacterial flagellum systems. This protein acts to drive protein export for flagellar biosynthesis. The most closely related family is the YscN family of bacterial type III secretion systems. This model represents one (of three) segment of the FliI family tree. These have been modeled separately in order to exclude the type III secretion ATPases more effectively. Probab=72.40 E-value=6.3 Score=18.14 Aligned_cols=50 Identities=28% Similarity=0.455 Sum_probs=30.5 Q ss_pred CCEEEEECCCCEEEEECCCCCCCCCCEEEEEEECCCCCCCCCCCCCCCCCCCCEEEEECCCC Q ss_conf 71437962898067401112334443289996067667655655673111241467523665 Q gi|254780700|r 132 GASFSVILSDDTELPAKLVGTDALFDLAVLKVQSDRKFIPVEFEDANNIRVGEAVFTIGNPF 193 (489) Q Consensus 132 a~~i~V~~~dg~~~~a~vvg~D~~~DlAvlki~~~~~~~~~~lg~s~~~~~G~~v~aiG~P~ 193 (489) .+..+|...+|+...|+|++.+. |.+++.+ |++.+.+..|+.|...|.|+ T Consensus 23 Ge~c~I~~~~g~~~~aEVvg~~~--~~v~l~~----------~~~t~Gi~~G~~V~~tg~~~ 72 (418) T TIGR03498 23 GDRCAIRARDGRPVLAEVVGFNG--DRVLLMP----------FEPLEGVGLGCAVFAREGPL 72 (418) T ss_pred CCEEEEECCCCCEEEEEEEEECC--CEEEEEE----------ECCCCCCCCCCEEEECCCCC T ss_conf 88899991999889999999819--9899999----------07988989999999689974 No 130 >cd01719 Sm_G The eukaryotic Sm and Sm-like (LSm) proteins associate with RNA to form the core domain of the ribonucleoprotein particles involved in a variety of RNA processing events including pre-mRNA splicing, telomere replication, and mRNA degradation. Members of this family share a highly conserved Sm fold containing an N-terminal helix followed by a strongly bent five-stranded antiparallel beta-sheet. Sm subunit G binds subunits E and F to form a trimer which then assembles onto snRNA along with the D1/D2 and D3/B heterodimers forming a seven-membered ring structure. Sm-like proteins exist in archaea as well as prokaryotes that form heptameric and hexameric ring structures similar to those found in eukaryotes. Probab=72.17 E-value=3.8 Score=19.56 Aligned_cols=31 Identities=19% Similarity=0.347 Sum_probs=27.7 Q ss_pred CEEEEECCCCEEEEECCCCCCCCCCEEEEEE Q ss_conf 1437962898067401112334443289996 Q gi|254780700|r 133 ASFSVILSDDTELPAKLVGTDALFDLAVLKV 163 (489) Q Consensus 133 ~~i~V~~~dg~~~~a~vvg~D~~~DlAvlki 163 (489) .++.|+|.+||++.+.+.|+|+...|.+=.. T Consensus 11 k~v~Vkl~ggR~i~G~L~GfD~~mNLVLdda 41 (72) T cd01719 11 KKLSLKLNGNRKVSGILRGFDPFMNLVLDDA 41 (72) T ss_pred CEEEEEECCCCEEEEEEEEECCCCEEEEEEE T ss_conf 8899998899699999997074202772305 No 131 >cd00600 Sm_like The eukaryotic Sm and Sm-like (LSm) proteins associate with RNA to form the core domain of the ribonucleoprotein particles involved in a variety of RNA processing events including pre-mRNA splicing, telomere replication, and mRNA degradation. Members of this family share a highly conserved Sm fold containing an N-terminal helix followed by a strongly bent five-stranded antiparallel beta-sheet. Sm-like proteins exist in archaea as well as prokaryotes that form heptameric and hexameric ring structures similar to those found in eukaryotes. Probab=71.61 E-value=4.1 Score=19.33 Aligned_cols=33 Identities=30% Similarity=0.392 Sum_probs=29.0 Q ss_pred CCEEEEECCCCEEEEECCCCCCCCCCEEEEEEE Q ss_conf 714379628980674011123344432899960 Q gi|254780700|r 132 GASFSVILSDDTELPAKLVGTDALFDLAVLKVQ 164 (489) Q Consensus 132 a~~i~V~~~dg~~~~a~vvg~D~~~DlAvlki~ 164 (489) ...+.|.+.||+.|.+++.+.|....+.+=.+. T Consensus 6 g~~V~V~l~~g~~~~G~L~~~D~~mNlvL~~~~ 38 (63) T cd00600 6 GKTVRVELKDGRVLEGVLVAFDKYMNLVLDDVE 38 (63) T ss_pred CCEEEEEECCCCEEEEEEEEECCCCCEEECCEE T ss_conf 985999998995999999998886540986799 No 132 >pfam05416 Peptidase_C37 Southampton virus-type processing peptidase. Corresponds to Merops family C37. Norwalk-like viruses (NLVs), including the Southampton virus, cause acute non-bacterial gastroenteritis in humans. The NLV genome encodes three open reading frames (ORFs). ORF1 encodes a polyprotein, which is processed by the viral protease into six proteins. Probab=71.45 E-value=6.6 Score=18.00 Aligned_cols=25 Identities=12% Similarity=0.169 Sum_probs=12.3 Q ss_pred CCCCCCEEEEECCCC-CEEEECCCCC Q ss_conf 566752010120478-1665125565 Q gi|254780700|r 351 RSPKEQVKISLCKEG-SKHSVAVVLG 375 (489) Q Consensus 351 ~~~G~~v~l~v~R~g-~~~~~~V~l~ 375 (489) .+.|+.+++-+.|.. +.+.+.|..+ T Consensus 443 ~PEGtV~silIKR~sGEllPLAvRMg 468 (535) T pfam05416 443 APEGTVASVLIKRASGELLPLAVRMG 468 (535) T ss_pred CCCCCEEEEEECCCCCCEEEEEEEEC T ss_conf 88872589986168887300356623 No 133 >PRK04192 V-type ATP synthase subunit A; Provisional Probab=71.42 E-value=6.6 Score=18.00 Aligned_cols=68 Identities=21% Similarity=0.327 Sum_probs=36.6 Q ss_pred EEEEECCCCEEEECHHCC--CCCCEEEEECCCCEEEEECCCCCCCCCCEEEEEEECCCCCCCCCCCCCCCCCCCCEEEEE Q ss_conf 278975996298510104--787143796289806740111233444328999606766765565567311124146752 Q gi|254780700|r 112 SGFFITDDGYILTSNHIV--EDGASFSVILSDDTELPAKLVGTDALFDLAVLKVQSDRKFIPVEFEDANNIRVGEAVFTI 189 (489) Q Consensus 112 sG~ii~~~G~ilTn~hvv--~~a~~i~V~~~dg~~~~a~vvg~D~~~DlAvlki~~~~~~~~~~lg~s~~~~~G~~v~ai 189 (489) .|.|..-+|-+++...+- .-.+-++| ++....++|+..+ -|.|++++ +.+...+++|+.|... T Consensus 4 ~G~I~~I~GPlV~~e~~~~~~~~EvV~V---G~~~L~GEVI~i~--gd~a~iQV----------yE~T~Gl~~G~~V~~T 68 (585) T PRK04192 4 KGKIVRVSGPLVVAEGMGGARMYEVVKV---GEEGLIGEIIRVR--GDEASIQV----------YEETSGIKPGEPVEFT 68 (585) T ss_pred CCEEEEEECCEEEEEECCCCCCCCEEEE---CCCCEEEEEEEEE--CCEEEEEE----------CCCCCCCCCCCEEEEC T ss_conf 7369999888899952788864667998---8985579999994--99899996----------6688899998988847 Q ss_pred CCCCC Q ss_conf 36655 Q gi|254780700|r 190 GNPFR 194 (489) Q Consensus 190 G~P~g 194 (489) |.|+- T Consensus 69 G~pLs 73 (585) T PRK04192 69 GEPLS 73 (585) T ss_pred CCCEE T ss_conf 99449 No 134 >cd01730 LSm3 The eukaryotic Sm and Sm-like (LSm) proteins associate with RNA to form the core domain of the ribonucleoprotein particles involved in a variety of RNA processing events including pre-mRNA splicing, telomere replication, and mRNA degradation. Members of this family share a highly conserved Sm fold containing an N-terminal helix followed by a strongly bent five-stranded antiparallel beta-sheet. LSm3 is one of at least seven subunits that assemble onto U6 snRNA to form a seven-membered ring structure. Sm-like proteins exist in archaea as well as prokaryotes that form heptameric and hexameric ring structures similar to those found in eukaryotes. Probab=71.13 E-value=3.9 Score=19.44 Aligned_cols=31 Identities=26% Similarity=0.306 Sum_probs=27.5 Q ss_pred CEEEEECCCCEEEEECCCCCCCCCCEEEEEE Q ss_conf 1437962898067401112334443289996 Q gi|254780700|r 133 ASFSVILSDDTELPAKLVGTDALFDLAVLKV 163 (489) Q Consensus 133 ~~i~V~~~dg~~~~a~vvg~D~~~DlAvlki 163 (489) .+|.|++.+||++.+++.|+|....+-+=.+ T Consensus 12 ~~V~VklrggRel~G~L~afD~h~NlVL~d~ 42 (82) T cd01730 12 ERVYVKLRGDRELRGRLHAYDQHLNMILGDV 42 (82) T ss_pred CEEEEEECCCCEEEEEEEEECCEEEEEEECC T ss_conf 8699998799799999997340226885163 No 135 >pfam01423 LSM LSM domain. The LSM domain contains Sm proteins as well as other related LSM (Like Sm) proteins. The U1, U2, U4/U6, and U5 small nuclear ribonucleoprotein particles (snRNPs) involved in pre-mRNA splicing contain seven Sm proteins (B/B', D1, D2, D3, E, F and G) in common, which assemble around the Sm site present in four of the major spliceosomal small nuclear RNAs. The U6 snRNP binds to the LSM (Like Sm) proteins. Sm proteins are also found in archaebacteria, which do not have any splicing apparatus suggesting a more general role for Sm proteins. All Sm proteins contain a common sequence motif in two segments, Sm1 and Sm2, separated by a short variable linker. This family also includes the bacterial Hfq (host factor Q) proteins. Hfq are also RNA-binding proteins, that form hexameric rings. Probab=69.93 E-value=4.9 Score=18.86 Aligned_cols=33 Identities=30% Similarity=0.469 Sum_probs=29.2 Q ss_pred CCEEEEECCCCEEEEECCCCCCCCCCEEEEEEE Q ss_conf 714379628980674011123344432899960 Q gi|254780700|r 132 GASFSVILSDDTELPAKLVGTDALFDLAVLKVQ 164 (489) Q Consensus 132 a~~i~V~~~dg~~~~a~vvg~D~~~DlAvlki~ 164 (489) ...++|.+.||+.|.+++.+.|....+.+=.+. T Consensus 8 ~~~V~V~l~~g~~~~G~L~~~D~~mNlvL~~~~ 40 (66) T pfam01423 8 GKRVTVELKNGRELRGTLKGFDQFMNLVLDDVE 40 (66) T ss_pred CCEEEEEECCCCEEEEEEEEECCCCCEEEEEEE T ss_conf 987999998992999999998899950991699 No 136 >COG1958 LSM1 Small nuclear ribonucleoprotein (snRNP) homolog [Transcription] Probab=69.84 E-value=4.8 Score=18.88 Aligned_cols=32 Identities=28% Similarity=0.374 Sum_probs=28.5 Q ss_pred CEEEEECCCCEEEEECCCCCCCCCCEEEEEEE Q ss_conf 14379628980674011123344432899960 Q gi|254780700|r 133 ASFSVILSDDTELPAKLVGTDALFDLAVLKVQ 164 (489) Q Consensus 133 ~~i~V~~~dg~~~~a~vvg~D~~~DlAvlki~ 164 (489) ..+.|.+.||++|.+++.++|....+.+--+. T Consensus 18 ~~V~V~lk~g~~~~G~L~~~D~~mNlvL~d~~ 49 (79) T COG1958 18 KRVLVKLKNGREYRGTLVGFDQYMNLVLDDVE 49 (79) T ss_pred CEEEEEECCCCEEEEEEEEECCCCCEEEECEE T ss_conf 88999987995999999998475418991429 No 137 >pfam05580 Peptidase_S55 SpoIVB peptidase S55. The protein SpoIVB plays a key role in signalling in the final sigma-K checkpoint of Bacillus subtilis. Probab=69.23 E-value=3.2 Score=19.99 Aligned_cols=43 Identities=23% Similarity=0.363 Sum_probs=29.0 Q ss_pred EEEEEEEEECCCCCCEEEECCCEEEEEECCCCCCCCCCCCCCCCCCC Q ss_conf 02332332013477035403430355512344553222222321123 Q gi|254780700|r 218 FTQIDAPINQGNSGGPCFNALGHVIGVNAMIVTSGQFHMGVGLIIPL 264 (489) Q Consensus 218 ~iqtDa~InpGnSGGpl~n~~G~viGint~i~~~~g~~~GigfaIP~ 264 (489) +++..--|-.|.||.|.+ .+|++||--|..|-+.- . -||.|-+ T Consensus 171 LL~~TGGIVQGMSGSPII-QngKlIGAVTHVfvndP-t--~GYGIfi 213 (219) T pfam05580 171 LLEKTGGIVQGMSGSPII-QNGKIIGAVTHVFVNDP-T--KGYGVYI 213 (219) T ss_pred HHHHCCCEEECCCCCCEE-ECCEEEEEEEEEEECCC-C--CCEEHHH T ss_conf 996519886436789676-58908989989996189-9--6115126 No 138 >TIGR01230 agmatinase agmatinase, putative; InterPro: IPR005925 Members of this family include known and predicted examples of agmatinase (agmatine ureohydrolase, 3.5.3.11 from EC) and members of archaea, for which no definitive agmatinase sequence has yet been made available. However, archaeal sequences are phylogenetically close to the experimentally verified B. subtilis sequence. One species of Halobacterium has been demonstrated in vitro to produce agmatine from arginine, but no putrescine from ornithine, suggesting that arginine decarboxylase and agmatinase, rather than arginase and ornithine decarboxylase, lead from arginine to polyamine biosynthesis.; GO: 0008783 agmatinase activity, 0006596 polyamine biosynthetic process. Probab=68.69 E-value=2.8 Score=20.42 Aligned_cols=68 Identities=26% Similarity=0.398 Sum_probs=47.1 Q ss_pred CCCCCCCCCCEEE---EEEECCCCCCCCCCCCC----CC--------CCCCC-EEEEECCCCCCCCCCCCCCCCCCCCCC Q ss_conf 1112334443289---99606766765565567----31--------11241-467523665531111258744311223 Q gi|254780700|r 148 KLVGTDALFDLAV---LKVQSDRKFIPVEFEDA----NN--------IRVGE-AVFTIGNPFRLRGTVSAGIVSALDRDI 211 (489) Q Consensus 148 ~vvg~D~~~DlAv---lki~~~~~~~~~~lg~s----~~--------~~~G~-~v~aiG~P~g~~~tvt~GiiSa~~R~~ 211 (489) ..+...|..||+. |++-+..+++ +.+|+. +. |+-|. .++|+| -+|+||.+||-|..+ T Consensus 53 ~~ldtsPC~dL~~~~~l~~~D~gd~~-~~~G~~~~~~~~i~~~~~~~L~~gKGf~~~~G----GEH~it~Pv~rA~~~-- 125 (296) T TIGR01230 53 NLLDTSPCRDLALRERLKVVDAGDLP-LAFGDAREMFEKIEEVIEEILEEGKGFPVAIG----GEHSITLPVIRAMKK-- 125 (296) T ss_pred CCCCCCCCCCCCCCCCCCCCCCCCCC-CCCCCHHHHHHHHHHHHHHHHHCCCCEEEEEC----CCCHHHHHHHHHHHC-- T ss_conf 36678765212211410300357745-78888889999999999999870496589865----863235678998733-- Q ss_pred CCCCCC--EEEEEE Q ss_conf 344342--023323 Q gi|254780700|r 212 PDRPGT--FTQIDA 223 (489) Q Consensus 212 ~~~~~~--~iqtDa 223 (489) +.+.+ +||-|| T Consensus 126 -G~~~~~~~v~fDA 138 (296) T TIGR01230 126 -GKFEKFAVVQFDA 138 (296) T ss_pred -CCCCCCEEEEECC T ss_conf -8999607998778 No 139 >smart00651 Sm snRNP Sm proteins. small nuclear ribonucleoprotein particles (snRNPs) involved in pre-mRNA splicing Probab=67.66 E-value=5.8 Score=18.37 Aligned_cols=33 Identities=27% Similarity=0.404 Sum_probs=28.6 Q ss_pred CCEEEEECCCCEEEEECCCCCCCCCCEEEEEEE Q ss_conf 714379628980674011123344432899960 Q gi|254780700|r 132 GASFSVILSDDTELPAKLVGTDALFDLAVLKVQ 164 (489) Q Consensus 132 a~~i~V~~~dg~~~~a~vvg~D~~~DlAvlki~ 164 (489) ...++|.+.||+.+.+++.+.|....+-+=.+. T Consensus 8 ~~~V~V~l~~g~~~~G~L~~~D~~mNlvL~~~~ 40 (67) T smart00651 8 GKRVLVELKNGREYRGTLKGFDQFMNLVLEDVE 40 (67) T ss_pred CCEEEEEECCCCEEEEEEEEECCCCCEEECEEE T ss_conf 987999998996999999998899972987299 No 140 >PRK05922 type III secretion system ATPase; Validated Probab=67.17 E-value=8.2 Score=17.43 Aligned_cols=69 Identities=16% Similarity=0.313 Sum_probs=35.7 Q ss_pred EEEECCCCEEEECHHCCCC-CCEEEEECCCCEEEEECCCCCCCCCCEEEEEEECCCCCCCCCCCCCCCCCCCCEEEEECC Q ss_conf 7897599629851010478-714379628980674011123344432899960676676556556731112414675236 Q gi|254780700|r 113 GFFITDDGYILTSNHIVED-GASFSVILSDDTELPAKLVGTDALFDLAVLKVQSDRKFIPVEFEDANNIRVGEAVFTIGN 191 (489) Q Consensus 113 G~ii~~~G~ilTn~hvv~~-a~~i~V~~~dg~~~~a~vvg~D~~~DlAvlki~~~~~~~~~~lg~s~~~~~G~~v~aiG~ 191 (489) |-|.+-.|.++...-.-.. .+-.+|...++....|+|+|.+ -|.++|.. |++.+.+..|+.|...|. T Consensus 21 GrV~~V~G~~ie~~g~~~~iGelc~I~~~~~~~i~aeVvgf~--~~~~~l~p----------~~~~~Gi~~G~~V~~~g~ 88 (434) T PRK05922 21 GLLSRVSGNLLEAQGLSACLGELCQISLPKSPPILAEVIGFH--NQTTLLMS----------LSPIHYVALGAEVLPLRR 88 (434) T ss_pred EEEEEEECEEEEEEECCCCCCCEEEEEECCCCEEEEEEEEEE--CCEEEEEE----------CCCCCCCCCCCEEEECCC T ss_conf 799999664999980687879859998189987899999872--99799997----------777667899999998999 Q ss_pred CC Q ss_conf 65 Q gi|254780700|r 192 PF 193 (489) Q Consensus 192 P~ 193 (489) |+ T Consensus 89 ~~ 90 (434) T PRK05922 89 PP 90 (434) T ss_pred CC T ss_conf 87 No 141 >cd01722 Sm_F The eukaryotic Sm and Sm-like (LSm) proteins associate with RNA to form core domain of the ribonucleoprotein particles involved in a variety of RNA processing events including pre-mRNA splicing, telomere replication, and mRNA degradation. Members of this family share a highly conserved Sm fold containing an N-terminal helix followed by a strongly bent five-stranded antiparallel beta-sheet. Sm subunit F is capable of forming both homo- and hetero-heptamer ring structures. To form the hetero-heptamer, Sm subunit F initially binds subunits E and G to form a trimer which then assembles onto snRNA along with the D3/B and D1/D2 heterodimers. Probab=67.05 E-value=5.4 Score=18.57 Aligned_cols=33 Identities=24% Similarity=0.330 Sum_probs=28.4 Q ss_pred CCEEEEECCCCEEEEECCCCCCCCCCEEEEEEE Q ss_conf 714379628980674011123344432899960 Q gi|254780700|r 132 GASFSVILSDDTELPAKLVGTDALFDLAVLKVQ 164 (489) Q Consensus 132 a~~i~V~~~dg~~~~a~vvg~D~~~DlAvlki~ 164 (489) ...+.|+|.+|++|.+++++.|....+++=..+ T Consensus 11 gk~V~V~LK~G~~y~G~L~s~D~~MNl~L~~a~ 43 (68) T cd01722 11 GKPVIVKLKWGMEYKGTLVSVDSYMNLQLANTE 43 (68) T ss_pred CCEEEEEECCCCEEEEEEEEECCCEEEEECCCE T ss_conf 982999988998999999997242655980319 No 142 >COG0821 gcpE 1-hydroxy-2-methyl-2-(e)-butenyl 4-diphosphate synthase [Lipid metabolism] Probab=66.63 E-value=0.38 Score=25.91 Aligned_cols=64 Identities=14% Similarity=0.225 Sum_probs=27.5 Q ss_pred CCCCCCCCCCCCCCCCCCCCCCCCCCHHHHHHHCCCCCCCCEEEECCCCCCCCCCCCCHHHHHHHHHC Q ss_conf 11001000023333334332000342166764417644441132011111211346711678887524 Q gi|254780700|r 267 IKKAIPSLISKGRVDHGWFGIMTQNLTQELAIPLGLRGTKGSLITAVVKESPADKAGMKVGDVICMLD 334 (489) Q Consensus 267 ~~~i~~~l~~~g~v~rg~lGv~~~~v~~~la~~lgl~~~~GvlV~~V~~~sPA~~AGLk~GDvI~~in 334 (489) ++.+++..+++|...| +|+....+..++.+.++-+.+++.+-+....-.-+++-|+. |+++|+- T Consensus 112 v~~vVe~Ak~~g~piR--IGVN~GSLek~~~~ky~~pt~ealveSAl~~a~~~e~l~f~--~i~iS~K 175 (361) T COG0821 112 VREVVEAAKDKGIPIR--IGVNAGSLEKRLLEKYGGPTPEALVESALEHAELLEELGFD--DIKVSVK 175 (361) T ss_pred HHHHHHHHHHCCCCEE--EECCCCCHHHHHHHHHCCCCHHHHHHHHHHHHHHHHHCCCC--CEEEEEE T ss_conf 9999999997599879--95266861699999854798789999999999999977998--6799987 No 143 >PRK08972 fliI flagellum-specific ATP synthase; Validated Probab=66.52 E-value=8.4 Score=17.35 Aligned_cols=71 Identities=25% Similarity=0.318 Sum_probs=38.7 Q ss_pred CEEEEEECCCCEEEECHHCCCC-CCEEEEECCCCEEEEECCCCCCCCCCEEEEEEECCCCCCCCCCCCCCCCCCCCEEEE Q ss_conf 4027897599629851010478-714379628980674011123344432899960676676556556731112414675 Q gi|254780700|r 110 FGSGFFITDDGYILTSNHIVED-GASFSVILSDDTELPAKLVGTDALFDLAVLKVQSDRKFIPVEFEDANNIRVGEAVFT 188 (489) Q Consensus 110 ~GsG~ii~~~G~ilTn~hvv~~-a~~i~V~~~dg~~~~a~vvg~D~~~DlAvlki~~~~~~~~~~lg~s~~~~~G~~v~a 188 (489) .-||-+..-.|.++.....-.. .+-..|...||+ ..|+|+|.+. |-++|. +|++...+..|+.|.. T Consensus 22 ~~sGrv~~v~G~~ie~~g~~~~iG~~c~i~~~~g~-~~aEVvgf~~--~~~~l~----------p~~~~~Gi~~G~~V~~ 88 (440) T PRK08972 22 VASGQLVRVVGLTLEATGCRAPVGSLCSIETMAGE-LEAEVVGFDG--DLLYLM----------PIEELRGVLPGARVTP 88 (440) T ss_pred CCEEEEEEEEEEEEEEEECCCCCCCEEEEECCCCC-EEEEEEEECC--CEEEEE----------ECCCCCCCCCCCEEEE T ss_conf 66048999982589998168987887899849982-8999999829--979999----------8888888899999997 Q ss_pred ECCCC Q ss_conf 23665 Q gi|254780700|r 189 IGNPF 193 (489) Q Consensus 189 iG~P~ 193 (489) .|.|+ T Consensus 89 tg~~~ 93 (440) T PRK08972 89 LGEQS 93 (440) T ss_pred CCCCC T ss_conf 89986 No 144 >PRK00366 ispG 4-hydroxy-3-methylbut-2-en-1-yl diphosphate synthase; Reviewed Probab=66.07 E-value=0.41 Score=25.74 Aligned_cols=30 Identities=23% Similarity=0.281 Sum_probs=17.2 Q ss_pred EECCCCHHHHC--CC--CCCCEEEEECCEECCCH Q ss_conf 60688978982--99--98889998899993899 Q gi|254780700|r 415 NPNREREVEAK--GI--QKGMTIVSVNTHEVSCI 444 (489) Q Consensus 415 ~v~~~s~Aa~~--GL--~~GDiIl~VNg~~V~s~ 444 (489) .|.-.+.|..+ |+ -+|-.++-.+|+.++.+ T Consensus 306 vVNGPGEak~ADiGiagg~~~~~lf~~G~~~~~v 339 (367) T PRK00366 306 VVNGPGEAKHADIGIAGGNGKGPVFVDGEKIKTL 339 (367) T ss_pred EECCCCHHHHCCEEEECCCCCEEEEECCEEEEEC T ss_conf 3117650221777265698835799899981344 No 145 >TIGR01171 rplB_bact ribosomal protein L2; InterPro: IPR005880 Ribosomes are the particles that catalyse mRNA-directed protein synthesis in all organisms. The codons of the mRNA are exposed on the ribosome to allow tRNA binding. This leads to the incorporation of amino acids into the growing polypeptide chain in accordance with the genetic information. Incoming amino acid monomers enter the ribosomal A site in the form of aminoacyl-tRNAs complexed with elongation factor Tu (EF-Tu) and GTP. The growing polypeptide chain, situated in the P site as peptidyl-tRNA, is then transferred to aminoacyl-tRNA and the new peptidyl-tRNA, extended by one residue, is translocated to the P site with the aid the elongation factor G (EF-G) and GTP as the deacylated tRNA is released from the ribosome through one or more exit sites , . About 2/3 of the mass of the ribosome consists of RNA and 1/3 of protein. The proteins are named in accordance with the subunit of the ribosome which they belong to - the small (S1 to S31) and the large (L1 to L44). Usually they decorate the rRNA cores of the subunits. Many of ribosomal proteins, particularly those of the large subunit, are composed of a globular, surfaced-exposed domain with long finger-like projections that extend into the rRNA core to stabilise its structure. Most of the proteins interact with multiple RNA elements, often from different domains. In the large subunit, about 1/3 of the 23S rRNA nucleotides are at least in van der Waal's contact with protein, and L22 interacts with all six domains of the 23S rRNA. Proteins S4 and S7, which initiate assembly of the 16S rRNA, are located at junctions of five and four RNA helices, respectively. In this way proteins serve to organise and stabilise the rRNA tertiary structure. While the crucial activities of decoding and peptide transfer are RNA based, proteins play an active role in functions that may have evolved to streamline the process of protein synthesis. In addition to their function in the ribosome, many ribosomal proteins have some function 'outside' the ribosome , . The protein L2 is found in all ribosomes and is one of the best conserved proteins of this mega-dalton complex. L2 is elongated, exposing one end of the protein to the surface of the intersubunit interface of the 50 S subunit and is essential for the association of the ribosomal subunits and might participate in the binding and translocation of the tRNAs . This entry represents bacterial, chloroplast and mitochondrial forms.; GO: 0003723 RNA binding, 0003735 structural constituent of ribosome, 0016740 transferase activity, 0006412 translation, 0015934 large ribosomal subunit. Probab=65.18 E-value=8.9 Score=17.19 Aligned_cols=136 Identities=17% Similarity=0.246 Sum_probs=90.2 Q ss_pred EEECHHCCCCCCEEE--EECC--CCEEEEECCC--CCCCC--CCEEEEEEECCCCCCCCCCCCCCCCCCCCEEEEE-C-C Q ss_conf 985101047871437--9628--9806740111--23344--4328999606766765565567311124146752-3-6 Q gi|254780700|r 122 ILTSNHIVEDGASFS--VILS--DDTELPAKLV--GTDAL--FDLAVLKVQSDRKFIPVEFEDANNIRVGEAVFTI-G-N 191 (489) Q Consensus 122 ilTn~hvv~~a~~i~--V~~~--dg~~~~a~vv--g~D~~--~DlAvlki~~~~~~~~~~lg~s~~~~~G~~v~ai-G-~ 191 (489) .+|.-|-=.+..+.+ |=|. |-...+|+|. =+||. .-||||.=++.+ --|+ |+ ..-|++||.|++- - . T Consensus 48 rIT~RHrGGGHKr~YR~IDFKR~~K~~I~AkV~~IEYDPNRsA~IALl~Y~DGE-KRYI-La-P~Gl~vGd~v~SG~~~a 124 (279) T TIGR01171 48 RITSRHRGGGHKRLYRIIDFKRRDKDGIPAKVAAIEYDPNRSARIALLHYADGE-KRYI-LA-PKGLKVGDTVISGEPEA 124 (279) T ss_pred EEEEEEECCCCCCCCCEEEEEECCCCCCCEEEEEEEECCCCCEEEEEEECCCCC-EEEE-EC-CCCCCCCCEEEECCCCC T ss_conf 188887158844124314321036578735999972278766022244327876-7664-32-78777078898678788 Q ss_pred CCCCCCCCCCCCCCCCCCCCCCCCCCEEEEEEEEECCCCCCEEEECCCEEEEEECCCCCCCC-CCCCCCCCCCCCCCCCC Q ss_conf 65531111258744311223344342023323320134770354034303555123445532-22222321123321100 Q gi|254780700|r 192 PFRLRGTVSAGIVSALDRDIPDRPGTFTQIDAPINQGNSGGPCFNALGHVIGVNAMIVTSGQ-FHMGVGLIIPLSIIKKA 270 (489) Q Consensus 192 P~g~~~tvt~GiiSa~~R~~~~~~~~~iqtDa~InpGnSGGpl~n~~G~viGint~i~~~~g-~~~GigfaIP~~~~~~i 270 (489) |+-.++++..- .++-+ .+=+..-++|| .||=|+=..|-.+ +|++.++ .|+-| =+|+-.++.| T Consensus 125 ~IK~GNaLPL~-------~IP~G---t~VHNiEl~PG-kGGQlaRSAG~~a----qi~aKe~~~Yv~l--rLpSGE~R~v 187 (279) T TIGR01171 125 PIKPGNALPLK-------NIPVG---TTVHNIELKPG-KGGQLARSAGTSA----QILAKEGTKYVTL--RLPSGEVRMV 187 (279) T ss_pred CCCCCCCCCCC-------CCCCC---EEEEEEEEEEC-CCCHHHEECCCEE----EEEEECCCCEEEE--EECCCCEEEE T ss_conf 80332227756-------47624---06888988307-9603221114078----9897638754999--8268724542 Q ss_pred CCCCCCC Q ss_conf 1000023 Q gi|254780700|r 271 IPSLISK 277 (489) Q Consensus 271 ~~~l~~~ 277 (489) ..+.++. T Consensus 188 ~~~C~AT 194 (279) T TIGR01171 188 LKECRAT 194 (279) T ss_pred CCCCEEE T ss_conf 2332274 No 146 >cd01726 LSm6 The eukaryotic Sm and Sm-like (LSm) proteins associate with RNA to form the core domain of the ribonucleoprotein particles involved in a variety of RNA processing events including pre-mRNA splicing, telomere replication, and mRNA degradation. Members of this family share a highly conserved Sm fold containing an N-terminal helix followed by a strongly bent five-stranded antiparallel beta-sheet. LSm6 is one of at least seven subunits that assemble onto U6 snRNA to form a seven-membered ring structure. Sm-like proteins exist in archaea as well as prokaryotes that form heptameric and hexameric ring structures similar to those found in eukaryotes. Probab=64.57 E-value=6.3 Score=18.15 Aligned_cols=33 Identities=18% Similarity=0.302 Sum_probs=28.3 Q ss_pred CCEEEEECCCCEEEEECCCCCCCCCCEEEEEEE Q ss_conf 714379628980674011123344432899960 Q gi|254780700|r 132 GASFSVILSDDTELPAKLVGTDALFDLAVLKVQ 164 (489) Q Consensus 132 a~~i~V~~~dg~~~~a~vvg~D~~~DlAvlki~ 164 (489) ...+.|+|.+|.+|.+++...|...++++=..+ T Consensus 10 gk~V~VkLk~G~ey~G~L~s~D~~MNl~L~~ae 42 (67) T cd01726 10 GRPVVVKLNSGVDYRGILACLDGYMNIALEQTE 42 (67) T ss_pred CCEEEEEECCCCEEEEEEEEECCEEEEEECCEE T ss_conf 990999988998989999988560846871349 No 147 >PRK07196 fliI flagellum-specific ATP synthase; Validated Probab=63.27 E-value=9.7 Score=16.97 Aligned_cols=70 Identities=24% Similarity=0.298 Sum_probs=41.4 Q ss_pred EEEEECCCCEEEECHHCCC-CCCEEEEECCCCEEEEECCCCCCCCCCEEEEEEECCCCCCCCCCCCCCCCCCCCEEEEEC Q ss_conf 2789759962985101047-871437962898067401112334443289996067667655655673111241467523 Q gi|254780700|r 112 SGFFITDDGYILTSNHIVE-DGASFSVILSDDTELPAKLVGTDALFDLAVLKVQSDRKFIPVEFEDANNIRVGEAVFTIG 190 (489) Q Consensus 112 sG~ii~~~G~ilTn~hvv~-~a~~i~V~~~dg~~~~a~vvg~D~~~DlAvlki~~~~~~~~~~lg~s~~~~~G~~v~aiG 190 (489) +|-+..-.|.+++....-. -.+...|...|+....|+|+|.+. |.++|. +|+..+.+..|++|...| T Consensus 18 ~GrV~~i~G~~ie~~g~~~~iG~~c~I~~~~g~~v~aEVVgf~~--~~~~L~----------p~~~~~Gi~~G~~V~~~g 85 (434) T PRK07196 18 AGRLVRVTGLLLESVGCRLAIGQRCRIESVDETFIEAQVVGFDR--DITYLM----------PFKHPGGVLGGARVFPSE 85 (434) T ss_pred EEEEEEEECCEEEEECCCCCCCCEEEEEECCCCEEEEEEEEECC--CEEEEE----------ECCCCCCCCCCCEEEECC T ss_conf 88999997108999806989899899980899888999999819--969999----------888877889999999889 Q ss_pred CCC Q ss_conf 665 Q gi|254780700|r 191 NPF 193 (489) Q Consensus 191 ~P~ 193 (489) .|. T Consensus 86 ~~~ 88 (434) T PRK07196 86 QDG 88 (434) T ss_pred CCC T ss_conf 987 No 148 >COG0260 PepB Leucyl aminopeptidase [Amino acid transport and metabolism] Probab=61.83 E-value=1.6 Score=21.89 Aligned_cols=41 Identities=24% Similarity=0.387 Sum_probs=28.2 Q ss_pred HHHCCCCCCCCEEEECCCCCCCCCCCCCHHHHHHHHHCCCCCC Q ss_conf 6441764444113201111121134671167888752431478 Q gi|254780700|r 297 AIPLGLRGTKGSLITAVVKESPADKAGMKVGDVICMLDGRIIK 339 (489) Q Consensus 297 a~~lgl~~~~GvlV~~V~~~sPA~~AGLk~GDvI~~ing~~I~ 339 (489) ...++++..- +.|....++.|...| .||||||++.||+.|. T Consensus 290 ~a~l~l~vnv-~~vl~~~ENm~~g~A-~rPGDVits~~GkTVE 330 (485) T COG0260 290 LAELKLPVNV-VGVLPAVENMPSGNA-YRPGDVITSMNGKTVE 330 (485) T ss_pred HHHCCCCCEE-EEEEEEECCCCCCCC-CCCCCEEEECCCEEEE T ss_conf 9971999349-999761315878789-9998767805970899 No 149 >pfam02601 Exonuc_VII_L Exonuclease VII, large subunit. This family consist of exonuclease VII, large subunit EC:3.1.11.6 This enzyme catalyses exonucleolytic cleavage in either 5'-3' or 3'-5' direction to yield 5'-phosphomononucleotides. This exonuclease VII enzyme is composed of one large subunit and 4 small ones. Probab=61.47 E-value=10 Score=16.76 Aligned_cols=21 Identities=19% Similarity=0.455 Sum_probs=14.8 Q ss_pred CCCCCEEEE-ECCEECCCHHHH Q ss_conf 998889998-899993899999 Q gi|254780700|r 427 IQKGMTIVS-VNTHEVSCIKDV 447 (489) Q Consensus 427 L~~GDiIl~-VNg~~V~s~~dl 447 (489) |..|--|+. =||+.|++.+++ T Consensus 255 L~RGYaiv~~~~gkiI~s~~~l 276 (295) T pfam02601 255 LKRGFAIVRRKDGKIVTSAAEL 276 (295) T ss_pred HHCCEEEEECCCCCEECCHHHC T ss_conf 8486089996999997488997 No 150 >PRK00286 xseA exodeoxyribonuclease VII large subunit; Reviewed Probab=60.81 E-value=11 Score=16.69 Aligned_cols=21 Identities=19% Similarity=0.406 Sum_probs=16.6 Q ss_pred CCCCCEEEE-ECCEECCCHHHH Q ss_conf 998889998-899993899999 Q gi|254780700|r 427 IQKGMTIVS-VNTHEVSCIKDV 447 (489) Q Consensus 427 L~~GDiIl~-VNg~~V~s~~dl 447 (489) |+.|--|+. -||+.|++.+++ T Consensus 396 L~RGYaiv~~~~gkiI~s~~~l 417 (443) T PRK00286 396 LARGYAIVRDEDGKVIRSAKQL 417 (443) T ss_pred HCCCEEEEECCCCCEECCHHHC T ss_conf 7495599992999997488997 No 151 >PRK05015 aminopeptidase B; Provisional Probab=60.74 E-value=4.7 Score=18.95 Aligned_cols=41 Identities=27% Similarity=0.442 Sum_probs=28.3 Q ss_pred HHHCCCCCCCCEEEECCCCCCCCCCCCCHHHHHHHHHCCCCCC Q ss_conf 6441764444113201111121134671167888752431478 Q gi|254780700|r 297 AIPLGLRGTKGSLITAVVKESPADKAGMKVGDVICMLDGRIIK 339 (489) Q Consensus 297 a~~lgl~~~~GvlV~~V~~~sPA~~AGLk~GDvI~~ing~~I~ 339 (489) +..++++.. -..+....++.|...| .|+||||.+.||+.|. T Consensus 228 a~~~~l~~~-V~~il~~aENm~sg~A-~rPGDVit~~nGkTVE 268 (424) T PRK05015 228 AITRGLNKR-VKLFLCCAENLISGNA-FKLGDIITYKNGKTVE 268 (424) T ss_pred HHHCCCCCE-EEEEEEEECCCCCCCC-CCCHHHHHHCCCCEEE T ss_conf 997299960-8999885505778778-8838889763997799 No 152 >pfam01727 consensus Probab=60.47 E-value=8.1 Score=17.44 Aligned_cols=32 Identities=22% Similarity=0.318 Sum_probs=26.5 Q ss_pred CEEEEEEEEECCCCCCEEEECCCEEEEEECCC Q ss_conf 20233233201347703540343035551234 Q gi|254780700|r 217 TFTQIDAPINQGNSGGPCFNALGHVIGVNAMI 248 (489) Q Consensus 217 ~~iqtDa~InpGnSGGpl~n~~G~viGint~i 248 (489) .++-+|.-+-.|.||.++||.+=++.||=.+. T Consensus 22 Gl~l~dtnl~gGSSGSlv~Nq~kQI~gIYFgv 53 (81) T pfam01727 22 GLALNDTNLPGGSSGSLVFNQDKQISGIYFGV 53 (81) T ss_pred EEEEECCCCCCCCCCCEEECCCCEEEEEEEEE T ss_conf 05760366699876665774777487899998 No 153 >KOG3627 consensus Probab=59.51 E-value=6.2 Score=18.21 Aligned_cols=136 Identities=20% Similarity=0.268 Sum_probs=68.1 Q ss_pred CEEEEEECCCCEEEECHHCCCCCC--EEEEECC---------CC---EEE-EECCC---CCC---CC-CCEEEEEEECCC Q ss_conf 402789759962985101047871--4379628---------98---067-40111---233---44-432899960676 Q gi|254780700|r 110 FGSGFFITDDGYILTSNHIVEDGA--SFSVILS---------DD---TEL-PAKLV---GTD---AL-FDLAVLKVQSDR 167 (489) Q Consensus 110 ~GsG~ii~~~G~ilTn~hvv~~a~--~i~V~~~---------dg---~~~-~a~vv---g~D---~~-~DlAvlki~~~~ 167 (489) ...|.+|++. ||+|++|.+.+.. .+.|.+. ++ ... ..+++ .++ .. .|+|+|+++.+- T Consensus 39 ~Cggsli~~~-~vltaaHC~~~~~~~~~~V~~G~~~~~~~~~~~~~~~~~~v~~~i~H~~y~~~~~~~nDiall~l~~~v 117 (256) T KOG3627 39 LCGGSLISPR-WVLTAAHCVKGASASLYTVRLGEHDINLSVSEGEEQLVGDVEKIIVHPNYNPRTLENNDIALLRLSEPV 117 (256) T ss_pred EEEEEEECCC-EEEECHHHCCCCCCCCEEEEECCCCCCCCCCCCCCEEEEEEEEEEECCCCCCCCCCCCCEEEEEECCCC T ss_conf 8888993279-899757748898877579992763023654357514775474799899977577878887999847877 Q ss_pred C----CCCCCCCCCCC---CCCCCEEEEECCCCCCCC-----------CCCCCCCCCCC--CCCCC---CCCCEEEEE-- Q ss_conf 6----76556556731---112414675236655311-----------11258744311--22334---434202332-- Q gi|254780700|r 168 K----FIPVEFEDANN---IRVGEAVFTIGNPFRLRG-----------TVSAGIVSALD--RDIPD---RPGTFTQID-- 222 (489) Q Consensus 168 ~----~~~~~lg~s~~---~~~G~~v~aiG~P~g~~~-----------tvt~GiiSa~~--R~~~~---~~~~~iqtD-- 222 (489) . ..++.|-.+.. ...+..+.+.|- |... .+..-+++... +.... -....+.+. T Consensus 118 ~~~~~v~piclp~~~~~~~~~~~~~~~v~GW--G~~~~~~~~~~~~L~~~~v~i~~~~~C~~~~~~~~~~~~~~~Ca~~~ 195 (256) T KOG3627 118 TFSSHIQPICLPSSADPYFPPGGTTCLVSGW--GRTESGGGPLPDTLQEVDVPIISNSECRRAYGGLGTITDTMLCAGGP 195 (256) T ss_pred CCCCCCCCEECCCCCCCCCCCCCCEEEEECC--CCCCCCCCCCCCEEEEEEEEEECHHHHCCCCCCCCCCCCCCEEECCC T ss_conf 6765773325577445455688845999786--75457877788420256788619889403026767567777842688 Q ss_pred ---EEEECCCCCCEEEECC---CEEEEEECCC Q ss_conf ---3320134770354034---3035551234 Q gi|254780700|r 223 ---APINQGNSGGPCFNAL---GHVIGVNAMI 248 (489) Q Consensus 223 ---a~InpGnSGGpl~n~~---G~viGint~i 248 (489) .....|.|||||+-.. ..++||.+.- T Consensus 196 ~~~~~~C~GDSGGPLv~~~~~~~~~~GivS~G 227 (256) T KOG3627 196 EGGKDACQGDSGGPLVCEDNGRWVLVGIVSWG 227 (256) T ss_pred CCCCCCCCCCCCCEEEEECCCCEEEEEEEEEE T ss_conf 89997488997384477448856999999981 No 154 >cd06168 LSm9 The eukaryotic Sm and Sm-like (LSm) proteins associate with RNA to form the core domain of the ribonucleoprotein particles involved in a variety of RNA processing events including pre-mRNA splicing, telomere replication, and mRNA degradation. Members of this family share a highly conserved Sm fold containing an N-terminal helix followed by a strongly bent five-stranded antiparallel beta-sheet. LSm9 proteins have a single Sm-like domain structure. Sm-like proteins exist in archaea as well as prokaryotes that form heptameric and hexameric ring structures similar to those found in eukaryotes. Probab=58.00 E-value=9.8 Score=16.91 Aligned_cols=32 Identities=16% Similarity=0.222 Sum_probs=27.2 Q ss_pred CEEEEECCCCEEEEECCCCCCCCCCEEEEEEE Q ss_conf 14379628980674011123344432899960 Q gi|254780700|r 133 ASFSVILSDDTELPAKLVGTDALFDLAVLKVQ 164 (489) Q Consensus 133 ~~i~V~~~dg~~~~a~vvg~D~~~DlAvlki~ 164 (489) ..++|++.|||.|.+...++|+...+-+=..+ T Consensus 11 ~~lrV~l~DGR~~vG~f~c~Dk~~NiIL~~~~ 42 (75) T cd06168 11 RTMRIHMTDGRTLVGVFLCTDRDCNIILGSAQ 42 (75) T ss_pred CEEEEEEECCCEEEEEEEEECCCCCEEEECCE T ss_conf 87999996799999999997376759980877 No 155 >cd01732 LSm5 The eukaryotic Sm and Sm-like (LSm) proteins associate with RNA to form the core domain of the ribonucleoprotein particles involved in a variety of RNA processing events including pre-mRNA splicing, telomere replication, and mRNA degradation. Members of this family share a highly conserved Sm fold containing an N-terminal helix followed by a strongly bent five-stranded antiparallel beta-sheet. LSm4 is one of at least seven subunits that assemble onto U6 snRNA to form a seven-membered ring structure. Sm-like proteins exist in archaea as well as prokaryotes that form heptameric and hexameric ring structures similar to those found in eukaryotes. Probab=56.22 E-value=12 Score=16.37 Aligned_cols=32 Identities=22% Similarity=0.487 Sum_probs=27.1 Q ss_pred CCEEEEECCCCEEEEECCCCCCCCCCEEEEEE Q ss_conf 71437962898067401112334443289996 Q gi|254780700|r 132 GASFSVILSDDTELPAKLVGTDALFDLAVLKV 163 (489) Q Consensus 132 a~~i~V~~~dg~~~~a~vvg~D~~~DlAvlki 163 (489) .++|||.+.+.++|.++++|+|....+-+=.+ T Consensus 13 gs~Iwi~mk~drE~~GtL~GFDdyvNmVLeDv 44 (76) T cd01732 13 GSRIWIVMKSDKEFVGTLLGFDDYVNMVLEDV 44 (76) T ss_pred CCEEEEEECCCCEEEEEEECCCCEEEEEEEEE T ss_conf 98799999899199999971000068898306 No 156 >TIGR00074 hypC_hupF hydrogenase assembly chaperone HypC/HupF; InterPro: IPR001109 The large subunit of [NiFe]-hydrogenase, as well as other nickel metalloenzymes, is synthesised as a precursor devoid of the metalloenzyme active site. This precursor then undergoes a complex post-translational maturation process that requires a number of accessory proteins. The hydrogenase expression/formation proteins (HUPF/HYPC) form a family of small proteins that are hydrogenase precursor-specific chaperones required for this maturation process . They are believed to keep the hydrogenase precursor in a conformation accessible for metal incorporation , . . Probab=56.08 E-value=8.2 Score=17.42 Aligned_cols=44 Identities=20% Similarity=0.507 Sum_probs=30.4 Q ss_pred EEECCCCCCCCCCEEEEEEECCCCCCCCCCCC-----CCCCCCCCEEEE Q ss_conf 74011123344432899960676676556556-----731112414675 Q gi|254780700|r 145 LPAKLVGTDALFDLAVLKVQSDRKFIPVEFED-----ANNIRVGEAVFT 188 (489) Q Consensus 145 ~~a~vvg~D~~~DlAvlki~~~~~~~~~~lg~-----s~~~~~G~~v~a 188 (489) .|++|+..++..|+|.+...+-..---+.|=+ ....++||||+. T Consensus 5 iP~qV~~i~~~~~~A~v~~~G~~~~v~l~Lv~ksC~~N~~~~~GdyvLv 53 (88) T TIGR00074 5 IPGQVVEIDENIDLALVEFKGVKREVSLDLVGKSCDENEEVKVGDYVLV 53 (88) T ss_pred CCCEEEEECCCCCEEEECCCCEEEEEEEEECCCCCCCCCCCCCCCEEEE T ss_conf 7716888549998788601522466764123554467859999877632 No 157 >COG3127 Predicted ABC-type transport system involved in lysophospholipase L1 biosynthesis, permease component [Secondary metabolites biosynthesis, transport, and catabolism] Probab=56.06 E-value=6.8 Score=17.94 Aligned_cols=25 Identities=24% Similarity=0.373 Sum_probs=19.3 Q ss_pred CCCHHHHCCCCCCCEE-EEECCEECC Q ss_conf 8897898299988899-988999938 Q gi|254780700|r 418 REREVEAKGIQKGMTI-VSVNTHEVS 442 (489) Q Consensus 418 ~~s~Aa~~GL~~GDiI-l~VNg~~V~ 442 (489) +...|.+.||+-||.+ ..|+|+.|+ T Consensus 602 e~~~A~~LglKLGDtvTf~v~gq~i~ 627 (829) T COG3127 602 EEGEAKRLGLKLGDTVTFMVLGQNIT 627 (829) T ss_pred HHHHHHHHCCCCCCEEEEEECCCEEE T ss_conf 08679870976277799984262677 No 158 >PRK02118 V-type ATP synthase subunit B; Provisional Probab=55.89 E-value=13 Score=16.16 Aligned_cols=49 Identities=24% Similarity=0.344 Sum_probs=28.2 Q ss_pred CEEEEECCCCEEEEECCCCCCCCCCEEEEEEECCCCCCCCCCCCCCCCCCCCEEEEECCCCC Q ss_conf 14379628980674011123344432899960676676556556731112414675236655 Q gi|254780700|r 133 ASFSVILSDDTELPAKLVGTDALFDLAVLKVQSDRKFIPVEFEDANNIRVGEAVFTIGNPFR 194 (489) Q Consensus 133 ~~i~V~~~dg~~~~a~vvg~D~~~DlAvlki~~~~~~~~~~lg~s~~~~~G~~v~aiG~P~g 194 (489) +-.+|...++. ..|+|++.+ -|-+++++ |++.+.+.+|+.|...|.|+. T Consensus 27 Elv~I~~~~~~-~~gEVI~~~--~d~v~iqv----------fe~T~Gi~~G~~V~~tG~~l~ 75 (432) T PRK02118 27 ELATVERKGRS-SLASVLKLD--GDKVTLQV----------FGGTSGISTGDEVVFLGRPMQ 75 (432) T ss_pred CEEEEECCCCE-EEEEEEEEE--CCEEEEEE----------CCCCCCCCCCCEEEECCCCCE T ss_conf 78999849977-999999981--99899998----------469878999999996899767 No 159 >pfam01732 DUF31 Domain of unknown function DUF31. This domain has no known function. It is found in various hypothetical proteins and putative lipoproteins from mycoplasmas. Probab=55.86 E-value=6.3 Score=18.15 Aligned_cols=23 Identities=26% Similarity=0.596 Sum_probs=18.1 Q ss_pred CCEEEEEECCC----------CEEEECHHCCCC Q ss_conf 34027897599----------629851010478 Q gi|254780700|r 109 MFGSGFFITDD----------GYILTSNHIVED 131 (489) Q Consensus 109 ~~GsG~ii~~~----------G~ilTn~hvv~~ 131 (489) ..|||.+||=. .|+-||-||++. T Consensus 3 ~~GTGWLiDwk~~~~~~~~f~~ylATNLHVa~~ 35 (68) T pfam01732 3 TYGTGWLIDWKKDENNNNKFTLYLATNLHVADA 35 (68) T ss_pred CCCEEEEEEECCCCCCCCCEEEEEEECHHHHHH T ss_conf 432078998416778887389998722677876 No 160 >TIGR02068 cya_phycin_syn cyanophycin synthetase; InterPro: IPR011810 Cyanophycin is an insoluble storage polymer for carbon, nitrogen, and energy, found in most Cyanobacteria. The polymer has a backbone of L-aspartic acid, with most Asp side chain carboxyl groups attached to L-arginine. The polymer is made by this enzyme, cyanophycin synthetase, and degraded by cyanophycinase. Heterologously expressed cyanophycin synthetase in Escherichia coli produces a closely related, water-soluble polymer with some Arg replaced by Lys. It is unclear whether enzymes that produce soluble cyanophycin-like polymers in vivo in non-Cyanobacterial species should be designated as cyanophycin synthetase itself or as a related enzyme. Cyanophycin synthesis is analogous to polyhydroxyalkanoic acid (PHA) biosynthesis, except that PHA polymers lack nitrogen and may be made under nitrogen-limiting conditions .; GO: 0005524 ATP binding, 0016874 ligase activity, 0009059 macromolecule biosynthetic process. Probab=55.79 E-value=12 Score=16.31 Aligned_cols=93 Identities=23% Similarity=0.255 Sum_probs=61.1 Q ss_pred CEEEECCCEEEEEECC-----CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC-CCCCCCCCC---CH-HHHH---- Q ss_conf 0354034303555123-----44553222222321123321100100002333333-433200034---21-6676---- Q gi|254780700|r 232 GPCFNALGHVIGVNAM-----IVTSGQFHMGVGLIIPLSIIKKAIPSLISKGRVDH-GWFGIMTQN---LT-QELA---- 297 (489) Q Consensus 232 Gpl~n~~G~viGint~-----i~~~~g~~~GigfaIP~~~~~~i~~~l~~~g~v~r-g~lGv~~~~---v~-~~la---- 297 (489) =||=+.+|-+|=||.+ -..||.| .|=|..+.|++.|.-...-.| |-++|++.+ .+ .-.| T Consensus 439 ~PL~~~~G~iVEVNAaPGlrMH~~PS~G-------~pR~Va~Ai~d~LFP~~~~grIPiV~vTGTNGKTt~~RL~Ahil~ 511 (876) T TIGR02068 439 KPLRDTDGAIVEVNAAPGLRMHLAPSQG-------KPRNVAKAIVDMLFPEEDDGRIPIVAVTGTNGKTTTTRLVAHILK 511 (876) T ss_pred CCHHHCCCEEEEEECCCCHHHCCCCCCC-------CCCCCCHHHHHHCCCCCCCCCEEEEEEECCCCCHHHHHHHHHHHH T ss_conf 3745459729998567663444677774-------698710268752288878983448887268983557889999998 Q ss_pred ---HHCCCCCCCCEEEECCCCCCCCCCCCCHHHHHHHH Q ss_conf ---44176444411320111112113467116788875 Q gi|254780700|r 298 ---IPLGLRGTKGSLITAVVKESPADKAGMKVGDVICM 332 (489) Q Consensus 298 ---~~lgl~~~~GvlV~~V~~~sPA~~AGLk~GDvI~~ 332 (489) +.-|+..++|+||.+=.=++ -+.+|=+.+-.|+. T Consensus 512 ~~G~~vG~T~tDG~Yi~~~~v~~-GDntGP~SAr~~L~ 548 (876) T TIGR02068 512 QTGKVVGMTTTDGVYIGKKLVEK-GDNTGPKSARRILA 548 (876) T ss_pred HCCCEEEEEECCCEEECCEEEEC-CCCCCHHHHHHCCC T ss_conf 56982764203767755766624-78987157301227 No 161 >PRK05688 fliI flagellum-specific ATP synthase; Validated Probab=54.07 E-value=14 Score=15.98 Aligned_cols=70 Identities=19% Similarity=0.248 Sum_probs=40.1 Q ss_pred EEEEECCCCEEEECHHCCCC-CCEEEEECCCC---EEEEECCCCCCCCCCEEEEEEECCCCCCCCCCCCCCCCCCCCEEE Q ss_conf 27897599629851010478-71437962898---067401112334443289996067667655655673111241467 Q gi|254780700|r 112 SGFFITDDGYILTSNHIVED-GASFSVILSDD---TELPAKLVGTDALFDLAVLKVQSDRKFIPVEFEDANNIRVGEAVF 187 (489) Q Consensus 112 sG~ii~~~G~ilTn~hvv~~-a~~i~V~~~dg---~~~~a~vvg~D~~~DlAvlki~~~~~~~~~~lg~s~~~~~G~~v~ 187 (489) +|-+..-.|.++.....-.. .+...+.-.++ +...|+|+|.+... ++| .+|++...+..|+.|. T Consensus 28 ~GrV~~v~G~~ie~~G~~~~iG~~c~i~~~~~~~~~~v~aEVVgf~~~~--v~l----------~p~g~~~Gi~~G~~V~ 95 (451) T PRK05688 28 EGRLLRMVGLTLEAEGLRAAVGSRCLVINDDSYHPVQVEAEVMGFSGDK--VYL----------MPVGSVAGIAPGARVV 95 (451) T ss_pred EEEEEEEECEEEEEEECCCCCCCEEEEEECCCCCCCEEEEEEEEECCCE--EEE----------EECCCCCCCCCCCEEE T ss_conf 3699999674999982588747847998689887754689992254997--999----------9887877889999999 Q ss_pred EECCCC Q ss_conf 523665 Q gi|254780700|r 188 TIGNPF 193 (489) Q Consensus 188 aiG~P~ 193 (489) ..|.|+ T Consensus 96 ~~g~~~ 101 (451) T PRK05688 96 PLADTG 101 (451) T ss_pred ECCCCC T ss_conf 689987 No 162 >PRK07721 fliI flagellum-specific ATP synthase; Validated Probab=53.62 E-value=14 Score=15.93 Aligned_cols=87 Identities=20% Similarity=0.255 Sum_probs=45.1 Q ss_pred EEEECCCCEEEECHHC---CCCCCEEEEECCCCEEEEECCCCCCCCCCEEEEEEECCCCCCCCCCCCCCCCCCCCEEEEE Q ss_conf 7897599629851010---4787143796289806740111233444328999606766765565567311124146752 Q gi|254780700|r 113 GFFITDDGYILTSNHI---VEDGASFSVILSDDTELPAKLVGTDALFDLAVLKVQSDRKFIPVEFEDANNIRVGEAVFTI 189 (489) Q Consensus 113 G~ii~~~G~ilTn~hv---v~~a~~i~V~~~dg~~~~a~vvg~D~~~DlAvlki~~~~~~~~~~lg~s~~~~~G~~v~ai 189 (489) |-|.+-.|.++..... +.+--.|...-.+|+...|+|++.+.. -++ ..+|++.+.+..|+.|.+. T Consensus 18 GrV~~I~G~lIea~g~~~~iGelc~I~~~~~~g~~i~aEVVgf~~~--~v~----------l~p~~~~~GI~~G~~V~~~ 85 (435) T PRK07721 18 GKVKRVIGLMIESKGPESSIGDVCYIHTKGKGGKKIKAEVVGFKDE--NIL----------LMPYLEAANIAPGSLVEAT 85 (435) T ss_pred EEEEEEECEEEEEEECCCCCCCEEEEEECCCCCCEEEEEEEEECCC--EEE----------EEECCCCCCCCCCCEEEEC T ss_conf 2899998638999957888434179996479997899999987698--899----------9987688899999999958 Q ss_pred CCCCCC--CCCCCCCCCCCCCCCC Q ss_conf 366553--1111258744311223 Q gi|254780700|r 190 GNPFRL--RGTVSAGIVSALDRDI 211 (489) Q Consensus 190 G~P~g~--~~tvt~GiiSa~~R~~ 211 (489) |.|+.. +..+---|+.+++|.+ T Consensus 86 g~~~~vpvg~~lLGRV~d~lG~Pi 109 (435) T PRK07721 86 GEPLRVKVGSGLIGQVVDAFGEPL 109 (435) T ss_pred CCCCEEECCHHHCCCCCCCCCCCC T ss_conf 997667727532245104565435 No 163 >KOG4407 consensus Probab=53.20 E-value=1.3 Score=22.51 Aligned_cols=12 Identities=33% Similarity=0.501 Sum_probs=6.4 Q ss_pred CCCCEEEEECCH Q ss_conf 525469872896 Q gi|254780700|r 392 ELLGMVLQDIND 403 (489) Q Consensus 392 ~~lGl~v~~l~~ 403 (489) ..+|..+.+... T Consensus 1156 ~~~GVrL~dCP~ 1167 (1973) T KOG4407 1156 PVLGVRLADCPT 1167 (1973) T ss_pred CCCCCCCCCCCC T ss_conf 633215335886 No 164 >PRK06315 type III secretion system ATPase; Provisional Probab=52.62 E-value=15 Score=15.83 Aligned_cols=70 Identities=16% Similarity=0.187 Sum_probs=37.2 Q ss_pred EEEEECCCCEEEECHHCCCC-CCEEEEECCCCEEEEECCCCCCCCCCEEEEEEECCCCCCCCCCCCCCCCCCCCEEEEEC Q ss_conf 27897599629851010478-71437962898067401112334443289996067667655655673111241467523 Q gi|254780700|r 112 SGFFITDDGYILTSNHIVED-GASFSVILSDDTELPAKLVGTDALFDLAVLKVQSDRKFIPVEFEDANNIRVGEAVFTIG 190 (489) Q Consensus 112 sG~ii~~~G~ilTn~hvv~~-a~~i~V~~~dg~~~~a~vvg~D~~~DlAvlki~~~~~~~~~~lg~s~~~~~G~~v~aiG 190 (489) .|-|..-.|.+++..-.-.. .+-..|...+++...|+|+|.+. |-++|.. |++...+..|+.|...| T Consensus 24 ~GrV~~v~G~~ie~~g~~~~iGelc~I~~~~~~~~~aEVVgf~~--~~~~l~p----------~~~~~Gi~~G~~V~~~g 91 (442) T PRK06315 24 VGRITEVVGMLIKAVVPDVRVGEVCLVKRHGMEPLVTEVVGFTQ--NFVFLSP----------LGELTGVSPSSEVIPTG 91 (442) T ss_pred EEEEEEEEEEEEEEEECCCCCCCEEEEEECCCCEEEEEEEEECC--CEEEEEE----------CCCCCCCCCCCEEEECC T ss_conf 57999999789999867898678689991899778899999849--9799998----------77876789999999689 Q ss_pred CCC Q ss_conf 665 Q gi|254780700|r 191 NPF 193 (489) Q Consensus 191 ~P~ 193 (489) .|+ T Consensus 92 ~~~ 94 (442) T PRK06315 92 LPL 94 (442) T ss_pred CCC T ss_conf 987 No 165 >pfam00883 Peptidase_M17 Cytosol aminopeptidase family, catalytic domain. The two associated zinc ions and the active site are entirely enclosed within the C-terminal catalytic domain in leucine aminopeptidase. Probab=52.56 E-value=2.8 Score=20.38 Aligned_cols=29 Identities=24% Similarity=0.555 Sum_probs=21.4 Q ss_pred EECCCCCCCCCCCCCHHHHHHHHHCCCCCC Q ss_conf 201111121134671167888752431478 Q gi|254780700|r 310 ITAVVKESPADKAGMKVGDVICMLDGRIIK 339 (489) Q Consensus 310 V~~V~~~sPA~~AGLk~GDvI~~ing~~I~ 339 (489) +.-..++.|..+| .|+||||.+.||+.|+ T Consensus 135 i~~l~EN~is~~A-~rPgDVi~s~~GkTVE 163 (312) T pfam00883 135 VLALTENMISGTA-MRPGDIITAMNGKTVE 163 (312) T ss_pred EEEEECCCCCCCC-CCCCCEEEECCCCEEE T ss_conf 9870103789988-9999778917997898 No 166 >pfam06003 SMN Survival motor neuron protein (SMN). This family consists of several eukaryotic survival motor neuron (SMN) proteins. The Survival of Motor Neurons (SMN) protein, the product of the spinal muscular atrophy-determining gene, is part of a large macromolecular complex (SMN complex) that functions in the assembly of spliceosomal small nuclear ribonucleoproteins (snRNPs). The SMN complex functions as a specificity factor essential for the efficient assembly of Sm proteins on U snRNAs and likely protects cells from illicit, and potentially deleterious, non-specific binding of Sm proteins to RNAs. Probab=52.13 E-value=15 Score=15.79 Aligned_cols=44 Identities=16% Similarity=0.093 Sum_probs=33.1 Q ss_pred CCEEEEECC-CCEEEEECCCCCCCCCCEEEEEEECCCCCCCCCCC Q ss_conf 714379628-98067401112334443289996067667655655 Q gi|254780700|r 132 GASFSVILS-DDTELPAKLVGTDALFDLAVLKVQSDRKFIPVEFE 175 (489) Q Consensus 132 a~~i~V~~~-dg~~~~a~vvg~D~~~DlAvlki~~~~~~~~~~lg 175 (489) .+.-+.++. ||.-|+|+|+..|.....+|++-.+-.+-..+.|. T Consensus 72 GD~C~A~yseDG~~YeAtI~SId~k~gtCvV~Y~GYgNeEev~L~ 116 (264) T pfam06003 72 GDSCNAVWSEDGNLYTATITSIDQKRGTCVVFYTGYGNEEEQNLA 116 (264) T ss_pred CCEEEEEECCCCCCCEEEEEEECCCCCCEEEEEECCCCHHHHHHH T ss_conf 885676652578520136777526787068998236865231287 No 167 >COG5640 Secreted trypsin-like serine protease [Posttranslational modification, protein turnover, chaperones] Probab=51.76 E-value=15 Score=15.74 Aligned_cols=60 Identities=20% Similarity=0.263 Sum_probs=32.7 Q ss_pred EEECCCCEEEECHHCCCCCC-----EEEE--ECCCCEEEE---ECC------CCC-CCCCCEEEEEEECCCCCCCCCC Q ss_conf 89759962985101047871-----4379--628980674---011------123-3444328999606766765565 Q gi|254780700|r 114 FFITDDGYILTSNHIVEDGA-----SFSV--ILSDDTELP---AKL------VGT-DALFDLAVLKVQSDRKFIPVEF 174 (489) Q Consensus 114 ~ii~~~G~ilTn~hvv~~a~-----~i~V--~~~dg~~~~---a~v------vg~-D~~~DlAvlki~~~~~~~~~~l 174 (489) -.+... ||||++|.+.+.+ .+.| .+.|..+.+ ++. .+. .-..|+|+++......+|-+++ T Consensus 66 s~l~~R-YvLTAAHC~~~~s~is~d~~~vv~~l~d~Sq~~rg~vr~i~~~efY~~~n~~ND~Av~~l~~~a~~pr~ki 142 (413) T COG5640 66 SKLGGR-YVLTAAHCADASSPISSDVNRVVVDLNDSSQAERGHVRTIYVHEFYSPGNLGNDIAVLELARAASLPRVKI 142 (413) T ss_pred CEECCE-EEEEEHHHCCCCCCCCCCCEEEEECCCCCCCCCCCCEEEEEEECCCCCCCCCCCCEEECCCCCCCCCHHHE T ss_conf 142453-77641133267887554535887125654424676449983003325544567410240354556530001 No 168 >COG0298 HypC Hydrogenase maturation factor [Posttranslational modification, protein turnover, chaperones] Probab=50.93 E-value=11 Score=16.54 Aligned_cols=43 Identities=21% Similarity=0.462 Sum_probs=26.7 Q ss_pred EEECCCCCCCCCCEEEEEEECCCCCCCCCCCCCCCCCCCCEEEE Q ss_conf 74011123344432899960676676556556731112414675 Q gi|254780700|r 145 LPAKLVGTDALFDLAVLKVQSDRKFIPVEFEDANNIRVGEAVFT 188 (489) Q Consensus 145 ~~a~vvg~D~~~DlAvlki~~~~~~~~~~lg~s~~~~~G~~v~a 188 (489) .|++++..|...++|++.+-+-..---+.|-+ ..+++||||+. T Consensus 5 iPgqI~~I~~~~~~A~Vd~gGvkreV~l~Lv~-~~v~~GdyVLV 47 (82) T COG0298 5 IPGQIVEIDDNNHLAIVDVGGVKREVNLDLVG-EEVKVGDYVLV 47 (82) T ss_pred CCCEEEEEECCCCEEEEEECCEEEEEEEEEEC-CCCCCCCEEEE T ss_conf 67278998078855899865676898753304-75334778999 No 169 >PRK13579 gcvT glycine cleavage system aminomethyltransferase T; Provisional Probab=50.89 E-value=9.4 Score=17.05 Aligned_cols=24 Identities=21% Similarity=0.412 Sum_probs=16.0 Q ss_pred CCCCCCCCCCCCCCCCCCCCCCCC Q ss_conf 223211233211001000023333 Q gi|254780700|r 257 GVGLIIPLSIIKKAIPSLISKGRV 280 (489) Q Consensus 257 GigfaIP~~~~~~i~~~l~~~g~v 280 (489) |.-+-+|.+.+..+++.|.+.+.+ T Consensus 199 G~Ei~~~~~~a~~l~~~l~~~~~~ 222 (371) T PRK13579 199 GFEISVPADAAEALAEALLADPRV 222 (371) T ss_pred EEEEEECHHHHHHHHHHHHHCCCC T ss_conf 599996599999999999974898 No 170 >PRK00913 leucyl aminopeptidase; Provisional Probab=47.32 E-value=4.1 Score=19.31 Aligned_cols=29 Identities=28% Similarity=0.596 Sum_probs=23.5 Q ss_pred EECCCCCCCCCCCCCHHHHHHHHHCCCCCC Q ss_conf 201111121134671167888752431478 Q gi|254780700|r 310 ITAVVKESPADKAGMKVGDVICMLDGRIIK 339 (489) Q Consensus 310 V~~V~~~sPA~~AGLk~GDvI~~ing~~I~ 339 (489) |.-..++.|...| .||||||++.||+.|. T Consensus 306 ~~~~~ENm~~g~a-~~pgDvi~~~~GktvE 334 (491) T PRK00913 306 VVAACENMPSGNA-YRPGDVLTSMSGKTIE 334 (491) T ss_pred EEEHHHCCCCCCC-CCCCCEEEECCCCEEE T ss_conf 9861213888889-9985557806994798 No 171 >KOG2597 consensus Probab=46.60 E-value=6.6 Score=18.01 Aligned_cols=40 Identities=25% Similarity=0.446 Sum_probs=28.9 Q ss_pred HHCCCCCCCCEEEECCCCCCCCCCCCCHHHHHHHHHCCCCCC Q ss_conf 441764444113201111121134671167888752431478 Q gi|254780700|r 298 IPLGLRGTKGSLITAVVKESPADKAGMKVGDVICMLDGRIIK 339 (489) Q Consensus 298 ~~lgl~~~~GvlV~~V~~~sPA~~AGLk~GDvI~~ing~~I~ 339 (489) ..++++ ..=..|.-..+++|...| -|+||||+..||+.|. T Consensus 313 ~~l~~~-in~~~v~plcENm~sg~A-~kpgDVit~~nGKtve 352 (513) T KOG2597 313 AQLSLP-INVHAVLPLCENMPSGNA-TKPGDVITLRNGKTVE 352 (513) T ss_pred HHCCCC-CCEEEEEEEECCCCCCCC-CCCCCEEEECCCCEEE T ss_conf 861899-752799750005887557-8987478812786787 No 172 >cd00433 Peptidase_M17 Cytosol aminopeptidase family, N-terminal and catalytic domains. Family M17 contains zinc- and manganese-dependent exopeptidases ( EC 3.4.11.1), including leucine aminopeptidase. They catalyze removal of amino acids from the N-terminus of a protein and play a key role in protein degradation and in the metabolism of biologically active peptides. They do not contain HEXXH motif (which is used as one of the signature patterns to group the peptidase families) in the metal-binding site. The two associated zinc ions and the active site are entirely enclosed within the C-terminal catalytic domain in leucine aminopeptidase. The enzyme is a hexamer, with the catalytic domains clustered around the three-fold axis, and the two trimers related to one another by a two-fold rotation. The N-terminal domain is structurally similar to the ADP-ribose binding Macro domain. This family includes proteins from bacteria, archaea, animals and plants. Probab=45.65 E-value=4.4 Score=19.12 Aligned_cols=29 Identities=21% Similarity=0.407 Sum_probs=23.5 Q ss_pred EECCCCCCCCCCCCCHHHHHHHHHCCCCCC Q ss_conf 201111121134671167888752431478 Q gi|254780700|r 310 ITAVVKESPADKAGMKVGDVICMLDGRIIK 339 (489) Q Consensus 310 V~~V~~~sPA~~AGLk~GDvI~~ing~~I~ 339 (489) +.-..++.|..+| .||||||++.||+.|. T Consensus 289 ~~~~~EN~~~~~a-~rpgDvi~~~~GktvE 317 (468) T cd00433 289 VLPLAENMISGNA-YRPGDVITSRSGKTVE 317 (468) T ss_pred EEEHHHCCCCCCC-CCCCCEEECCCCCEEE T ss_conf 9862314878889-8984658827996899 No 173 >pfam04551 GcpE GcpE protein. In a variety of organisms, including plants and several eubacteria, isoprenoids are synthesized by the mevalonate-independent 2-C-methyl-D-erythritol 4-phosphate (MEP) pathway. Although different enzymes of this pathway have been described, the terminal biosynthetic steps of the MEP pathway have not been fully elucidated. GcpE gene of Escherichia coli is involved in this pathway. Probab=44.99 E-value=0.44 Score=25.54 Aligned_cols=39 Identities=28% Similarity=0.349 Sum_probs=20.8 Q ss_pred ECCCCHHHHC--CCC--CCCEEEEECCEECCCHHHHHHHHHHHH Q ss_conf 0688978982--999--888999889999389999999999886 Q gi|254780700|r 416 PNREREVEAK--GIQ--KGMTIVSVNTHEVSCIKDVERLIGKAK 455 (489) Q Consensus 416 v~~~s~Aa~~--GL~--~GDiIl~VNg~~V~s~~dl~~iL~~~k 455 (489) |.-.+.|..+ |+. +|-.++-.+|+.++.+.+ .++++++. T Consensus 296 VNGPGEak~ADiGiagg~g~~~lf~~G~~v~~v~~-~~iv~~l~ 338 (345) T pfam04551 296 VNGPGEAKEADLGIAGGKGKGILFKKGEIVKKVPE-EELVDELK 338 (345) T ss_pred ECCCCCCCCCCEEEECCCCCEEEEECCEEEEECCH-HHHHHHHH T ss_conf 31876444476757258896679999999676388-89999999 No 174 >cd04643 CBS_pair_30 The CBS domain, named after human CBS, is a small domain originally identified in cystathionine beta-synthase and is subsequently found in a wide range of different proteins. CBS domains usually occur in tandem repeats. They associate to form a so-called Bateman domain or a CBS pair based on crystallographic studies in bacteria. The CBS pair was used as a basis for this cd hierarchy since the human CBS proteins can adopt the typical core structure and form an intramolecular CBS pair. The interface between the two CBS domains forms a cleft that is a potential ligand binding site. The CBS pair coexists with a variety of other functional domains and this has been used to help in its classification here. It has been proposed that the CBS domain may play a regulatory role, although its exact function is unknown. Mutations of conserved residues within this domain are associated with a variety of human hereditary diseases, including congenital myotonia, idiopathic gener Probab=42.67 E-value=16 Score=15.55 Aligned_cols=20 Identities=25% Similarity=0.456 Sum_probs=13.9 Q ss_pred CCCCCCEEEECCCEEEEEEC Q ss_conf 13477035403430355512 Q gi|254780700|r 227 QGNSGGPCFNALGHVIGVNA 246 (489) Q Consensus 227 pGnSGGpl~n~~G~viGint 246 (489) .|-|+=|++|.+|+++||-| T Consensus 22 ~~i~~lPVvd~~gklvGiit 41 (116) T cd04643 22 HGYSAIPVLDKEGKYVGTIS 41 (116) T ss_pred CCCCEEEEECCCCEEEEEEE T ss_conf 49987989869994999988 No 175 >cd04641 CBS_pair_28 The CBS domain, named after human CBS, is a small domain originally identified in cystathionine beta-synthase and is subsequently found in a wide range of different proteins. CBS domains usually occur in tandem repeats. They associate to form a so-called Bateman domain or a CBS pair based on crystallographic studies in bacteria. The CBS pair was used as a basis for this cd hierarchy since the human CBS proteins can adopt the typical core structure and form an intramolecular CBS pair. The interface between the two CBS domains forms a cleft that is a potential ligand binding site. The CBS pair coexists with a variety of other functional domains and this has been used to help in its classification here. It has been proposed that the CBS domain may play a regulatory role, although its exact function is unknown. Mutations of conserved residues within this domain are associated with a variety of human hereditary diseases, including congenital myotonia, idiopathic gener Probab=41.31 E-value=16 Score=15.61 Aligned_cols=19 Identities=26% Similarity=0.314 Sum_probs=11.9 Q ss_pred CCCCCEEEECCCEEEEEEC Q ss_conf 3477035403430355512 Q gi|254780700|r 228 GNSGGPCFNALGHVIGVNA 246 (489) Q Consensus 228 GnSGGpl~n~~G~viGint 246 (489) +-||=|++|.+|+++||-| T Consensus 23 ~is~lPVVD~~g~lvGiis 41 (120) T cd04641 23 RVSALPIVDENGKVVDVYS 41 (120) T ss_pred CCCEEEEECCCCCEEEEEE T ss_conf 9866999878996989975 No 176 >pfam00947 Pico_P2A Picornavirus core protein 2A. This protein is a protease, involved in cleavage of the polyprotein. Probab=40.26 E-value=22 Score=14.67 Aligned_cols=37 Identities=24% Similarity=0.239 Sum_probs=26.0 Q ss_pred EEEEEEE-----ECCCCCCEEEECCCEEEEEECCCCCCCCCCCCCCCC Q ss_conf 2332332-----013477035403430355512344553222222321 Q gi|254780700|r 219 TQIDAPI-----NQGNSGGPCFNALGHVIGVNAMIVTSGQFHMGVGLI 261 (489) Q Consensus 219 iqtDa~I-----npGnSGGpl~n~~G~viGint~i~~~~g~~~Gigfa 261 (489) +|+...| .||.-||-|.=.+| ||||-|| ||.-=++|| T Consensus 76 yQ~~vllg~G~~ePGDCGGiLrC~HG-viGivTa-----GG~g~VaFa 117 (127) T pfam00947 76 YQSHLLLGVGPAEPGDCGGILRCEHG-VIGIVTA-----GGEGHVAFA 117 (127) T ss_pred HHCCEEEEECCCCCCCCCEEEEECCC-CEEEEEC-----CCCCEEEEE T ss_conf 61063576437888877507884478-5789963-----898779898 No 177 >smart00116 CBS Domain in cystathionine beta-synthase and other proteins. Domain present in all 3 forms of cellular life. Present in two copies in inosine monophosphate dehydrogenase, of which one is disordered in the crystal structure [3]. A number of disease states are associated with CBS-containing proteins including homocystinuria, Becker's and Thomsen disease. Probab=39.66 E-value=19 Score=15.13 Aligned_cols=20 Identities=20% Similarity=0.446 Sum_probs=16.4 Q ss_pred CCCCCCEEEECCCEEEEEEC Q ss_conf 13477035403430355512 Q gi|254780700|r 227 QGNSGGPCFNALGHVIGVNA 246 (489) Q Consensus 227 pGnSGGpl~n~~G~viGint 246 (489) .+-++-|++|.+|+++||-| T Consensus 21 ~~i~~lPVVd~~~~lvGiit 40 (49) T smart00116 21 HGIRRLPVVDEEGRLVGIVT 40 (49) T ss_pred HCCCEEEEECCCCCEEEEEE T ss_conf 09985769989991999988 No 178 >PRK04196 V-type ATP synthase subunit B; Provisional Probab=38.21 E-value=24 Score=14.42 Aligned_cols=78 Identities=15% Similarity=0.150 Sum_probs=40.3 Q ss_pred EECCCCEEEECHHCC--CCCCEEEEECCCCEEEEECCCCCCCCCCEEEEEEECC-CCCC----CC-CCCCCCCCCCCCE- Q ss_conf 975996298510104--7871437962898067401112334443289996067-6676----55-6556731112414- Q gi|254780700|r 115 FITDDGYILTSNHIV--EDGASFSVILSDDTELPAKLVGTDALFDLAVLKVQSD-RKFI----PV-EFEDANNIRVGEA- 185 (489) Q Consensus 115 ii~~~G~ilTn~hvv--~~a~~i~V~~~dg~~~~a~vvg~D~~~DlAvlki~~~-~~~~----~~-~lg~s~~~~~G~~- 185 (489) |.+=.|-++....+- .-.+-++|...||+...++|++.+ -|.|++++-.+ ..+. .+ ..|..-.+.+|+- T Consensus 7 V~~I~Gplv~~~g~~~~~~gElv~I~~~~g~~~~GeVi~~~--~d~~~iqv~e~t~Gl~~~g~~V~~tG~plsV~vG~~l 84 (460) T PRK04196 7 VSEIVGPLMFVEGVEGVAYGELVEIELPNGEKRRGQVLEVS--GDKAVVQVFEGTTGLNLKGTKVRFTGETLELPVSEDM 84 (460) T ss_pred EEEEECCEEEEECCCCCCCCCEEEEECCCCCEEEEEEEEEE--CCEEEEEECCCCCCCCCCCCEEEECCCCEEEEECHHH T ss_conf 99998868999258889878789998399988889999986--9979999915988878599789947995288718777 Q ss_pred ----EEEECCCCC Q ss_conf ----675236655 Q gi|254780700|r 186 ----VFTIGNPFR 194 (489) Q Consensus 186 ----v~aiG~P~g 194 (489) +=++|.|.. T Consensus 85 LGRV~DglGrPlD 97 (460) T PRK04196 85 LGRIFDGLGRPID 97 (460) T ss_pred HCCEECCCCCCCC T ss_conf 2798477886368 No 179 >PRK04972 hypothetical protein; Provisional Probab=38.13 E-value=4.8 Score=18.92 Aligned_cols=51 Identities=12% Similarity=0.248 Sum_probs=25.6 Q ss_pred CCCEEEEECCCCCCCCCCEEEEEEECCCCCCCCCCCCCCCCCCCCEEEEECCCCC Q ss_conf 8980674011123344432899960676676556556731112414675236655 Q gi|254780700|r 140 SDDTELPAKLVGTDALFDLAVLKVQSDRKFIPVEFEDANNIRVGEAVFTIGNPFR 194 (489) Q Consensus 140 ~dg~~~~a~vvg~D~~~DlAvlki~~~~~~~~~~lg~s~~~~~G~~v~aiG~P~g 194 (489) .+|+.... +......++.+-++.-+.... .-.....++.||.+..+|++.. T Consensus 228 ~~G~tl~e--~~~~~~~~~~i~ri~r~g~~~--~~~~~~~L~~GD~v~vvG~~~~ 278 (558) T PRK04972 228 TDGKNLRE--LGIYRQTGCYIERIRRNGILA--NPDGDAVLQMGDEIALVGYPDA 278 (558) T ss_pred CCCCCHHH--HHHHCCCCEEEEEEEECCEEE--CCCCCCEECCCCEEEEEECHHH T ss_conf 46874999--874237875999998899576--7998675089999999966788 No 180 >TIGR00758 UDG_fam4 uracil-DNA glycosylase, family 4; InterPro: IPR005273 This well-conserved family of proteins is about 200 residues in length and homologous to the N-terminus of the DNA polymerase of phage SPO1 of Bacillus subtilis. The function of these proteins is unknown. . Probab=37.07 E-value=19 Score=15.04 Aligned_cols=39 Identities=28% Similarity=0.361 Sum_probs=31.2 Q ss_pred CCCCCCCCCCCCHHHHHHHCCCCCCCCEEEECCCCCCCC Q ss_conf 334332000342166764417644441132011111211 Q gi|254780700|r 281 DHGWFGIMTQNLTQELAIPLGLRGTKGSLITAVVKESPA 319 (489) Q Consensus 281 ~rg~lGv~~~~v~~~la~~lgl~~~~GvlV~~V~~~sPA 319 (489) -+||.|-.++=|+.=|+++.||...+-+|||.|..==|= T Consensus 39 G~PFVG~aGkLLd~lL~e~iGl~R~q~vYITNvvKCRPP 77 (185) T TIGR00758 39 GRPFVGRAGKLLDELLEEAIGLSREQNVYITNVVKCRPP 77 (185) T ss_pred CCCCCCHHHHHHHHHHHHHHCCCCCCCEEEEEEEEECCC T ss_conf 898354005689999999837443786235225654685 No 181 >cd01724 Sm_D1 The eukaryotic Sm and Sm-like (LSm) proteins associate with RNA to form core domain of the ribonucleoprotein particles involved in a variety of RNA processing events including pre-mRNA splicing, telomere replication, and mRNA degradation. Members of this family share a highly conserved Sm fold containing an N-terminal helix followed by a strongly bent five-stranded antiparallel beta-sheet. Sm subunit D1 heterodimerizes with subunit D2 and three such heterodimers form a hexameric ring structure with alternating D1 and D2 subunits. The D1 - D2 heterodimer also assembles into a heptameric ring containing DB, D3, E, F, and G subunits. Sm-like proteins exist in archaea as well as prokaryotes which form heptameric and hexameric ring structures similar to those found in eukaryotes. Probab=37.04 E-value=25 Score=14.30 Aligned_cols=34 Identities=15% Similarity=0.304 Sum_probs=28.8 Q ss_pred CCCEEEEECCCCEEEEECCCCCCCCCCEEEEEEE Q ss_conf 8714379628980674011123344432899960 Q gi|254780700|r 131 DGASFSVILSDDTELPAKLVGTDALFDLAVLKVQ 164 (489) Q Consensus 131 ~a~~i~V~~~dg~~~~a~vvg~D~~~DlAvlki~ 164 (489) ....++|.|.+|..|.++++..|....+.+-.+. T Consensus 10 ~g~~VtVELKng~~~~G~L~~vd~~MN~~L~~v~ 43 (90) T cd01724 10 TNETVTIELKNGTIVHGTITGVDPSMNTHLKNVK 43 (90) T ss_pred CCCEEEEEECCCCEEEEEEEEECCCCEEEEEEEE T ss_conf 8987999987997999999881378201898899 No 182 >KOG1781 consensus Probab=35.58 E-value=8.2 Score=17.42 Aligned_cols=29 Identities=24% Similarity=0.371 Sum_probs=25.6 Q ss_pred CCEEEEECCCCEEEEECCCCCCCCCCEEE Q ss_conf 71437962898067401112334443289 Q gi|254780700|r 132 GASFSVILSDDTELPAKLVGTDALFDLAV 160 (489) Q Consensus 132 a~~i~V~~~dg~~~~a~vvg~D~~~DlAv 160 (489) -.+|+|+|..||+..+.+.|+|+...+.+ T Consensus 27 Dk~Irvkf~GGr~~sGiLkGyDqLlNlVL 55 (108) T KOG1781 27 DKKIRVKFTGGREASGILKGYDQLLNLVL 55 (108) T ss_pred CCCEEEEEECCCEEEEEHHHHHHHHHHHH T ss_conf 00158996067264100222899999998 No 183 >PRK08262 hypothetical protein; Provisional Probab=35.22 E-value=27 Score=14.12 Aligned_cols=51 Identities=22% Similarity=0.261 Sum_probs=27.9 Q ss_pred CCHHHHHHHHHHHHHHHHHHHHHHHHC----CCCCCCCCCCCCHHHHHHHHCCCE Q ss_conf 930278999999999999999753210----111134755589889999848950 Q gi|254780700|r 1 MFKRQILSVKSICTVALTCVIFSSTYL----VLEAKLPPSSVDLPPVIARVSPSI 51 (489) Q Consensus 1 m~~r~~~~~~~~~~~~l~~~~~~~~~~----~~~~~~~~~~~~~~~~~~~~~paV 51 (489) ||||-++.+..+.++.++++++-.... ......++...|....+++.+-|+ T Consensus 3 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~a~~~l~~~i 57 (489) T PRK08262 3 WIRRIILGLLLLLLVLAAVLAVRTFRFKSRQLQVAAVAPVAVDEDRAAQRLSEAI 57 (489) T ss_pred HHHHHHHHHHHHHHHHHHHHHHHHHHCCCCCCCCCCCCCCCCCHHHHHHHHHHCC T ss_conf 8999999999999999999999985267865667888876588699999997340 No 184 >cd04602 CBS_pair_IMPDH_2 This cd contains two tandem repeats of the cystathionine beta-synthase (CBS pair) domains in the inosine 5' monophosphate dehydrogenase (IMPDH) protein. IMPDH is an essential enzyme that catalyzes the first step unique to GTP synthesis, playing a key role in the regulation of cell proliferation and differentiation. CBS is a small domain originally identified in cystathionine beta-synthase and subsequently found in a wide range of different proteins. CBS domains usually come in tandem repeats, which associate to form a so-called Bateman domain or a CBS pair which is reflected in this model. The interface between the two CBS domains forms a cleft that is a potential ligand binding site. The CBS pair coexists with a variety of other functional domains. It has been proposed that the CBS domain may play a regulatory role, although its exact function is unknown. Mutations of conserved residues within this domain in IMPDH have been associated with retinitis pigmentos Probab=35.19 E-value=23 Score=14.58 Aligned_cols=15 Identities=20% Similarity=0.319 Sum_probs=11.1 Q ss_pred CEEEECCCEEEEEEC Q ss_conf 035403430355512 Q gi|254780700|r 232 GPCFNALGHVIGVNA 246 (489) Q Consensus 232 Gpl~n~~G~viGint 246 (489) =|++|.+|+++||-| T Consensus 93 LPVVd~~g~LvGiIT 107 (114) T cd04602 93 LPIVNDDGELVALVT 107 (114) T ss_pred EEEECCCCEEEEEEE T ss_conf 819978996999999 No 185 >TIGR02124 hypE hydrogenase expression/formation protein HypE; InterPro: IPR011854 This family contains HypE (or HupE), a protein required for expression of catalytically active hydrogenase in many systems. It appears to be an accessory protein involved in maturation rather than a regulatory protein involved in expression. HypE shows considerable homology to the thiamine-monophosphate kinase ThiL (IPR006283 from INTERPRO) and other enzymes.. Probab=35.15 E-value=4.7 Score=18.95 Aligned_cols=55 Identities=9% Similarity=0.129 Sum_probs=28.9 Q ss_pred CCCEEEEECCEECCCHHHHHHHHHHHHHCCCCE-EEEEEEECC---CCCCCCCCCEEEEEEE Q ss_conf 888999889999389999999999886259956-999997177---6433468843688875 Q gi|254780700|r 429 KGMTIVSVNTHEVSCIKDVERLIGKAKEKKRDS-VLLQIKYDP---DMQSGNDNMSRFVSLK 486 (489) Q Consensus 429 ~GDiIl~VNg~~V~s~~dl~~iL~~~k~~~~~~-VLL~V~r~~---~~~~~~~~~~rFVal~ 486 (489) .|-.++.|+.+. .+++.++|++.+..+... |.=.|.-.. -......+..|++..+ T Consensus 278 EG~~v~~V~~E~---A~~vLe~lk~hp~G~~A~YiIG~V~e~~~~~V~l~t~~G~~R~ld~p 336 (345) T TIGR02124 278 EGKLVLAVPPEA---AEKVLEILKSHPLGKDAAYIIGEVVEKKEGLVVLKTAYGGKRILDMP 336 (345) T ss_pred CCEEEEEECHHH---HHHHHHHHHHCCCCCCCCEEEEEEEECCCCEEEEEECCCCEEEEECC T ss_conf 762899828377---99999998607764332156301473798779997068840565535 No 186 >cd04614 CBS_pair_1 The CBS domain, named after human CBS, is a small domain originally identified in cystathionine beta-synthase and is subsequently found in a wide range of different proteins. CBS domains usually occur in tandem repeats. They associate to form a so-called Bateman domain or a CBS pair based on crystallographic studies in bacteria. The CBS pair was used as a basis for this cd hierarchy since the human CBS proteins can adopt the typical core structure and form an intramolecular CBS pair. The interface between the two CBS domains forms a cleft that is a potential ligand binding site. The CBS pair coexists with a variety of other functional domains and this has been used to help in its classification here. It has been proposed that the CBS domain may play a regulatory role, although its exact function is unknown. Mutations of conserved residues within this domain are associated with a variety of human hereditary diseases, including congenital myotonia, idiopathic genera Probab=35.01 E-value=24 Score=14.47 Aligned_cols=12 Identities=33% Similarity=0.659 Sum_probs=6.0 Q ss_pred EEEECCCEEEEE Q ss_conf 354034303555 Q gi|254780700|r 233 PCFNALGHVIGV 244 (489) Q Consensus 233 pl~n~~G~viGi 244 (489) |++|.+|+++|| T Consensus 76 PVvd~~~~lvGi 87 (96) T cd04614 76 PIINGNDKLIGL 87 (96) T ss_pred EEECCCCCEEEE T ss_conf 289899919999 No 187 >pfam01079 Hint Hint module. This is an alignment of the Hint module in the Hedgehog proteins. It does not include any Inteins which also possess the Hint module. Probab=34.62 E-value=24 Score=14.43 Aligned_cols=12 Identities=25% Similarity=0.548 Sum_probs=8.3 Q ss_pred CCCCCCCEEEEE Q ss_conf 311124146752 Q gi|254780700|r 178 NNIRVGEAVFTI 189 (489) Q Consensus 178 ~~~~~G~~v~ai 189 (489) +++++||.|+.. T Consensus 104 s~V~~Gd~v~v~ 115 (214) T pfam01079 104 SDVRPGDYVLVQ 115 (214) T ss_pred CCCCCCCEEEEE T ss_conf 227789889999 No 188 >COG4956 Integral membrane protein (PIN domain superfamily) [General function prediction only] Probab=34.25 E-value=14 Score=15.87 Aligned_cols=43 Identities=16% Similarity=0.306 Sum_probs=33.0 Q ss_pred HHHHHCCCCCCCCCCEEEEEC-CCCCCCCEEEEECCCCCEEEEC Q ss_conf 887524314787431012220-3566752010120478166512 Q gi|254780700|r 329 VICMLDGRIIKSHQDFVWQIA-SRSPKEQVKISLCKEGSKHSVA 371 (489) Q Consensus 329 vI~~ing~~I~~~~~l~~~i~-~~~~G~~v~l~v~R~g~~~~~~ 371 (489) .+-++.|.++-+..||.+++. ..-||+++++++.++||+..-- T Consensus 268 KVae~qgV~vLNINDLAnAVkP~vlpGe~l~v~iiK~GkE~~QG 311 (356) T COG4956 268 KVAELQGVQVLNINDLANAVKPVVLPGEELTVQIIKDGKEPGQG 311 (356) T ss_pred HHHHHCCCCEECHHHHHHHHCCCCCCCCEEEEEEEECCCCCCCC T ss_conf 87764488463088888873773157871689985067656886 No 189 >pfam06893 consensus Probab=34.08 E-value=23 Score=14.53 Aligned_cols=32 Identities=9% Similarity=0.025 Sum_probs=17.7 Q ss_pred CCCCCCCCCCCCCCCCEEEEECCCCCCCCCCC Q ss_conf 76556556731112414675236655311112 Q gi|254780700|r 169 FIPVEFEDANNIRVGEAVFTIGNPFRLRGTVS 200 (489) Q Consensus 169 ~~~~~lg~s~~~~~G~~v~aiG~P~g~~~tvt 200 (489) .++++=|++-.+++|+..+--|+-.-...+++ T Consensus 46 ~~~i~~G~~~~V~Ig~d~VltG~VD~~~~~~~ 77 (341) T pfam06893 46 QSRIKQGQAVEVLIGGELVITGYVDSTPPRYD 77 (341) T ss_pred CCCCCCCCEEEEEECCEEEEEEEECCCCCCCC T ss_conf 77448999899999999999999872214357 No 190 >cd01720 Sm_D2 The eukaryotic Sm and Sm-like (LSm) proteins associate with RNA to form core domain of the ribonucleoprotein particles involved in a variety of RNA processing events including pre-mRNA splicing, telomere replication, and mRNA degradation. Members of this family share a highly conserved Sm fold containing an N-terminal helix followed by a strongly bent five-stranded antiparallel beta-sheet. Sm subunit D2 heterodimerizes with subunit D1 and three such heterodimers form a hexameric ring structure with alternating D1 and D2 subunits. The D1 - D2 heterodimer also assembles into a heptameric ring containing D2, D3, E, F, and G subunits. Sm-like proteins exist in archaea as well as prokaryotes which form heptameric and hexameric ring structures similar to those found in eukaryotes. Probab=33.59 E-value=24 Score=14.50 Aligned_cols=37 Identities=11% Similarity=0.218 Sum_probs=30.8 Q ss_pred CCCCCCEEEEECCCCEEEEECCCCCCCCCCEEEEEEE Q ss_conf 0478714379628980674011123344432899960 Q gi|254780700|r 128 IVEDGASFSVILSDDTELPAKLVGTDALFDLAVLKVQ 164 (489) Q Consensus 128 vv~~a~~i~V~~~dg~~~~a~vvg~D~~~DlAvlki~ 164 (489) .+.+-+.+.|.+-++++..+++.++|....+.+-.++ T Consensus 10 av~~~~~V~v~lr~~r~l~G~l~AfD~H~NmVL~dv~ 46 (87) T cd01720 10 AVKNNTQVLINCRNNKKLLGRVKAFDRHCNMVLENVK 46 (87) T ss_pred HHHCCCEEEEEECCCCEEEEEEEEEHHHHHHHHHHCE T ss_conf 8747967999908998887999861013554565201 No 191 >PRK06300 enoyl-(acyl carrier protein) reductase; Provisional Probab=33.06 E-value=11 Score=16.65 Aligned_cols=24 Identities=21% Similarity=0.170 Sum_probs=15.0 Q ss_pred CCHHHHHHHCCCCCCCCEEEECCCCC Q ss_conf 42166764417644441132011111 Q gi|254780700|r 291 NLTQELAIPLGLRGTKGSLITAVVKE 316 (489) Q Consensus 291 ~v~~~la~~lgl~~~~GvlV~~V~~~ 316 (489) .+|..+|..++- ..|+-|+.|.|+ T Consensus 201 ~lTr~lA~E~g~--~ygIRVNaI~PG 224 (298) T PRK06300 201 SDTKTLAWEAGR--RWGIRVNTISAG 224 (298) T ss_pred HHHHHHHHHHCC--CCCEEEEEEECC T ss_conf 659999998570--118089998548 No 192 >TIGR00337 PyrG CTP synthase; InterPro: IPR004468 CTP synthase is involved in pyrimidine ribonucleotide/ribonucleoside metabolism. The enzyme catalyzes the reaction L-glutamine + H2O + UTP + ATP = CTP + phosphate + ADP + L-glutamate. The enzyme exists as a dimer of identical chains that aggregates as a tetramer. This gene has been found circa 500 bp 5 upstream of enolase in both beta (Nitrosomonas europaea) and gamma (Escherichia coli) subdivisions of Proteobacterium .; GO: 0003883 CTP synthase activity, 0006221 pyrimidine nucleotide biosynthetic process. Probab=32.75 E-value=11 Score=16.68 Aligned_cols=82 Identities=23% Similarity=0.256 Sum_probs=43.1 Q ss_pred CCCCCCCCC---CHHHHHHHCCCCCCCCEEEECCCCCCCCCCCCCHHHHHHHHHCCC--CCCCCCCEEEEECCCCCCCCE Q ss_conf 433200034---216676441764444113201111121134671167888752431--478743101222035667520 Q gi|254780700|r 283 GWFGIMTQN---LTQELAIPLGLRGTKGSLITAVVKESPADKAGMKVGDVICMLDGR--IIKSHQDFVWQIASRSPKEQV 357 (489) Q Consensus 283 g~lGv~~~~---v~~~la~~lgl~~~~GvlV~~V~~~sPA~~AGLk~GDvI~~ing~--~I~~~~~l~~~i~~~~~G~~v 357 (489) -|||-++|= +|.++.+. |.++.. .|.+..|.-+.=+|++|.|+ .|.+.- +.++|+.++ T Consensus 108 dYLG~TVQiIPHiTnEIK~~----------I~~~A~-~P~eDtG~~~Dv~IvEiGGTVGDIEs~P-FLEAiRQ~~----- 170 (571) T TIGR00337 108 DYLGKTVQIIPHITNEIKDR----------IKRVAK-KPVEDTGSGADVVIVEIGGTVGDIESLP-FLEAIRQLK----- 170 (571) T ss_pred CCCCCCEEEECCCCHHHHHH----------HHHHHC-CCCCCCCCCCCEEEEEECCCCCCCCCHH-HHHHHHHHH----- T ss_conf 21488279842554678999----------999603-7764567997479998377000003625-899999999----- Q ss_pred EEEECCCCCEEEECCCCCCCCCCCCC Q ss_conf 10120478166512556558763100 Q gi|254780700|r 358 KISLCKEGSKHSVAVVLGSSPTAKND 383 (489) Q Consensus 358 ~l~v~R~g~~~~~~V~l~~~p~~~~~ 383 (489) .++= ..+..-+.|||-+......+ T Consensus 171 -~e~G-~Env~~iHvTLVP~i~aagE 194 (571) T TIGR00337 171 -KEVG-RENVLFIHVTLVPYIAAAGE 194 (571) T ss_pred -HHHC-CCCEEEEEEEECCCCCCCCC T ss_conf -8738-98679998400263144874 No 193 >TIGR00739 yajC preprotein translocase, YajC subunit; InterPro: IPR003849 This entry describes proteins of unknown function.. Probab=32.71 E-value=21 Score=14.83 Aligned_cols=13 Identities=38% Similarity=0.846 Sum_probs=5.4 Q ss_pred CCCCCCCEEEEEC Q ss_conf 3111241467523 Q gi|254780700|r 178 NNIRVGEAVFTIG 190 (489) Q Consensus 178 ~~~~~G~~v~aiG 190 (489) ++|+.||.|++.| T Consensus 36 ~~L~KGd~V~T~g 48 (86) T TIGR00739 36 ESLKKGDKVLTIG 48 (86) T ss_pred HCCCCCCEEEECC T ss_conf 5279977899838 No 194 >cd05701 S1_Rrp5_repeat_hs10 S1_Rrp5_repeat_hs10: Rrp5 is a trans-acting factor important for biogenesis of both the 40S and 60S eukaryotic ribosomal subunits. Rrp5 has two distinct regions, an N-terminal region containing tandemly repeated S1 RNA-binding domains (12 S1 repeats in Saccharomyces cerevisiae Rrp5 and 14 S1 repeats in Homo sapiens Rrp5) and a C-terminal region containing tetratricopeptide repeat (TPR) motifs thought to be involved in protein-protein interactions. Mutational studies have shown that each region represents a specific functional domain. Deletions within the S1-containing region inhibit pre-rRNA processing at either site A3 or A2, whereas deletions within the TPR region confer an inability to support cleavage of A0-A2. This CD includes H. sapiens S1 repeat 10 (hs10). Rrp5 is found in eukaryotes but not in prokaryotes or archaea. Probab=32.58 E-value=30 Score=13.85 Aligned_cols=44 Identities=16% Similarity=0.233 Sum_probs=21.7 Q ss_pred EEEECCCCCCCCCCEEEEEEECCCCCCCCCC---------CCCCCCCCCCEEEEE Q ss_conf 6740111233444328999606766765565---------567311124146752 Q gi|254780700|r 144 ELPAKLVGTDALFDLAVLKVQSDRKFIPVEF---------EDANNIRVGEAVFTI 189 (489) Q Consensus 144 ~~~a~vvg~D~~~DlAvlki~~~~~~~~~~l---------g~s~~~~~G~~v~ai 189 (489) ++.|.|-..+ -|+|+..+.....|.+.+. -||.++++|+.+.+. T Consensus 3 ~h~a~VQH~~--k~FAi~SL~~Tg~L~afp~~sHlNdtFrfdSeKL~vGq~~~v~ 55 (69) T cd05701 3 RHTAIVQHAD--KDFAIVSLATTGDLAAFPTRSHLNDTFRFDSEKLSVGQCLDVT 55 (69) T ss_pred CCHHHHHHHH--HCEEEEEEECCCCEEEEECHHHCCCCCCCCCCEEECCCEEEEE T ss_conf 1025332121--1068999604464799971464366434573115336549999 No 195 >COG4784 Putative Zn-dependent protease [General function prediction only] Probab=32.54 E-value=25 Score=14.35 Aligned_cols=63 Identities=6% Similarity=0.101 Sum_probs=30.0 Q ss_pred CCCEEEEECCCCCEEEECCCCCCCCCCCCCCCCCCCCCCCCCEEEEECCHHHCC---EEEEEEEEECCCCH Q ss_conf 752010120478166512556558763100001246545254698728965715---20079996068897 Q gi|254780700|r 354 KEQVKISLCKEGSKHSVAVVLGSSPTAKNDMHLEVGDKELLGMVLQDINDGNKK---LVRIVALNPNRERE 421 (489) Q Consensus 354 G~~v~l~v~R~g~~~~~~V~l~~~p~~~~~~~~~~~~~~~lGl~v~~l~~~~~~---~~gi~vv~v~~~s~ 421 (489) +-...+.|+|.|...-.-++..+.-...... .....--+|+.+++.++. ...+.++.+.++-. T Consensus 373 ~w~fdvaVI~~g~rvyrfltavp~gs~~l~~-----~a~sv~~SFR~lt~~E~a~lkPlrirvvtVk~GqT 438 (479) T COG4784 373 RWQFDVAVIRAGDRVYRFLTAVPKGSTALEP-----RANSVRRSFRPLTPAERAALKPLRIRVVTVKPGQT 438 (479) T ss_pred CCCCEEEEEEECCEEEEEEEECCCCCCHHHH-----HHHHHHHHCCCCCHHHHHCCCCEEEEEEEECCCCC T ss_conf 4441289997288889998731467400127-----89988863524887677405761799998258761 No 196 >cd04615 CBS_pair_2 The CBS domain, named after human CBS, is a small domain originally identified in cystathionine beta-synthase and is subsequently found in a wide range of different proteins. CBS domains usually occur in tandem repeats. They associate to form a so-called Bateman domain or a CBS pair based on crystallographic studies in bacteria. The CBS pair was used as a basis for this cd hierarchy since the human CBS proteins can adopt the typical core structure and form an intramolecular CBS pair. The interface between the two CBS domains forms a cleft that is a potential ligand binding site. The CBS pair coexists with a variety of other functional domains and this has been used to help in its classification here. It has been proposed that the CBS domain may play a regulatory role, although its exact function is unknown. Mutations of conserved residues within this domain are associated with a variety of human hereditary diseases, including congenital myotonia, idiopathic genera Probab=32.45 E-value=29 Score=13.88 Aligned_cols=19 Identities=16% Similarity=0.401 Sum_probs=10.6 Q ss_pred CCCCCEEEECCCEEEEEEC Q ss_conf 3477035403430355512 Q gi|254780700|r 228 GNSGGPCFNALGHVIGVNA 246 (489) Q Consensus 228 GnSGGpl~n~~G~viGint 246 (489) +-++-|++|.+|+++||-| T Consensus 23 ~~~~~pVvd~~~~lvGivT 41 (113) T cd04615 23 GSRALPVVDDKKRLVGIIT 41 (113) T ss_pred CCCEEEEECCCCEEEEEEE T ss_conf 9978999948997999999 No 197 >COG3338 Cah Carbonic anhydrase [Inorganic ion transport and metabolism] Probab=32.31 E-value=30 Score=13.82 Aligned_cols=24 Identities=42% Similarity=0.408 Sum_probs=18.8 Q ss_pred CCCEEE--EECCCCCCCCCCEEEEEE Q ss_conf 898067--401112334443289996 Q gi|254780700|r 140 SDDTEL--PAKLVGTDALFDLAVLKV 163 (489) Q Consensus 140 ~dg~~~--~a~vvg~D~~~DlAvlki 163 (489) -||+.| +|..|..|+..+||||-+ T Consensus 123 v~Gk~~pmEaHFVHkd~~g~L~Vl~v 148 (250) T COG3338 123 VDGKSFPMEAHFVHKDAKGTLAVLAV 148 (250) T ss_pred HCCCCCCCEEEEEECCCCCCEEEEEE T ss_conf 14664664014662178986899887 No 198 >PRK13605 endoribonuclease SymE; Provisional Probab=32.00 E-value=18 Score=15.28 Aligned_cols=57 Identities=11% Similarity=0.095 Sum_probs=31.6 Q ss_pred EECHHCCCCCCEEEEECCCCEEEEECCCCCCCCCCEEEEEEECCCCCCCCCCCCCCCCCCCCEEEEECCCCCCC Q ss_conf 85101047871437962898067401112334443289996067667655655673111241467523665531 Q gi|254780700|r 123 LTSNHIVEDGASFSVILSDDTELPAKLVGTDALFDLAVLKVQSDRKFIPVEFEDANNIRVGEAVFTIGNPFRLR 196 (489) Q Consensus 123 lTn~hvv~~a~~i~V~~~dg~~~~a~vvg~D~~~DlAvlki~~~~~~~~~~lg~s~~~~~G~~v~aiG~P~g~~ 196 (489) +|+-|.+.+...-.|...+.|.+.......-+. -..+|.+.| -|+|.-+.|.+.|.. T Consensus 1 mt~~hsia~~~~~evsp~nnr~~tV~Yasr~~d----------y~~iPAI~L-------kGqWLeeAGF~tG~~ 57 (113) T PRK13605 1 MTDTHSIAQPFEAEVSPANNRQLTVSYASRYPD----------YSRIPAITL-------KGQWLEAAGFATGTA 57 (113) T ss_pred CCCCCCCCCCCCCCCCCCCCCEEEEEEECCCCC----------CCCCCCEEE-------CCHHHHHCCCCCCCE T ss_conf 987533334578865877774488875303787----------562754544-------638899729877981 No 199 >TIGR01379 thiL thiamine-monophosphate kinase; InterPro: IPR006283 This family represents thiamine-monophosphate kinase, an enzyme that converts thiamine monophosphate into thiamine pyrophosphate (TPP, coenzyme B1), an enzyme cofactor. Thiamine monophosphate may be derived from de novo synthesis or from unphosphorylated thiamine, known as vitamin B1. Eukaryotes lack this enzyme, and add pyrophosphate from ATP to unphosphorylated thiamine in a single step. ; GO: 0009030 thiamin phosphate kinase activity, 0009228 thiamin biosynthetic process. Probab=31.84 E-value=31 Score=13.77 Aligned_cols=48 Identities=19% Similarity=0.200 Sum_probs=19.6 Q ss_pred EEEECCCCCCCCCCEEEEEEECCCCCCCCCCCCCCCCCCCCEEEEECCC Q ss_conf 6740111233444328999606766765565567311124146752366 Q gi|254780700|r 144 ELPAKLVGTDALFDLAVLKVQSDRKFIPVEFEDANNIRVGEAVFTIGNP 192 (489) Q Consensus 144 ~~~a~vvg~D~~~DlAvlki~~~~~~~~~~lg~s~~~~~G~~v~aiG~P 192 (489) .|.+++||-|-..- .++-+..-...+.=..=.-+..++||+|++-|.+ T Consensus 113 ~Y~~~LiGGDT~~~-~~~~~T~iG~~~~~~~~~RsgAk~GD~v~VTG~l 160 (336) T TIGR01379 113 QYGVDLIGGDTVKS-LVVTVTAIGEAPKGRALLRSGAKPGDLVFVTGTL 160 (336) T ss_pred HCCCEEEECCCCCE-EEEEEEEEEEECCCCEEECCCCCCCCEEEEECCC T ss_conf 53987872440021-3143457897568973323678777678995883 No 200 >cd01721 Sm_D3 The eukaryotic Sm and Sm-like (LSm) proteins associate with RNA to form core domain of the ribonucleoprotein particles involved in a variety of RNA processing events including pre-mRNA splicing, telomere replication, and mRNA degradation. Members of this family share a highly conserved Sm fold containing an N-terminal helix followed by a strongly bent five-stranded antiparallel beta-sheet. Sm subunit D3 heterodimerizes with subunit B and three such heterodimers form a hexameric ring structure with alternating B and D3 subunits. The D3 - B heterodimer also assembles into a heptameric ring containing D1, D2, E, F, and G subunits. Sm-like proteins exist in archaea as well as prokaryotes which form heptameric and hexameric ring structures similar to those found in eukaryotes. Probab=31.25 E-value=31 Score=13.71 Aligned_cols=33 Identities=18% Similarity=0.237 Sum_probs=28.2 Q ss_pred CCEEEEECCCCEEEEECCCCCCCCCCEEEEEEE Q ss_conf 714379628980674011123344432899960 Q gi|254780700|r 132 GASFSVILSDDTELPAKLVGTDALFDLAVLKVQ 164 (489) Q Consensus 132 a~~i~V~~~dg~~~~a~vvg~D~~~DlAvlki~ 164 (489) ...+.|.+.||..|.++++..|...++-+-.+. T Consensus 10 g~~VtVELKnG~~y~G~L~~~d~~MN~~L~~v~ 42 (70) T cd01721 10 GHIVTVELKTGEVYRGKLIEAEDNMNCQLKDVT 42 (70) T ss_pred CCEEEEEECCCEEEEEEEEEEECCCCCEEEEEE T ss_conf 988999988994999999887023674998999 No 201 >cd04610 CBS_pair_ParBc_assoc This cd contains two tandem repeats of the cystathionine beta-synthase (CBS pair) domains associated with a ParBc (ParB-like nuclease) domain downstream. CBS is a small domain originally identified in cystathionine beta-synthase and subsequently found in a wide range of different proteins. CBS domains usually come in tandem repeats, which associate to form a so-called Bateman domain or a CBS pair which is reflected in this model. The interface between the two CBS domains forms a cleft that is a potential ligand binding site. The CBS pair coexists with a variety of other functional domains. It has been proposed that the CBS domain may play a regulatory role, although its exact function is unknown. Probab=29.97 E-value=32 Score=13.64 Aligned_cols=15 Identities=13% Similarity=0.368 Sum_probs=9.9 Q ss_pred CEEEECCCEEEEEEC Q ss_conf 035403430355512 Q gi|254780700|r 232 GPCFNALGHVIGVNA 246 (489) Q Consensus 232 Gpl~n~~G~viGint 246 (489) =|++|.+|+++||-| T Consensus 86 lpVvde~g~lvGiiT 100 (107) T cd04610 86 LPVVDENNNLVGIIT 100 (107) T ss_pred EEEECCCCEEEEEEE T ss_conf 969923998999999 No 202 >pfam01455 HupF_HypC HupF/HypC family. Probab=29.54 E-value=33 Score=13.52 Aligned_cols=41 Identities=24% Similarity=0.411 Sum_probs=21.2 Q ss_pred EEECCCCCCCCCCEEEEEEECCCCCCCCCCCCCCCCCCCCEEEE Q ss_conf 74011123344432899960676676556556731112414675 Q gi|254780700|r 145 LPAKLVGTDALFDLAVLKVQSDRKFIPVEFEDANNIRVGEAVFT 188 (489) Q Consensus 145 ~~a~vvg~D~~~DlAvlki~~~~~~~~~~lg~s~~~~~G~~v~a 188 (489) .+++++-.|. -+.|+....+-. ..+.+.--.++++||||+. T Consensus 5 iP~~Vv~i~~-~~~A~vd~~G~~--r~v~l~lv~~~~~GD~VLV 45 (67) T pfam01455 5 IPGKVVEIDD-GNMALVDFGGVR--REVSLALVPEVKVGDYVLV 45 (67) T ss_pred CCEEEEEECC-CCEEEEEECCEE--EEEEEEECCCCCCCCEEEE T ss_conf 4509999979-988999809979--9999757188998989999 No 203 >KOG3460 consensus Probab=28.61 E-value=23 Score=14.54 Aligned_cols=28 Identities=25% Similarity=0.328 Sum_probs=21.3 Q ss_pred CEEEEECCCCEEEEECCCCCCCCCCEEE Q ss_conf 1437962898067401112334443289 Q gi|254780700|r 133 ASFSVILSDDTELPAKLVGTDALFDLAV 160 (489) Q Consensus 133 ~~i~V~~~dg~~~~a~vvg~D~~~DlAv 160 (489) +.++|++-+++++.+++-++|....+-+ T Consensus 16 ErVyVKlr~drel~G~L~afD~HlNmvL 43 (91) T KOG3460 16 ERVYVKLRSDRELRGTLHAFDEHLNMVL 43 (91) T ss_pred CEEEEEECCCHHHCEEHHHHHHHHHHHH T ss_conf 2699996177414002356677666645 No 204 >TIGR03431 PhnD phosphonate ABC transporter, periplasmic phosphonate binding protein. Note that this model does not identify all phnD-subfamily genes with evident phosphonate context, but all sequences above the trusted context may be inferred to bind phosphonate compounds even in the absence of such context. Furthermore, there is ample evidence to suggest that many other members of the TIGR01098 subfamily have a different primary function. Probab=28.58 E-value=35 Score=13.42 Aligned_cols=14 Identities=36% Similarity=0.508 Sum_probs=8.5 Q ss_pred CCHHHHHHHHHHHH Q ss_conf 93027899999999 Q gi|254780700|r 1 MFKRQILSVKSICT 14 (489) Q Consensus 1 m~~r~~~~~~~~~~ 14 (489) ||||.++.+.++++ T Consensus 1 m~~r~l~~~~~~~~ 14 (288) T TIGR03431 1 MLRRLILSLVAAFM 14 (288) T ss_pred CCHHHHHHHHHHHH T ss_conf 90889999999999 No 205 >pfam04083 Abhydro_lipase ab-hydrolase associated lipase region. Probab=28.34 E-value=35 Score=13.39 Aligned_cols=19 Identities=42% Similarity=0.536 Sum_probs=14.6 Q ss_pred EEEECCCCEEEECHHCCCC Q ss_conf 7897599629851010478 Q gi|254780700|r 113 GFFITDDGYILTSNHIVED 131 (489) Q Consensus 113 G~ii~~~G~ilTn~hvv~~ 131 (489) =.+..+||||||-+++-.+ T Consensus 15 h~V~T~DGYiL~l~RIp~~ 33 (62) T pfam04083 15 HEVTTEDGYILTLHRIPPG 33 (62) T ss_pred EEEECCCCCEEEEEECCCC T ss_conf 9998288819999975888 No 206 >pfam02743 Cache_1 Cache domain. Probab=28.18 E-value=27 Score=14.15 Aligned_cols=12 Identities=33% Similarity=0.886 Sum_probs=7.9 Q ss_pred EEEECCCEEEEE Q ss_conf 354034303555 Q gi|254780700|r 233 PCFNALGHVIGV 244 (489) Q Consensus 233 pl~n~~G~viGi 244 (489) |+.|.+|+++|+ T Consensus 20 pi~d~~g~~~GV 31 (81) T pfam02743 20 PVYDRDGDLLGV 31 (81) T ss_pred EEECCCCCEEEE T ss_conf 999999989999 No 207 >PRK13484 putative iron-regulated outer membrane virulence protein; Provisional Probab=28.12 E-value=35 Score=13.37 Aligned_cols=17 Identities=12% Similarity=0.235 Sum_probs=7.3 Q ss_pred CHHHHHHHHHHHHHHHH Q ss_conf 30278999999999999 Q gi|254780700|r 2 FKRQILSVKSICTVALT 18 (489) Q Consensus 2 ~~r~~~~~~~~~~~~l~ 18 (489) ||++++...+++++|.+ T Consensus 1 ~~~~~~~~~~~~~~~~~ 17 (682) T PRK13484 1 MKNKYIIAPGIAVMCSA 17 (682) T ss_pred CCCEEHHHHHHHHHHHH T ss_conf 97212489999999988 No 208 >cd04621 CBS_pair_8 The CBS domain, named after human CBS, is a small domain originally identified in cystathionine beta-synthase and is subsequently found in a wide range of different proteins. CBS domains usually occur in tandem repeats. They associate to form a so-called Bateman domain or a CBS pair based on crystallographic studies in bacteria. The CBS pair was used as a basis for this cd hierarchy since the human CBS proteins can adopt the typical core structure and form an intramolecular CBS pair. The interface between the two CBS domains forms a cleft that is a potential ligand binding site. The CBS pair coexists with a variety of other functional domains and this has been used to help in its classification here. It has been proposed that the CBS domain may play a regulatory role, although its exact function is unknown. Mutations of conserved residues within this domain are associated with a variety of human hereditary diseases, including congenital myotonia, idiopathic genera Probab=28.11 E-value=35 Score=13.37 Aligned_cols=19 Identities=21% Similarity=0.233 Sum_probs=13.3 Q ss_pred CCCCCEEEECCCEEEEEEC Q ss_conf 3477035403430355512 Q gi|254780700|r 228 GNSGGPCFNALGHVIGVNA 246 (489) Q Consensus 228 GnSGGpl~n~~G~viGint 246 (489) +-|+-|++|.+|+++||-| T Consensus 23 ~i~~lpVvd~~g~lvGivT 41 (135) T cd04621 23 GVGRVIVVDDNGKPVGVIT 41 (135) T ss_pred CCCEEEEECCCCEEEEEEE T ss_conf 9977999959993999998 No 209 >cd04592 CBS_pair_EriC_assoc_euk This cd contains two tandem repeats of the cystathionine beta-synthase (CBS pair) domains in the EriC CIC-type chloride channels in eukaryotes. These ion channels are proteins with a seemingly simple task of allowing the passive flow of chloride ions across biological membranes. CIC-type chloride channels come from all kingdoms of life, have several gene families, and can be gated by voltage. The members of the CIC-type chloride channel are double-barreled: two proteins forming homodimers at a broad interface formed by four helices from each protein. The two pores are not found at this interface, but are completely contained within each subunit, as deduced from the mutational analyses, unlike many other channels, in which four or five identical or structurally related subunits jointly form one pore. CBS is a small domain originally identified in cystathionine beta-synthase and subsequently found in a wide range of different proteins. CBS domains usually Probab=27.92 E-value=36 Score=13.34 Aligned_cols=21 Identities=10% Similarity=0.064 Sum_probs=17.3 Q ss_pred ECCCCCCEEEECCCEEEEEEC Q ss_conf 013477035403430355512 Q gi|254780700|r 226 NQGNSGGPCFNALGHVIGVNA 246 (489) Q Consensus 226 npGnSGGpl~n~~G~viGint 246 (489) ....|+.|++|.+|.++||-| T Consensus 21 ~~~~~~~~VVD~~~~L~GIvt 41 (133) T cd04592 21 DEKQSCVLVVDSDDFLEGILT 41 (133) T ss_pred HHCCCEEEEECCCCCEEEEEE T ss_conf 818865799838997899978 No 210 >cd04801 CBS_pair_M50_like This cd contains two tandem repeats of the cystathionine beta-synthase (CBS pair) domains in association with the metalloprotease peptidase M50. CBS is a small domain originally identified in cystathionine beta-synthase and subsequently found in a wide range of different proteins. CBS domains usually come in tandem repeats, which associate to form a so-called Bateman domain or a CBS pair which is reflected in this model. The interface between the two CBS domains forms a cleft that is a potential ligand binding site. The CBS pair coexists with a variety of other functional domains. It has been proposed that the CBS domain may play a regulatory role, although its exact function is unknown. Probab=27.60 E-value=35 Score=13.43 Aligned_cols=19 Identities=11% Similarity=0.165 Sum_probs=10.7 Q ss_pred CCCCCEEEECCCEEEEEEC Q ss_conf 3477035403430355512 Q gi|254780700|r 228 GNSGGPCFNALGHVIGVNA 246 (489) Q Consensus 228 GnSGGpl~n~~G~viGint 246 (489) +.++-|++|.+|+++||-| T Consensus 24 ~~~~~pVvd~~g~l~Givt 42 (114) T cd04801 24 NQRRFVVVDNEGRYVGIIS 42 (114) T ss_pred CCEEEEEECCCCEEEEEEE T ss_conf 9668999878997999999 No 211 >pfam04225 OapA Opacity-associated protein A LysM-like domain. This family includes the Haemophilus influenzae opacity-associated protein. This protein is required for efficient nasopharyngeal mucosal colonisation, and its expression is associated with a distinctive transparent colony phenotype. OapA is thought to be a secreted protein, and its expression exhibits high-frequency phase variation. This is a LysM-like domain. Probab=26.53 E-value=20 Score=14.94 Aligned_cols=28 Identities=14% Similarity=0.246 Sum_probs=20.6 Q ss_pred EECCCCCCCCEEEEECCCCCEEEECCCC Q ss_conf 2203566752010120478166512556 Q gi|254780700|r 347 QIASRSPKEQVKISLCKEGSKHSVAVVL 374 (489) Q Consensus 347 ~i~~~~~G~~v~l~v~R~g~~~~~~V~l 374 (489) .+...+||+++.+.+--+|+...+++.. T Consensus 39 ~Ls~Lk~Gq~v~~~~n~~G~l~~L~i~~ 66 (85) T pfam04225 39 PLSNIKSGQLVRIKLNAQGRVDELQIEN 66 (85) T ss_pred CHHHCCCCCEEEEEECCCCCEEEEEEEC T ss_conf 0544589999999999999889999814 No 212 >KOG1387 consensus Probab=26.52 E-value=23 Score=14.58 Aligned_cols=28 Identities=7% Similarity=-0.095 Sum_probs=16.3 Q ss_pred CCCEEEEECCEECC----CHHHHHHHHHHHHH Q ss_conf 88899988999938----99999999998862 Q gi|254780700|r 429 KGMTIVSVNTHEVS----CIKDVERLIGKAKE 456 (489) Q Consensus 429 ~GDiIl~VNg~~V~----s~~dl~~iL~~~k~ 456 (489) .=|++..-+|+++. +..|..+++-++.. T Consensus 392 ~lDIV~~~~G~~tGFla~t~~EYaE~iLkIv~ 423 (465) T KOG1387 392 LLDIVTPWDGETTGFLAPTDEEYAEAILKIVK 423 (465) T ss_pred CEEEEECCCCCCCEEECCCHHHHHHHHHHHHH T ss_conf 32364045786010115872899999999997 No 213 >PRK07807 inositol-5-monophosphate dehydrogenase; Validated Probab=26.47 E-value=38 Score=13.18 Aligned_cols=20 Identities=15% Similarity=0.391 Sum_probs=10.5 Q ss_pred CCEEEEEECCCCE---EEECHHC Q ss_conf 3402789759962---9851010 Q gi|254780700|r 109 MFGSGFFITDDGY---ILTSNHI 128 (489) Q Consensus 109 ~~GsG~ii~~~G~---ilTn~hv 128 (489) +.++..+++++|. |+||..+ T Consensus 119 ~~sg~pVv~~~gkLvGIvT~RDi 141 (479) T PRK07807 119 AHGAVVVVDEEGRPVGLVTEADC 141 (479) T ss_pred CCCCCCEECCCCCEEEEEECHHH T ss_conf 78887414679947889821341 No 214 >COG4810 EutS Ethanolamine utilization protein [Amino acid transport and metabolism] Probab=26.33 E-value=25 Score=14.31 Aligned_cols=11 Identities=18% Similarity=0.365 Sum_probs=4.3 Q ss_pred EEEECHHCCCC Q ss_conf 29851010478 Q gi|254780700|r 121 YILTSNHIVED 131 (489) Q Consensus 121 ~ilTn~hvv~~ 131 (489) .-+|-+|.|.+ T Consensus 23 KQVTLAHLIAn 33 (121) T COG4810 23 KQVTLAHLIAN 33 (121) T ss_pred CEEEHHHHHCC T ss_conf 21257788709 No 215 >cd04606 CBS_pair_Mg_transporter This cd contains two tandem repeats of the cystathionine beta-synthase (CBS pair) domain in the magnesium transporter, MgtE. MgtE and its homologs are found in eubacteria, archaebacteria, and eukaryota. Members of this family transport Mg2+ or other divalent cations into the cell via two highly conserved aspartates. CBS is a small domain originally identified in cystathionine beta-synthase and subsequently found in a wide range of different proteins. CBS domains usually come in tandem repeats, which associate to form a so-called Bateman domain or a CBS pair which is reflected in this model. The interface between the two CBS domains forms a cleft that is a potential ligand binding site. The CBS pair coexists with a variety of other functional domains. It has been proposed that the CBS domain may play a regulatory role, although its exact function is unknown. Probab=26.31 E-value=38 Score=13.16 Aligned_cols=15 Identities=20% Similarity=0.496 Sum_probs=9.3 Q ss_pred CEEEECCCEEEEEEC Q ss_conf 035403430355512 Q gi|254780700|r 232 GPCFNALGHVIGVNA 246 (489) Q Consensus 232 Gpl~n~~G~viGint 246 (489) =|++|.+|+++||-| T Consensus 87 lPVVd~~~~lvGiIt 101 (109) T cd04606 87 LPVVDEEGRLVGIIT 101 (109) T ss_pred EEEECCCCEEEEEEE T ss_conf 468988997999999 No 216 >TIGR01975 isoAsp_dipep beta-aspartyl peptidase; InterPro: IPR010229 Metalloproteases are the most diverse of the four main types of protease, with more than 50 families identified to date. In these enzymes, a divalent cation, usually zinc, activates the water molecule. The metal ion is held in place by amino acid ligands, usually three in number. The known metal ligands are His, Glu, Asp or Lys and at least one other residue is required for catalysis, which may play an electrophillic role. Of the known metalloproteases, around half contain an HEXXH motif, which has been shown in crystallographic studies to form part of the metal-binding site . The HEXXH motif is relatively common, but can be more stringently defined for metalloproteases as 'abXHEbbHbc', where 'a' is most often valine or threonine and forms part of the S1' subsite in thermolysin and neprilysin, 'b' is an uncharged residue, and 'c' a hydrophobic residue. Proline is never found in this site, possibly because it would break the helical structure adopted by this motif in metalloproteases . Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Each clan is identified with two letters, the first representing the catalytic type of the families included in the clan (with the letter 'P' being used for a clan containing families of more than one of the catalytic types serine, threonine and cysteine). Some families cannot yet be assigned to clans, and when a formal assignment is required, such a family is described as belonging to clan A-, C-, M-, S-, T- or U-, according to the catalytic type. Some clans are divided into subclans because there is evidence of a very ancient divergence within the clan, for example MA(E), the gluzincins, and MA(M), the metzincins. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. The serine, threonine and cysteine peptidases utilise the amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule. This group of proteins include metallopeptidases belonging to the MEROPS peptidase family M38 (clan MJ, beta-aspartyl dipeptidase family). This entry includes the beta-aspartyl dipeptidase from Escherichia coli, (3.4.19.5 from EC, IadA), which degrades isoaspartyl dipeptides and may unblock degradation of proteins that cannot be repaired. This entry also describes closely related proteins from other species (e.g. Clostridium perfringens, Thermoanaerobacter tengcongensis) that may have an equivalent in function. This family shows homology to dihydroorotases. The L-isoaspartyl derivative of Asp arises non-enzymatically over time as a form of protein damage. In this isomerisation, the connectivity of the polypeptide changes to pass through the beta-carboxyl of the side chain. Much but not all of this damage can be repaired by protein-L-isoaspartate (D-aspartate) O-methyltransferase.. Probab=26.27 E-value=29 Score=13.90 Aligned_cols=86 Identities=14% Similarity=0.118 Sum_probs=36.5 Q ss_pred CCCHHHHHHHC--CC-CCCCCEEEECCCCCCCCCCCCCHHHHHHHHHCCCCCCCC--CCEEEEECCCCCCCCEEEEECCC Q ss_conf 34216676441--76-444411320111112113467116788875243147874--31012220356675201012047 Q gi|254780700|r 290 QNLTQELAIPL--GL-RGTKGSLITAVVKESPADKAGMKVGDVICMLDGRIIKSH--QDFVWQIASRSPKEQVKISLCKE 364 (489) Q Consensus 290 ~~v~~~la~~l--gl-~~~~GvlV~~V~~~sPA~~AGLk~GDvI~~ing~~I~~~--~~l~~~i~~~~~G~~v~l~v~R~ 364 (489) ++++.-.||+- || ....|+..-+|=. |+ + .|++=-.|++=-..||+.+ .|+.|....+. --|++.|. T Consensus 171 ~~L~~~aAeARVGGLLgGK~Giv~~H~Gd-s~--~-~L~~i~~~v~~~dvPi~q~lPTH~nR~~~LFE----~g~~fa~~ 242 (391) T TIGR01975 171 EELTNLAAEARVGGLLGGKPGIVNLHVGD-SK--R-KLEPIEELVEETDVPITQFLPTHINRNRELFE----AGLEFAKK 242 (391) T ss_pred HHHHHHHHHHCCCCCCCCCCCEEEEEECC-CH--H-HHHHHHHHHHHCCCCCCCCCCCCCCCCHHHHH----HHHHHHHC T ss_conf 99999977511241116887568996369-86--7-77799999850588700255776476756899----99999973 Q ss_pred CCEEEECCCCCCCCCCCCC Q ss_conf 8166512556558763100 Q gi|254780700|r 365 GSKHSVAVVLGSSPTAKND 383 (489) Q Consensus 365 g~~~~~~V~l~~~p~~~~~ 383 (489) |-...++-...+.+..... T Consensus 243 GG~iDlTss~~p~~~~ege 261 (391) T TIGR01975 243 GGTIDLTSSIDPQFRKEGE 261 (391) T ss_pred CCEEEEECCCCCCCCCCCC T ss_conf 9808760278887553554 No 217 >cd04627 CBS_pair_14 The CBS domain, named after human CBS, is a small domain originally identified in cystathionine beta-synthase and is subsequently found in a wide range of different proteins. CBS domains usually occur in tandem repeats. They associate to form a so-called Bateman domain or a CBS pair based on crystallographic studies in bacteria. The CBS pair was used as a basis for this cd hierarchy since the human CBS proteins can adopt the typical core structure and form an intramolecular CBS pair. The interface between the two CBS domains forms a cleft that is a potential ligand binding site. The CBS pair coexists with a variety of other functional domains and this has been used to help in its classification here. It has been proposed that the CBS domain may play a regulatory role, although its exact function is unknown. Mutations of conserved residues within this domain are associated with a variety of human hereditary diseases, including congenital myotonia, idiopathic gener Probab=26.07 E-value=38 Score=13.13 Aligned_cols=18 Identities=22% Similarity=0.252 Sum_probs=13.5 Q ss_pred CCCCEEEECCCEEEEEEC Q ss_conf 477035403430355512 Q gi|254780700|r 229 NSGGPCFNALGHVIGVNA 246 (489) Q Consensus 229 nSGGpl~n~~G~viGint 246 (489) =||=|++|.+|++||+-| T Consensus 99 i~~lpVVD~~g~lvGiiS 116 (123) T cd04627 99 ISSVAVVDNQGNLIGNIS 116 (123) T ss_pred CCEEEEECCCCCEEEEEE T ss_conf 887869859996999989 No 218 >cd04583 CBS_pair_ABC_OpuCA_assoc2 This cd contains two tandem repeats of the cystathionine beta-synthase (CBS pair) domains in association with the ABC transporter OpuCA. OpuCA is the ATP binding component of a bacterial solute transporter that serves a protective role to cells growing in a hyperosmolar environment but the function of the CBS domains in OpuCA remains unknown. In the related ABC transporter, OpuA, the tandem CBS domains have been shown to function as sensors for ionic strength, whereby they control the transport activity through an electronic switching mechanism. ABC transporters are a large family of proteins involved in the transport of a wide variety of different compounds, like sugars, ions, peptides, and more complex organic molecules. They are a subset of nucleotide hydrolases that contain a signature motif, Q-loop, and H-loop/switch region, in addition to the Walker A motif/P-loop and Walker B motif commonly found in a number of ATP- and GTP-binding and hydrolyz Probab=26.00 E-value=39 Score=13.13 Aligned_cols=14 Identities=21% Similarity=0.577 Sum_probs=10.4 Q ss_pred EEEECCCEEEEEEC Q ss_conf 35403430355512 Q gi|254780700|r 233 PCFNALGHVIGVNA 246 (489) Q Consensus 233 pl~n~~G~viGint 246 (489) |++|.+|+++||-| T Consensus 89 PVVd~~~~lvGiiT 102 (109) T cd04583 89 PVVDEDGKLVGLIT 102 (109) T ss_pred EEECCCCEEEEEEE T ss_conf 89964999999999 No 219 >pfam00789 UBX UBX domain. This domain is present in ubiquitin-regulatory proteins and is a general Cdc48-interacting module. Probab=25.28 E-value=40 Score=13.04 Aligned_cols=29 Identities=28% Similarity=0.440 Sum_probs=19.4 Q ss_pred CCCCEEEEECCCCEEEEECCCCCCCCCCE Q ss_conf 78714379628980674011123344432 Q gi|254780700|r 130 EDGASFSVILSDDTELPAKLVGTDALFDL 158 (489) Q Consensus 130 ~~a~~i~V~~~dg~~~~a~vvg~D~~~Dl 158 (489) .+...|+|+|+||+.+..+--..|+..|+ T Consensus 4 ~~~t~I~iRlpdG~r~~r~F~~~~tl~~v 32 (81) T pfam00789 4 EDVCRLQIRLPDGSRLVRRFNSSDPLQDV 32 (81) T ss_pred CCEEEEEEECCCCCEEEEEECCCCCHHHH T ss_conf 98289999989998899990899839999 No 220 >pfam06838 Alum_res Aluminium resistance protein. This family represents the aluminium resistance protein, which confers resistance to aluminium in bacteria. Probab=24.98 E-value=40 Score=13.01 Aligned_cols=12 Identities=25% Similarity=0.332 Sum_probs=3.9 Q ss_pred CEEEECCCEEEE Q ss_conf 035403430355 Q gi|254780700|r 232 GPCFNALGHVIG 243 (489) Q Consensus 232 Gpl~n~~G~viG 243 (489) |-|.-.-|-++| T Consensus 227 GgiaptGGYIaG 238 (405) T pfam06838 227 GGIAKTGGYIAG 238 (405) T ss_pred CCCCCCCCEEEC T ss_conf 773675777853 No 221 >cd04642 CBS_pair_29 The CBS domain, named after human CBS, is a small domain originally identified in cystathionine beta-synthase and is subsequently found in a wide range of different proteins. CBS domains usually occur in tandem repeats. They associate to form a so-called Bateman domain or a CBS pair based on crystallographic studies in bacteria. The CBS pair was used as a basis for this cd hierarchy since the human CBS proteins can adopt the typical core structure and form an intramolecular CBS pair. The interface between the two CBS domains forms a cleft that is a potential ligand binding site. The CBS pair coexists with a variety of other functional domains and this has been used to help in its classification here. It has been proposed that the CBS domain may play a regulatory role, although its exact function is unknown. Mutations of conserved residues within this domain are associated with a variety of human hereditary diseases, including congenital myotonia, idiopathic gener Probab=24.96 E-value=40 Score=13.00 Aligned_cols=18 Identities=33% Similarity=0.466 Sum_probs=9.6 Q ss_pred CCCCEEEECCCEEEEEEC Q ss_conf 477035403430355512 Q gi|254780700|r 229 NSGGPCFNALGHVIGVNA 246 (489) Q Consensus 229 nSGGpl~n~~G~viGint 246 (489) =||=|++|.+|+++|+-| T Consensus 24 i~~lPVvd~~g~lvGiis 41 (126) T cd04642 24 ISGLPVVDEKGKLIGNIS 41 (126) T ss_pred CCEEEEEECCCEEEEEEE T ss_conf 878999928990999999 No 222 >pfam08669 GCV_T_C Glycine cleavage T-protein C-terminal barrel domain. This is a family of glycine cleavage T-proteins, part of the glycine cleavage multienzyme complex (GCV) found in bacteria and the mitochondria of eukaryotes. GCV catalyses the catabolism of glycine in eukaryotes. The T-protein is an aminomethyl transferase. Probab=24.92 E-value=40 Score=13.00 Aligned_cols=31 Identities=19% Similarity=0.297 Sum_probs=18.6 Q ss_pred CCCEEEECCCEEEEEEC-CCCCCCC-CCCCCCC Q ss_conf 77035403430355512-3445532-2222232 Q gi|254780700|r 230 SGGPCFNALGHVIGVNA-MIVTSGQ-FHMGVGL 260 (489) Q Consensus 230 SGGpl~n~~G~viGint-~i~~~~g-~~~Gigf 260 (489) .|-|++..+|+.||.-| ..+++.- .+.|++| T Consensus 35 ~g~~i~~~~g~~vG~vTS~~~s~~~~~~iala~ 67 (95) T pfam08669 35 EGEPVLAADGEVVGEVTSGTYSPTLGKNIALAY 67 (95) T ss_pred CCCEEECCCCEEEEEEEEEEECCCCCCEEEEEE T ss_conf 999888189979979866669875797389999 No 223 >cd04582 CBS_pair_ABC_OpuCA_assoc This cd contains two tandem repeats of the cystathionine beta-synthase (CBS pair) domains in association with the ABC transporter OpuCA. OpuCA is the ATP binding component of a bacterial solute transporter that serves a protective role to cells growing in a hyperosmolar environment but the function of the CBS domains in OpuCA remains unknown. In the related ABC transporter, OpuA, the tandem CBS domains have been shown to function as sensors for ionic strength, whereby they control the transport activity through an electronic switching mechanism. ABC transporters are a large family of proteins involved in the transport of a wide variety of different compounds, like sugars, ions, peptides, and more complex organic molecules. They are a subset of nucleotide hydrolases that contain a signature motif, Q-loop, and H-loop/switch region, in addition to the Walker A motif/P-loop and Walker B motif commonly found in a number of ATP- and GTP-binding and hydrolyzi Probab=24.35 E-value=41 Score=12.93 Aligned_cols=16 Identities=25% Similarity=0.445 Sum_probs=12.7 Q ss_pred CCEEEECCCEEEEEEC Q ss_conf 7035403430355512 Q gi|254780700|r 231 GGPCFNALGHVIGVNA 246 (489) Q Consensus 231 GGpl~n~~G~viGint 246 (489) .=|++|.+|+++||-| T Consensus 84 ~lPVVD~~grlvGivT 99 (106) T cd04582 84 WLPCVDEDGRYVGEVT 99 (106) T ss_pred EEEEECCCCEEEEEEE T ss_conf 6258989990999998 No 224 >cd04635 CBS_pair_22 The CBS domain, named after human CBS, is a small domain originally identified in cystathionine beta-synthase and is subsequently found in a wide range of different proteins. CBS domains usually occur in tandem repeats. They associate to form a so-called Bateman domain or a CBS pair based on crystallographic studies in bacteria. The CBS pair was used as a basis for this cd hierarchy since the human CBS proteins can adopt the typical core structure and form an intramolecular CBS pair. The interface between the two CBS domains forms a cleft that is a potential ligand binding site. The CBS pair coexists with a variety of other functional domains and this has been used to help in its classification here. It has been proposed that the CBS domain may play a regulatory role, although its exact function is unknown. Mutations of conserved residues within this domain are associated with a variety of human hereditary diseases, including congenital myotonia, idiopathic gener Probab=24.17 E-value=42 Score=12.91 Aligned_cols=17 Identities=29% Similarity=0.604 Sum_probs=9.2 Q ss_pred CCCEEEECCCEEEEEEC Q ss_conf 77035403430355512 Q gi|254780700|r 230 SGGPCFNALGHVIGVNA 246 (489) Q Consensus 230 SGGpl~n~~G~viGint 246 (489) ||=|++|.+|+++||-| T Consensus 25 ~~lPVVd~~g~lvGiit 41 (122) T cd04635 25 TGLPVVQKAGELIGIIT 41 (122) T ss_pred CEEEEEECCCCEEEEEE T ss_conf 48999918982999999 No 225 >cd04596 CBS_pair_DRTGG_assoc This cd contains two tandem repeats of the cystathionine beta-synthase (CBS pair) domains associated with a DRTGG domain upstream. The function of the DRTGG domain, named after its conserved residues, is unknown. CBS is a small domain originally identified in cystathionine beta-synthase and subsequently found in a wide range of different proteins. CBS domains usually come in tandem repeats, which associate to form a so-called Bateman domain or a CBS pair which is reflected in this model. The interface between the two CBS domains forms a cleft that is a potential ligand binding site. The CBS pair coexists with a variety of other functional domains. It has been proposed that the CBS domain may play a regulatory role, although its exact function is unknown. Probab=23.37 E-value=43 Score=12.81 Aligned_cols=19 Identities=26% Similarity=0.478 Sum_probs=12.7 Q ss_pred CCCCCEEEECCCEEEEEEC Q ss_conf 3477035403430355512 Q gi|254780700|r 228 GNSGGPCFNALGHVIGVNA 246 (489) Q Consensus 228 GnSGGpl~n~~G~viGint 246 (489) |-|+-|++|.+|+++||-| T Consensus 24 ~~~~~PVvd~~~~lvGivt 42 (108) T cd04596 24 GHSRFPVVDEKNKVVGIVT 42 (108) T ss_pred CCCEEEEECCCCEEEEEEE T ss_conf 9988999968990999999 No 226 >KOG0340 consensus Probab=23.28 E-value=24 Score=14.48 Aligned_cols=187 Identities=12% Similarity=0.104 Sum_probs=90.3 Q ss_pred CCCCCCCCEEEEEEECCCCCCCCCCCCCCCCCCCCEEEEECCCCCCCCCCCCCCCCCCC--CCCCCCCCCEEEEEEEEEC Q ss_conf 12334443289996067667655655673111241467523665531111258744311--2233443420233233201 Q gi|254780700|r 150 VGTDALFDLAVLKVQSDRKFIPVEFEDANNIRVGEAVFTIGNPFRLRGTVSAGIVSALD--RDIPDRPGTFTQIDAPINQ 227 (489) Q Consensus 150 vg~D~~~DlAvlki~~~~~~~~~~lg~s~~~~~G~~v~aiG~P~g~~~tvt~GiiSa~~--R~~~~~~~~~iqtDa~Inp 227 (489) +..||..-+|++=-.. .+ -.++.-|.-.|.|.|.++..++-.|=-+-+. ..+.+.+.. .-.+| T Consensus 69 LsedP~giFalvlTPT-rE---------LA~QiaEQF~alGk~l~lK~~vivGG~d~i~qa~~L~~rPHv-----VvatP 133 (442) T KOG0340 69 LSEDPYGIFALVLTPT-RE---------LALQIAEQFIALGKLLNLKVSVIVGGTDMIMQAAILSDRPHV-----VVATP 133 (442) T ss_pred HCCCCCCCEEEEECCH-HH---------HHHHHHHHHHHHCCCCCCEEEEEECCHHHHHHHHHCCCCCCE-----EECCC T ss_conf 1338876069995452-88---------888888999984564563279997568876454442669875-----75176 Q ss_pred CCCCCEEEECC-CEEEEEECCCCCCCCCCCCCCCCCCCCCCCCCCCC--------CCCCCCCCCCCCCCCCCCCHHHHHH Q ss_conf 34770354034-30355512344553222222321123321100100--------0023333334332000342166764 Q gi|254780700|r 228 GNSGGPCFNAL-GHVIGVNAMIVTSGQFHMGVGLIIPLSIIKKAIPS--------LISKGRVDHGWFGIMTQNLTQELAI 298 (489) Q Consensus 228 GnSGGpl~n~~-G~viGint~i~~~~g~~~GigfaIP~~~~~~i~~~--------l~~~g~v~rg~lGv~~~~v~~~la~ 298 (489) |--- +++-.+ |....++.. +-| .-.+.+-++++. +.+--...|--|=++ -.+|..+.+ T Consensus 134 GRla-d~l~sn~~~~~~~~~r----------lkf-lVlDEADrvL~~~f~d~L~~i~e~lP~~RQtLlfS-ATitd~i~q 200 (442) T KOG0340 134 GRLA-DHLSSNLGVCSWIFQR----------LKF-LVLDEADRVLAGCFPDILEGIEECLPKPRQTLLFS-ATITDTIKQ 200 (442) T ss_pred CCCC-CCCCCCCCCCHHHHHC----------EEE-EEECCHHHHHCCCHHHHHHHHHCCCCCCCCEEEEE-EEHHHHHHH T ss_conf 3335-4112687655255530----------046-77413026541560567766650488764337998-663657998 Q ss_pred HCCCCCCCC-EEEECCCCCCCCCCCCCHHHHHHHHHCCCCCCCCCCEEEEECCCCCCCCEEEEECCCCCE Q ss_conf 417644441-132011111211346711678887524314787431012220356675201012047816 Q gi|254780700|r 299 PLGLRGTKG-SLITAVVKESPADKAGMKVGDVICMLDGRIIKSHQDFVWQIASRSPKEQVKISLCKEGSK 367 (489) Q Consensus 299 ~lgl~~~~G-vlV~~V~~~sPA~~AGLk~GDvI~~ing~~I~~~~~l~~~i~~~~~G~~v~l~v~R~g~~ 367 (489) .++.+...+ ++.-++.+++|-..+ |..+++.++++++.+....-|...- .. +...+.+.+.|.-.. T Consensus 201 l~~~~i~k~~a~~~e~~~~vstvet-L~q~yI~~~~~vkdaYLv~~Lr~~~-~~-~~~simIFvnttr~c 267 (442) T KOG0340 201 LFGCPITKSIAFELEVIDGVSTVET-LYQGYILVSIDVKDAYLVHLLRDFE-NK-ENGSIMIFVNTTREC 267 (442) T ss_pred HHCCCCCCCCCEEEECCCCCCCHHH-HHHHEEECCHHHHHHHHHHHHHHHH-HC-CCCEEEEEEEHHHHH T ss_conf 6368745550268852699872545-5202220544456788998775222-13-576089996046899 No 227 >PRK13861 type IV secretion system protein VirB9; Provisional Probab=23.20 E-value=44 Score=12.79 Aligned_cols=12 Identities=25% Similarity=0.567 Sum_probs=8.2 Q ss_pred CCHHHHHHHHHH Q ss_conf 930278999999 Q gi|254780700|r 1 MFKRQILSVKSI 12 (489) Q Consensus 1 m~~r~~~~~~~~ 12 (489) |||+.++.++++ T Consensus 1 mmk~l~~~~~~~ 12 (293) T PRK13861 1 MIKKLFLTLACL 12 (293) T ss_pred CCHHHHHHHHHH T ss_conf 908999999999 No 228 >cd02558 PSRA_1 PSRA_1: Pseudouridine synthase, a subgroup of the RluA family. This group is comprised of bacterial proteins assigned to the RluA family of pseudouridine synthases. Pseudouridine synthases catalyze the isomerization of specific uridines in an RNA molecule to pseudouridines (5-ribosyluracil, psi). No cofactors are required. The RluA family is comprised of proteins related to Escherichia coli RluA. Probab=23.20 E-value=44 Score=12.79 Aligned_cols=28 Identities=18% Similarity=0.208 Sum_probs=23.9 Q ss_pred EEEECCCCEEEECHHCCCCCCEEEEECC Q ss_conf 7897599629851010478714379628 Q gi|254780700|r 113 GFFITDDGYILTSNHIVEDGASFSVILS 140 (489) Q Consensus 113 G~ii~~~G~ilTn~hvv~~a~~i~V~~~ 140 (489) |-+|+.||..++..+.+...+.|++... T Consensus 1 ~~~v~~~G~~v~~~~~l~~Gd~v~~~~~ 28 (246) T cd02558 1 GLVVDADGEPLDPDSPYRPGTFVWYYRE 28 (246) T ss_pred CCEECCCCEECCCCCEECCCCEEEEEEC T ss_conf 9488809979799987279999999805 No 229 >TIGR02545 ATP_syn_fliI flagellar protein export ATPase FliI; InterPro: IPR013379 FliI proteins are involved in bacterial flagellum systems, acting to drive protein export for flagellar biosynthesis. The most closely related proteins are the YscN proteins of bacterial type III secretion systems.; GO: 0016887 ATPase activity, 0001539 ciliary or flagellar motility, 0009296 flagellum biogenesis, 0009288 flagellin-based flagellum. Probab=23.18 E-value=44 Score=12.79 Aligned_cols=228 Identities=19% Similarity=0.320 Sum_probs=102.4 Q ss_pred EEEECCCCEEEECHH----CCCCCCEEEEECCCC---EEEEECCCCCCCCCCEEEEEEECCCCCCCCCCCCCCCCCCCCE Q ss_conf 789759962985101----047871437962898---0674011123344432899960676676556556731112414 Q gi|254780700|r 113 GFFITDDGYILTSNH----IVEDGASFSVILSDD---TELPAKLVGTDALFDLAVLKVQSDRKFIPVEFEDANNIRVGEA 185 (489) Q Consensus 113 G~ii~~~G~ilTn~h----vv~~a~~i~V~~~dg---~~~~a~vvg~D~~~DlAvlki~~~~~~~~~~lg~s~~~~~G~~ 185 (489) |-|.+-.|..++..= ++.-.+...|.-.++ +.+.|+|||..... ....++++..-+++|+. T Consensus 5 G~V~~v~G~~~e~~Gp~~~~~~~G~~c~i~~~~~g~~~~~~aEVvGF~~~~------------~~lmP~~~~~Gi~~G~~ 72 (439) T TIGR02545 5 GRVTKVAGLTVEVAGPKAAVARLGDLCEIEPQEGGEEKHVLAEVVGFEGDR------------VILMPYEPLEGIRPGDR 72 (439) T ss_pred EEEEEEECCEEEEECCCCCEEECCCEEEEECCCCCCCCEEEEEEEEECCCE------------EEEEECCCCCCCCCCCE T ss_conf 899998843889854421302136278996588852101568999873883------------69864465556245765 Q ss_pred EE---------------------------EECCCCC----CCCCCCCCCCCCCCCCCCCC-----CCCEEEEEE-EEECC Q ss_conf 67---------------------------5236655----31111258744311223344-----342023323-32013 Q gi|254780700|r 186 VF---------------------------TIGNPFR----LRGTVSAGIVSALDRDIPDR-----PGTFTQIDA-PINQG 228 (489) Q Consensus 186 v~---------------------------aiG~P~g----~~~tvt~GiiSa~~R~~~~~-----~~~~iqtDa-~InpG 228 (489) |+ +.|.|.. .........-..+.|.-++- -.+-++|=. +||- T Consensus 73 V~~~~~~ad~~~~~~~~~~G~~LLGRv~D~lG~PlDg~G~~~~~~~~~~~~~l~~~~pnpl~R~rI~~~l~tGVRaId~- 151 (439) T TIGR02545 73 VFLLGDIADAGGRSLSIPVGDELLGRVIDALGRPLDGKGAGGGLIDATVYRPLRREPPNPLDRRRIEEVLDTGVRAIDA- 151 (439) T ss_pred EEECCCCCCCCCCCCEECCCCCCCCCEECCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCHHHHHHC- T ss_conf 7863431465456642027744403238678786578778887665655556567830746787578611212310111- Q ss_pred CCCCEEEE-CCCEEEEEECCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC-CCCCCC---CCCCCCCCCCHHHHHHHCCCC Q ss_conf 47703540-3430355512344553222222321123321100100002-333333---433200034216676441764 Q gi|254780700|r 229 NSGGPCFN-ALGHVIGVNAMIVTSGQFHMGVGLIIPLSIIKKAIPSLIS-KGRVDH---GWFGIMTQNLTQELAIPLGLR 303 (489) Q Consensus 229 nSGGpl~n-~~G~viGint~i~~~~g~~~GigfaIP~~~~~~i~~~l~~-~g~v~r---g~lGv~~~~v~~~la~~lgl~ 303 (489) |+- -+||=||| |+.|| +| |..+=-++. +-+... +.+|=.+.+|.+=+-+.||-+ T Consensus 152 -----lLT~GrGQR~GI----FAGSG----VG--------KSTLLGMiAr~t~ADV~VIALIGERGREV~EFiE~~LG~e 210 (439) T TIGR02545 152 -----LLTIGRGQRLGI----FAGSG----VG--------KSTLLGMIARYTEADVNVIALIGERGREVKEFIEDDLGEE 210 (439) T ss_pred -----CCCCCCCCEEEE----ECCCC----HH--------HHHHHHHHHCCCCCCEEEEEEECCCCCCCHHHHHHHCCCC T ss_conf -----036556410266----33774----44--------7889888750665887899844465664313554303511 Q ss_pred C-CCCEEEECCCCCCCCCCCCCHHHHHHHHHCCCCCCCCCCEEEEECCCCCCCCEEEEE---CC-CCCEEEECCCCCCCC Q ss_conf 4-441132011111211346711678887524314787431012220356675201012---04-781665125565587 Q gi|254780700|r 304 G-TKGSLITAVVKESPADKAGMKVGDVICMLDGRIIKSHQDFVWQIASRSPKEQVKISL---CK-EGSKHSVAVVLGSSP 378 (489) Q Consensus 304 ~-~~GvlV~~V~~~sPA~~AGLk~GDvI~~ing~~I~~~~~l~~~i~~~~~G~~v~l~v---~R-~g~~~~~~V~l~~~p 378 (489) . .+=|+|..-.+.||..+ ++.-=.=++| .|+ ...-|..|-|-+ -| --..+++-+.+++.| T Consensus 211 Gl~kSVVVVATSD~spl~R--~~aA~~A~~i--------AEY-----FRDqGk~VLL~~DSlTRFAmAqREigLa~GEPP 275 (439) T TIGR02545 211 GLKKSVVVVATSDESPLMR--IRAAYAATAI--------AEY-----FRDQGKDVLLLMDSLTRFAMAQREIGLAAGEPP 275 (439) T ss_pred CCCCEEEEEECCCCCHHHH--HHHHHHHHHH--------HHH-----HHHCCCCEEEEECHHHHHHHHHHHHHHHCCCCC T ss_conf 0254079982799868999--8888999999--------999-----986498347762117889989889998717876 Q ss_pred CCCCCCCCCCC Q ss_conf 63100001246 Q gi|254780700|r 379 TAKNDMHLEVG 389 (489) Q Consensus 379 ~~~~~~~~~~~ 389 (489) ..+...+.-.. T Consensus 276 ~tkGYpPSVF~ 286 (439) T TIGR02545 276 TTKGYPPSVFS 286 (439) T ss_pred CCCCCCCHHHH T ss_conf 66789704899 No 230 >PRK12696 flgH flagellar basal body L-ring protein; Reviewed Probab=22.89 E-value=38 Score=13.15 Aligned_cols=11 Identities=9% Similarity=0.277 Sum_probs=7.0 Q ss_pred CCHHHHHHHHH Q ss_conf 93027899999 Q gi|254780700|r 1 MFKRQILSVKS 11 (489) Q Consensus 1 m~~r~~~~~~~ 11 (489) ||||.++.+++ T Consensus 1 mm~~~l~~~~~ 11 (238) T PRK12696 1 MIRKLLAASCA 11 (238) T ss_pred CHHHHHHHHHH T ss_conf 95899999999 No 231 >cd01725 LSm2 The eukaryotic Sm and Sm-like (LSm) proteins associate with RNA to form the core domain of the ribonucleoprotein particles involved in a variety of RNA processing events including pre-mRNA splicing, telomere replication, and mRNA degradation. Members of this family share a highly conserved Sm fold containing an N-terminal helix followed by a strongly bent five-stranded antiparallel beta-sheet. LSm2 is one of at least seven subunits that assemble onto U6 snRNA to form a seven-membered ring structure. Sm-like proteins exist in archaea as well as prokaryotes that form heptameric and hexameric ring structures similar to those found in eukaryotes. Probab=22.88 E-value=44 Score=12.75 Aligned_cols=33 Identities=18% Similarity=0.397 Sum_probs=29.1 Q ss_pred CCEEEEECCCCEEEEECCCCCCCCCCEEEEEEE Q ss_conf 714379628980674011123344432899960 Q gi|254780700|r 132 GASFSVILSDDTELPAKLVGTDALFDLAVLKVQ 164 (489) Q Consensus 132 a~~i~V~~~dg~~~~a~vvg~D~~~DlAvlki~ 164 (489) ...+.|.|.|+..+.+++...|+...+.+-.+. T Consensus 11 g~~vtVELKN~~~i~G~L~svD~~mNi~L~nv~ 43 (81) T cd01725 11 GKEVTVELKNDLSIRGTLHSVDQYLNIKLTNIS 43 (81) T ss_pred CCEEEEEECCCCEEEEEEEECCCCCCCEEEEEE T ss_conf 987999976994999999643722181887679 No 232 >TIGR03219 salicylate_mono salicylate 1-monooxygenase. Members of this protein family are salicylate 1-monooxygenase, also called salicylate hydroxylase. This enzyme converts salicylate to catechol, which is a common intermediate in the degradation of a number of aromatic compounds (phenol, toluene, benzoate, etc.). The gene for this protein may occur in catechol degradation genes, such as those of the meta-cleavage pathway. Probab=22.75 E-value=44 Score=12.74 Aligned_cols=24 Identities=33% Similarity=0.628 Sum_probs=17.5 Q ss_pred CCCCEEEEECCCCEEEEECC-CCCC Q ss_conf 78714379628980674011-1233 Q gi|254780700|r 130 EDGASFSVILSDDTELPAKL-VGTD 153 (489) Q Consensus 130 ~~a~~i~V~~~dg~~~~a~v-vg~D 153 (489) ++.+.++|+|.||+++.|.+ ||.| T Consensus 131 ~~~~~v~v~f~dG~~~~aDlVVGAD 155 (414) T TIGR03219 131 EQAEEVQVLFTDGTEYRCDLLIGAD 155 (414) T ss_pred EECCEEEEEECCCCEEECCEEEECC T ss_conf 9589279998799887226899747 No 233 >COG2985 Predicted permease [General function prediction only] Probab=22.63 E-value=45 Score=12.72 Aligned_cols=17 Identities=35% Similarity=0.600 Sum_probs=8.1 Q ss_pred CCCCCCEEEEECCCCCC Q ss_conf 11124146752366553 Q gi|254780700|r 179 NIRVGEAVFTIGNPFRL 195 (489) Q Consensus 179 ~~~~G~~v~aiG~P~g~ 195 (489) .+++||..--+|.|+.+ T Consensus 249 ~i~~Gd~l~lVG~~~~l 265 (544) T COG2985 249 IIQVGDELHLVGYPDAL 265 (544) T ss_pred CCCCCCEEEECCCHHHH T ss_conf 22337577652781788 No 234 >cd04607 CBS_pair_NTP_transferase_assoc This cd contains two tandem repeats of the cystathionine beta-synthase (CBS pair) domain associated with the NTP (Nucleotidyl transferase) domain downstream. CBS is a small domain originally identified in cystathionine beta-synthase and subsequently found in a wide range of different proteins. CBS domains usually come in tandem repeats, which associate to form a so-called Bateman domain or a CBS pair which is reflected in this model. The interface between the two CBS domains forms a cleft that is a potential ligand binding site. The CBS pair coexists with a variety of other functional domains. It has been proposed that the CBS domain may play a regulatory role, although its exact function is unknown. Probab=22.61 E-value=45 Score=12.72 Aligned_cols=14 Identities=29% Similarity=0.722 Sum_probs=8.4 Q ss_pred EEEECCCEEEEEEC Q ss_conf 35403430355512 Q gi|254780700|r 233 PCFNALGHVIGVNA 246 (489) Q Consensus 233 pl~n~~G~viGint 246 (489) |++|.+|+++||-| T Consensus 93 PVvd~~~~lvGiit 106 (113) T cd04607 93 PILDEEGRVVGLAT 106 (113) T ss_pred EEECCCCEEEEEEE T ss_conf 99978994999999 No 235 >cd04587 CBS_pair_CAP-ED_DUF294_PBI_assoc This cd contains two tandem repeats of the cystathionine beta-synthase (CBS pair) domains associated with either the CAP_ED (cAMP receptor protein effector domain) family of transcription factors and the DUF294 domain or the PB1 (Phox and Bem1p) domain. Members of CAP_ED, include CAP which binds cAMP, FNR (fumarate and nitrate reductase) which uses an iron-sulfur cluster to sense oxygen, and CooA a heme containing CO sensor. In all cases binding of the effector leads to conformational changes and the ability to activate transcription. DUF294 is a putative nucleotidyltransferase with a conserved DxD motif. The PB1 domain adopts a beta-grasp fold, similar to that found in ubiquitin and Ras-binding domains. A motif, variously termed OPR, PC and AID, represents the most conserved region of the majority of PB1 domains, and is necessary for PB1 domain function. This function is the formation of PB1 domain heterodimers, although not all PB1 domain pai Probab=22.36 E-value=45 Score=12.69 Aligned_cols=15 Identities=27% Similarity=0.499 Sum_probs=11.1 Q ss_pred CEEEECCCEEEEEEC Q ss_conf 035403430355512 Q gi|254780700|r 232 GPCFNALGHVIGVNA 246 (489) Q Consensus 232 Gpl~n~~G~viGint 246 (489) =|++|.+|+++||-| T Consensus 92 lpVvd~~~~~vGiit 106 (113) T cd04587 92 LPVVDKSGQVVGLLD 106 (113) T ss_pred EEEEECCCEEEEEEE T ss_conf 999926998999998 No 236 >TIGR01687 moaD_arch MoaD family protein; InterPro: IPR010038 Members of this family appear to be archaeal and bacterial (proteobacteria and Thermus) versions of MoaD, subunit 1 of molybdopterin converting factor.. Probab=22.22 E-value=20 Score=15.00 Aligned_cols=22 Identities=23% Similarity=0.327 Sum_probs=10.9 Q ss_pred CEEEEECCE-ECCCHHHHHHHHH Q ss_conf 899988999-9389999999999 Q gi|254780700|r 431 MTIVSVNTH-EVSCIKDVERLIG 452 (489) Q Consensus 431 DiIl~VNg~-~V~s~~dl~~iL~ 452 (489) ++++.+||+ .|..+++|...|+ T Consensus 57 ~v~ilvNGran~~~l~GL~~~Lk 79 (93) T TIGR01687 57 NVIILVNGRANVDWLEGLETELK 79 (93) T ss_pred EEEEEECCCCCCCCCCCCCCCCC T ss_conf 57898516414322036575232 No 237 >KOG1775 consensus Probab=22.20 E-value=46 Score=12.67 Aligned_cols=28 Identities=29% Similarity=0.579 Sum_probs=19.9 Q ss_pred CCEEEEECCCCEEEEECCCCCCCCCCEE Q ss_conf 7143796289806740111233444328 Q gi|254780700|r 132 GASFSVILSDDTELPAKLVGTDALFDLA 159 (489) Q Consensus 132 a~~i~V~~~dg~~~~a~vvg~D~~~DlA 159 (489) .++|+|.+.+.+++.++++|+|-...+. T Consensus 17 gski~iimksdkE~~GtL~GFDd~VNmv 44 (84) T KOG1775 17 GSKIWIIMKSDKEFVGTLVGFDDFVNMV 44 (84) T ss_pred CCEEEEEECCCCEEEEEEECHHHHHHHH T ss_conf 7548999806814055773468889999 No 238 >cd04632 CBS_pair_19 The CBS domain, named after human CBS, is a small domain originally identified in cystathionine beta-synthase and is subsequently found in a wide range of different proteins. CBS domains usually occur in tandem repeats. They associate to form a so-called Bateman domain or a CBS pair based on crystallographic studies in bacteria. The CBS pair was used as a basis for this cd hierarchy since the human CBS proteins can adopt the typical core structure and form an intramolecular CBS pair. The interface between the two CBS domains forms a cleft that is a potential ligand binding site. The CBS pair coexists with a variety of other functional domains and this has been used to help in its classification here. It has been proposed that the CBS domain may play a regulatory role, although its exact function is unknown. Mutations of conserved residues within this domain are associated with a variety of human hereditary diseases, including congenital myotonia, idiopathic gener Probab=22.04 E-value=46 Score=12.65 Aligned_cols=20 Identities=25% Similarity=0.368 Sum_probs=14.3 Q ss_pred CCCCCEEEECCCEEEEEECC Q ss_conf 34770354034303555123 Q gi|254780700|r 228 GNSGGPCFNALGHVIGVNAM 247 (489) Q Consensus 228 GnSGGpl~n~~G~viGint~ 247 (489) +-|+=|++|.+|+++||-|. T Consensus 23 ~i~~lPVvd~~g~lvGiiT~ 42 (128) T cd04632 23 GISRLPVVDDNGKLTGIVTR 42 (128) T ss_pred CCCEEEEECCCCCEEEEEEH T ss_conf 99779999689978999988 No 239 >PRK08472 fliI flagellum-specific ATP synthase; Validated Probab=21.82 E-value=46 Score=12.62 Aligned_cols=70 Identities=13% Similarity=0.191 Sum_probs=37.4 Q ss_pred EEEECCCCEEEECHHCCCCC-CEEEEE-CCCCEEEEECCCCCCCCCCEEEEEEECCCCCCCCCCCCCCCCCCCCEEEEEC Q ss_conf 78975996298510104787-143796-2898067401112334443289996067667655655673111241467523 Q gi|254780700|r 113 GFFITDDGYILTSNHIVEDG-ASFSVI-LSDDTELPAKLVGTDALFDLAVLKVQSDRKFIPVEFEDANNIRVGEAVFTIG 190 (489) Q Consensus 113 G~ii~~~G~ilTn~hvv~~a-~~i~V~-~~dg~~~~a~vvg~D~~~DlAvlki~~~~~~~~~~lg~s~~~~~G~~v~aiG 190 (489) |-+.+-.|.++....+-... +-.++. ..+++...|+|++.+. |-++| ..|+++..++.|+.|...| T Consensus 21 G~V~~V~G~li~v~G~~~~iGe~~~I~~~~~g~~~~geVvg~~~--~~v~l----------~~~~~~~Gi~~G~~V~~tg 88 (435) T PRK08472 21 GSITKISANIIEARGLKPSVGDIVKIVEENDGKECLGMVVVIEK--EQFGI----------SPFSFIEGFKIGDKVFISD 88 (435) T ss_pred CEEEEEECEEEEEEECCCCCCCEEEEEECCCCCEEEEEEEEEEC--CEEEE----------EECCCCCCCCCCCEEEECC T ss_conf 68999957399999458876787999976999677899998859--98999----------9836887899999999899 Q ss_pred CCCC Q ss_conf 6655 Q gi|254780700|r 191 NPFR 194 (489) Q Consensus 191 ~P~g 194 (489) .|+- T Consensus 89 ~~~~ 92 (435) T PRK08472 89 EGLN 92 (435) T ss_pred CCCE T ss_conf 9737 No 240 >cd01723 LSm4 The eukaryotic Sm and Sm-like (LSm) proteins associate with RNA to form the core domain of the ribonucleoprotein particles involved in a variety of RNA processing events including pre-mRNA splicing, telomere replication, and mRNA degradation. Members of this family share a highly conserved Sm fold containing an N-terminal helix followed by a strongly bent five-stranded antiparallel beta-sheet. LSm4 is one of at least seven subunits that assemble onto U6 snRNA to form a seven-membered ring structure. Sm-like proteins exist in archaea as well as prokaryotes that form heptameric and hexameric ring structures similar to those found in eukaryotes. Probab=21.67 E-value=47 Score=12.60 Aligned_cols=32 Identities=19% Similarity=0.289 Sum_probs=28.3 Q ss_pred CCEEEEECCCCEEEEECCCCCCCCCCEEEEEE Q ss_conf 71437962898067401112334443289996 Q gi|254780700|r 132 GASFSVILSDDTELPAKLVGTDALFDLAVLKV 163 (489) Q Consensus 132 a~~i~V~~~dg~~~~a~vvg~D~~~DlAvlki 163 (489) ...+.|.|.+|..|.+++...|....+-+-.+ T Consensus 11 g~~V~VELKng~~~~G~L~~~D~~MN~~L~~v 42 (76) T cd01723 11 NHPMLVELKNGETYNGHLVNCDNWMNIHLREV 42 (76) T ss_pred CCEEEEEECCCCEEEEEEEEEECCCCCEEEEE T ss_conf 98999998899799999999734358199899 No 241 >PRK05713 hypothetical protein; Provisional Probab=21.64 E-value=47 Score=12.60 Aligned_cols=61 Identities=25% Similarity=0.398 Sum_probs=38.7 Q ss_pred CCCEEEECHHCCCCCCEEEEECCC--CEEEEECCCCCCC-CCCEEEEEEECCCCCCCCCCCCCCCCCCCCEEEEE Q ss_conf 996298510104787143796289--8067401112334-44328999606766765565567311124146752 Q gi|254780700|r 118 DDGYILTSNHIVEDGASFSVILSD--DTELPAKLVGTDA-LFDLAVLKVQSDRKFIPVEFEDANNIRVGEAVFTI 189 (489) Q Consensus 118 ~~G~ilTn~hvv~~a~~i~V~~~d--g~~~~a~vvg~D~-~~DlAvlki~~~~~~~~~~lg~s~~~~~G~~v~ai 189 (489) ++||+|+.-..+.. .+.+...+ .+.++|+|...+. ..|+.-|+++.++++ +| +.||.+... T Consensus 65 ~~g~~L~Cq~~~~s--D~~ie~~~~~~~~~~a~v~~i~~lt~dv~~l~l~~~~~~---~f------~aGQY~~l~ 128 (312) T PRK05713 65 EQGWRLACQCRVVG--DLRVEVFDPQRDGLPARVVALDWLGGDVLRLRLEPERPL---RY------RAGQHLVLW 128 (312) T ss_pred HCCEEEEECCEECC--CEEEEECCCCCCCCCEEEEEEECCCCCEEEEEECCCCCC---CC------CCCCCEEEE T ss_conf 48858840589897--659985166546632599998437898799997589978---75------899818998 No 242 >cd04629 CBS_pair_16 The CBS domain, named after human CBS, is a small domain originally identified in cystathionine beta-synthase and is subsequently found in a wide range of different proteins. CBS domains usually occur in tandem repeats. They associate to form a so-called Bateman domain or a CBS pair based on crystallographic studies in bacteria. The CBS pair was used as a basis for this cd hierarchy since the human CBS proteins can adopt the typical core structure and form an intramolecular CBS pair. The interface between the two CBS domains forms a cleft that is a potential ligand binding site. The CBS pair coexists with a variety of other functional domains and this has been used to help in its classification here. It has been proposed that the CBS domain may play a regulatory role, although its exact function is unknown. Mutations of conserved residues within this domain are associated with a variety of human hereditary diseases, including congenital myotonia, idiopathic gener Probab=21.43 E-value=47 Score=12.57 Aligned_cols=19 Identities=32% Similarity=0.625 Sum_probs=13.7 Q ss_pred CCCCCEEEECCCEEEEEEC Q ss_conf 3477035403430355512 Q gi|254780700|r 228 GNSGGPCFNALGHVIGVNA 246 (489) Q Consensus 228 GnSGGpl~n~~G~viGint 246 (489) +-|+-|++|.+|+++||-| T Consensus 23 ~~~~~pVvd~~~~lvGiit 41 (114) T cd04629 23 KISGGPVVDDNGNLVGFLS 41 (114) T ss_pred CCCEEEEECCCCEEEEEEE T ss_conf 9978999948992999996 No 243 >cd04597 CBS_pair_DRTGG_assoc2 This cd contains two tandem repeats of the cystathionine beta-synthase (CBS pair) domains associated with a DRTGG domain upstream. The function of the DRTGG domain, named after its conserved residues, is unknown. CBS is a small domain originally identified in cystathionine beta-synthase and subsequently found in a wide range of different proteins. CBS domains usually come in tandem repeats, which associate to form a so-called Bateman domain or a CBS pair which is reflected in this model. The interface between the two CBS domains forms a cleft that is a potential ligand binding site. The CBS pair coexists with a variety of other functional domains. It has been proposed that the CBS domain may play a regulatory role, although its exact function is unknown. Probab=21.39 E-value=47 Score=12.56 Aligned_cols=16 Identities=19% Similarity=0.237 Sum_probs=10.8 Q ss_pred CCEEEECCCEEEEEEC Q ss_conf 7035403430355512 Q gi|254780700|r 231 GGPCFNALGHVIGVNA 246 (489) Q Consensus 231 GGpl~n~~G~viGint 246 (489) .=|++|.+|+++||-| T Consensus 91 ~LPVVD~~g~l~GiIT 106 (113) T cd04597 91 TLPVVDDDGTPAGIIT 106 (113) T ss_pred EEEEECCCCEEEEEEE T ss_conf 7869889993999987 No 244 >TIGR00115 tig trigger factor; InterPro: IPR005215 The trigger factor is found in several prokaryotes, and is involved in protein export. Trigger factor is a ribosome-associated molecular chaperone and is the first chaperone to interact with nascent polypeptide. It acts as a chaperone by maintaining the newly synthesised protein in an open conformation. Trigger factor can bind at the same time as the signal recognition particle (SRP), but is excluded by the SRP receptor (FtsY). The central domain of trigger factor has peptidyl-prolyl cis/trans isomerase activity .; GO: 0015031 protein transport. Probab=21.35 E-value=41 Score=12.97 Aligned_cols=59 Identities=27% Similarity=0.526 Sum_probs=27.6 Q ss_pred CCCEEEEEEC--CCCEEEECHHCCCCCCEEEEECCCCEEEEEC-CCCCCCCCCEEEEEEECCCCCCCC Q ss_conf 2340278975--9962985101047871437962898067401-112334443289996067667655 Q gi|254780700|r 108 LMFGSGFFIT--DDGYILTSNHIVEDGASFSVILSDDTELPAK-LVGTDALFDLAVLKVQSDRKFIPV 172 (489) Q Consensus 108 ~~~GsG~ii~--~~G~ilTn~hvv~~a~~i~V~~~dg~~~~a~-vvg~D~~~DlAvlki~~~~~~~~~ 172 (489) ..+|||=+|. ++|.| .|-|+....|.|||+.. |+|+ +-|.+...++-|=+|+. ..||++ T Consensus 213 L~~G~~~~i~GFe~gl~---Gm~~ge~k~i~~tFP~d--YhaE~LaGk~~~F~i~LK~Ik~-relpel 274 (475) T TIGR00115 213 LTLGSGRFIPGFEDGLV---GMKAGEEKDIEVTFPED--YHAEELAGKPAKFKIKLKEIKK-RELPEL 274 (475) T ss_pred EEEECCCCCCCHHHHHH---HEECCCEEEEECCCCCC--CCHHHHCCCCEEEEEEEHHHHH-CCCCCC T ss_conf 89506771245542113---20047654430268752--6825425973245432001211-157987 No 245 >cd04639 CBS_pair_26 The CBS domain, named after human CBS, is a small domain originally identified in cystathionine beta-synthase and is subsequently found in a wide range of different proteins. CBS domains usually occur in tandem repeats. They associate to form a so-called Bateman domain or a CBS pair based on crystallographic studies in bacteria. The CBS pair was used as a basis for this cd hierarchy since the human CBS proteins can adopt the typical core structure and form an intramolecular CBS pair. The interface between the two CBS domains forms a cleft that is a potential ligand binding site. The CBS pair coexists with a variety of other functional domains and this has been used to help in its classification here. It has been proposed that the CBS domain may play a regulatory role, although its exact function is unknown. Mutations of conserved residues within this domain are associated with a variety of human hereditary diseases, including congenital myotonia, idiopathic gener Probab=20.73 E-value=49 Score=12.47 Aligned_cols=19 Identities=21% Similarity=0.410 Sum_probs=11.8 Q ss_pred CCCCCEEEECCCEEEEEEC Q ss_conf 3477035403430355512 Q gi|254780700|r 228 GNSGGPCFNALGHVIGVNA 246 (489) Q Consensus 228 GnSGGpl~n~~G~viGint 246 (489) +-++=|++|.+|+++||-| T Consensus 23 ~~~~~PVvd~~g~lvGivt 41 (111) T cd04639 23 TQHEFPVVDGDGHLVGLLT 41 (111) T ss_pred CCCEEEEEECCCCEEEEEE T ss_conf 9978999938998899998 No 246 >TIGR00441 gmhA phosphoheptose isomerase; InterPro: IPR004515 Phosphoheptose isomerase is involved in lipopolysaccharide biosynthesis, and more specifically in the synthesis of glyceromannoheptose 7-phosphate. It may also have a role in virulence in Haemophilus ducreyi.; GO: 0008968 phosphoheptose isomerase activity, 0009244 lipopolysaccharide core region biosynthetic process, 0005737 cytoplasm. Probab=20.68 E-value=49 Score=12.47 Aligned_cols=44 Identities=25% Similarity=0.305 Sum_probs=17.4 Q ss_pred CCCCEEEEECCEECCCHHHHHHHHHHHHHCCCCEEEEEEEECCCCC Q ss_conf 9888999889999389999999999886259956999997177643 Q gi|254780700|r 428 QKGMTIVSVNTHEVSCIKDVERLIGKAKEKKRDSVLLQIKYDPDMQ 473 (489) Q Consensus 428 ~~GDiIl~VNg~~V~s~~dl~~iL~~~k~~~~~~VLL~V~r~~~~~ 473 (489) ++|||+..|--. .+-+++.++++++|+.+=++|-|.=+-+|.++ T Consensus 106 ~~GDVL~GiSTS--GNS~NvlkA~~~Ak~~gm~~i~L~G~dGGk~~ 149 (186) T TIGR00441 106 QEGDVLLGISTS--GNSKNVLKAIEAAKDKGMKTIALTGKDGGKLA 149 (186) T ss_pred CCCCEEEEEECC--CCCHHHHHHHHHHHHCCCEEEEEECCCCCCCC T ss_conf 898688874247--67088999999884579669997217863113 No 247 >PRK06936 type III secretion system ATPase; Provisional Probab=20.63 E-value=49 Score=12.46 Aligned_cols=70 Identities=20% Similarity=0.199 Sum_probs=34.9 Q ss_pred EEEEECCCCEEEECHHCCCC-CCEEEEECCCCE-EEEECCCCCCCCCCEEEEEEECCCCCCCCCCCCCCCCCCCCEEEEE Q ss_conf 27897599629851010478-714379628980-6740111233444328999606766765565567311124146752 Q gi|254780700|r 112 SGFFITDDGYILTSNHIVED-GASFSVILSDDT-ELPAKLVGTDALFDLAVLKVQSDRKFIPVEFEDANNIRVGEAVFTI 189 (489) Q Consensus 112 sG~ii~~~G~ilTn~hvv~~-a~~i~V~~~dg~-~~~a~vvg~D~~~DlAvlki~~~~~~~~~~lg~s~~~~~G~~v~ai 189 (489) +|-|..-.|.++.....-.. .+-..|...++. ...|+|+|.+. |-++|. +|++.+.+..|+.|... T Consensus 24 ~Grv~~v~G~~iea~~~~~~iG~~c~i~~~~~~~~~~aEVVgf~~--~~~~l~----------p~~~~~Gi~~G~~V~~~ 91 (439) T PRK06936 24 RGRVTQVTGTILKAVVPGVRIGELCYLRNPDNSLSLQAEVIGFAQ--HQALLT----------PLGEMYGISSNTEVSPT 91 (439) T ss_pred EEEEEEEEEEEEEEEECCCCCCCEEEEECCCCCCEEEEEEEEEEC--CEEEEE----------ECCCCCCCCCCCEEEEC T ss_conf 879999996589998479997886899828998348999998838--989999----------67786678999999978 Q ss_pred CCCC Q ss_conf 3665 Q gi|254780700|r 190 GNPF 193 (489) Q Consensus 190 G~P~ 193 (489) |.|+ T Consensus 92 g~~~ 95 (439) T PRK06936 92 GTMH 95 (439) T ss_pred CCCC T ss_conf 9986 No 248 >TIGR02624 rhamnu_1P_ald rhamnulose-1-phosphate aldolase; InterPro: IPR013447 Proteins in this entry match the enzyme RhaD, rhamnulose-1-phosphate aldolase (4.1.2.19 from EC).; GO: 0008994 rhamnulose-1-phosphate aldolase activity. Probab=20.55 E-value=49 Score=12.45 Aligned_cols=48 Identities=15% Similarity=0.186 Sum_probs=30.9 Q ss_pred CCEEEEECCCCEEEEECCCCCCCCCCEEEEEEECCCCCCCCCCCCCCCCC Q ss_conf 71437962898067401112334443289996067667655655673111 Q gi|254780700|r 132 GASFSVILSDDTELPAKLVGTDALFDLAVLKVQSDRKFIPVEFEDANNIR 181 (489) Q Consensus 132 a~~i~V~~~dg~~~~a~vvg~D~~~DlAvlki~~~~~~~~~~lg~s~~~~ 181 (489) |.+..+..-.||-| |=+--+|..+|+||||+.+..--.+-||=+|.-. T Consensus 66 Ag~yFlVTGSGKyF--knv~~~P~~nLg~~rVs~dG~~~~llWGl~dgg~ 113 (273) T TIGR02624 66 AGKYFLVTGSGKYF--KNVEENPAENLGVLRVSEDGESVHLLWGLTDGGL 113 (273) T ss_pred CCCCEEEECCCHHH--HHHHCCCCCCEEEEEECCCCCEEEEEECCCCCCC T ss_conf 68706993565322--2111170146467898158874565301158896 No 249 >PRK09511 nirD nitrite reductase small subunit; Provisional Probab=20.52 E-value=49 Score=12.45 Aligned_cols=25 Identities=32% Similarity=0.608 Sum_probs=17.7 Q ss_pred CCEEEEECC--CCCCCCCCCCCCCCCC Q ss_conf 414675236--6553111125874431 Q gi|254780700|r 183 GEAVFTIGN--PFRLRGTVSAGIVSAL 207 (489) Q Consensus 183 G~~v~aiG~--P~g~~~tvt~GiiSa~ 207 (489) ++.++||.| |++-.+-++.|||+.. T Consensus 36 ~~~vyAi~n~dP~~~a~VLsrGivg~~ 62 (108) T PRK09511 36 SDQVFAISNIDPFFEASVLSRGLIAEH 62 (108) T ss_pred CCCEEEEECCCCCCCCCCCCCCCCCCC T ss_conf 996999837698889732126038278 No 250 >cd04620 CBS_pair_7 The CBS domain, named after human CBS, is a small domain originally identified in cystathionine beta-synthase and is subsequently found in a wide range of different proteins. CBS domains usually occur in tandem repeats. They associate to form a so-called Bateman domain or a CBS pair based on crystallographic studies in bacteria. The CBS pair was used as a basis for this cd hierarchy since the human CBS proteins can adopt the typical core structure and form an intramolecular CBS pair. The interface between the two CBS domains forms a cleft that is a potential ligand binding site. The CBS pair coexists with a variety of other functional domains and this has been used to help in its classification here. It has been proposed that the CBS domain may play a regulatory role, although its exact function is unknown. Mutations of conserved residues within this domain are associated with a variety of human hereditary diseases, including congenital myotonia, idiopathic genera Probab=20.44 E-value=50 Score=12.44 Aligned_cols=13 Identities=31% Similarity=0.761 Sum_probs=6.4 Q ss_pred EEEECCCEEEEEE Q ss_conf 3540343035551 Q gi|254780700|r 233 PCFNALGHVIGVN 245 (489) Q Consensus 233 pl~n~~G~viGin 245 (489) |++|.+|+++||- T Consensus 95 pVvd~~g~lvGii 107 (115) T cd04620 95 PVLDDQGQLIGLV 107 (115) T ss_pred EEECCCCEEEEEE T ss_conf 9995799799999 No 251 >PRK08594 enoyl-(acyl carrier protein) reductase; Provisional Probab=20.35 E-value=24 Score=14.47 Aligned_cols=22 Identities=27% Similarity=0.309 Sum_probs=13.7 Q ss_pred CHHHHHHHCCCCCCCCEEEECCCCC Q ss_conf 2166764417644441132011111 Q gi|254780700|r 292 LTQELAIPLGLRGTKGSLITAVVKE 316 (489) Q Consensus 292 v~~~la~~lgl~~~~GvlV~~V~~~ 316 (489) ++..+|..++ +.|+.|..|.|+ T Consensus 170 ltr~lA~ela---~~gIRVN~V~PG 191 (256) T PRK08594 170 SVKYLANDLG---KDGIRVNAISAG 191 (256) T ss_pred HHHHHHHHHC---CCCEEEEEEEEC T ss_conf 9999999853---888399998637 No 252 >smart00166 UBX Domain present in ubiquitin-regulatory proteins. Present in FAF1 and Shp1p. Probab=20.22 E-value=50 Score=12.41 Aligned_cols=28 Identities=21% Similarity=0.386 Sum_probs=18.9 Q ss_pred CCCEEEEECCCCEEEEECCCCCCCCCCE Q ss_conf 8714379628980674011123344432 Q gi|254780700|r 131 DGASFSVILSDDTELPAKLVGTDALFDL 158 (489) Q Consensus 131 ~a~~i~V~~~dg~~~~a~vvg~D~~~Dl 158 (489) +...|+|+|.||+....+--..|...|+ T Consensus 3 ~~~~iqiRlpdG~~l~~~F~~~dtl~~v 30 (80) T smart00166 3 DQCRLQIRLPDGSRLVRRFPSSDTLRTV 30 (80) T ss_pred CCEEEEEEECCCCEEEEECCCCCHHHHH T ss_conf 7479999919999899983897839999 No 253 >PRK13254 cytochrome c-type biogenesis protein CcmE; Reviewed Probab=20.16 E-value=50 Score=12.40 Aligned_cols=15 Identities=27% Similarity=0.222 Sum_probs=7.6 Q ss_pred CCHHHHHHHHHHHHH Q ss_conf 930278999999999 Q gi|254780700|r 1 MFKRQILSVKSICTV 15 (489) Q Consensus 1 m~~r~~~~~~~~~~~ 15 (489) ||+|+--+++.++++ T Consensus 1 mm~~rkkRl~~v~~~ 15 (149) T PRK13254 1 MMKRKRRRLLIILGA 15 (149) T ss_pred CCCCHHHHHHHHHHH T ss_conf 995112247899999 No 254 >cd02005 TPP_PDC_IPDC Thiamine pyrophosphate (TPP) family, PDC_IPDC subfamily, TPP-binding module; composed of proteins similar to pyruvate decarboxylase (PDC) and indolepyruvate decarboxylase (IPDC). PDC, a key enzyme in alcoholic fermentation, catalyzes the conversion of pyruvate to acetaldehyde and CO2. It is able to utilize other 2-oxo acids as substrates. In plants and various plant-associated bacteria, IPDC plays a role in the indole-3-pyruvic acid (IPA) pathway, a tryptophan-dependent biosynthetic route to indole-3-acetaldehyde (IAA). IPDC catalyzes the decarboxylation of IPA to IAA. Both PDC and IPDC depend on TPP and Mg2+ as cofactors. Probab=20.16 E-value=50 Score=12.40 Aligned_cols=36 Identities=11% Similarity=0.124 Sum_probs=26.5 Q ss_pred ECCEECCCHHHHHHHHHHHHHCCCCEEEEEEEECCC Q ss_conf 899993899999999998862599569999971776 Q gi|254780700|r 436 VNTHEVSCIKDVERLIGKAKEKKRDSVLLQIKYDPD 471 (489) Q Consensus 436 VNg~~V~s~~dl~~iL~~~k~~~~~~VLL~V~r~~~ 471 (489) +.+..|++.+||+++++++..+...++|+.|.-+.+ T Consensus 141 ~~g~rV~~~~el~~al~~Al~~~~~P~liev~vdp~ 176 (183) T cd02005 141 GLSFRVKTEGELDEALKDALFNRDKLSLIEVILPKD 176 (183) T ss_pred CEEEEECCHHHHHHHHHHHHHCCCCEEEEEEECCCC T ss_conf 428997899999999999997289829999974877 No 255 >PRK10002 outer membrane protein F; Provisional Probab=20.14 E-value=50 Score=12.40 Aligned_cols=10 Identities=60% Similarity=0.830 Sum_probs=8.1 Q ss_pred CCHHHHHHHH Q ss_conf 9302789999 Q gi|254780700|r 1 MFKRQILSVK 10 (489) Q Consensus 1 m~~r~~~~~~ 10 (489) ||||.+|++. T Consensus 1 mMKK~~LA~a 10 (362) T PRK10002 1 MMKRNILAVI 10 (362) T ss_pred CCHHHHHHHH T ss_conf 9308799999 Done!