Query gi|254781211|ref|YP_003065624.1| hypothetical protein CLIBASIA_05590 [Candidatus Liberibacter asiaticus str. psy62] Match_columns 234 No_of_seqs 1 out of 3 Neff 1.0 Searched_HMMs 39220 Date Mon May 30 06:51:33 2011 Command /home/congqian_1/programs/hhpred/hhsearch -i 254781211.hhm -d /home/congqian_1/database/cdd/Cdd.hhm No Hit Prob E-value P-value Score SS Cols Query HMM Template HMM 1 TIGR01887 dipeptidaselike dipe 65.2 2.5 6.3E-05 23.0 0.8 27 125-151 142-170 (492) 2 COG2081 Predicted flavoprotein 52.3 3.4 8.5E-05 22.2 -0.4 80 102-193 31-112 (408) 3 TIGR00019 prfA peptide chain r 46.9 7.3 0.00019 20.0 0.6 14 163-176 134-147 (373) 4 TIGR00706 SppA_dom signal pept 45.6 17 0.00043 17.6 2.3 50 109-159 143-192 (224) 5 PHA00666 putative protease 42.1 28 0.00073 16.1 11.7 128 61-200 83-226 (233) 6 KOG4385 consensus 39.0 15 0.00037 18.0 1.2 59 157-216 394-462 (581) 7 pfam09715 Plasmod_dom_1 Plasmo 38.9 18 0.00045 17.5 1.6 30 143-172 32-70 (70) 8 TIGR01748 rhaA L-rhamnose isom 37.7 21 0.00053 17.0 1.8 44 43-118 128-174 (415) 9 TIGR01286 nifK nitrogenase mol 36.9 30 0.00075 16.0 2.5 54 112-169 29-88 (526) 10 pfam09077 Phage-MuB_C Mu B tra 36.6 19 0.00047 17.3 1.4 25 139-163 3-27 (78) 11 COG1154 Dxs Deoxyxylulose-5-ph 36.0 32 0.00082 15.8 2.6 46 60-116 142-187 (627) 12 KOG4439 consensus 34.2 30 0.00078 15.9 2.2 109 20-129 18-126 (901) 13 PRK08609 hypothetical protein; 32.9 5.9 0.00015 20.5 -1.6 95 139-233 180-278 (570) 14 pfam09105 SelB-wing_1 Elongati 30.1 30 0.00076 16.0 1.6 18 113-130 27-44 (61) 15 TIGR00674 dapA dihydrodipicoli 29.2 27 0.0007 16.2 1.3 142 78-224 3-195 (288) 16 COG5245 DYN1 Dynein, heavy cha 28.6 47 0.0012 14.7 4.8 102 72-184 1644-1754(3164) 17 KOG0364 consensus 28.6 28 0.00072 16.1 1.3 31 125-156 178-208 (527) 18 TIGR02187 GlrX_arch Glutaredox 27.4 41 0.001 15.1 1.9 41 47-88 133-177 (237) 19 COG2344 AT-rich DNA-binding pr 25.2 41 0.001 15.1 1.6 20 113-132 64-83 (211) 20 KOG0104 consensus 24.3 55 0.0014 14.2 2.1 32 35-66 592-623 (902) 21 pfam06640 P_C P protein C-term 23.4 58 0.0015 14.1 3.0 54 12-65 39-110 (227) 22 TIGR00956 3a01205 Pleiotropic 23.3 40 0.001 15.2 1.2 63 88-150 996-1075(1466) 23 pfam06798 PrkA PrkA serine pro 23.0 59 0.0015 14.1 4.9 96 88-186 35-147 (254) 24 pfam04624 Dec-1 Dec-1 repeat. 21.8 28 0.00072 16.1 0.2 12 222-233 8-19 (27) 25 TIGR00380 cobD cobalamin biosy 20.7 66 0.0017 13.8 3.6 88 99-187 100-194 (322) 26 COG3110 Uncharacterized protei 20.6 61 0.0015 14.0 1.7 19 212-230 198-216 (216) No 1 >TIGR01887 dipeptidaselike dipeptidase, putative; InterPro: IPR010964 Metalloproteases are the most diverse of the four main types of protease, with more than 50 families identified to date. In these enzymes, a divalent cation, usually zinc, activates the water molecule. The metal ion is held in place by amino acid ligands, usually three in number. The known metal ligands are His, Glu, Asp or Lys and at least one other residue is required for catalysis, which may play an electrophillic role. Of the known metalloproteases, around half contain an HEXXH motif, which has been shown in crystallographic studies to form part of the metal-binding site . The HEXXH motif is relatively common, but can be more stringently defined for metalloproteases as 'abXHEbbHbc', where 'a' is most often valine or threonine and forms part of the S1' subsite in thermolysin and neprilysin, 'b' is an uncharged residue, and 'c' a hydrophobic residue. Proline is never found in this site, possibly because it would break the helical structure adopted by this motif in metalloproteases . Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Each clan is identified with two letters, the first representing the catalytic type of the families included in the clan (with the letter 'P' being used for a clan containing families of more than one of the catalytic types serine, threonine and cysteine). Some families cannot yet be assigned to clans, and when a formal assignment is required, such a family is described as belonging to clan A-, C-, M-, S-, T- or U-, according to the catalytic type. Some clans are divided into subclans because there is evidence of a very ancient divergence within the clan, for example MA(E), the gluzincins, and MA(M), the metzincins. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. The serine, threonine and cysteine peptidases utilise the amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule. This entry represents bacterial zinc dipeptidases or probably dipeptidases, belonging to the MEROPS peptidase family M20 (clan MH), subfamily M20A. Many of the members are incorrectly annotated as 'Xaa-His' and 'carnosinase' due to the early miss-characterisation of the Lactobacillus delbrueckii PepV enzyme. The entry includes unassigned peptidases and non-peptidase homologues. ; GO: 0008270 zinc ion binding, 0016805 dipeptidase activity. Probab=65.24 E-value=2.5 Score=23.00 Aligned_cols=27 Identities=37% Similarity=0.496 Sum_probs=23.5 Q ss_pred HHHHCCCHHHHHHHHHHHHHH--HCCCHH Q ss_conf 865077302347799999753--038814 Q gi|254781211|r 125 QTKLGSDYETREKDIARYFRK--EKIPDN 151 (234) Q Consensus 125 qtklgsdyetrekdiaryfrk--ekipdn 151 (234) .--+|+|=||--++|.|||++ |..|+- T Consensus 142 R~I~GTDEEsgw~c~~yYf~~~~E~~P~~ 170 (492) T TIGR01887 142 RFIFGTDEESGWKCIDYYFEHLKEEAPDL 170 (492) T ss_pred EEEEECCCCCCCCCHHHHHHHCCCCCCCE T ss_conf 99983465658700787776505888856 No 2 >COG2081 Predicted flavoproteins [General function prediction only] Probab=52.32 E-value=3.4 Score=22.15 Aligned_cols=80 Identities=23% Similarity=0.346 Sum_probs=37.3 Q ss_pred HHHHHHHHHHHHCCCHHHHHHHHHHHHCCCHHHHHHHHHHHHHHHCCCHH--HHHHHHHHHHHHHHHHHHHHHHHHCCCC Q ss_conf 99836887788311369999999865077302347799999753038814--7999998632256777999987511211 Q gi|254781211|r 102 LVDHGRKIGEQFGASLEEERKLLQTKLGSDYETREKDIARYFRKEKIPDN--DVQSLISAWGFEKTFNFFDRYAQQNKES 179 (234) Q Consensus 102 lvdhgrkigeqfgasleeerkllqtklgsdyetrekdiaryfrkekipdn--dvqslisawgfektfnffdryaqqnkes 179 (234) |.||++|+|+ |+|-+--|----|.....+.|.- .+|.| -+.|.++.+.-+....||.++...-+|- T Consensus 31 lid~~~k~Gr----------Kil~sGgGrCN~Tn~~~~~~~ls--~~p~~~~fl~sal~~ft~~d~i~~~e~~Gi~~~e~ 98 (408) T COG2081 31 LIDKGPKLGR----------KILMSGGGRCNFTNSEAPDEFLS--RNPGNGHFLKSALARFTPEDFIDWVEGLGIALKEE 98 (408) T ss_pred EEECCCCCCC----------EEEECCCCCCCCCCCCCHHHHHH--HCCCCCHHHHHHHHHCCHHHHHHHHHHCCCEEEEC T ss_conf 9805864221----------36853788743326505899997--58982067788987279899999998659715774 Q ss_pred CCCCHHHHCCCCHH Q ss_conf 23310221245214 Q gi|254781211|r 180 STGDTFVRSEGSQE 193 (234) Q Consensus 180 stgdtfvrsegsqe 193 (234) +.|--|-.+..++. T Consensus 99 ~~Gr~Fp~sdkA~~ 112 (408) T COG2081 99 DLGRMFPDSDKASP 112 (408) T ss_pred CCCEECCCCCCHHH T ss_conf 68525578666689 No 3 >TIGR00019 prfA peptide chain release factor 1; InterPro: IPR004373 This family describes peptide chain release factor 1 (PrfA, RF-1), and excludes the related peptide chain release factor 2 (PrfB, RF-2). RF-1 helps recognise and terminate translation at UAA and UAG stop codons. The mitochondrial release factors are prfA-like, although not included above the trusted cut-off for this model. RF-1 does not have a translational frameshift.; GO: 0016149 translation release factor activity codon specific, 0006415 translational termination, 0005737 cytoplasm. Probab=46.88 E-value=7.3 Score=19.96 Aligned_cols=14 Identities=29% Similarity=0.726 Sum_probs=9.4 Q ss_pred HHHHHHHHHHHHHC Q ss_conf 56777999987511 Q gi|254781211|r 163 EKTFNFFDRYAQQN 176 (234) Q Consensus 163 ektfnffdryaqqn 176 (234) ---|+.+-|||... T Consensus 134 gDLfrMY~rYAE~k 147 (373) T TIGR00019 134 GDLFRMYSRYAESK 147 (373) T ss_pred HHHHHHHHHHHHCC T ss_conf 87999988887437 No 4 >TIGR00706 SppA_dom signal peptide peptidase SppA, 36K type; InterPro: IPR004635 Proteolytic enzymes that exploit serine in their catalytic activity are ubiquitous, being found in viruses, bacteria and eukaryotes . They include a wide range of peptidase activity, including exopeptidase, endopeptidase, oligopeptidase and omega-peptidase activity. Over 20 families (denoted S1 - S66) of serine protease have been identified, these being grouped into clans on the basis of structural similarity and other functional evidence . Structures are known for members of the clans and the structures indicate that some appear to be totally unrelated, suggesting different evolutionary origins for the serine peptidases . Not withstanding their different evolutionary origins, there are similarities in the reaction mechanisms of several peptidases. Chymotrypsin, subtilisin and carboxypeptidase C have a catalytic triad of serine, aspartate and histidine in common: serine acts as a nucleophile, aspartate as an electrophile, and histidine as a base . The geometric orientations of the catalytic residues are similar between families, despite different protein folds . The linear arrangements of the catalytic residues commonly reflect clan relationships. For example the catalytic triad in the chymotrypsin clan (PA) is ordered HDS, but is ordered DHS in the subtilisin clan (SB) and SDH in the carboxypeptidase clan (SC) , . Peptidases are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry. Each clan is identified with two letters, the first representing the catalytic type of the families included in the clan (with the letter 'P' being used for a clan containing families of more than one of the catalytic types serine, threonine and cysteine). Some families cannot yet be assigned to clans, and when a formal assignment is required, such a family is described as belonging to clan A-, C-, M-, S-, T- or U-, according to the catalytic type. Some clans are divided into subclans because there is evidence of a very ancient divergence within the clan, for example MA(E), the gluzincins, and MA(M), the metzincins. Families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. The serine, threonine and cysteine peptidases utilise the amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule. This group of serine peptidases belong to MEROPS peptidase family S49 (protease IV family, clan S-). The predicted active site serine for members of this family occurs in a transmembrane domain. This group of sequences represent both long and short forms of the bacterial SppA and homologs found in the archaea and plants. Signal peptides of secretory proteins seem to serve at least two important biological functions. First, they are required for protein targeting to and translocation across membranes, such as the eubacterial plasma membrane and the endoplasmic reticular membrane of eukaryotes. Second, in addition to their role as determinants for protein targeting and translocation, certain signal peptides have a signaling function. During or shortly after pre-protein translocation, the signal peptide is removed by signal peptidases. The integral membrane protein, SppA (protease IV), of Escherichia coli was shown experimentally to degrade signal peptides. The member of this family from Bacillus subtilis has only been shown to be required for efficient processing of pre-proteins under conditions of hyper-secretion . ; GO: 0008233 peptidase activity, 0006508 proteolysis. Probab=45.62 E-value=17 Score=17.61 Aligned_cols=50 Identities=30% Similarity=0.416 Sum_probs=41.0 Q ss_pred HHHHHCCCHHHHHHHHHHHHCCCHHHHHHHHHHHHHHHCCCHHHHHHHHHH Q ss_conf 778831136999999986507730234779999975303881479999986 Q gi|254781211|r 109 IGEQFGASLEEERKLLQTKLGSDYETREKDIARYFRKEKIPDNDVQSLISA 159 (234) Q Consensus 109 igeqfgasleeerkllqtklgsdyetrekdiaryfrkekipdndvqslisa 159 (234) |+--+-.-.+|||..||+-.-..|+.=-+.|+.+ |..|+|-.+|+.+-.- T Consensus 143 ~~~~~R~lt~eE~~~lQ~~v~~~Y~~F~~~V~~~-R~nkl~~~~vK~~AdG 192 (224) T TIGR00706 143 IGSPTRELTPEERKILQSLVNESYEQFVQVVAKG-RNNKLSVEDVKKFADG 192 (224) T ss_pred CCCCHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH-CCCCCCHHHHHHHHCC T ss_conf 8987577629999999998888875789999984-1677897887652068 No 5 >PHA00666 putative protease Probab=42.05 E-value=28 Score=16.12 Aligned_cols=128 Identities=20% Similarity=0.312 Sum_probs=73.7 Q ss_pred CCCCCEECCCHHH--HHHHHHHHHHHHHHCCHHHHHHHHHHH---HHHHH--------HHHHHHHHCCCHHHHHHHHHHH Q ss_conf 8321200380020--145678989998862815789999999---99983--------6887788311369999999865 Q gi|254781211|r 61 PIEDYTLSCPDYV--SEAEVTAHIEAFKEAGVDARVAQKVVD---KLVDH--------GRKIGEQFGASLEEERKLLQTK 127 (234) Q Consensus 61 piedytlscpdyv--seaevtahieafkeagvdarvaqkvvd---klvdh--------grkigeqfgasleeerkllqtk 127 (234) --|.|.+..|.-+ ...-+.+.-+.++|-|..-.-|||||| +++.- -.++-+++++++.-..+..--+ T Consensus 83 APEkYeF~apEG~elD~e~L~~F~~vA~ELgL~~eQAQkilD~y~k~~~~~q~qQaea~q~~~e~Wa~~~k~D~E~gG~~ 162 (233) T PHA00666 83 APEKYEFQAAEGVELDTGALGAFEPVARELNLTNEQAQKVVDLYTKILPVVQQRQAEAWQKTTEQWAADSKADKEIGGDK 162 (233) T ss_pred CCCCCCCCCCCCCCCCHHHHHHHHHHHHHHCCCHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH T ss_conf 97213456886554699999999999998199999999999999735389999999999999999999976538765688 Q ss_pred HCCCHHHHHHHHHHHHHHHCCCHHHHHHHHHHHHHHH---HHHHHHHHHHHCCCCCCCCHHHHCCCCHHHHHHHHH Q ss_conf 0773023477999997530388147999998632256---777999987511211233102212452144322888 Q gi|254781211|r 128 LGSDYETREKDIARYFRKEKIPDNDVQSLISAWGFEK---TFNFFDRYAQQNKESSTGDTFVRSEGSQEADRDFDK 200 (234) Q Consensus 128 lgsdyetrekdiaryfrkekipdndvqslisawgfek---tfnffdryaqqnkesstgdtfvrsegsqeadrdfdk 200 (234) |......-.+-+.+++-. .+.+|...-|+-. ..++|-|....-.| |.+|.. | .+..|++-. T Consensus 163 ~~aN~~~Aq~Ald~Fgtp------eL~~lln~tGLGNHPelVr~f~kiGkamsE----D~~V~g-g-~~~~~~~a~ 226 (233) T PHA00666 163 LQENLSAAQRALDQFGTP------ELKEYLNKTGLGNHPELVKVFYKAGKAMSE----DRLVTG-G-NEPKQSDAR 226 (233) T ss_pred HHHHHHHHHHHHHHHCCH------HHHHHHHHCCCCCCHHHHHHHHHHHHHHCC----CCEECC-C-CCCCCCHHH T ss_conf 876799999999985899------999998612777788999999999987522----664107-8-864301789 No 6 >KOG4385 consensus Probab=38.98 E-value=15 Score=17.99 Aligned_cols=59 Identities=37% Similarity=0.539 Sum_probs=40.5 Q ss_pred HHHHHHHHHHHHHHHHHHHCCC-----CCCCCHHHHCCCCHHH-----HHHHHHHHCCCCCCHHHHCCCH Q ss_conf 9863225677799998751121-----1233102212452144-----3228886278430033312646 Q gi|254781211|r 157 ISAWGFEKTFNFFDRYAQQNKE-----SSTGDTFVRSEGSQEA-----DRDFDKVFNTPDFGSRVLSGDK 216 (234) Q Consensus 157 isawgfektfnffdryaqqnke-----sstgdtfvrsegsqea-----drdfdkvfntpdfgsrvlsgdk 216 (234) |-.| |..||-||.|-|-.-|- -|++..|||-|.-.-+ .|.|.|--.---||+-.|.|+. T Consensus 394 IY~W-FTrtFAYFRRNaATWKnAVRHNLSLHKCF~RVEnvkgavwtvDe~e~~krr~~k~~g~~sl~~n~ 462 (581) T KOG4385 394 IYNW-FTRTFAYFRRNAATWKNAVRHNLSLHKCFVRVENVKGAVWTVDEREFQKRRPQKITGSPSLIGNM 462 (581) T ss_pred HHHH-HHHHHHHHHCCCHHHHHHHHHHHHHHHHHHHHHHHHCCEEEEEHHHHHHHCCCCCCCCHHHHCCC T ss_conf 9999-99999999636056767776556788999999987255146324655520675446862131211 No 7 >pfam09715 Plasmod_dom_1 Plasmodium protein of unknown function (Plasmod_dom_1). These sequences represent an uncharacterized family consisting of a small number of hypothetical proteins of the malaria parasite Plasmodium falciparum (isolate 3D7). Probab=38.87 E-value=18 Score=17.46 Aligned_cols=30 Identities=37% Similarity=0.558 Sum_probs=22.3 Q ss_pred HHHHCCCHHHHHHHHHHH---------HHHHHHHHHHHH Q ss_conf 753038814799999863---------225677799998 Q gi|254781211|r 143 FRKEKIPDNDVQSLISAW---------GFEKTFNFFDRY 172 (234) Q Consensus 143 frkekipdndvqslisaw---------gfektfnffdry 172 (234) |-+|-+|+|-+-.+-+|- -+-|++|||.|| T Consensus 32 ~~~E~V~~n~l~l~~aa~p~~aipi~~Yi~krinF~~~y 70 (70) T pfam09715 32 FIKEDVIDNSLSLCTAAIPLTAIPIFSYIAKRINFFTKY 70 (70) T ss_pred CCHHHHHHHHHHHHHHHCCCHHHHHHHHHHHHHHHHHCC T ss_conf 876650453999987414320026699999887777509 No 8 >TIGR01748 rhaA L-rhamnose isomerase; InterPro: IPR009308 This family consists of several bacterial L-rhamnose isomerase proteins (5.3.1.14 from EC). This enzyme interconverts L-rhamnose and L-rhamnulose. In some species, including Escherichia coli, this is the first step in rhamnose catabolism. Sequential steps are catalyzed by rhamnulose kinase (rhaB), then rhamnulose-1-phosphate aldolase (rhaD) to yield glycerone phosphate and (S)-lactaldehyde. ; GO: 0008740 L-rhamnose isomerase activity, 0030145 manganese ion binding, 0019299 rhamnose metabolic process. Probab=37.68 E-value=21 Score=17.02 Aligned_cols=44 Identities=36% Similarity=0.561 Sum_probs=27.9 Q ss_pred CCCCCCCCCHHHCCCCCCCCCCCEECCCHHHHHHHHHHHHHHHHHCCHHHHHHHHHHHHHHHHH---HHHHHHHCCCHH Q ss_conf 8877554430211899998321200380020145678989998862815789999999999836---887788311369 Q gi|254781211|r 43 RNPSSSSSSTEEAGEPKPPIEDYTLSCPDYVSEAEVTAHIEAFKEAGVDARVAQKVVDKLVDHG---RKIGEQFGASLE 118 (234) Q Consensus 43 rnpssssssteeagepkppiedytlscpdyvseaevtahieafkeagvdarvaqkvvdklvdhg---rkigeqfgasle 118 (234) =||+--|... --..||||-||- .|-+=-++|| |||+|-||-.|- T Consensus 128 FNPt~FSHp~--------~aDg~TLshpD~------------------------~iR~FWI~HckasRriseYFGkeLG 174 (415) T TIGR01748 128 FNPTLFSHPL--------AADGYTLSHPDD------------------------EIREFWIEHCKASRRISEYFGKELG 174 (415) T ss_pred CCCCCCCCCC--------CCCCCCCCCCCH------------------------HHHHHHHHHHHHHCHHHHHHHHHCC T ss_conf 3666335420--------005773447882------------------------4679999832330102345313106 No 9 >TIGR01286 nifK nitrogenase molybdenum-iron protein beta chain; InterPro: IPR005976 The enzyme responsible for nitrogen fixation, the nitrogenase, shows a high degree of conservation of structure, function, and amino acid sequence across wide phylogenetic ranges. All known Mo-nitrogenases consist of two components, component I (also called dinitrogenase, or Fe-Mo protein), an alpha2beta2 tetramer encoded by the nifD and nifK genes, and component II (dinitrogenase reductase, or Fe protein) a homodimer encoded by the nifH gene , . Two operons, nifDK and nifEN, encode a tetrameric (alpha2beta2 and N2E2) enzymatic complex. Nitrogenase contains two unusual rare metal clusters; one of them is the iron molybdenum cofactor (FeMo-co), which is considered to be the site of dinitrogen reduction and whose biosynthesis requires the products of nifNE and of some other nif genes . It has been proposed that NifNE might serve as a scaffold upon which FeMo-co is built and then inserted into component I .; GO: 0016163 nitrogenase activity, 0009399 nitrogen fixation, 0016612 molybdenum-iron nitrogenase complex. Probab=36.88 E-value=30 Score=16.02 Aligned_cols=54 Identities=30% Similarity=0.470 Sum_probs=32.9 Q ss_pred HHCCCHHHHHHHHHHHHCCCHHHHHHHHHHHHHHHCCC---HHHHH---HHHHHHHHHHHHHHH Q ss_conf 83113699999998650773023477999997530388---14799---999863225677799 Q gi|254781211|r 112 QFGASLEEERKLLQTKLGSDYETREKDIARYFRKEKIP---DNDVQ---SLISAWGFEKTFNFF 169 (234) Q Consensus 112 qfgasleeerkllqtklgsdyetrekdiaryfrkekip---dndvq---slisawgfektfnff 169 (234) +|-...-.|.---+..+-.-.|-|||..|| |-+- -.-.| .++.|-|||+|.-|. T Consensus 29 ~FE~~~p~~~v~~v~~wT~~wEYrEkNfaR----EALtvNPAKACQPLGAvLAA~GFE~TmpfV 88 (526) T TIGR01286 29 EFEEKAPKEKVQEVLEWTKTWEYREKNFAR----EALTVNPAKACQPLGAVLAALGFEGTMPFV 88 (526) T ss_pred HHCCCCCHHHHHHHHHHHCCHHHHHHHHHH----HHHCCCCHHCCCHHHHHHHHHCCCCCCCCC T ss_conf 416787345799998741561345454430----221037522034179999971213335752 No 10 >pfam09077 Phage-MuB_C Mu B transposition protein, C terminal. The C terminal domain of the B transposition protein from Bacteriophage Mu comprises four alpha-helices arranged in a loosely packed bundle, where helix alpha1 runs parallel to alpha3, and anti-parallel to helices alpha2 and alpha4. The domain allows for non-specific binding of Mu to double-stranded DNA, allowing for integration into the bacterial genome, and mediates dimerization of the protein. Probab=36.60 E-value=19 Score=17.32 Aligned_cols=25 Identities=32% Similarity=0.587 Sum_probs=19.1 Q ss_pred HHHHHHHHCCCHHHHHHHHHHHHHH Q ss_conf 9999753038814799999863225 Q gi|254781211|r 139 IARYFRKEKIPDNDVQSLISAWGFE 163 (234) Q Consensus 139 iaryfrkekipdndvqslisawgfe 163 (234) |++-++-.|--.+||++++.|||.+ T Consensus 3 igk~~~i~k~kk~Di~Aia~AWgv~ 27 (78) T pfam09077 3 IAKRTGIKKAKKADVKAVAQAWGLQ 27 (78) T ss_pred CCHHCCCCCCCHHHHHHHHHHHCCC T ss_conf 0111002589799999999996899 No 11 >COG1154 Dxs Deoxyxylulose-5-phosphate synthase [Coenzyme metabolism / Lipid metabolism] Probab=35.99 E-value=32 Score=15.77 Aligned_cols=46 Identities=26% Similarity=0.304 Sum_probs=21.1 Q ss_pred CCCCCCEECCCHHHHHHHHHHHHHHHHHCCHHHHHHHHHHHHHHHHHHHHHHHHCCC Q ss_conf 983212003800201456789899988628157899999999998368877883113 Q gi|254781211|r 60 PPIEDYTLSCPDYVSEAEVTAHIEAFKEAGVDARVAQKVVDKLVDHGRKIGEQFGAS 116 (234) Q Consensus 60 ppiedytlscpdyvseaevtahieafkeagvdarvaqkvvdklvdhgrkigeqfgas 116 (234) +-|-|-.|||- --.||...+| +. ...+++--|-|.+.-|.+..|+- T Consensus 142 aVIGDGAlt~G---------mA~EALN~ag-~~-~~~~~iVILNDNeMSIs~nvGal 187 (627) T COG1154 142 AVIGDGALTGG---------MAFEALNNAG-AD-LKSNLIVILNDNEMSISPNVGAL 187 (627) T ss_pred EEECCCCCCCH---------HHHHHHHHHH-HC-CCCCEEEEEECCCCCCCCCCCHH T ss_conf 99777633001---------7999985332-30-48998999807986458775479 No 12 >KOG4439 consensus Probab=34.22 E-value=30 Score=15.93 Aligned_cols=109 Identities=21% Similarity=0.060 Sum_probs=49.2 Q ss_pred ECCCCCCCCCCCHHHHHHCCCCCCCCCCCCCCHHHCCCCCCCCCCCEECCCHHHHHHHHHHHHHHHHHCCHHHHHHHHHH Q ss_conf 00000278688756654308678887755443021189999832120038002014567898999886281578999999 Q gi|254781211|r 20 CERSQRSNPPPSKEEAVQSDPQGRNPSSSSSSTEEAGEPKPPIEDYTLSCPDYVSEAEVTAHIEAFKEAGVDARVAQKVV 99 (234) Q Consensus 20 cersqrsnpppskeeavqsdpqgrnpssssssteeagepkppiedytlscpdyvseaevtahieafkeagvdarvaqkvv 99 (234) |...+-+--|+||.++++.--+- --.|+-.|++..-...++|..|+.--|--.+..|-+...-+-.--+--+.+-|-|- T Consensus 18 ~~~d~sssvp~sk~~i~~~~~~~-~~~s~~~s~~~h~ra~s~i~~~~~~~P~~~s~~eS~~~~s~~q~~e~r~~i~q~~~ 96 (901) T KOG4439 18 MIQDQSSSVPKSKQNISRTMLGE-PYDSSECSGENHERARSFILTNKPLRPIEKSDNESAIFRSDSQLEERRKSIKQLVP 96 (901) T ss_pred HHHCCCCCCCCCHHHHHCCCCCC-CCCCHHCCCCCHHHCCCHHHCCCCCCCCHHCCCCCCHHHHHHHHHHHHHHHHHHCC T ss_conf 53023456885200121256788-77611015641432025454168888510002311004578887525564887421 Q ss_pred HHHHHHHHHHHHHHCCCHHHHHHHHHHHHC Q ss_conf 999983688778831136999999986507 Q gi|254781211|r 100 DKLVDHGRKIGEQFGASLEEERKLLQTKLG 129 (234) Q Consensus 100 dklvdhgrkigeqfgasleeerkllqtklg 129 (234) +-+.-|+--++-+-++|-.++-+++.++++ T Consensus 97 ~~~~~~~~~~q~~~~~s~~~~~~~~~~~~~ 126 (901) T KOG4439 97 DILTTEASSRQGWGNASETEELDDLKLHLS 126 (901) T ss_pred CCCHHHHHCCCCCCCCCCHHHHCCHHCCCC T ss_conf 000011101467566763122011011210 No 13 >PRK08609 hypothetical protein; Provisional Probab=32.94 E-value=5.9 Score=20.54 Aligned_cols=95 Identities=17% Similarity=0.217 Sum_probs=47.6 Q ss_pred HHHHHHHHCCCHHHHHHHHHHHHHHHHHHHHHHHHHHCCCCCCCCHHHHCCCCHHHHHHHHH-HHCCCCCCHHH--HCCC Q ss_conf 99997530388147999998632256777999987511211233102212452144322888-62784300333--1264 Q gi|254781211|r 139 IARYFRKEKIPDNDVQSLISAWGFEKTFNFFDRYAQQNKESSTGDTFVRSEGSQEADRDFDK-VFNTPDFGSRV--LSGD 215 (234) Q Consensus 139 iaryfrkekipdndvqslisawgfektfnffdryaqqnkesstgdtfvrsegsqeadrdfdk-vfntpdfgsrv--lsgd 215 (234) ||--||..+---.|+.-||++-..+....+|-.+..-..--+.|+|.+.-.-........|- +...-.||... ..|. T Consensus 180 iaGS~RR~ketvGDIDiLv~s~~p~~v~~~l~~~~~v~evl~~G~tK~s~~~~~~~~i~vDlrvv~~~~fg~aLlyFTGS 259 (570) T PRK08609 180 RAGSLRRARETVKDLDFIIATDNPKAVREQLLQFPNIVEVIAAGDTKVSLELAYDYTISVDFRLVEPEAFATTLHHFTGS 259 (570) T ss_pred EECHHHHCCCCCCCEEEEEECCCHHHHHHHHHCCCCHHHHHHCCCCCEEEEEECCCCCEEEEEEECHHHHHHHHHHHHCC T ss_conf 60202314665467348995598089999996486256787268860589971588836899980889989999997666 Q ss_pred HHHHHHHHHHHHHHH-HCC Q ss_conf 678899999986544-205 Q gi|254781211|r 216 KEATKTLRQWAEKQA-TLN 233 (234) Q Consensus 216 keatktlrqwaekqa-tln 233 (234) |+-...||++|-++. +|| T Consensus 260 k~hNi~lR~~A~~kG~~Ln 278 (570) T PRK08609 260 KDHNVRMRQLAKARGEKIS 278 (570) T ss_pred HHHHHHHHHHHHHCCCCCC T ss_conf 9999999999997488503 No 14 >pfam09105 SelB-wing_1 Elongation factor SelB, winged helix. Members of this family adopt a winged-helix fold, with an alpha/beta structure consisting of three alpha-helices and a twisted three-stranded antiparallel beta-sheet, with an alpha-beta-alpha-alpha-beta-beta connectivity. They are involved in both DNA and RNA binding. Probab=30.13 E-value=30 Score=15.97 Aligned_cols=18 Identities=50% Similarity=0.543 Sum_probs=13.5 Q ss_pred HCCCHHHHHHHHHHHHCC Q ss_conf 311369999999865077 Q gi|254781211|r 113 FGASLEEERKLLQTKLGS 130 (234) Q Consensus 113 fgasleeerkllqtklgs 130 (234) -..||||.|||||.-... T Consensus 27 aslsleetrkllqsmaaa 44 (61) T pfam09105 27 ASLSLEETRKLLQSMAAA 44 (61) T ss_pred HHCCHHHHHHHHHHHHHC T ss_conf 623489999999999855 No 15 >TIGR00674 dapA dihydrodipicolinate synthase; InterPro: IPR005263 Dihydropicolinate synthase (DHDPS) is the key enzyme in lysine biosynthesis via the diaminopimelate pathway of prokaryotes, some phycomycetes and higher plants. The enzyme catalyses the condensation of L-aspartate-beta-semialdehyde and pyruvate to dihydropicolinic acid via a ping-pong mechanism in which pyruvate binds to the enzyme by forming a Schiff-base with a lysine residue . Three other proteins are structurally related to DHDPS and probably also act via a similar catalytic mechanism. These are Escherichia coli N-acetylneuraminate lyase (4.1.3.3 from EC, IPR005264 from INTERPRO) (gene nanA), which catalyzes the condensation of N-acetyl-D-mannosamine and pyruvate to form N-acetylneuraminate; Sinorhizobium meliloti protein mosA , which is involved in the biosynthesis of the rhizopine 3-O-methyl-scyllo-inosamine; and E. coli hypothetical protein yjhH. The sequences of DHDPS from different sources are well-conserved. The structure takes the form of a homotetramer, in which 2 monomers are related by an approximate 2-fold symmetry . Each monomer comprises 2 domains: an 8-fold alpha-/beta-barrel, and a C-terminal alpha-helical domain. The fold resembles that of N-acetylneuraminate lyase. The active site lysine is located in the barrel domain, and has access via 2 channels on the C-terminal side of the barrel. This family represents a subclass of dihydrodipicolinate synthase. ; GO: 0008840 dihydrodipicolinate synthase activity, 0019877 diaminopimelate biosynthetic process. Probab=29.20 E-value=27 Score=16.22 Aligned_cols=142 Identities=30% Similarity=0.434 Sum_probs=86.2 Q ss_pred HHHHHHHHHHCC-HHHHHHHHHHHHHHHHHH-------HHHHHHCCCHHHHHHHHHHHH-------------CCCHHHHH Q ss_conf 789899988628-157899999999998368-------877883113699999998650-------------77302347 Q gi|254781211|r 78 VTAHIEAFKEAG-VDARVAQKVVDKLVDHGR-------KIGEQFGASLEEERKLLQTKL-------------GSDYETRE 136 (234) Q Consensus 78 vtahieafkeag-vdarvaqkvvdklvdhgr-------kigeqfgasleeerkllqtkl-------------gsdyetre 136 (234) .||-|--||+-| ||-....++++.++.||- --||----|.||..+++..-. ||. -|+| T Consensus 3 ~~A~iTPFk~~~~VDf~~Le~li~~~~~~G~da~V~~GTTGEs~TLs~EE~~~~i~~~~~~~~~R~pvIaG~GsN-~T~E 81 (288) T TIGR00674 3 ITALITPFKEDGSVDFAALEKLIDFQIENGTDAIVVVGTTGESATLSHEEHKKVIEFVVDLVKGRVPVIAGTGSN-ATEE 81 (288) T ss_pred CCEEECCCCCCCCCCHHHHHHHHHHHHHHCCCEEEECCCCCCCCCCCHHHHHHHHHHHHHHHCCCEEEEECCCCC-HHHH T ss_conf 840533304988553889999999899707985897135588644688888999999987762877898537732-5899 Q ss_pred H-HHHHHHHHHCCCHHHHHHHH---HHHHHHHHHHHHHHHHHH--------CCCCCCCCHH-----H------H-CCCCH Q ss_conf 7-99999753038814799999---863225677799998751--------1211233102-----2------1-24521 Q gi|254781211|r 137 K-DIARYFRKEKIPDNDVQSLI---SAWGFEKTFNFFDRYAQQ--------NKESSTGDTF-----V------R-SEGSQ 192 (234) Q Consensus 137 k-diaryfrkekipdndvqsli---sawgfektfnffdryaqq--------nkesstgdtf-----v------r-segsq 192 (234) - ..+..++| ++-.-+-..- .-=.-|--+..|++-|+. |--|-||--+ . + --|.. T Consensus 82 ai~l~~~a~~--~G~dg~L~vtPyYNKP~q~Gl~~HFkaia~~~~lPiiLYNvPsRTg~~l~peTv~rLA~~~~NI~aiK 159 (288) T TIGR00674 82 AIELTKFAEK--LGVDGFLVVTPYYNKPTQEGLYQHFKAIAEEVDLPIILYNVPSRTGVSLEPETVKRLAEEPNNIVAIK 159 (288) T ss_pred HHHHHHHHHH--CCCCEEECCCCCCCCCCCCHHHHHHHHHHHHCCCCEEEECCCCCCCCCCCHHHHHHHHCCCCCEEEEE T ss_conf 9999999986--89568845887551888213899999999871698898428764101786289999730167706887 Q ss_pred HHHHHHHHHH----CCC--CCCHHHHCCCHHHHHHHHH Q ss_conf 4432288862----784--3003331264678899999 Q gi|254781211|r 193 EADRDFDKVF----NTP--DFGSRVLSGDKEATKTLRQ 224 (234) Q Consensus 193 eadrdfdkvf----ntp--dfgsrvlsgdkeatktlrq 224 (234) ||-.|...+- -+| || +|||||-+-+--+.. T Consensus 160 Ea~g~l~~~~~i~~~~p~~dF--~vlsGDD~l~l~~~~ 195 (288) T TIGR00674 160 EATGNLERISEIKAITPDDDF--VVLSGDDALTLPILA 195 (288) T ss_pred ECCCCHHHHHHHHHHCCCCCE--EEEECCCCHHHHHHH T ss_conf 268888999999986689853--888478611369998 No 16 >COG5245 DYN1 Dynein, heavy chain [Cytoskeleton] Probab=28.64 E-value=47 Score=14.74 Aligned_cols=102 Identities=28% Similarity=0.277 Sum_probs=63.7 Q ss_pred HHHHHHHHHHHHHHHHCCHHHHHHHHHHHHHHHHHHHHHHHH-CCCHHH---HHHHHHHHHCCCHHHHHHHHHHHHHH-- Q ss_conf 201456789899988628157899999999998368877883-113699---99999865077302347799999753-- Q gi|254781211|r 72 YVSEAEVTAHIEAFKEAGVDARVAQKVVDKLVDHGRKIGEQF-GASLEE---ERKLLQTKLGSDYETREKDIARYFRK-- 145 (234) Q Consensus 72 yvseaevtahieafkeagvdarvaqkvvdklvdhgrkigeqf-gaslee---erkllqtklgsdyetrekdiaryfrk-- 145 (234) |-+-+-...-+||.-++ .-+++|.=+.+-|.| -||.+- -+.-..+-|-+.|--.-.+..|..|- T Consensus 1644 ype~~SL~~Iyea~l~~----------s~l~~~ef~~~se~~~~aSv~ly~~~k~~~k~~lq~~y~y~pReLtR~lr~i~ 1713 (3164) T COG5245 1644 YPELASLRNIYEAVLMG----------SYLCFDEFNRLSEETMSASVELYLSSKDKTKFFLQMNYGYKPRELTRSLRAIF 1713 (3164) T ss_pred CCCHHHHHHHHHHHHHH----------HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHCCCCCCCCHHHHHHHHHHHH T ss_conf 86334499999999998----------88845989887799988889999999886523200000337378889999997 Q ss_pred ---HCCCHHHHHHHHHHHHHHHHHHHHHHHHHHCCCCCCCCH Q ss_conf ---038814799999863225677799998751121123310 Q gi|254781211|r 146 ---EKIPDNDVQSLISAWGFEKTFNFFDRYAQQNKESSTGDT 184 (234) Q Consensus 146 ---ekipdndvqslisawgfektfnffdryaqqnkesstgdt 184 (234) |.-|+..--|||-.|-.|.---|.||..|| ||+|++.+ T Consensus 1714 ~yaeT~~~t~~~slI~~wy~ea~r~~~dRLV~q-kE~st~~q 1754 (3164) T COG5245 1714 GYAETRIDTPDVSLIIDWYCEAIREKIDRLVQQ-KESSTSRQ 1754 (3164) T ss_pred HHHHCCCCCCCHHHHHHHHHHHHHHHHHHHHHH-HHCCHHHH T ss_conf 677527888748999999988999999999888-74041789 No 17 >KOG0364 consensus Probab=28.63 E-value=28 Score=16.15 Aligned_cols=31 Identities=32% Similarity=0.541 Sum_probs=24.4 Q ss_pred HHHHCCCHHHHHHHHHHHHHHHCCCHHHHHHH Q ss_conf 86507730234779999975303881479999 Q gi|254781211|r 125 QTKLGSDYETREKDIARYFRKEKIPDNDVQSL 156 (234) Q Consensus 125 qtklgsdyetrekdiaryfrkekipdndvqsl 156 (234) -.+.|.+ +.||-||-||.+-||||...++.- T Consensus 178 vk~V~~~-~g~e~dik~y~kveKvpgg~l~~s 208 (527) T KOG0364 178 VKTVGVE-NGREIDIKRYAKVEKVPGGLLEDS 208 (527) T ss_pred HHHHHHC-CCCEECHHHHCCCCCCCCCCCCCC T ss_conf 8776404-684420566434565676301344 No 18 >TIGR02187 GlrX_arch Glutaredoxin-like domain protein; InterPro: IPR011903 Glutaredoxins , , , also known as thioltransferases (disulphide reductases, are small proteins of approximately one hundred amino-acid residues which utilise glutathione and NADPH as cofactors. Oxidized glutathione is regenerated by glutathione reductase. Together these components compose the glutathione system . Glutaredoxin functions as an electron carrier in the glutathione-dependent synthesis of deoxyribonucleotides by the enzyme ribonucleotide reductase. Like thioredoxin, which functions in a similar way, glutaredoxin possesses an active centre disulphide bond . It exists in either a reduced or an oxidized form where the two cysteine residues are linked in an intramolecular disulphide bond. Glutaredoxin has been sequenced in a variety of species. On the basis of extensive sequence similarity, it has been proposed that Vaccinia virus protein O2L is most probably a glutaredoxin. Finally, it must be noted that Bacteriophage T4 thioredoxin seems also to be evolutionary related. In position 5 of the pattern T4 thioredoxin has Val instead of Pro. This entry of archaeal proteins contains a C-terminal domain with homology to bacterial and eukaryotic glutaredoxins, including a CPYC motif. There is an N-terminal domain which has even more distant homology to glutaredoxins. The name "glutaredoxin" may be inappropriate in the sense of working in tandem with glutathione and glutathione reductase which may not be present in the archaea. The overall domain structure appears to be related to bacterial alkylhydroperoxide reductases, but the homology may be distant enough that the function of this family is wholly different.. Probab=27.36 E-value=41 Score=15.09 Aligned_cols=41 Identities=29% Similarity=0.400 Sum_probs=24.6 Q ss_pred CCCCCHHHCCC-CCCCCCC---CEECCCHHHHHHHHHHHHHHHHHC Q ss_conf 55443021189-9998321---200380020145678989998862 Q gi|254781211|r 47 SSSSSTEEAGE-PKPPIED---YTLSCPDYVSEAEVTAHIEAFKEA 88 (234) Q Consensus 47 sssssteeage-pkppied---ytlscpdyvseaevtahieafkea 88 (234) -|..+.|+--. -.-||.= -|-||| |--.|-++||-=|+--. T Consensus 133 L~~~~~e~l~~kl~~~v~I~vfVTPtCP-YCP~AV~mAH~fA~~~~ 177 (237) T TIGR02187 133 LSEETVEELKSKLDEPVRIEVFVTPTCP-YCPRAVLMAHKFALAND 177 (237) T ss_pred CCHHHHHHHHHHCCCCEEEEEEEECCCC-CHHHHHHHHHHHHHHCC T ss_conf 5089999997337983599999856899-72579999999998354 No 19 >COG2344 AT-rich DNA-binding protein [General function prediction only] Probab=25.25 E-value=41 Score=15.10 Aligned_cols=20 Identities=25% Similarity=0.489 Sum_probs=7.1 Q ss_pred HCCCHHHHHHHHHHHHCCCH Q ss_conf 31136999999986507730 Q gi|254781211|r 113 FGASLEEERKLLQTKLGSDY 132 (234) Q Consensus 113 fgasleeerkllqtklgsdy 132 (234) ||-..+.-+..+..-||-|- T Consensus 64 ~GYnV~~L~~ff~~~Lg~~~ 83 (211) T COG2344 64 YGYNVKYLRDFFDDLLGQDK 83 (211) T ss_pred CCCCHHHHHHHHHHHHCCCC T ss_conf 78439999999999838774 No 20 >KOG0104 consensus Probab=24.32 E-value=55 Score=14.25 Aligned_cols=32 Identities=25% Similarity=0.364 Sum_probs=11.5 Q ss_pred HHHCCCCCCCCCCCCCCHHHCCCCCCCCCCCE Q ss_conf 54308678887755443021189999832120 Q gi|254781211|r 35 AVQSDPQGRNPSSSSSSTEEAGEPKPPIEDYT 66 (234) Q Consensus 35 avqsdpqgrnpssssssteeagepkppiedyt 66 (234) |-|-+++-...+.+...|+-.-.|-|-+..|. T Consensus 592 ~s~e~k~e~~t~e~~~~~~~~~~~~p~~~~~~ 623 (902) T KOG0104 592 ASQEDKTEKETSEAQKPTEKKETPAPMVVRLQ 623 (902) T ss_pred CCCCCCCCCCCHHCCCCCHHHCCCCCCEEEEE T ss_conf 21013332000100474111035676226765 No 21 >pfam06640 P_C P protein C-terminus. This family represents the C-terminus of plant P proteins. The maize P gene is a transcriptional regulator of genes encoding enzymes for flavonoid biosynthesis in the pathway leading to the production of a red phlobaphene pigment, and P proteins are homologous to the DNA-binding domain of myb-like transcription factors. All members of this family contain the pfam00249 domain. Probab=23.41 E-value=58 Score=14.12 Aligned_cols=54 Identities=33% Similarity=0.411 Sum_probs=25.0 Q ss_pred CCCCCCEEECCCCCCCCCCCHHHH----------HHC--------CCCCCCCCCCCCCHHHCCCCCCCCCCC Q ss_conf 899980230000027868875665----------430--------867888775544302118999983212 Q gi|254781211|r 12 PSTPPVVECERSQRSNPPPSKEEA----------VQS--------DPQGRNPSSSSSSTEEAGEPKPPIEDY 65 (234) Q Consensus 12 pstppvvecersqrsnpppskeea----------vqs--------dpqgrnpssssssteeagepkppiedy 65 (234) |..+|--...|+.+..+++-+.++ .+| ||.-..|.|||.||-.+-+-.+--||- T Consensus 39 pg~spkss~s~sKq~d~d~p~~ea~~~~~~assprhsD~aRSaVVdp~pnQPNsSSGstg~~~~~~~s~EdA 110 (227) T pfam06640 39 PGRSPKSSASRTKQADADQPGGEAAGDAAAASSPRHSDGARSAVVDPGPNQPNSSSGSTGTAEEGPCSSEDA 110 (227) T ss_pred CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCEEECCCCCCCCCCCCCCCCCCCCCCCCCCC T ss_conf 998986533334678889987544577776788665676521242799999887778777778998766668 No 22 >TIGR00956 3a01205 Pleiotropic Drug Resistance (PDR) Family protein; InterPro: IPR005285 ABC transporters belong to the ATP-Binding Cassette (ABC) superfamily, which uses the hydrolysis of ATP to energize diverse biological systems. ABC transporters are minimally constituted of two conserved regions: a highly conserved ATP binding cassette (ABC) and a less conserved transmembrane domain (TMD). These regions can be found on the same protein or on two different ones. Most ABC transporters function as a dimer and therefore are constituted of four domains, two ABC modules and two TMDs. ABC transporters are involved in the export or import of a wide variety of substrates ranging from small ions to macromolecules. The major function of ABC import systems is to provide essential nutrients to bacteria. They are found only in prokaryotes and their four constitutive domains are usually encoded by independent polypeptides (two ABC proteins and two TMD proteins). Prokaryotic importers require additional extracytoplasmic binding proteins (one or more per systems) for function. In contrast, export systems are involved in the extrusion of noxious substances, the export of extracellular toxins and the targeting of membrane components. They are found in all living organisms and in general the TMD is fused to the ABC module in a variety of combinations. Some eukaryotic exporters encode the four domains on the same polypeptide chain . The ABC module (approximately two hundred amino acid residues) is known to bind and hydrolyze ATP, thereby coupling transport to ATP hydrolysis in a large number of biological processes. The cassette is duplicated in several subfamilies. Its primary sequence is highly conserved, displaying a typical phosphate-binding loop: Walker A, and a magnesium binding site: Walker B. Besides these two regions, three other conserved motifs are present in the ABC cassette: the switch region which contains a histidine loop, postulated to polarize the attaching water molecule for hydrolysis, the signature conserved motif (LSGGQ) specific to the ABC transporter, and the Q-motif (between Walker A and the signature), which interacts with the gamma phosphate through a water bond. The Walker A, Walker B, Q-loop and switch region form the nucleotide binding site , , . The 3D structure of a monomeric ABC module adopts a stubby L-shape with two distinct arms. ArmI (mainly beta-strand) contains Walker A and Walker B. The important residues for ATP hydrolysis and/or binding are located in the P-loop. The ATP-binding pocket is located at the extremity of armI. The perpendicular armII contains mostly the alpha helical subdomain with the signature motif. It only seems to be required for structural integrity of the ABC module. ArmII is in direct contact with the TMD. The hinge between armI and armII contains both the histidine loop and the Q-loop, making contact with the gamma phosphate of the ATP molecule. ATP hydrolysis leads to a conformational change that could facilitate ADP release. In the dimer the two ABC cassettes contact each other through hydrophobic interactions at the antiparallel beta-sheet of armI by a two-fold axis , , , , , . Proteins known to belong to this family are classified in several functional subfamilies depending on the substrate used (for further information see http://www.tcdb.org/tcdb/index.php?tc=3.A.1). This family includes transporters, whose physiological function is not yet established. These proteins are thought to confer resistance to the chemicals cycloheximide and sulphomethuron methyl, BFA, azole antifungal agents, other antifungal agents: amorolfine and terbinafine. Some of them could serve as an efflux pump of various antibiotics.. Probab=23.30 E-value=40 Score=15.16 Aligned_cols=63 Identities=43% Similarity=0.592 Sum_probs=40.1 Q ss_pred CCHHHHHHH---HHHHHHHHHHHHHH---HHHCCCHHHH--HHHHHHHHC-----CCHHHHHHHHHHHHH-HH---CCCH Q ss_conf 281578999---99999998368877---8831136999--999986507-----730234779999975-30---3881 Q gi|254781211|r 88 AGVDARVAQ---KVVDKLVDHGRKIG---EQFGASLEEE--RKLLQTKLG-----SDYETREKDIARYFR-KE---KIPD 150 (234) Q Consensus 88 agvdarvaq---kvvdklvdhgrkig---eqfgasleee--rkllqtklg-----sdyetrekdiaryfr-ke---kipd 150 (234) -|.|++-|- |.+-||+|||+-|= .|=-|-|=|| |-||--|=| .|.-..-+.|-.||. +. |+|+ T Consensus 996 SGLDSQtAWsi~~l~RKLad~GQaILCTIHQPSA~L~~eFDrLLlLqkGG~TvYFGdlG~n~~T~inYFEa~hGA~kCp~ 1075 (1466) T TIGR00956 996 SGLDSQTAWSICKLLRKLADHGQAILCTIHQPSAILFEEFDRLLLLQKGGQTVYFGDLGENSKTLINYFEAKHGAPKCPE 1075 (1466) T ss_pred CCHHHHHHHHHHHHHHHHHHCCCEEEECCCCHHHHHHHHHHHHHHHHCCCEEEEECCCCHHHHHHHHHHHHHCCCCCCCC T ss_conf 70558999999999998875598388604302489999862897754288068727513135899988866537889858 No 23 >pfam06798 PrkA PrkA serine protein kinase C-terminal domain. This is a family of PrkA bacterial and archaeal serine kinases approximately 630 residues long. This family corresponds to the C-terminal domain. Probab=23.02 E-value=59 Score=14.07 Aligned_cols=96 Identities=22% Similarity=0.344 Sum_probs=42.9 Q ss_pred CCHHHHHHHHHHHHHHH--HHHHHHHHHCCCHHHHHHHHH-HHHCCCHHHHHH--HHHHHHHH--HCCCHHHHHHH-HHH Q ss_conf 28157899999999998--368877883113699999998-650773023477--99999753--03881479999-986 Q gi|254781211|r 88 AGVDARVAQKVVDKLVD--HGRKIGEQFGASLEEERKLLQ-TKLGSDYETREK--DIARYFRK--EKIPDNDVQSL-ISA 159 (234) Q Consensus 88 agvdarvaqkvvdklvd--hgrkigeqfgasleeerkllq-tklgsdyetrek--diaryfrk--ekipdndvqsl-isa 159 (234) .|+..|.+++.+....- ++...-.-+..--+-+.-+.+ .-+. -|++++ +.-.+.|+ ..|-.++||.- +.+ T Consensus 35 ~GiS~Rfi~~~ls~a~~~~~~~~~inp~~vl~~Le~~i~~~~~i~--~e~~~~Y~~~i~~vr~eY~e~i~~Evq~A~~~s 112 (254) T pfam06798 35 DGISPRFIGKALSNALVSDSEERCINPLDVLEELEQGIKDHESIP--EEDRDKYLEFLKVVRKEYNERIKKEVQKAYIES 112 (254) T ss_pred CCCCHHHHHHHHHHHHHCCCCCCCCCHHHHHHHHHHHHHCCCCCC--HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHC T ss_conf 786789999999999843888885789999999999986024378--789999999999999999999999999998644 Q ss_pred HHHHHHHHHHHHHHHH---------CCCCCCCCHHH Q ss_conf 3225677799998751---------12112331022 Q gi|254781211|r 160 WGFEKTFNFFDRYAQQ---------NKESSTGDTFV 186 (234) Q Consensus 160 wgfektfnffdryaqq---------nkesstgdtfv 186 (234) .. +---+.|+||-.. -+...||.-+- T Consensus 113 ~e-e~~q~lf~~Yid~~~a~~~~~k~kDp~TGe~~~ 147 (254) T pfam06798 113 YE-EAAQNLFDNYLDNVEAYINDEKVKDPLTGEELE 147 (254) T ss_pred CH-HHHHHHHHHHHHHHHHHHCCCCCCCCCCCCCCC T ss_conf 39-999999999999999998236215897555216 No 24 >pfam04624 Dec-1 Dec-1 repeat. The defective chorion-1 gene (dec-1) in Drosophila encodes follicle cell proteins necessary for proper eggshell assembly. Multiple products of the dec-1 gene are formed by alternative RNA splicing and proteolytic processing. Cleavage products include S80 (80 kDa) which is incorporated into the eggshell, and further proteolysis of S80 gives S60 (60 kDa). This repeat is usually found in 12 copies in the central region of the protein. Its function is unknown. Length polymorphisms of Dec-1 have been observed in wild-type strains, and are caused by changes in the numbers of the first five repeats. Probab=21.81 E-value=28 Score=16.14 Aligned_cols=12 Identities=50% Similarity=0.961 Sum_probs=9.1 Q ss_pred HHHHHHHHHHCC Q ss_conf 999986544205 Q gi|254781211|r 222 LRQWAEKQATLN 233 (234) Q Consensus 222 lrqwaekqatln 233 (234) -|||.|.||... T Consensus 8 qRQwsEeQak~q 19 (27) T pfam04624 8 QRQWSEEQAKIQ 19 (27) T ss_pred HHHHHHHHHHHH T ss_conf 988339999999 No 25 >TIGR00380 cobD cobalamin biosynthesis protein CobD; InterPro: IPR004485 Cobalamin (vitamin B12) is a structurally complex cofactor, consisting of a modified tetrapyrrole with a centrally chelated cobalt. Cobalamin is usually found in one of two biologically active forms: methylcobalamin and adocobalamin. Most prokaryotes, as well as animals, have cobalamin-dependent enzymes, whereas plants and fungi do not appear to use it. In bacteria and archaea, these include methionine synthase, ribonucleotide reductase, glutamate and methylmalonyl-CoA mutases, ethanolamine ammonia lyase, and diol dehydratase . In mammals, cobalamin is obtained through the diet, and is required for methionine synthase and methylmalonyl-CoA mutase . There are at least two distinct cobalamin biosynthetic pathways in bacteria : Aerobic pathway that requires oxygen and in which cobalt is inserted late in the pathway ; found in Pseudomonas denitrificans and Rhodobacter capsulatus. Anaerobic pathway in which cobalt insertion is the first committed step towards cobalamin synthesis ; found in Salmonella typhimurium, Bacillus megaterium, and Propionibacterium freudenreichii shermanii. Either pathway can be divided into two parts: (1) corrin ring synthesis (differs in aerobic and anaerobic pathways) and (2) adenosylation of corrin ring, attachment of aminopropanol arm, and assembly of the nucleotide loop (common to both pathways) . There are about 30 enzymes involved in either pathway, where those involved in the aerobic pathway are prefixed Cob and those of the anaerobic pathway Cbi. Several of these enzymes are pathway-specific: CbiD, CbiG, and CbiK are specific to the anaerobic route of S. typhimurium, whereas CobE, CobF, CobG, CobN, CobS, CobT, and CobW are unique to the aerobic pathway of P. denitrificans. This entry represents the CbiB protein, which is involved in cobalamin biosynthesis and porphyrin biosynthesis. It converts cobyric acid to cobinamide by the addition of aminopropanol on the F carboxylic group. It is part of the cob operon .; GO: 0009236 cobalamin biosynthetic process, 0016021 integral to membrane. Probab=20.68 E-value=66 Score=13.77 Aligned_cols=88 Identities=25% Similarity=0.283 Sum_probs=59.0 Q ss_pred HHHHHHHHHHHHHHHCC-CHHHHHHHHHHHHCCCHHHHH-HHHHHHHHHHCCCHHHHHHHHHHHHHH---HHHHHHH--H Q ss_conf 99999836887788311-369999999865077302347-799999753038814799999863225---6777999--9 Q gi|254781211|r 99 VDKLVDHGRKIGEQFGA-SLEEERKLLQTKLGSDYETRE-KDIARYFRKEKIPDNDVQSLISAWGFE---KTFNFFD--R 171 (234) Q Consensus 99 vdklvdhgrkigeqfga-sleeerkllqtklgsdyetre-kdiaryfrkekipdndvqslisawgfe---ktfnffd--r 171 (234) +.-|+.|.++.+|..-+ .||.-||-+|.-.+-|-+.-. ..|.+ ---|-+-.|-|.|.+++-=+- --|.|-- - T Consensus 100 ~~sL~~~A~~~~E~~k~GDle~AR~~l~~~VSRdt~~Ls~e~i~s-A~vESlaEN~vDgv~apLFY~llGilfGl~GPWP 178 (322) T TIGR00380 100 VKSLVEAAKKVIESLKEGDLEDARKKLQMIVSRDTEELSEEQILS-AAVESLAENIVDGVTAPLFYALLGILFGLPGPWP 178 (322) T ss_pred HHHHHHHHHHCCCCCCCCCCHHHHHHHHHHHCCCCCCCCHHHHHH-HHHHHHHHCHHHHHHHHHHHHHHHHHCCCCCCHH T ss_conf 999999722026665579826789998764224755345356777-8865533011333378999999998647877147 Q ss_pred HHHHCCCCCCCCHHHH Q ss_conf 8751121123310221 Q gi|254781211|r 172 YAQQNKESSTGDTFVR 187 (234) Q Consensus 172 yaqqnkesstgdtfvr 187 (234) .|---|--+|=|..|- T Consensus 179 lA~~YravnTLDAMvG 194 (322) T TIGR00380 179 LAFVYRAVNTLDAMVG 194 (322) T ss_pred HHHHHHHHHHHHHHHC T ss_conf 9999999977532102 No 26 >COG3110 Uncharacterized protein conserved in bacteria [Function unknown] Probab=20.64 E-value=61 Score=13.98 Aligned_cols=19 Identities=47% Similarity=0.711 Sum_probs=15.9 Q ss_pred HCCCHHHHHHHHHHHHHHH Q ss_conf 1264678899999986544 Q gi|254781211|r 212 LSGDKEATKTLRQWAEKQA 230 (234) Q Consensus 212 lsgdkeatktlrqwaekqa 230 (234) .-.|||.-+...+||++|- T Consensus 198 ~qAd~ETr~rFl~Wa~~Q~ 216 (216) T COG3110 198 QQADKETRKRFLQWAKKQP 216 (216) T ss_pred HHCCHHHHHHHHHHHHCCC T ss_conf 8659999999999864198 Done!