Citrus Sinensis ID: 034242

Local Sequence Feature Prediction

Prediction and (Method)	Result

Residue Number Marker

Protein Sequence

Secondary Structure (PSIPRED)

Secondary Structure Prediction (SSPRO)

Coil and Loop (DISEMBL)

Flexible Loop (DISEMBL)

Low Complexity Region (SEG)

Disordered region (IsUnstruct)

Disordered Region (DISOPRED)

Disordered Region (DISEMBL)

Disordered Region (DISPRO)

Transmembrane Helix (TMHMM)

Transmembrane Helix (HMMTOP)

Transmembrane Helix (MEMSAT)

TM Helix, Signal Peptide (MEMSAT_SVM)

TM Helix, Signal Peptide (Phobius)

Signal Peptide (SignalP HMM Mode)

Signal Peptide (SignalP NN Mode)

Coiled Coils (COILS)

Positional Conservation

--------10--------20--------30--------40--------50--------60--------70--------80--------90-------100

MQQLKICATRRRRMAYSRTETYVLLEPGVEEKFVTEEELKARLKYWLENWAGQVGKGGLPPDLAKFATIDEAVAFLITNVCELELQGDVGSIQWYEVRLE

cccHHHHHHHHHHHHHcccccEEEEccccccccccHHHHHHHHHHHHHHHHcccccccccHHHHccccHHHHHHHHHHccEEEEEEccccEEEEEEEEEc

ccEEEEEHHHHHHHHHcccccEEEEccccccHcccHHHHHHHHHHHHHHcccccccccccHHHHccccHHHHHHHHHHHccEEEccccccEEEEEEEEcc

MQQLKICATRRRRMAYSRTETYvllepgveekfVTEEELKARLKYWLENWagqvgkgglppdlakFATIDEAVAFLITNVCELelqgdvgsiQWYEVRLE

mqqlkicatrrrrmaysrtetyvllepgveekfvTEEELKARLKYWLENWAGQVGKGGLPPDLAKFATIDEAVAFLITNVCELelqgdvgsiqwYEVRLE

MQQLKICATRRRRMAYSRTETYVLLEPGVEEKFVTEEELKARLKYWLENWAGQVGKGGLPPDLAKFATIDEAVAFLITNVCELELQGDVGSIQWYEVRLE

*****ICATRRRRMAYSRTETYVLLEPGVEEKFVTEEELKARLKYWLENWAGQVGKGGLPPDLAKFATIDEAVAFLITNVCELELQGDVGSIQWYEV***

********TRRRRMAYSRTETYVLLEPGVEEKFVTEEELKARLKYWLENWAGQVGKGGLPPDLAKFATIDEAVAFLITNVCELELQGDVGSIQWYEVRLE

MQQLKICATRRRRMAYSRTETYVLLEPGVEEKFVTEEELKARLKYWLENWAGQVGKGGLPPDLAKFATIDEAVAFLITNVCELELQGDVGSIQWYEVRLE

iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii

oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo

iiiiiiiiiiiiiiiiiiiihhhhhhhhhhhhhhhhhhhhhooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo

iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiihhhhhhhhhhhhhhhhoooooooo

oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

MQQLKICATRRRRMAYSRTETYVLLEPGVEEKFVTEEELKARLKYWLENWAGQVGKGGLPPDLAKFATIDEAVAFLITNVCELELQGDVGSIQWYEVRLE

no confident homologs detected

Close Homologs for Annotation Transfer

Close Homologs in SWISS-PROT Database Detected by BLAST

Original result of BLAST against SWISS-PROT Database

No hits with e-value below 0.001 by BLAST

Close Homologs in the Non-Redundant Database Detected by BLAST

Original result of BLAST against Nonredundant Database

GI	Alignment Graph	Length	Definition	Q cover	H cover	Identity	E-value
Query		100
359488522		151	PREDICTED: uncharacterized protein LOC10	0.92	0.609	0.690	2e-30
297805830		157	hypothetical protein ARALYDRAFT_916402 [	0.94	0.598	0.602	2e-29
270342082		152	hypothetical protein [Phaseolus vulgaris	0.92	0.605	0.666	1e-28
388494090		151	unknown [Lotus japonicus]	0.91	0.602	0.687	5e-28
21553859		156	unknown [Arabidopsis thaliana]	0.93	0.596	0.602	1e-27
18421842		156	chlororespiratory reduction 7 [Arabidops	0.93	0.596	0.602	2e-27
356573538		152	PREDICTED: uncharacterized protein LOC10	0.92	0.605	0.677	2e-26
118489214		162	unknown [Populus trichocarpa x Populus d	0.94	0.580	0.606	5e-26
224080113		99	predicted protein [Populus trichocarpa]	0.94	0.949	0.606	1e-25
326494344		151	predicted protein [Hordeum vulgare subsp	0.89	0.589	0.595	2e-25

>gi\|359488522\|ref\|XP_002280011.2\| PREDICTED: uncharacterized protein LOC100249838 [Vitis vinifera] gi\|296082204\|emb\|CBI21209.3\| unnamed protein product [Vitis vinifera]	Back alignment and taxonomy information

 Score =  136 bits (342), Expect = 2e-30,   Method: Compositional matrix adjust.
 Identities = 67/97 (69%), Positives = 80/97 (82%), Gaps = 5/97 (5%)

Query: 4   LKICATRRRRMAYSRTETYVLLEPGVEEKFVTEEELKARLKYWLENWAGQVGKGGLPPDL 63
           +K  A RRRR AYS+TETYVL+EPG +E+FV++EELKARLK WLENW G+     LPPDL
Sbjct: 60  VKSYAMRRRR-AYSQTETYVLMEPGKDEEFVSQEELKARLKGWLENWPGK----ALPPDL 114

Query: 64  AKFATIDEAVAFLITNVCELELQGDVGSIQWYEVRLE 100
           AKF TID+AV +L+  VCELE+ GDVGSIQWYE+RLE
Sbjct: 115 AKFQTIDDAVMYLVKAVCELEIDGDVGSIQWYEIRLE 151

Source: Vitis vinifera

Species: Vitis vinifera

Genus: Vitis

Family: Vitaceae

Order: Vitales

Class:

Phylum: Streptophyta

Superkingdom: Eukaryota

>gi\|297805830\|ref\|XP_002870799.1\| hypothetical protein ARALYDRAFT_916402 [Arabidopsis lyrata subsp. lyrata] gi\|297316635\|gb\|EFH47058.1\| hypothetical protein ARALYDRAFT_916402 [Arabidopsis lyrata subsp. lyrata]	Back alignment and taxonomy information

>gi\|270342082\|gb\|ACZ74666.1\| hypothetical protein [Phaseolus vulgaris]	Back alignment and taxonomy information

>gi\|388494090\|gb\|AFK35111.1\| unknown [Lotus japonicus]	Back alignment and taxonomy information

>gi\|21553859\|gb\|AAM62952.1\| unknown [Arabidopsis thaliana]	Back alignment and taxonomy information

>gi\|18421842\|ref\|NP_568563.1\| chlororespiratory reduction 7 [Arabidopsis thaliana] gi\|9758849\|dbj\|BAB09375.1\| unnamed protein product [Arabidopsis thaliana] gi\|17529164\|gb\|AAL38808.1\| unknown protein [Arabidopsis thaliana] gi\|21281014\|gb\|AAM45076.1\| unknown protein [Arabidopsis thaliana] gi\|332007025\|gb\|AED94408.1\| chlororespiratory reduction 7 [Arabidopsis thaliana]	Back alignment and taxonomy information

>gi\|356573538\|ref\|XP_003554915.1\| PREDICTED: uncharacterized protein LOC100801516 [Glycine max]	Back alignment and taxonomy information

>gi\|118489214\|gb\|ABK96413.1\| unknown [Populus trichocarpa x Populus deltoides]	Back alignment and taxonomy information

>gi\|224080113\|ref\|XP_002306021.1\| predicted protein [Populus trichocarpa] gi\|222848985\|gb\|EEE86532.1\| predicted protein [Populus trichocarpa]	Back alignment and taxonomy information

>gi\|326494344\|dbj\|BAJ90441.1\| predicted protein [Hordeum vulgare subsp. vulgare] gi\|326520968\|dbj\|BAJ92847.1\| predicted protein [Hordeum vulgare subsp. vulgare]	Back alignment and taxonomy information

Prediction of Gene Ontology (GO) Terms

Close Homologs with Gene Ontology terms Detected by BLAST

Original result of BLAST against Gene Ontology (AMIGO)

ID	Alignment graph	Length	Definition	Q cover	H cover	Identity	E-value
Query		100
TAIR\|locus:2157250		156	CRR7 "AT5G39210" [Arabidopsis	0.9	0.576	0.621	1.3e-27

TAIR\|locus:2157250 CRR7 "AT5G39210" [Arabidopsis thaliana (taxid:3702)]	Back alignment and assigned GO terms

 Score = 309 (113.8 bits), Expect = 1.3e-27, P = 1.3e-27
 Identities = 59/95 (62%), Positives = 76/95 (80%)

Query:     6 ICATRRRRMAYSRTETYVLLEPGVEEKFVTEEELKARLKYWLENWAGQVGKGGLPPDLAK 65
             +CATRRRR+ +S ++TYVLLE G +E+FVTE+ELKA+L+ WLENW        LPPDLA+
Sbjct:    67 VCATRRRRV-HSNSDTYVLLEAGQDEQFVTEDELKAKLRGWLENWP----VNSLPPDLAR 121

Query:    66 FATIDEAVAFLITNVCELELQGDVGSIQWYEVRLE 100
             F  +DEAV FL+  VCELE+ G+VGS+QWY+VRLE
Sbjct:   122 FDDLDEAVDFLVKAVCELEIDGEVGSVQWYQVRLE 156


Parameters:
  V=100
  filter=SEG
  E=0.001

  ctxfactor=1.00

  Query                        -----  As Used  -----    -----  Computed  ----
  Frame  MatID Matrix name     Lambda    K       H      Lambda    K       H
   +0      0   BLOSUM62        0.319   0.136   0.410    same    same    same
               Q=9,R=2         0.244   0.0300  0.180     n/a     n/a     n/a

  Query
  Frame  MatID  Length  Eff.Length     E     S W   T  X   E2     S2
   +0      0      100       100   0.00091  102 3  11 22  0.40    30
                                                     29  0.47    31


Statistics:

  Database:  /share/blast/go-seqdb.fasta
   Title:  go_20130330-seqdb.fasta
   Posted:  5:47:42 AM PDT Apr 1, 2013
   Created:  5:47:42 AM PDT Apr 1, 2013
   Format:  XDF-1
   # of letters in database:  169,044,731
   # of sequences in database:  368,745
   # of database sequences satisfying E:  1
  No. of states in DFA:  588 (63 KB)
  Total size of DFA:  126 KB (2079 KB)
  Time to generate neighborhood:  0.00u 0.00s 0.00t   Elapsed:  00:00:00
  No. of threads or processors used:  24
  Search cpu time:  10.56u 0.10s 10.66t   Elapsed:  00:00:00
  Total cpu time:  10.56u 0.10s 10.66t   Elapsed:  00:00:00
  Start:  Mon May 20 15:46:19 2013   End:  Mon May 20 15:46:19 2013

GO:0003674 "molecular_function" evidence=ND

GO:0005634 "nucleus" evidence=ISM

GO:0009507 "chloroplast" evidence=IDA

GO:0010275 "NAD(P)H dehydrogenase complex assembly" evidence=IGI;IMP

GO:0010598 "NAD(P)H dehydrogenase complex (plastoquinone)" evidence=IGI

GO:0016020 "membrane" evidence=IDA

GO:0009570 "chloroplast stroma" evidence=IDA

GO:0000023 "maltose metabolic process" evidence=RCA

GO:0010207 "photosystem II assembly" evidence=RCA

GO:0016556 "mRNA modification" evidence=RCA

GO:0019252 "starch biosynthetic process" evidence=RCA

Prediction of Enzyme Commission (EC) Number

EC Number Prediction by Annotation Transfer from SWISS-PROT Entries

Original result of BLAST against SWISS-PROT

No confident hit for EC number transfering in SWISSPROT detected by BLAST

EC Number Prediction by Ezypred Server

Original result from Ezypred Server

Fail to connect to Ezypred Server

EC Number Prediction by EFICAz Software

No EC number assignment, probably not an enzyme!

Prediction of Functionally Associated Proteins

Functionally Associated Proteins Detected by STRING

Original result from the STRING server

Your Input:
	GSVIVG00038032001	SubName- Full=Chromosome chr14 scaffold_9, whole genome shotgun sequence; (158 aa)
		(Vitis vinifera)
Predicted Functional Partners:
	GSVIVG00023910001	SubName- Full=Chromosome chr6 scaffold_3, whole genome shotgun sequence; (216 aa)	•	•	•	0.702
	ndhM	SubName- Full=Chromosome chr18 scaffold_1, whole genome shotgun sequence;; NDH-1 shuttles elect [...] (208 aa)	•		•	0.616
	PsbP2	SubName- Full=Chromosome chr13 scaffold_17, whole genome shotgun sequence; (238 aa)	•	•	•	0.604
	GSVIVG00028499001	RecName- Full=Photosystem II reaction center psb28 protein; (179 aa)	•	•		0.600
	GSVIVG00028771001	SubName- Full=Chromosome chr13 scaffold_45, whole genome shotgun sequence; (378 aa)	•	•		0.581
	GSVIVG00018549001	SubName- Full=Chromosome chr13 scaffold_17, whole genome shotgun sequence; (235 aa)	•	•	•	0.575
	GSVIVG00005655001	SubName- Full=Putative uncharacterized protein (Chromosome undetermined scaffold_155, whole gen [...] (254 aa)		•	•	0.568
	GSVIVG00023239001	SubName- Full=Chromosome chr8 scaffold_29, whole genome shotgun sequence; (219 aa)	•	•		0.550
	GSVIVG00038203001	SubName- Full=Chromosome chr14 scaffold_9, whole genome shotgun sequence; (175 aa)	•	•		0.537
	GSVIVG00030316001	SubName- Full=Chromosome chr1 scaffold_5, whole genome shotgun sequence; (102 aa)	•	•		0.537

Conserved Domains and Related Protein Families

Conserved Domains Detected by RPS-BLAST

Original result of RPS-BLAST against CDD database part I

ID	Alignment Graph	Length	Definition	E-value
Query		100
pfam12095		83	pfam12095, DUF3571, Protein of unknown function (D	2e-22

>gnl\|CDD\|192936 pfam12095, DUF3571, Protein of unknown function (DUF3571)	Back alignment and domain information

 Score = 82.7 bits (205), Expect = 2e-22
 Identities = 36/81 (44%), Positives = 46/81 (56%), Gaps = 7/81 (8%)

Query: 20  ETYVLLEPGVEEKFVTEEELKARLKYWLENWAGQVGKGGLPPDLAKFATIDEAVAFLITN 79
           + YV+LEPG  E+F+T  EL A LK WL           LP DLAK  ++D     L+  
Sbjct: 10  DHYVVLEPGQPEQFLTPAELLAWLKEWLTR------LDSLPADLAKLPSLDAQAQRLLDT 63

Query: 80  VCELELQGDVGSIQWYEVRLE 100
            CELE+ G   ++QWY VRLE
Sbjct: 64  ACELEI-GPGLTLQWYAVRLE 83

This family of proteins is functionally uncharacterized. This protein is found in bacteria and eukaryotes. Proteins in this family are typically between 85 to 97 amino acids in length. Length = 83

Conserved Domains Detected by HHsearch

Original result of HHsearch against CDD database

ID	Alignment Graph	Length	Definition	Probability
Query		100
PF12095		83	DUF3571: Protein of unknown function (DUF3571); In	100.0
PHA03373		247	tegument protein; Provisional	84.06

>PF12095 DUF3571: Protein of unknown function (DUF3571); InterPro: IPR021954 This family of proteins is functionally uncharacterised	Back alignment and domain information

Probab=100.00  E-value=4.9e-45  Score=249.35  Aligned_cols=79  Identities=51%  Similarity=0.993  Sum_probs=58.1

Q ss_pred             hhhccCCcEEEecCCCCccccCHHHHHHHHHHHHHhhcccCCCCCCChhhhccCCHHHHHHHHHHccccccccCCceeEE
Q 034242           14 MAYSRTETYVLLEPGVEEKFVTEEELKARLKYWLENWAGQVGKGGLPPDLAKFATIDEAVAFLITNVCELELQGDVGSIQ   93 (100)
Q Consensus        14 ~my~q~D~YVvLEp~~~EqfLT~~Ell~~Lk~~L~~~~~~~~~~~LP~dL~k~~s~~~qaq~Lldt~CELEi~pg~~~lQ   93 (100)
                      -|| ++|||||||||+||||||++||++|||+||++.      ++||+||+||+|+++||+||++|+|||||||| +|||
T Consensus         5 lm~-~~d~yVvLEp~~~Eqflt~~Ell~~Lk~~L~~~------~~LP~dL~~~~s~~~qa~~Lldt~CeLeigpg-~~lQ   76 (83)
T PF12095_consen    5 LMY-QEDHYVVLEPGQPEQFLTPEELLEKLKEWLQNQ------DDLPPDLAKFSSVEEQAQYLLDTACELEIGPG-GYLQ   76 (83)
T ss_dssp             -S------EEEEESSS-SEEE-HHHHHHHHHHHHHHT------TTS-HHHHH---HHHHHHHHHHH---EEEETT-EEEE
T ss_pred             hhh-ccCCEEEecCCCCcccCCHHHHHHHHHHHHHcC------CCCCHHHHhCCCHHHHHHHHHHhceeeecCCC-CEEE
Confidence            488 999999999999999999999999999999994      48999999999999999999999999999999 6999


Q ss_pred             EEEEeeC
Q 034242           94 WYEVRLE  100 (100)
Q Consensus        94 WYaVRLE  100 (100)
                      |||||||
T Consensus        77 WyaVRLE   83 (83)
T PF12095_consen   77 WYAVRLE   83 (83)
T ss_dssp             EEE----
T ss_pred             EEEEecC
Confidence            9999998

This protein is found in bacteria and eukaryotes. Proteins in this family are typically between 85 to 97 amino acids in length. ; PDB: 2KRX_A.

>PHA03373 tegument protein; Provisional	Back alignment and domain information

Homologous Structure Templates

Structure Templates Detected by BLAST

Original result of BLAST against Protein Data Bank

ID	Alignment Graph	Length	Definition	E-value
Query		100
2krx_A		94	Solution Nmr Structure Of Asl3597 From Nostoc Sp. P		5e-05

>pdb|2KRX|A Chain A, Solution Nmr Structure Of Asl3597 From Nostoc Sp. Pcc7120. Northeast Structural Genomics Consortium Target Id Nsr244 Length = 94

Back alignment and structure

Iteration: 1
 Score = 42.7 bits (99), Expect = 5e-05,   Method: Compositional matrix adjust.
 Identities = 31/86 (36%), Positives = 45/86 (52%), Gaps = 11/86 (12%)
Query: 18  RTETYVLLEPGVEEKFVTEEELKARLKYWLENWAGQVGKGGLPPDLAKFATIDEAVAFLI 77
           + + +V+LE    E+F+T  EL  +LK  LE    ++    LP +L K  ++      LI
Sbjct: 8   QQDNFVVLETNQPEQFLTTIELLEKLKGELE----KISFSDLPLELQKLDSLPAQAQHLI 63
Query: 78  TNVCELELQGDVGS---IQWYEVRLE 100
              CEL    DVG+   +QWY VRLE
Sbjct: 64  DTSCEL----DVGAGKYLQWYAVRLE 85

PyMOL of 2krx

Structure Templates Detected by RPS-BLAST

Original result of RPS-BLAST against PDB70 database

ID	Alignment Graph	Length	Definition	E-value
Query		100
2krx_A		94	ASL3597 protein; structural genomics, PSI-2, prote	1e-28

>2krx_A ASL3597 protein; structural genomics, PSI-2, protein structure initiative, no structural genomics consortium, NESG, unknown function; NMR {Nostoc SP} Length = 94	Back alignment and structure

 Score = 98.3 bits (245), Expect = 1e-28
 Identities = 29/81 (35%), Positives = 42/81 (51%), Gaps = 5/81 (6%)

Query: 20  ETYVLLEPGVEEKFVTEEELKARLKYWLENWAGQVGKGGLPPDLAKFATIDEAVAFLITN 79
           + +V+LE    E+F+T  EL  +LK  LE  +       LP +L K  ++      LI  
Sbjct: 10  DNFVVLETNQPEQFLTTIELLEKLKGELEKISFSD----LPLELQKLDSLPAQAQHLIDT 65

Query: 80  VCELELQGDVGSIQWYEVRLE 100
            CEL++ G    +QWY VRLE
Sbjct: 66  SCELDV-GAGKYLQWYAVRLE 85

PyMOL of 2krx

Structure Templates Detected by HHsearch

Original result of HHsearch against PDB70 database

ID	Alignment Graph	Length	Definition	Probability
Query		100
2krx_A		94	ASL3597 protein; structural genomics, PSI-2, prote	100.0

>2krx_A ASL3597 protein; structural genomics, PSI-2, protein structure initiative, no structural genomics consortium, NESG, unknown function; NMR {Nostoc SP}	Back alignment and structure

Probab=100.00  E-value=1.1e-45  Score=256.48  Aligned_cols=81  Identities=36%  Similarity=0.613  Sum_probs=78.2

Q ss_pred             hhhccCCcEEEecCCCCccccCHHHHHHHHHHHHHhhcccCCCCCCChhhhccCCHHHHHHHHHHccccccccCCceeEE
Q 034242           14 MAYSRTETYVLLEPGVEEKFVTEEELKARLKYWLENWAGQVGKGGLPPDLAKFATIDEAVAFLITNVCELELQGDVGSIQ   93 (100)
Q Consensus        14 ~my~q~D~YVvLEp~~~EqfLT~~Ell~~Lk~~L~~~~~~~~~~~LP~dL~k~~s~~~qaq~Lldt~CELEi~pg~~~lQ   93 (100)
                      -|| |+|||||||||+||||||++||++|||+||+++|+    ++||+||+||+|+++||+||+||+|||||+||+ |||
T Consensus         5 lmy-q~D~yVvLEp~~~EqfLT~~Ell~~Lk~~L~~~~~----~~LP~dL~~~~s~~~qaq~Lldt~CELei~pG~-~lQ   78 (94)
T 2krx_A            5 LMY-QQDNFVVLETNQPEQFLTTIELLEKLKGELEKISF----SDLPLELQKLDSLPAQAQHLIDTSCELDVGAGK-YLQ   78 (94)
T ss_dssp             CSC-CCCCEEEEESSSCSEEECHHHHHHHHHHHHHHSCT----TTSCHHHHHCCCHHHHHHHHHHHCCCEEEETTE-EEE
T ss_pred             hhc-ccCCEEEecCCCCcccCCHHHHHHHHHHHHHhCcc----ccCCHHHHhCCCHHHHHHHHHHheeeeeeCCCC-EEE
Confidence            389 99999999999999999999999999999999995    499999999999999999999999999999995 999


Q ss_pred             EEEEeeC
Q 034242           94 WYEVRLE  100 (100)
Q Consensus        94 WYaVRLE  100 (100)
                      |||||||
T Consensus        79 WYaVRLE   85 (94)
T 2krx_A           79 WYAVRLE   85 (94)
T ss_dssp             EEECCCC
T ss_pred             EEEEEEe
Confidence            9999998

PyMOL of 2krx

Homologous Structure Domains

Structure Domains Detected by RPS-BLAST

Original result of RPS-BLAST against SCOP70(version1.75) database

No hit with e-value below 0.005

Homologous Domains Detected by HHsearch

Original result of HHsearch against SCOP70(version1.75) database

No hit with probability above 80.00