SignalP+TMHMM预测微生物分泌蛋白 广微测是最权威的检测中心吗? 健明迪

SignalP+TMHMM预测微生物分泌蛋白

Secretory Protein是指在细胞内分解后,分泌到细胞外起作用的蛋白质。分泌蛋白的N 端有普通由15~30 个氨基酸组成的信号肽。信号肽是引导新分解的蛋白质向分泌通路转移的短(长度5-30个氨基酸)肽链。常指新分解多肽链中用于指点蛋白质的跨膜转移(定位)的N-末端的氨基酸序列(有时不一定在N端)。运用SignalP 注释蛋白序列能否含有信号肽结构,运用TMHMM注释蛋白序列能否含有跨膜结构,*终挑选出含有信号肽结构并且不含跨膜结构的蛋白为分泌蛋白

软件Software

  • SignalP V6.0
  • SignalP 6.0 预测来自古细菌、革兰氏阳性细菌、革兰氏阴性细菌和真核生物的蛋白质中存在的信号肽predicts signal peptides and the location of their cleavage sites in proteins from Archaea, Gram-positive Bacteria,及其切割位点的位置。Gram-negative Bacteria and Eukarya.在细菌和古细菌中,SignalP 6.0 可以区分五种类型的信号肽:In Bacteria and Archaea, SignalP 6.0 can discriminate between five types of signal peptides:
    • Sec/SPI:由 Sec 转座转运,并由信号肽酶 I (Lep) 切割的“规范”分泌信号肽;"Standard" secretory signal peptides transported by Sec translocon and cleaved by Signal Peptidase I (Lep).
    • Sec/SPII:由 Sec 转座子运输,并由信号肽酶 II (Lsp) 切割的脂蛋白信号肽;lipoprotein signal peptides transported by the Sec translocon and cleaved by Signal Peptidase II (Lsp).
    • Tat/SPI:由 Tat 转座子转运,并由信号肽酶 I (Lep) 切割的 Tat 信号肽;Tat signal peptides transported by the Tat translocon and cleaved by Signal Peptidase I (Lep).
    • Tat/SPII:由 Tat 转位子转运,并由信号肽酶 II (Lsp) 切割的 Tat 脂蛋白信号肽;Tat lipoprotein signal peptides transported by Tat translocon & cleaved by Signal Peptidase II (Lsp).
    • Sec/SPIII:由 Sec 转位子运输,并由信号肽酶 III (PilD/PibD) 切割的菌毛蛋白和菌毛蛋白样信号肽。Pilin & pilin-like signal peptides transported by Sec translocon & cleaved by Signal Peptidase III (PilD/PibD).
    • 此外,SignalP 6.0 预测信号肽的区域。Additionally, SignalP 6.0 predicts the regions of signal peptides.依据类型,预测 n、h 和 c 区域以及其他显着特征的位置。Depending on the type, the positions of n-, h- and c-regions as well as of other distinctive features are predicted.

  • Python

SignalP和TMHMM关于学术用户收费,但是需求填写相关信息和邮箱,以接纳下载链接(4h有效时间)。

软件装置Installation of Softwares

装置SignalP 6.0

  • 下载 访问SignalP V6.0网站,找到“Download”,填写相关信息,获取下载链接,下载失掉“signalp-6.0.fast.tar.gz”。有两个形式可以选择——“slow_sequential”和“fast"。前者runs the full model sequentially, taking the same amount of RAM as fast but being 6 times slower;后者uses a smaller model that approximates the performance of the full model, requiring a fraction of the resources and being significantly faste。本教程下载的是fast形式。
  • 装置Installation
    • 装置依赖Dependencies
      • Python
      • matplotlib>3.3.2
      • numpy>1.19.2
      • torch>1.7.0 pip install torch
      • tqdm>4.46.1

    • 装置SignalP 6.0 # 解紧缩装置文件 tar zxvf signalp-6.0.fast.tar.gz # 进入解压后的软件目录,在终端运转 python setup.py install # 测试装置 signalp6 --help

装置TMHMM V2.0c

  • 下载 访问TMHMM V2.0c网站,找到“Download”,填写相关信息,获取下载链接,下载失掉“tmhmm-2.0c.Linux.tar.gz”。
  • 装置 # 解紧缩 tar zxvf tmhmm-2.0c.Linux.tar.gz # 进入解压后的目录 cd tmhmm-2.0c # 获取以后途径,我的是“/home/liu/tools/tmhmm-2.0c/bin” pwd # 将该途径参与到系统的环境变量中,参考我之前的文章来(编辑~/.bashrc)liaochenlanruo.github.io # 修正bin目录下的tmhmm和tmhmmformat.pl的首行为“#!/usr/bin/perl”
  • 运转错误 运转软件时总报Segmentation fault (core dumped)错误,暂时无解。各位可以运用其在线版

软件用法Usage

SignalP 6.0

预测Prediction

A command takes the following form

signalp6 --fastafile /path/to/input.fasta --organism other --output_dir path/to/be/saved --format txt --mode fast

  • fastafile 输入文件为FASTA格式的蛋白序列文件Specifies the fasta file with the sequences to be predicted.。
  • organism is either other or Eukarya. Specifying Eukarya triggers post-processing of the SP predictions to prevent spurious results (only predicts type Sec/SPI).
  • format can take the values txt, png, eps, all. It defines what output files are created for individual sequences. txtproduces a tabular .gff file with the per-position predictions for each sequence. png, eps, all additionally produce probability plots in the requested format. For larger prediction jobs, plotting will slow down the processing speed significantly.
  • mode is either fast, slow or slow-sequential. Default is fast, which uses a smaller model that approximates the performance of the full model, requiring a fraction of the resources and being significantly faster. slow runs the full model in parallel, which requires more than 14GB of RAM to be available. slow-sequential runs the full model sequentially, taking the same amount of RAM as fast but being 6 times slower. If the specified model is not installed, SignalP will abort with an error.

输入Outputs

  • output_dir/output.gff3:仅包括含有信号肽的序列信息;

  • output_dir/prediction_results.txt:包括了输入文件中的一切序列(不重要);
  • output_dir/region_output.gff3:包括一切的信号肽区域信息。
    • n-region: The n-terminal region of the signal peptide. Reported for Sec/SPI, Sec/SPII, Tat/SPI and Tat/SPII. Labeled as N
    • h-region: The center hydrophobic region of the signal peptide. Reported for Sec/SPI, Sec/SPII, Tat/SPI and Tat/SPII. Labeled as H
    • c-region: The c-terminal region of the signal peptide, reported for Sec/SPI and Tat/SPI.
    • Cysteine: The conserved cysteine in +1 of the cleavage site of Lipoproteins that is used for Lipidation. Labeled as c.
    • Twin-arginine motif: The twin-arginine motif at the end of the n-region that is characteristic for Tat signal peptides. Labeled as R.
    • Sec/SPIII: These signal peptides have no known region structure.

批处置与结果优化

脚本名:run_SignalP.pl

#!/usr/bin/perl

use strict;

use warnings;

# Author: Liu Hualin

# Date: Oct 14, 2021

open IDNOSEQ, ">IDNOSEQ.txt" || die;

my @faa = glob("*.faa");

foreach (@faa) {

$_ =~ /(.+).faa/;

my $str = $1;

my $out = $1 . ".nodesc";

my $sigseq = $1 . ".sigseq";

my $outdir = $1 . "_signalp";

open IN, $_ || die;

open OUT, ">$out" || die;

while () {

chomp;

if (/^(>\S+)/) {

print OUT $1 . "\n";

}else {

print OUT $_ . "\n";

}

}

close IN;

close OUT;

my %hash = idseq($out);

system("signalp6 --fastafile $out --organism other --output_dir $outdir --format txt --mode fast");

my $gff = $outdir . "/output.gff3";

if (! -z $gff) {

open IN, "$gff" || die;

;

open OUT, ">$sigseq" || die;

while () {

chomp;

my @lines = split /\t/;

if (exists $hash{$lines[0]}) {

print OUT ">$lines[0]\n$hash{$lines[0]}\n";

}else {

print IDNOSEQ $str . "\t" . "$lines[0]\n";

}

}

close IN;

close OUT;

}

system("rm $out");

system("mv $sigseq $outdir");

}

close IDNOSEQ;

sub idseq {

my ($fasta) = @_;

my %hash;

local $/ = ">";

open IN, $fasta || die;

;

while () {

chomp;

my ($header, $seq) = split (/\n/, $_, 2);

$header =~ /(\S+)/;

my $id = $1;

$hash{$id} = $seq;

}

close IN;

return (%hash);

}

将run_SignalP.pl与后缀名为“.faa”的FASTA格式文件放在同一目录下,在终端中运转如下代码:

perl run_SignalP.pl

结果解读Output interpretation

*代表输入文件的名字。

  • *_signalp/output.gff3:仅包括含有信号肽的序列信息;
  • *_signalp/prediction_results.txt:包括了输入文件中的一切序列(不重要);
  • *_signalp/region_output.gff3:包括一切的信号肽区域信息;
  • *_signalp/*.sigseq:存储一切信号肽的氨基酸序列文件,可用作TMHMM的输入文件。

TMHMM

预测

离线版总是报错,找不出缘由,因此运用网页效劳器停止,输入文件为上述生成的“*_signalp/*.sigseq”,将其上传至网页版TMHMM,提交义务,等候结果即可。

结果展现

TMHMM可以输入多种格式的结果文件,详细请参考其官方说明

在TMHMM网站提交义务

  • Long output format
    • Length: 蛋白序列的长度。The length of the protein sequence.
    • Number of predicted TMHs:预测到的跨膜螺旋的数量。The number of predicted transmembrane helices.
    • Exp number of AAs in TMHs:跨膜螺旋中氨基酸的预期数量。The expected number of amino acids intransmembrane helices. 假设此数字大于 18,则很能够是跨膜蛋白(或具有信号肽)。If this number is larger than 18 it is very likely to be a transmembrane protein (OR have a signal peptide).
    • Exp number, first 60 AAs:在蛋白的前60个氨基酸中跨膜螺旋中氨基酸的预期数量。The expected number of amino acids in transmembrane helices in the first 60 amino acids of the protein.假设该数字超越几个,你应该被正告在 N 端预测的跨膜螺旋能够是一个信号肽。If it more than a few, you are warned that a predicted transmembrane helix in the N-term could be a signal peptide.
    • Total prob of N-in:N端在膜的细胞质一侧的总概率。The total probability that the N-term is on the cytoplasmic side of the membrane.
    • POSSIBLE N-term signal sequence:当“Exp number, first 60 AAs”大于 10 时发生的正告。A warning that is produced when "Exp number, first 60 AAs" is larger than 10.

  • 蛋白F01_bin.1_00110合计436个氨基酸,有5个跨膜螺旋结构。

  • 蛋白F01_bin.1_00142合计557个氨基酸,一切序列均在膜外,即该序列编码的是分泌蛋白。

  • Short output format
    • "len=": 蛋白序列的长度。The length of the protein sequence.
    • "ExpAA=":跨膜螺旋中氨基酸的预期数量。The expected number of amino acids intransmembrane helices.假设此数字大于 18,则很能够是跨膜蛋白(或具有信号肽)。If this number is larger than 18 it is very likely to be a transmembrane protein (OR have a signal peptide).
    • "First60=":在蛋白的前60个氨基酸中跨膜螺旋中氨基酸的预期数量。The expected number of amino acids in transmembrane helices in the first 60 amino acids of the protein.假设该数字超越几个,你应该被正告在 N 端预测的跨膜螺旋能够是一个信号肽。If it more than a few, you are warned that a predicted transmembrane helix in the N-term could be a signal peptide.
    • "PredHel=":预测到的跨膜螺旋的数量。The number of predicted transmembrane helices by N-best.
    • "Topology=":N-best 预测的拓扑结构。The topology predicted by N-best.拓扑是由跨膜螺旋的位置给出的,假设螺旋在外部,则由“i”分隔,假设螺旋在外部,则由“o”分隔。'i7-29o44-66i87-109o'意味着它从膜内末尾,在位置7到29有一个预测的TMH,30-43在膜外,然后是位置44-66的TMH。

结果汇总

经过网页版预测我们仅失掉了一个列表文件(Short output format),该文件需求自己复制网页内容粘贴到新文件中,我将其命名为*_TMHMM_SHORT.txt,并将其寄存在*_signalp目录中,该目录是由run_SignalP.pl生成的。下面我将会统计各个基因组中信号肽蛋白的总数量、分泌蛋白数量和跨膜蛋白数量到文件Statistics.txt中,并区分提取每个基因组的分泌蛋白序列到*_signalp/*.secretory.faa文件中,提取跨膜蛋白序列到*_signalp/*.membrane.faa文件中。该进程将经过tmhmm_parser.pl完成。

#!/usr/bin/perl use strict; use warnings; # Author: Liu Hualin # Date: Oct 15, 2021 open OUT, ">Statistics.txt" || die; print OUT "Strain name\tSignal peptide numbers\tSecretory protein numbers\tMembrane protein numbers\n"; my @sig = glob("*_signalp"); foreach my $sig (@sig) { $sig=~/(.+)_signalp/; my $str = $1; my $tmhmm = $sig . "/$str" . "_TMHMM_SHORT.txt"; my $fasta = $sig . "/$str" . ".sigseq"; my $secretory = $str . ".secretory.faa"; my $membrane = $str . ".membrane.faa"; open SEC, ">$secretory" || die; open MEM, ">$membrane" || die; my $out = 0; my $on = 0; my %hash = idseq($fasta); open IN, $tmhmm || die; while () { chomp; $_=~s/[\r\n]+//g; # print $_ . "\n"; my @lines = split /\t/; if ($lines[5] eq "Topology=o") { $out++; print SEC ">$lines[0]\n$hash{$lines[0]}\n"; }else { $on++; print MEM ">$lines[0]\n$hash{$lines[0]}\n"; } } close IN; close SEC; close MEM; system("mv $secretory $membrane $sig"); my $total = $out + $on; print OUT "$str\t$total\t$out\t$on\n"; } close OUT; sub idseq { my ($fasta) = @_; my %hash; local $/ = ">"; open IN, $fasta || die; ; while () { chomp; my ($header, $seq) = split (/\n/, $_, 2); $header =~ /(\S+)/; my $id = $1; $hash{$id} = $seq; } close IN; return (%hash); }

运转方法:将tmhmm_parser.pl放在*_signalp的上一级目录下,*_signalp目录中必需包括*_TMHMM_SHORT.txt文件和*.sigseq文件。在终端运转如下代码:

perl tmhmm_parser.pl

脚本获取

本文脚本见GitHub

敬告:运用文中脚本请援用本文网址,请尊重自己的休息效果,谢谢!Notice: When you use the scripts in this article, please cite the link of this webpage. Thank you!

参考

原文链接:SignalP+TMHMM预测微生物分泌蛋白 | liaochenlanruo

转载请注明出处!

编辑于 2021-12-28 09:33
「真诚赞赏,手留余香」
还没有人赞赏,快来当*个赞赏的人吧!

SignalP+TMHMM预测微生物分泌蛋白 广微测是*威望的检测中心吗? 健明迪

如何检测纯真水设备能否有细菌繁殖呢

保证产出水质的洁净是纯真水设备消费的关键,但是有时分也会出现纯真水细菌繁殖的状况,那么纯真水设备如何检测能否有细菌繁殖呢?罕见的有三种方法:

  一、经典微生物培育法:微生物培育法的要素包括:培育基的类型、培育温度和培育时间。培育方法包括:烧注皿培育法、铺平皿法、膜过滤法。

  二、仪器法主要有:显微镜直接计数法、放射法、阻抗法以及多种生化方法。

1、优点是精度好,准确度高,可以在较短时间内取得检测结果, 有利于停止及时控制。

2、缺陷是需人工处置样品,任务量大,样品处置量小,易受仪器等其他方面的制约,并且仪器法对微生物是破坏性的,它无法对污染菌作进一步的分别和鉴别。

  三、惯例方法:微生物的鉴别是一项专业性很强的任务,需少量任务阅历及专业知识。

  掌握纯真水设备细菌检测方法,足以可以看出各种不利于设备产水规范的现象,检测出危机产水质量的污染细菌种类,保证用户可以及时处置效果,结合纯真水设备运转条件保证系统产水动摇、牢靠。

发布于 2022-11-13 15:33・IP 属地山东
浅笑倾城:掌管和参与国度、行业及中央规范的制修订50多项。 主要微生物检测范围有:食品微生物检测,化妆品微生物检测,饲料及宠物食品微生物检测,威望化妆品微生物检测,卫生用品微生物检测,饮用水微生物检测。5 !XHJUXW...

SignalP+TMHMM预测微生物分泌蛋白 广微测是*威望的检测中心吗? 健明迪

健明迪微生物:例磺胺、抗生素等对生物体外部被微生物感染的组织或病变细胞停止治疗,以杀死组织内的病原微生物或病变细胞,但对无机体无毒害作用的治疗措施。 来源:健明迪转载于食品微生物检测群众号
SignalP+TMHMM预测微生物分泌蛋白 广微测是最权威的检测中心吗? 健明迪
公司简介
健明迪检测提供的SignalP+TMHMM预测微生物分泌蛋白 广微测是最权威的检测中心吗? 健明迪,预测微生物分泌蛋白了尘兰若了尘兰若了尘兰若华中农业大学微生物学博士华中农业大学微生物学博士
我们的服务
行业解决方案
官方公众号
客服微信

为您推荐
奶嘴抗拉扯性能测试夹具儿童奶咀抗扯安全性测试 婴儿奶嘴出口美国需要做哪些检测认证  健明迪

奶嘴检测

微生物菌种检测 几招在家轻松自我检测幽门螺旋杆菌 健明迪

微生物检测

GB38995婴幼儿奶瓶和奶嘴国家标准检测项目 66款奶瓶&奶嘴测评:这9款有安全风险,直接列入黑名单! 健明迪

奶嘴检测

宏基因检测(NGS)在病源微生物检测中的应用 实验员分享微生物日常检验过程注意事项 健明迪

微生物检测