建设领结索引失败（tophat2，bowtie2）(Building Bowtie index failure (tophat2, bowtie2))

2025-07-14 05:30:05

建设领结索引失败（tophat2，bowtie2）(Building Bowtie index failure (tophat2, bowtie2)) （注意：标签应该是tophat2和bowtie2，但我没有创建新标签的要点）问候：我使用Tophat2（命令行）来分析RA-seq数据，并且遇到一些错误。这是电话： tophat2 -o t

（注意：标签应该是tophat2和bowtie2，但我没有创建新标签的要点）

问候：我使用Tophat2（命令行）来分析RA-seq数据，并且遇到一些错误。

这是电话：

tophat2 -o tophat2_results/ -G ref_data/BA000007.2.gtf --transcriptome-index=transcriptome_data/RA_LBG01b_241_filteredQ indices/BA000007.2 data_files/RA_LBG01b_241_filteredQ.fastq

这是错误：

[2015-12-29 12:58:] Checking for Bowtie Bowtie version: 2.2.4.0 [2015-12-29 12:58:] Checking for Bowtie index files (genome).. [2015-12-29 12:58:] Checking for reference FASTA file [2015-12-29 12:58:] Generating SAM header for indices/BA000007.2 [2015-12-29 12:58:] Reading known juncti from GTF file Warning: TopHat did not find any juncti in GTF file [2015-12-29 12:58:] Preparing reads left reads: min. length=12, max. length=42, 20272 kept reads (115 discarded) Warning: short reads (<20bp) will make TopHat quite slow and take large amount of memory because they are likely to be mapped in too many places [2015-12-29 12:58:9] Building transcriptome data files transcriptome_data/RA_LBG01b_241_filteredQ [2015-12-29 12:58:40] Building Bowtie index from RA_LBG01b_241_filteredQ.fa [FAILED] Error: Couldn't build bowtie index with err = 1

版本信息： TopHat v2.1.0 Bowtie2版本2.2.4 Python 2.7.10 :: Anaconda 2.4.0（64位）

系统信息： CentOS版本6.7

我是如何到达这里的，我尝试了什么：

我使用大肠杆菌（登录号：BA000007.2）作为我的参考基因组，可在此处到： http ://gov/nuccore/BA000007.2

我从Ensemble获得了我的GTF文件（ ftp:///pub/release-29/bacteria//gtf/bacteria_9_collection/escherichia_coli_o157_h7_str_sakai/ ）

我使用bowtie2-build（在tophat2调用之前）制作了我的索引，

bowtie2-build -f ref_data/BA000007.2.fasta indices/BA000007.2

我知道我收到的错误与* .gtf文件第一列中出现的不同名称以及参考fasta文件的名称有关。如果我理解正确的话，第1列中的每个条目都应该是BA000007.2，其中第1列中的大部分名称都是“染体”。为了解决这个问题，我做了以下工作：

awk '{FS=OFS="\t"}{print "BA000007.2", $2, $, $4, $5, $6, $7, $8, $9}' pathToGTF/BA000007.2_ensemble.gtf > pathToGTF/BA000007.2.gtf

＃请注意，在合并gtf文件开头的注释构建信息（例如，＃！genome-build ASM80120v1）会导致awk命令产生不需要的输出

我还将fasta文件的终止从* .fasta更改为* .fa

问题：

我是否正确地解决了由于gtf文件的第1列和fasta文件的名称（BA000007.2，BA000007.2.fa）之间的命名差异而引起的任何问题？

当我在日志目录中浏览输出时，会出现几个错误（ftf_juncs.log中的和类似错误），其中的行以下列行开头：

警告：行的无效起始坐标：BA000007.2 ena gene -194 2502。 +。 gene_id“BAA1757”; gene_version“1”; gene_name“tagA”; gene_source“ena”; gene_biotype“protein_coding”;

gtf文件确实有负数，但genbank文件中没有（在vim中快速搜索）。这可能是错误的根源吗？我注释掉了特定的行并将它们从文件中删除 - 这两种方法仍然会导致错误。

是否有任何容易看到的可能导致“ 无法构建错误= 1的领结索引”错误 ？

我一直坚持这一两天，所以任何帮助，不胜感激。

(ote: tags should be tophat2 and bowtie2 but I do not have the points to create new tags)

Greetings: I am using Tophat2 (command line) to analyze RA-seq data and I am encountering some errors.

Here is the call:

tophat2 -o tophat2_results/ -G ref_data/BA000007.2.gtf --transcriptome-index=transcriptome_data/RA_LBG01b_241_filteredQ indices/BA000007.2 data_files/RA_LBG01b_241_filteredQ.fastq

Here is the error:

Version Information: TopHat v2.1.0 Bowtie2 version 2.2.4 Python 2.7.10 :: Anaconda 2.4.0 (64-bit)

System Information: CentOS Release 6.7

How I got here and what have I tried:

I am using E. coli (Accession: BA000007.2) for my reference genome which can be found here: http://gov/nuccore/BA000007.2

I obtained my GTF file from Ensemble (ftp:///pub/release-29/bacteria//gtf/bacteria_9_collection/escherichia_coli_o157_h7_str_sakai/)

I made my indices using bowtie2-build (before tophat2 call)

bowtie2-build -f ref_data/BA000007.2.fasta indices/BA000007.2

I am aware that the error I am receiving is affiliated with different names appearing in the first column in the *.gtf file and the name of the reference fasta file. If I understand this correctly, every entry in the 1st column should be BA000007.2 where most of the names in the 1st column where "Chromosome". To fix this, I did the following:

awk '{FS=OFS="\t"}{print "BA000007.2", $2, $, $4, $5, $6, $7, $8, $9}' pathToGTF/BA000007.2_ensemble.gtf > pathToGTF/BA000007.2.gtf

#Please note the commented build information (e.g., #!genome-build ASM80120v1) at the beginning of ensemble gtf file would create undesirable output from the awk command has been addressed

I also changed the termination of the fasta file from *.fasta to *.fa

Questi:

Did I properly put the kibosh on any problems arising from differences in naming between the 1st column of the gtf file and the name of the fasta file (BA000007.2, BA000007.2.fa)?

When I peruse output in the logs directory, there are several errors ( & similar errors in ftf_juncs.log) with lines beginning with:

Warning: invalid start coordinate at line: BA000007.2 ena gene -194 2502 . + . gene_id "BAA1757"; gene_version "1"; gene_name "tagA"; gene_source "ena"; gene_biotype "protein_coding";

There are indeed negative numbers in the gtf files, but not in the genbank file (quick search in vim). Could this be the source of the error? I commented out the specific lines and deleted them from the file -- both approaches still result in the error.

Is there anything readily seen that could be causing the "Couldn't build bowtie index with err = 1" error?

I have been stuck on this for a couple of days so any help is greatly appreciated.

最满意答案

我到了问题的根源。它是引用fasta文件中的头文件。最初的标题是：

>gi|4711801|dbj|BA000007.2| Escherichia coli O157:H7 str. Sakai DA, complete genome

应该在哪里

>BA000007

所以...如果fasta文件被称为abc12.fa，那么fasta文件中的标题必须> abc12。 gtf文件中的第一列也必须是abc12。

请注意，我在所有通话中将基数从BA000007.2更改为BA000007，并且我重命名了名称中没有.2的所有文件。它可能仍然适用于.2，但我没有测试它（“ basename是任何索引文件的名称，但不包括第一个句号。 ”[tophat manual]）（谢谢AM）。最后，我将fasta文件从* .fasta更名为* .fa。

I found the source of the problem. It was the header in the referential fasta file. The initial header was:

>gi|4711801|dbj|BA000007.2| Escherichia coli O157:H7 str. Sakai DA, complete genome

Where is should have been

>BA000007

So...if the fasta file is called abc12.fa, then the header in the fasta file must be >abc12. The first column in the gtf file must also be abc12.

Please note that I changed the base from BA000007.2 to BA000007 in all of my calls, and I renamed all files without the .2 in the name. It may still work with the .2, but I did not test it out ("The basename is the name of any of the index files up to but not including the first period." [tophat manual]) (Thank you AM). Lastly, I renamed in fasta files from *.fasta to *.fa.

#感谢您对电脑配置推荐网 - 最新i3 i5 i7组装电脑配置单推荐报价格的认可，转载请说明来源于"电脑配置推荐网 - 最新i3 i5 i7组装电脑配置单推荐报价格

本文地址：http://www.dnpztj.cn/diannao/61877.html

本站网友新生儿破伤风	15分钟前发表
] Reading known juncti from GTF file Warning
本站网友镇江贷款	18分钟前发表
] Preparing reads left reads
本站网友尿疗法	23分钟前发表
short reads (<20bp) will make TopHat quite slow and take large amount of memory because they are likely to be mapped in too many places [2015-12-29 12
本站网友 44533	17分钟前发表
awk '{FS=OFS="\t"}{print "BA000007.2"

建设领结索引失败（tophat2，bowtie2）(Building Bowtie index failure (tophat2, bowtie2))

最满意答案

安卓像素和苹果像素哪个好

仿iphone的IPod

求一个仿iphone的锁屏软件

苹果像素和安卓像素的区别