“dna-sequence”的相关标签问题

0 投票

1 回答

1254 浏览

loops - Python用for循环计算核苷酸

我正在尝试从输入文件中获取 DNA 序列，并使用循环计算单个 A 的 T、C 和 G 的数量，如果有非“ATCG”字母我需要打印“错误”例如我的输入文件是：

Seq1 AAAGCGT Seq2 aa tGcGt t Seq3 af GtgA cCTg

我想出的代码是：

所以我需要每一行的总数，但我的代码给了我整个文件的 A 和 T 等的总数。我希望我的输出类似于

Seq1: Total A's: 3 Total C's: 每个序列以此类推。

关于我可以做些什么来修复我的代码以实现这一目标的任何想法？

2013-04-01T04:14:01.447

0 投票

3 回答

1489 浏览

python - 查找序列中不匹配的 DNA 条形码

我有这样的 36-nt 读取：atcttgttcaatggccgatcXXXXgtcgacaatcaa在 fastq 文件中，XXXX 是不同的条形码。我想在文件中的确切位置（21 到 24）搜索条形码并打印序列中最多 3 个不匹配的序列而不是条形码。

例如：我有条码：aacg 在 fastq 文件中搜索位置 21 到 24 之间的条码，允许序列中有 3 个不匹配，例如：

我试图首先使用 awk 找到独特的线条并寻找不匹配的地方，但查找和找到它们对我来说非常乏味。

有什么快速的方法可以找到吗？

谢谢你。

python awk sed dna-sequence fastq

2013-04-17T19:09:23.133

0 投票

1 回答

1960 浏览

bioinformatics - ORF and amino identification using BioPython's translate() method-- incorrect translations?

I am trying to teach myself bioinformatics, arriving to the party by way of computer science and high performance computing. (Essentially, I'm trying to learn the biology.) I've recently discovered BioPython and so far think it's great, but I am curious if anybody could help me identify why the translate() method used in BioPython to convert sequence data to ORF candidates and amino protein chains is behaving differently than expected.

The following is from this year's DNA60 challenge, and it's to find all of the ORF's in a sequence and sort them, convert them to amino chains, and then take the 25th amino acid from the longest top 15 chains to spell out a phrase.

Here's the challenge: http://genomebiology.com/about/update/DNA60_TAGCGAC

So after doing some research, I settled on using the code directly out of the tutorial for finding and identifying ORF's using the translate method, found here:

http://www.bio-cloud.info/Biopython/en/ch16.html

Modifying it to print out the 25th amino acide for each chain, and sorting the output by chain length (using the linux command line tool "sort"), the output is entirely wrong.

Knowing what the answer was supposed to be, I could not figure out why this wasn't working. So I wrote my own script to do the ORF identification and and translation, sorted the output, and it worked! (Using NCBI table 1, min length of 25.)

So somehow, the ORF identification in the translate method is not working the way I think it should, and I was hoping somebody could tell me why. Below is my code for ORF identification in Python (and you pass in the reverse_complement for the second set of three frames)

Pretty straightforward. Here's the rest of it:

Do this once for each strand (initial and reverse_complement) then if you sort the output using the following command to give you the 15 longest:

the output is:

This is the correct phrase. The output using the straight up translate is:

You can see that the lengths don't even match up. What gives here? Am I missing something?

bioinformatics biopython dna-sequence ncbi

2013-04-29T14:53:38.237

0 投票

3 回答

5224 浏览

algorithm - 如何匹配 dna 序列模式

我很难找到解决这个问题的方法。

输入输出序列如下：

输入 nsequence 可以是 10^6 个字符，并且将考虑最大的连续模式。

例如，对于 input2，“agctaagcta”输出将不是“agcta2gcta”，而是“agcta2”。

任何帮助表示赞赏。

algorithm sequence matching dna-sequence

2013-06-01T09:16:15.153

0 投票

1 回答

160 浏览

python - 反复访问 LARGE fasta 文件。最高效的方法？

我正在使用 Biopython 打开一个大的单条目 fasta 文件（514 兆碱基），这样我就可以从特定坐标中提取 DNA 序列。返回序列的速度相当慢，我只是想知道是否有更快的方法来执行我还没有想到的这项任务。速度不会只是一两次点击的问题，但我正在遍历 145,000 个坐标的列表，这需要几天时间：/

python performance biopython fasta dna-sequence

2013-06-10T04:39:37.077

0 投票

0 回答

1652 浏览

r - readFASTA VS read.fasta

我是 R 新手，遇到 biostrings 函数 readFASTA 的问题。最新版本的 biostringsR 3.0.1是不同的。以前的版本似乎正在完成这项工作。因此我开始使用read.fastaseqin R 中的函数。我正在从 fasta 文件中读取 DNA 序列。

readFASTA阅读顺序如下

现在read.fasta正在将序列读取为

使用$以前版本中的运算符，我可以拆分序列和基因，但使用当前版本我不知道该怎么做。无论如何我可以获得与从readFASTA.

r bioinformatics dna-sequence

2013-06-18T20:24:54.547

0 投票

1 回答

44 浏览

string - 从序列中提取基因

我在 R 中有一个对象，它保存了基因和序列的列表。

我只想将 id 存储在一个对象中。例如，我将如何在 R 中实现它

我对 R 非常陌生

string r dna-sequence

2013-06-18T22:12:34.087

0 投票

1 回答

301 浏览

c - 用 typedef 和 enum 表示 C 中的 DNA 字母表

我正在编写一个处理基因序列的程序，我想将每个核苷酸存储在一个字节中，其中每个位代表遗传字母表中的一个字母A,C,G,T（显然只有一半的位会被使用）。

我的编码如下：

这里，R是嘌呤，可以代表Aor G，Y是嘧啶 ( Cor T)，并且N可以代表任何一个字母。

typedef在 C 中使用and定义这种格式的最佳方法是什么enum？我想定义一种类型，允许我按名称将字母分配给变量，例如

编辑：感谢您的输入。我绝对有理由不想要字符串，但感谢您的建议。确实，从逻辑上讲，N应该是，但对于我的应用程序来说，将其表示为以上都不是0b1111更有意义。

请注意，我确实知道如何完成这项工作，但我不会经常打破 ol' C，我宁愿寻找最优雅的解决方案。我想如果我想保留NUL字节，那么我可以在0b10000任何地方添加到我的代码中。

我认为两种可能的方法是一个enum或一些#define宏。然而，枚举是int，我需要一个char，那么宏是更好的解决方案吗？

c dna-sequence genetics

2013-07-28T13:21:28.360

0 投票

3 回答

70 浏览

javascript - 如何将变量分成3个数组？

所以这应该做的，减去 mouseenter/mouseleave 函数，是接受用户输入，将它分成一个数组，每个数组位置有 3 个字母（例如，用户输入 abcdef... 将变成 abc，def ，...）。我读了一篇关于堆栈溢出的不同帖子，（如何在 javascript 中将字符串拆分为某些字符号？）。但是，我不能完全让它在我的以下代码中起作用。

这是我的 script.js：

javascript jquery arrays dna-sequence

2013-08-11T21:30:50.047

0 投票

1 回答

69 浏览

php - 如何在特定索引处设置字符串中的字符样式？

您能否就如何在特定索引的字符串中设置一个字符的样式给我一些指导？这个字符串的索引来自一个数组，在某些情况下数组是空的，所以如果数组不为空，我只需要设置字符串中的字符样式

那么如何在索引 74 和 266 处添加一个环绕字符的跨度，以便我可以给它一个不同的样式？

我的数据来自数据库，所以我需要使其动态化。

谢谢

php string highlighting dna-sequence

2013-08-15T16:34:58.753

问题标签 [dna-sequence]

Reference