r - 使用 perl 脚本将选定的行转换为列

Question

我需要一个 Perl 脚本来连接该行..

我有 1000 多个基因名称 (>pmpI) 及其功能（多态性外膜蛋白）在单独的行中，我希望在基因名称附近加入基因的功能，以便将来轻松可视化并保存以供进一步参考。

例如：文件内容看起来像这样

>pmpG
 polymorphic outer membrane protein
>pmpH
 polymorphic outer membrane protein
>CTA_0953
 hypothetical protein
>pmpI
 polymorphic outer membrane protein

我尝试在 excel 中手动进行手动操作，但对于许多文件来说这是不可能的，所以我想从程序员那里获得帮助..

我需要 Perl 脚本来连接这些行

程序输出应该是这样的：

>pmpG      polymorphic outer membrane protein
>pmpH      polymorphic outer membrane protein
>CTA_0953  hypothetical protein
>pmpI      polymorphic outer membrane protein

score 3 · Accepted Answer

作为单行命令，这将是

perl -n -e 's/^\s+//; s/\s+$//; next unless $_ ne ""; if (/^[>]/) { $n = $_; } else { printf "%-11s%s\n", $n, $_; }' < data.txt

为了澄清起见，当放入 perl 程序时，它看起来像：

#!/usr/bin/perl

while (<>) {                            # iterate over all lines
    s/^\s+//;                           # remove whitespace at the beginning...
    s/\s+$//;                           # ...and the end of the line
    next unless $_ ne "";               # ignore empty lines
    if (/^[>]/) { $n = $_; }            # if line starts with >, remember it
    else { printf "%-11s%s\n", $n, $_;  # otherwise output the remembered 
}                                       # content and the current line

这接受您的内容作为输入，因此将使用perl program.pl < data.txt.

内容预计包含在data.txt; 将此修改为您的实际文件名。

score 0 · Accepted Answer

带有一些解释性评论...

#!/usr/bin/perl

use strict;
use warnings;
use 5.010;

# Store the current line
my $line;
while (<DATA>) {
  # Remove the newline
  chomp;
  # If the line starts with '>'
  if (/^>/) {
    # Output the current $line
    # (if we have one)
    say $line if $line;
    # Set $line to this line
    $line = $_;
  } else {
    # Append this line to $line
    $line .= "\t$_";
  }
}

# Output the current line
say $line;

__DATA__
>pmpG
 polymorphic outer membrane protein
>pmpH
 polymorphic outer membrane protein
>CTA_0953
 hypothetical protein
>pmpI
 polymorphic outer membrane protein

r - 使用 perl 脚本将选定的行转换为列

2 回答 2

Related

Reference