r - 按列连接两个矩阵并提取子矩阵

Question

我有两个矩阵（例如，A 和 B）。我想根据 A 的第一列的顺序提取 B 的列：

例如

矩阵A

name score
a 0.1
b 0.2
c 0.1
d 0.6

矩阵 B

a    d   b   c   g   h
0.1 0.2 0.3 0.4 0.6 0.2
0.2 0.1 0.4 0.7 0.1 0.1
...

我希望矩阵 B 最后看起来像这样

矩阵 B_modified

a    b   c   d
0.1 0.3 0.4 0.2
0.2 0.4 0.7 0.1

这可以在 perl 或 R 中完成吗？非常感谢提前

score 2 · Accepted Answer

如果您的数据源自 R 数据结构，那么将其导出并使用 Perl 解决此问题是不恰当的。但是，如果您的文本文件看起来像您所显示的数据，那么这里有一个适合您的 Perl 解决方案。

我已将输出拆分为空格。如有必要，可以非常简单地进行更改。

use strict;
use warnings;
use autodie;

sub read_file {
  my ($name) = @_;
  open my $fh, '<', $name;
  my @data = map [ split ], <$fh>;
  \@data;
}

my $matrix_a = read_file('MatrixA.txt');
my @fields = map $matrix_a->[$_][0], 1 .. $#$matrix_a;

my $matrix_b = read_file('MatrixB.txt');
my @headers = @{$matrix_b->[0]};
my @indices = map {
  my $label = $_;
  grep $headers[$_] eq $label, 0..$#headers
} @fields;

for my $row (0 .. $#$matrix_b) {
  print join('  ', map $matrix_b->[$row][$_], @indices), "\n";
}

输出

a  b  c  d
0.1  0.3  0.4  0.2
0.2  0.4  0.7  0.1

score 2 · Accepted Answer

我不知道你面临什么问题。这就是我的做法。

## get data as matrix
a <- read.table(header=TRUE, text="name score
a 0.1
b 0.2
c 0.1
d 0.6", stringsAsFactors=FALSE) # load directly as characters

b <- read.table(header=TRUE, text="a    d   b   c   g   h
0.1 0.2 0.3 0.4 0.6 0.2
0.2 0.1 0.4 0.7 0.1 0.1", stringsAsFactors=FALSE)

a <- as.matrix(a)
b <- as.matrix(b)

现在子集得到你的最终结果：

b[, a[, "name"]]
#        a   b   c   d
# [1,] 0.1 0.3 0.4 0.2
# [2,] 0.2 0.4 0.7 0.1

score 2 · Accepted Answer

错误：

[.data.frame(b, , a[, "name"]) : undefined columns selected

意味着您尝试获取未定义b但存在于中的列a$name。一种解决方案是使用intersectwith colnames(b)。这也会将因子转换为字符串，并且您会得到正确的顺序。

b[, intersect(a[, "name"],colnames(b))] ## the order is important here

例如，我用这个数据测试这个：

b <- read.table(text='
a    d   b   c
0.1 0.2 0.3 0.4
0.2 0.1 0.4 0.7',header=TRUE)

a <- read.table(text='name score
a 0.1
z 0.5
c 0.1
d 0.6',header=TRUE)

b[, intersect(a[, "name"],colnames(b))]


    a   c   d
1 0.1 0.4 0.2
2 0.2 0.7 0.1

r - 按列连接两个矩阵并提取子矩阵

3 回答 3

Related

Reference