python - Python中的R expand.grid() 函数

Question

是否有类似于 R 中的 expand.grid() 函数的 Python 函数？提前致谢。

（编辑）下面是这个 R 函数的描述和一个例子。

Create a Data Frame from All Combinations of Factors

Description:

     Create a data frame from all combinations of the supplied vectors
     or factors.  

> x <- 1:3
> y <- 1:3
> expand.grid(x,y)
  Var1 Var2
1    1    1
2    2    1
3    3    1
4    1    2
5    2    2
6    3    2
7    1    3
8    2    3
9    3    3

（EDIT2）以下是 rpy 包的示例。我想获得相同的输出对象但不使用 R ：

>>> from rpy import *
>>> a = [1,2,3]
>>> b = [5,7,9]
>>> r.assign("a",a)
[1, 2, 3]
>>> r.assign("b",b)
[5, 7, 9]
>>> r("expand.grid(a,b)")
{'Var1': [1, 2, 3, 1, 2, 3, 1, 2, 3], 'Var2': [5, 5, 5, 7, 7, 7, 9, 9, 9]}

编辑 02/09/2012：我真的迷失了 Python。Lev Levitsky 在他的回答中给出的代码对我不起作用：

>>> a = [1,2,3]
>>> b = [5,7,9]
>>> expandgrid(a, b)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 2, in expandgrid
NameError: global name 'itertools' is not defined

但是似乎安装了 itertools 模块（键入from itertools import *不返回任何错误消息）

score 45 · Accepted Answer

只需使用列表推导：

>>> [(x, y) for x in range(5) for y in range(5)]

[(0, 0), (0, 1), (0, 2), (0, 3), (0, 4), (1, 0), (1, 1), (1, 2), (1, 3), (1, 4), (2, 0), (2, 1), (2, 2), (2, 3), (2, 4), (3, 0), (3, 1), (3, 2), (3, 3), (3, 4), (4, 0), (4, 1), (4, 2), (4, 3), (4, 4)]

如果需要，转换为 numpy 数组：

>>> import numpy as np
>>> x = np.array([(x, y) for x in range(5) for y in range(5)])
>>> x.shape
(25, 2)

我已经测试了高达 10000 x 10000，python 的性能与 R 中的 expand.grid 相当。使用元组 (x, y) 比在理解中使用列表 [x, y] 快大约 40%。

或者...

使用 np.meshgrid 大约快 3 倍，并且内存占用少得多。

%timeit np.array(np.meshgrid(range(10000), range(10000))).reshape(2, 100000000).T
1 loops, best of 3: 736 ms per loop

在 R 中：

> system.time(expand.grid(1:10000, 1:10000))
   user  system elapsed 
  1.991   0.416   2.424

请记住，R 具有基于 1 的数组，而 Python 具有基于 0 的数组。

score 27 · Accepted Answer

productfromitertools是您解决方案的关键。它产生输入的笛卡尔积。

from itertools import product

def expand_grid(dictionary):
   return pd.DataFrame([row for row in product(*dictionary.values())], 
                       columns=dictionary.keys())

dictionary = {'color': ['red', 'green', 'blue'], 
              'vehicle': ['car', 'van', 'truck'], 
              'cylinders': [6, 8]}

>>> expand_grid(dictionary)
    color  cylinders vehicle
0     red          6     car
1     red          6     van
2     red          6   truck
3     red          8     car
4     red          8     van
5     red          8   truck
6   green          6     car
7   green          6     van
8   green          6   truck
9   green          8     car
10  green          8     van
11  green          8   truck
12   blue          6     car
13   blue          6     van
14   blue          6   truck
15   blue          8     car
16   blue          8     van
17   blue          8   truck

score 19 · Accepted Answer

这是一个示例，它提供类似于您需要的输出：

import itertools
def expandgrid(*itrs):
   product = list(itertools.product(*itrs))
   return {'Var{}'.format(i+1):[x[i] for x in product] for i in range(len(itrs))}

>>> a = [1,2,3]
>>> b = [5,7,9]
>>> expandgrid(a, b)
{'Var1': [1, 1, 1, 2, 2, 2, 3, 3, 3], 'Var2': [5, 7, 9, 5, 7, 9, 5, 7, 9]}

不同之处在于itertools.product 最右边的元素在每次迭代中都会前进。product如果它很重要，您可以通过巧妙地对列表进行排序来调整该功能。

score 19 · Accepted Answer

pandas 文档定义了一个函数expand_grid：

def expand_grid(data_dict):
    """Create a dataframe from every combination of given values."""
    rows = itertools.product(*data_dict.values())
    return pd.DataFrame.from_records(rows, columns=data_dict.keys())

要使此代码正常工作，您将需要以下两个导入：

import itertools
import pandas as pd

输出是 a pandas.DataFrame，它是 Python 中与 R 最具可比性的对象data.frame。

score 18 · Accepted Answer

我想知道这个问题已经有一段时间了，到目前为止我对提出的解决方案并不满意，所以我想出了自己的解决方案，它相当简单（但可能更慢）。该函数使用 numpy.meshgrid 制作网格，然后将网格展平为一维数组并将它们放在一起：

def expand_grid(x, y):
    xG, yG = np.meshgrid(x, y) # create the actual grid
    xG = xG.flatten() # make the grid 1d
    yG = yG.flatten() # same
    return pd.DataFrame({'x':xG, 'y':yG}) # return a dataframe

例如：

import numpy as np
import pandas as pd

p, q = np.linspace(1, 10, 10), np.linspace(1, 10, 10)

def expand_grid(x, y):
    xG, yG = np.meshgrid(x, y) # create the actual grid
    xG = xG.flatten() # make the grid 1d
    yG = yG.flatten() # same
    return pd.DataFrame({'x':xG, 'y':yG})

print expand_grid(p, q).head(n = 20)

我知道这是一篇旧帖子，但我想我会分享我的简单版本！

score 9 · Accepted Answer

从上述解决方案中，我做到了

import itertools
import pandas as pd

a = [1,2,3]
b = [4,5,6]
ab = list(itertools.product(a,b))
abdf = pd.DataFrame(ab,columns=("a","b"))

以下是输出

score 5 · Accepted Answer

Scikit 中的 ParameterGrid 函数与 expand_grid（来自 R）的作用相同。例子：

from sklearn.model_selection import ParameterGrid
param_grid = {'a': [1,2,3], 'b': [5,7,9]}
expanded_grid = ParameterGrid(param_grid)

您可以访问将其转换为列表的内容：

list(expanded_grid))

输出：

[{'a': 1, 'b': 5},
 {'a': 1, 'b': 7},
 {'a': 1, 'b': 9},
 {'a': 2, 'b': 5},
 {'a': 2, 'b': 7},
 {'a': 2, 'b': 9},
 {'a': 3, 'b': 5},
 {'a': 3, 'b': 7},
 {'a': 3, 'b': 9}]

通过索引访问元素

list(expanded_grid)[1]

你会得到这样的东西：

{'a': 1, 'b': 7}

只需添加一些用法...您可以使用上面打印的字典列表传递给带有 **kwargs 的函数。例子：

def f(a,b): return((a+b, a-b))
list(map(lambda x: f(**x), list(expanded_grid)))

输出：

[(6, -4),
 (8, -6),
 (10, -8),
 (7, -3),
 (9, -5),
 (11, -7),
 (8, -2),
 (10, -4),
 (12, -6)]

score 4 · Accepted Answer

这是另一个返回 pandas.DataFrame 的版本：

import itertools as it
import pandas as pd

def expand_grid(*args, **kwargs):
    columns = []
    lst = []
    if args:
        columns += xrange(len(args))
        lst += args
    if kwargs:
        columns += kwargs.iterkeys()
        lst += kwargs.itervalues()
    return pd.DataFrame(list(it.product(*lst)), columns=columns)

print expand_grid([0,1], [1,2,3])
print expand_grid(a=[0,1], b=[1,2,3])
print expand_grid([0,1], b=[1,2,3])

score 4 · Accepted Answer

pyjanitor可以说expand_grid()是最自然的解决方案，尤其是如果您来自 R 背景。

用法是将others参数设置为字典。字典中的项目可以有不同的长度和类型。返回值是一个 pandas DataFrame。

import janitor as jn

jn.expand_grid(others = {
    'x': range(0, 4),
    'y': ['a', 'b', 'c'],
    'z': [False, True]
})

score 0 · Accepted Answer

你试过product吗itertools？在我看来，比其中一些方法更容易使用（pandasand除外meshgrid）。请记住，此设置实际上将迭代器中的所有项目拉入列表，然后将其转换为，ndarray因此请注意更高维度或删除np.asarray(list(combs))更高维度的网格，除非您想用完内存，然后您可以参考特定组合的迭代器。我强烈推荐meshgrid这个：

#Generate square grid from axis
from itertools import product
import numpy as np
a=np.array(list(range(3)))+1 # axis with offset for 0 base index to 1
points=product(a,repeat=2) #only allow repeats for (i,j), (j,i) pairs with i!=j
np.asarray(list(points))   #convert to ndarray

我从中得到以下输出：

array([[1, 1],
   [1, 2],
   [1, 3],
   [2, 1],
   [2, 2],
   [2, 3],
   [3, 1],
   [3, 2],
   [3, 3]])

score 0 · Accepted Answer

这是任意数量的异构列类型的解决方案。它基于numpy.meshgrid. Thomas Browne 的答案适用于同质列类型。Nate 的答案适用于两列。

import pandas as pd
import numpy as np

def expand_grid(*xi, columns=None):
    """Expand 1-D arrays xi into a pd.DataFrame
    where each row is a unique combination of the xi.
    
    Args:
        x1, ..., xn (array_like): 1D-arrays to expand.
        columns (list, optional): Column names for the output
            DataFrame.
    
    Returns:
        Given vectors `x1, ..., xn` with lengths `Ni = len(xi)`
        a pd.DataFrame of shape (prod(Ni), n) where rows are:
        x1[0], x2[0], ..., xn-1[0], xn[0]
        x1[1], x2[0], ..., xn-1[0], xn[0]
        ...
        x1[N1 -1], x2[0], ..., xn-1[0], xn[0]
        x1[0], x2[1], ..., xn-1[0], xn[0]
        x1[1], x2[1], ..., xn-1[0], xn[0]
        ...
        x1[N1 - 1], x2[N2 - 1], ..., xn-1[Nn-1 - 1], xn[Nn - 1]
    """
    if columns is None:
        columns = pd.RangeIndex(0, len(xi))
    elif columns is not None and len(columns) != len(xi):
        raise ValueError(
            " ".join(["Expecting", str(len(xi)), "columns but", 
                str(len(columns)), "provided instead."])
        )
    return pd.DataFrame({
        coln: arr.flatten() for coln, arr in zip(columns, np.meshgrid(*xi))
    })

python - Python中的R expand.grid() 函数

11 回答 11

Related

Reference