c - Ruby 的 Enumerable#zip 是否在内部创建数组？

Question

zip 的问题在于它在内部创建数组，无论您传递什么 Enumerable。输入参数的长度还有另一个问题

我在 YARV 中查看了 Enumerable#zip 的实现，并看到了

static VALUE
enum_zip(int argc, VALUE *argv, VALUE obj)
{
    int i;
    ID conv;
    NODE *memo;
    VALUE result = Qnil;
    VALUE args = rb_ary_new4(argc, argv);
    int allary = TRUE;

    argv = RARRAY_PTR(args);
    for (i=0; i<argc; i++) {
        VALUE ary = rb_check_array_type(argv[i]);
        if (NIL_P(ary)) {
            allary = FALSE;
            break;
        }
        argv[i] = ary;
    }
    if (!allary) {
        CONST_ID(conv, "to_enum");
        for (i=0; i<argc; i++) {
            argv[i] = rb_funcall(argv[i], conv, 1, ID2SYM(id_each));
        }
    }
    if (!rb_block_given_p()) {
        result = rb_ary_new();
    }
    /* use NODE_DOT2 as memo(v, v, -) */
    memo = rb_node_newnode(NODE_DOT2, result, args, 0);
    rb_block_call(obj, id_each, 0, 0, allary ? zip_ary : zip_i, (VALUE)memo);

    return result;
}

我是否正确理解了以下内容？

检查所有参数是否都是数组，如果是，则将一些对数组的间接引用替换为直接引用

    for (i=0; i<argc; i++) {
        VALUE ary = rb_check_array_type(argv[i]);
        if (NIL_P(ary)) {
            allary = FALSE;
            break;
        }
        argv[i] = ary;
    }

如果它们不是所有数组，请创建一个枚举器

    if (!allary) {
        CONST_ID(conv, "to_enum");
        for (i=0; i<argc; i++) {
            argv[i] = rb_funcall(argv[i], conv, 1, ID2SYM(id_each));
        }
    }

仅当未给出块时才创建数组数组

    if (!rb_block_given_p()) {
        result = rb_ary_new();
    }

如果一切都是数组，则使用zip_ary，否则使用zip_i，并在每组值上调用一个块

    /* use NODE_DOT2 as memo(v, v, -) */
    memo = rb_node_newnode(NODE_DOT2, result, args, 0);
    rb_block_call(obj, id_each, 0, 0, allary ? zip_ary : zip_i, (VALUE)memo);

如果没有给出块，则返回数组数组，否则返回 nil ( Qnil)？

    return result;
}

score 7 · Accepted Answer

我将使用 1.9.2-p0，因为这就是我手头的东西。

该rb_check_array_type函数如下所示：

VALUE
rb_check_array_type(VALUE ary)
{
    return rb_check_convert_type(ary, T_ARRAY, "Array", "to_ary");  
}

rb_check_convert_type看起来像这样：

VALUE
rb_check_convert_type(VALUE val, int type, const char *tname, const char *method)
{
    VALUE v;

    /* always convert T_DATA */
    if (TYPE(val) == type && type != T_DATA) return val;
    v = convert_type(val, tname, method, FALSE);
    if (NIL_P(v)) return Qnil;
    if (TYPE(v) != type) {
        const char *cname = rb_obj_classname(val);
        rb_raise(rb_eTypeError, "can't convert %s to %s (%s#%s gives %s)",
                 cname, tname, cname, method, rb_obj_classname(v));
    }
    return v;
}

注意convert_type通话。这看起来很像 C 版本，Array.try_convert并且try_convert恰好看起来像这样：

/*   
 *  call-seq:
 *     Array.try_convert(obj) -> array or nil
 *
 *  Try to convert <i>obj</i> into an array, using +to_ary+ method. 
 *  Returns converted array or +nil+ if <i>obj</i> cannot be converted
 *  for any reason. This method can be used to check if an argument is an
 *  array.
 *   
 *     Array.try_convert([1])   #=> [1]
 *     Array.try_convert("1")   #=> nil
 *
 *     if tmp = Array.try_convert(arg)
 *       # the argument is an array
 *     elsif tmp = String.try_convert(arg)
 *       # the argument is a string
 *     end
 *
 */
static VALUE
rb_ary_s_try_convert(VALUE dummy, VALUE ary)
{
    return rb_check_array_type(ary);
}

所以，是的，第一个循环正在寻找argv不是数组的任何东西，allary如果找到这样的东西，就设置标志。

在enum.c中，我们看到：

id_each = rb_intern("each");

Ruby迭代器方法id_each的内部引用也是如此。each在中vm_eval.c，我们有这个：

/*!  
 * Calls a method 
 * \param recv   receiver of the method
 * \param mid    an ID that represents the name of the method
 * \param n      the number of arguments
 * \param ...    arbitrary number of method arguments  
 *
 * \pre each of arguments after \a n must be a VALUE.
 */
VALUE
rb_funcall(VALUE recv, ID mid, int n, ...)

所以这：

argv[i] = rb_funcall(argv[i], conv, 1, ID2SYM(id_each));

正在调用to_enum（本质上是默认参数）在argv[i].

因此，第一个for和if块的最终结果argv是要么充满数组，要么充满枚举数，而不是可能是两者的混合。但请注意逻辑是如何工作的：如果发现不是数组的东西，那么一切都变成了枚举数。该enum_zip函数的第一部分会将数组包装在枚举器中（这基本上是免费的，或者至少便宜到不用担心），但不会将枚举器扩展为数组（这可能非常昂贵）。早期版本可能采用了另一种方式（更喜欢数组而不是枚举器），我将把它作为练习留给读者或历史学家。

下一部分：

if (!rb_block_given_p()) {
    result = rb_ary_new();
}

创建一个新的空数组并将其保留在resultifzip被调用时没有块。在这里我们应该注意什么zip返回：

enum.zip(arg, ...) → an_array_of_array
enum.zip(arg, ...) {|arr| block } → nil

如果有块，则没有可返回的内容，result可以保持为Qnil; 如果没有块，那么我们需要一个数组，result以便可以返回一个数组。

从parse.c，我们看到这NODE_DOT2是一个双点范围，但看起来他们只是将新节点用作简单的三元素结构；rb_new_node只是分配一个对象，设置一些位，并在一个结构中分配三个值：

NODE*
rb_node_newnode(enum node_type type, VALUE a0, VALUE a1, VALUE a2)
{
    NODE *n = (NODE*)rb_newobj();

    n->flags |= T_NODE;
    nd_set_type(n, type);

    n->u1.value = a0;
    n->u2.value = a1;
    n->u3.value = a2;

    return n;
}

nd_set_type只是有点摆弄宏。现在我们memo只有一个三元素结构。这种使用NODE_DOT2似乎是一种方便的组合。

该rb_block_call函数似乎是核心内部迭代器。我们又见到了我们的朋友id_each，所以我们将进行each迭代。然后我们看到和之间的zip_i选择zip_ary；这是创建内部数组并将其推送到result. zip_i和之间的唯一区别zip_ary似乎是zip_i.

此时我们已经完成了压缩，我们要么有数组数组result（如果没有块），要么我们有Qnil（result如果有块）。

执行摘要：第一个循环明确避免将枚举数扩展为数组。和调用仅适用于非临时数组，如果它们必须构建一个数组数组作为返回值zip_i。zip_ary因此，如果您zip使用至少一个非数组枚举器调用并使用块形式，那么它一直是枚举器，并且“zip 的问题是它在内部创建数组”不会发生。回顾 1.8 或其他 Ruby 实现留给读者作为练习。

c - Ruby 的 Enumerable#zip 是否在内部创建数组？

1 回答 1

Related

Reference