tensorflow - 为什么在我使用 XLA_FLAGS 转储 ptx 和 llvm 时没有为矩阵乘法张量流操作生成 PTX？

Question

我正在尝试转储HLO, LLVM IR, 和PTX以下tensorflow 1.13.1代码：

import tensorflow as tf
import numpy as np

a = tf.placeholder(shape=(10,20), dtype=tf.float32)
b = tf.placeholder(shape=(20,10), dtype=tf.float32)
c = tf.placeholder(shape=(10,10), dtype=tf.float32)

jit_scope = tf.contrib.compiler.jit.experimental_jit_scope #using JIT compilation
with jit_scope():
    d = tf.matmul(a, b) + c

with tf.Session() as sess:
    print(sess.run(d, feed_dict={a:np.random.random((10,20)), b:np.random.random((20,10)), c:np.random.random((10,10))}))

我使用以下内容运行该程序XLA_FLAGS：

XLA_FLAGS="--xla_generate_hlo_text_to=./path1 --xla_dump_ir_to=./path2" python source.py

该./path1目录包括所有HLO通行证输出。但是目录中的.lland.ptx文件./path2不包含ptxorllvm IR代码。该.ptx文件是空的，ll文件只有以下几行：

; ModuleID = 'cluster_0__XlaCompiledKernel_true__XlaNumConstantArgs_0__XlaNumResourceArgs_0_.12'
source_filename = "cluster_0__XlaCompiledKernel_true__XlaNumConstantArgs_0__XlaNumResourceArgs_0_.12"
target datalayout = "e-i64:64-i128:128-v16:16-v32:32-n16:32:64"
target triple = "nvptx64-nvidia-cuda"

我的问题是为什么我看不到ptx上述 tensorflow 代码的代码？

我应该提一下，如果我在d = d + d上面的 python 代码之后添加该行a * b + c，会XLA生成一些PTX代码，这些代码只是加法运算，不包括矩阵乘法代码。

tensorflow - 为什么在我使用 XLA_FLAGS 转储 ptx 和 llvm 时没有为矩阵乘法张量流操作生成 PTX？

0 回答 0

Related

Reference