-1

我正在尝试使用 tensorboardX 来调试在 AWS 的 p2.xlarge 实例中运行的 pytorch NN。

我按照本教程打开了端口 6006。

该模型正在运行,并且 tensorboardX 正在制作其编写器文件。我在那里收到以下警告。我不确定它有多相关。

警告:root:tuple 出现在不转发元组的操作中(VisitNode at /pytorch/torch/csrc/jit/passes/lower_tuples.cpp:117)帧 #0:std::function::operator()() const + 0x11(/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/lib/libc10.so 中的 0x7fbe3dd04441)帧 #1:c10::Error::Error(c10::SourceLocation, std::string const&) + 0x2a (0x7fbe3dd03d7a in /home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/lib/libc10.so) 帧#2: + 0xaf61f5 (0x7fbe3cdc41f5 in / home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/lib/libtorch.so.1) 框架#3: + 0xaf6464 (0x7fbe3cdc4464 in /home/ubuntu/anaconda3/envs/pytorch_p36/ lib/python3.6/site-packages/torch/lib/libtorch.so.1) 帧#4:torch::jit::LowerAllTuples(std::shared_ptr&) + 0x13 (0x7fbe3cdc44a3 in /home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/lib/libtorch.so.1) 帧#5: + 0x3f84b4 (0x7fbe7d2cb4b4 in /home/ubuntu /anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/lib/libtorch_python.so)框架#6:+ 0x130cfc(/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6中的0x7fbe7d003cfc /site-packages/torch/lib/libtorch_python.so) 框架 #40: __libc_start_main + 0xf0 (0x7fbe8d69c830 in /lib/x86_64-linux-gnu/libc.so.6)6/site-packages/torch/lib/libtorch_python.so) 框架 #40: __libc_start_main + 0xf0 (0x7fbe8d69c830 in /lib/x86_64-linux-gnu/libc.so.6)6/site-packages/torch/lib/libtorch_python.so) 框架 #40: __libc_start_main + 0xf0 (0x7fbe8d69c830 in /lib/x86_64-linux-gnu/libc.so.6)

警告:root:tuple 出现在不转发元组的操作中(VisitNode at /pytorch/torch/csrc/jit/passes/lower_tuples.cpp:117)帧 #0:std::function::operator()() const + 0x11(/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/lib/libc10.so 中的 0x7fbe3dd04441)帧 #1:c10::Error::Error(c10::SourceLocation, std::string const&) + 0x2a (0x7fbe3dd03d7a in /home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/lib/libc10.so) 帧#2: + 0xaf61f5 (0x7fbe3cdc41f5 in / home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/lib/libtorch.so.1) 框架#3: + 0xaf6464 (0x7fbe3cdc4464 in /home/ubuntu/anaconda3/envs/pytorch_p36/ lib/python3.6/site-packages/torch/lib/libtorch.so.1) 帧#4:torch::jit::LowerAllTuples(std::shared_ptr&) + 0x13 (0x7fbe3cdc44a3 in /home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/lib/libtorch.so.1) 帧#5: + 0x3f84b4 (0x7fbe7d2cb4b4 in /home/ubuntu /anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/lib/libtorch_python.so)框架#6:+ 0x130cfc(/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6中的0x7fbe7d003cfc /site-packages/torch/lib/libtorch_python.so) 框架 #40: __libc_start_main + 0xf0 (0x7fbe8d69c830 in /lib/x86_64-linux-gnu/libc.so.6)6/site-packages/torch/lib/libtorch_python.so) 框架 #40: __libc_start_main + 0xf0 (0x7fbe8d69c830 in /lib/x86_64-linux-gnu/libc.so.6)6/site-packages/torch/lib/libtorch_python.so) 框架 #40: __libc_start_main + 0xf0 (0x7fbe8d69c830 in /lib/x86_64-linux-gnu/libc.so.6)

问题是我无权访问 tensorboard 浏览器用户界面。我采取以下步骤:

$ cd PATH_TO_FOLDER_CONTAINING_runs
$ source activate pytorch_p36
$ tensorboard --logdir=runs

我在哪里收到错误消息:

分段错误(核心转储)

当我检查系统日志时var/log/syslog,我看到以下内容:

Jun 26 09:06:40 ip-172-xx-xx-xxx kernel: [515315.598917] tensorboard[1446]: segfault at 0 ip (null) sp 00007ffd64c5f178 error 14 in python2.7 [55d8673d1000+1000]

我的谷歌搜索技能还远远不够。在 ASW 实例中运行时,如何通过浏览器访问 tensorboard?

如果有不清楚的地方或缺少某些信息,请告诉我。

4

1 回答 1

0

尽管代码必须在环境 pytorch_p36 中运行,但 tensorboard 实际上必须在不同的环境中运行。

终端中的命令序列应该是:

$ cd PATH_TO_FOLDER_CONTAINING_runs
$ source activate tensorflow_p27
$ tensorboard --logdir=runs

然后指定端口打开。

于 2019-06-27T14:29:35.913 回答