1

我按照本指南在 Google Colab TPU 上启动了我的 PyTorch Lightning 项目。所以我安装了

!pip install cloud-tpu-client==0.10 https://storage.googleapis.com/tpu-pytorch/wheels/torch_xla-1.9-cp37-cp37m-linux_x86_64.whl

然后

 !pip install pytorch-lightning

然后我

!pip install torch torchvision torchaudio 
!pip install -r requirements.txt

安装项目要求后,我按要求重新启动运行时,并从上面重新运行 cloud-TPU-client 安装、pytorch-lightning 安装和这两个命令。它运行得很顺利。

但就在 TPU 开始使用 PyTorch 1.9 版之后,我收到以下导入错误:

WARNING:root:TPU has started up successfully with version pytorch-1.9
        Traceback (most recent call last):
          File "synthesizer_train.py", line 2, in <module>
            from synthesizer.train import train
          File "/content/Real-Time-Voice-Cloning/synthesizer/train.py", line 6, in <module>
            from synthesizer.models.tacotron import Tacotron
          File "/content/Real-Time-Voice-Cloning/synthesizer/models/tacotron.py", line 7, in <module>
            import pytorch_lightning as pl
          File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/__init__.py", line 20, in <module>
            from pytorch_lightning.callbacks import Callback  # noqa: E402
          File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/callbacks/__init__.py", line 14, in <module>
            from pytorch_lightning.callbacks.base import Callback
          File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/callbacks/base.py", line 26, in <module>
            from pytorch_lightning.utilities.types import STEP_OUTPUT
          File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/utilities/__init__.py", line 18, in <module>
            from pytorch_lightning.utilities.apply_func import move_data_to_device  # noqa: F401
          File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/utilities/apply_func.py", line 26, in <module>
            from pytorch_lightning.utilities.imports import _compare_version, _TORCHTEXT_AVAILABLE
          File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/utilities/imports.py", line 101, in <module>
            from pytorch_lightning.utilities.xla_device import XLADeviceUtils  # noqa: E402
          File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/utilities/xla_device.py", line 24, in <module>
            import torch_xla.core.xla_model as xm
          File "/usr/local/lib/python3.7/dist-packages/torch_xla/__init__.py", line 142, in <module>
            import _XLAC
        ImportError: /usr/local/lib/python3.7/dist-packages/_XLAC.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZN2at13_foreach_erf_EN3c108ArrayRefINS_6TensorEEE

Trainer与国旗一起发射TPU_cores=8

该模型事先已在 CPU 和 GPU 上运行(即在另一个会话上)。

我试图将 PyTorch 降级到 1.9(与 TPU 启动时显示的相同),因为 Colab 使用了 torch 1.10.0+cu111 并且出现了不同的错误:

WARNING:root:TPU has started up successfully with version pytorch-1.9
Traceback (most recent call last):
  File "synthesizer_train.py", line 2, in <module>
    from synthesizer.train import train
  File "/content/Real-Time-Voice-Cloning/synthesizer/train.py", line 6, in <module>
    from synthesizer.models.tacotron import Tacotron
  File "/content/Real-Time-Voice-Cloning/synthesizer/models/tacotron.py", line 7, in <module>
    import pytorch_lightning as pl
  File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/__init__.py", line 20, in <module>
    from pytorch_lightning.callbacks import Callback  # noqa: E402
  File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/callbacks/__init__.py", line 14, in <module>
    from pytorch_lightning.callbacks.base import Callback
  File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/callbacks/base.py", line 26, in <module>
    from pytorch_lightning.utilities.types import STEP_OUTPUT
  File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/utilities/__init__.py", line 18, in <module>
    from pytorch_lightning.utilities.apply_func import move_data_to_device  # noqa: F401
  File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/utilities/apply_func.py", line 29, in <module>
    if _compare_version("torchtext", operator.ge, "0.9.0"):
  File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/utilities/imports.py", line 54, in _compare_version
    pkg = importlib.import_module(package)
  File "/usr/lib/python3.7/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "/usr/local/lib/python3.7/dist-packages/torchtext/__init__.py", line 5, in <module>
    from . import vocab
  File "/usr/local/lib/python3.7/dist-packages/torchtext/vocab/__init__.py", line 11, in <module>
    from .vocab_factory import (
  File "/usr/local/lib/python3.7/dist-packages/torchtext/vocab/vocab_factory.py", line 4, in <module>
    from torchtext._torchtext import (
ImportError: /usr/local/lib/python3.7/dist-packages/torchtext/_torchtext.so: undefined symbol: _ZTVN5torch3jit6MethodE

我可以做些什么来在 TPU 上训练模型吗?

非常感谢

4

1 回答 1

1

实际上,同样的问题也已被描述,建议的解决方案确实对我有用。

因此,在详细信息中,他们建议在安装 torch_xla 后将 PyTorch 降级为1.9.0+cu111(注意+cu111)。

因此,以下是我使用 TPU 在 Google Colab 上启动我的 Lightning 项目所遵循的步骤:

!pip install cloud-tpu-client==0.10 https://storage.googleapis.com/tpu-pytorch/wheels/torch_xla-1.9-cp37-cp37m-linux_x86_64.whl
!pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchtext==0.10.0 -f https://download.pytorch.org/whl/cu111/torch_stable.html

然后是项目的 pip :

!pip install torch torchvision torchaudio pytorch-lightning
!pip install -r requirements.txt

即使在最后一步之后,它仍然有效,我不得不重新启动运行时。

于 2021-11-29T13:01:48.863 回答