1

我根据此处的说明为 Scibert for CPU 构建了一个 docker 映像: https ://www.semi.technology/developers/weaviate/current/modules/text2vec-transformers.html#option-3-custom-build-with-a-私有或本地模型

这里是码头文件:

FROM semitechnologies/transformers-inference:custom
RUN MODEL_NAME=allenai/scibert_scivocab_uncased ./download.py

这是用于 CPU 的 docker-compose.yaml:

version: '3.4'
services:
  weaviate:
    image: semitechnologies/weaviate:1.7.2
    restart: on-failure:0
    ports:
     - "8080:8080"
    environment:
      QUERY_DEFAULTS_LIMIT: 20
      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
      PERSISTENCE_DATA_PATH: "./data"
      DEFAULT_VECTORIZER_MODULE: text2vec-transformers
      ENABLE_MODULES: text2vec-transformers
      TRANSFORMERS_INFERENCE_API: http://t2v-transformers:8080
  t2v-transformers:
    image: scibert-inference
    environment:
      ENABLE_CUDA: 0

我可以在 CPU 上运行它而不会出错。现在,我需要使用这个 docker-compose.yaml 在 GPU 上运行它:

version: '3.4'
services:
  weaviate:
    image: semitechnologies/weaviate:1.10.1
    restart: on-failure:0
    ports:
     - "8080:8080"
    environment:
      QUERY_DEFAULTS_LIMIT: 20
      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
      PERSISTENCE_DATA_PATH: "./data"
      DEFAULT_VECTORIZER_MODULE: text2vec-transformers
      ENABLE_MODULES: text2vec-transformers
      TRANSFORMERS_INFERENCE_API: http://t2v-transformers:8080
  t2v-transformers:
    image: scibert-inference
    environment:
      ENABLE_CUDA: '1'
      NVIDIA_VISIBLE_DEVICES: 'all'
    deploy:
      resources:
        reservations:
          devices:
          - capabilities: 
            - 'gpu'

当我在 GPU 上运行它时,出现以下错误:

weaviate_1          | {"action":"transformer_remote_wait_for_startup","error":"send check ready request: Get \"http://t2v-transformers:8080/.well-known/ready\": dial tcp: lookup t2v-transformers on 127.0.0.11:53: no such host","level":"warning","msg":"transformer remote inference service not ready","time":"2022-02-08T03:47:54Z"}

我的系统规格如下:

Cuda compilation tools, release 9.0, V9.0.176
Client: Docker Engine - Community
 Version:           20.10.7
 API version:       1.41
 Go version:        go1.13.15
 Git commit:        f0df350
 Built:             Wed Jun  2 11:56:47 2021
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.7
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.13.15
  Git commit:       b0f5bc3
  Built:            Wed Jun  2 11:54:58 2021
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.4.6
  GitCommit:        d71fcd7d8303cbf684402823e425e9dd2e99285d
 runc:
  Version:          1.0.0-rc95
  GitCommit:        b9ee9c6314599f1b4a7f497e1f1f856fe433d3b7
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0
docker-compose version 1.29.2, build 5becea4c
4

1 回答 1

0

你设置CUDA_CORE环境变量了吗?Weaviate 使用这个环境变量来精确定位cuda device,如果CUDA_CORE 没有设置,那么它默认为cuda:0.

如果您使用 docker-compose with 运行此程序gpu capabilities,您可能应该正确定位您的 gpu 设备。因此,您必须将CUDA_CORE环境变量覆盖为例如这样的内容:CUDA_CORE=GPU:0就像docker 页面上的这个示例一样。

我不知道你的 gpu 设备地址是否真的是GPU:0cuda:0因为它没有启动所以不是),你可以在你的 docker-compose 日志中检查。

于 2022-02-11T09:07:46.150 回答