1

我目前正在尝试在 Kubeflow 上部署管道,但每次启动它时,它都会返回:

This step is in Failed state with this message: OCI runtime create failed: container_linux.go:345: starting container process caused "exec: \"python /usr/src/app/FeatureExtractor.py\": stat python /usr/src/app/FeatureExtractor.py: no such file or directory": unknown

这是我的管道:它目前在所有 fe-* 组件上都失败了,这些组件是运行其他组件所必需的。

管道

所述组件的镜像的 Dockerfile 是:

FROM python:2
COPY FeatureExtractor.py /usr/src/app/
COPY FE_freeze.pb /usr/src/app/
COPY DB /usr/src/app/
RUN pip install opencv-python==4.2.0.32
RUN pip install imutils
RUN pip install image
RUN pip install tensorflow==1.15

而管道是通过这个python函数创建的:

import kfp
from kfp import dsl

def feature_extractor(name, images, result):
    images = "--path_imgs={}".format(images)
    result = "--res_name={}".format(result)

    return dsl.ContainerOp(
        name=name,
        image='texdade/feature-extractor',
        command=['python /usr/src/app/FeatureExtractor.py'],
        arguments=['--pretrained_model="/usr/src/app/FE_freeze.pb"', images, result],
        file_outputs={
            'feature_vector':result,
        }
    )

def train_mv(name, primary, secondary, non_members, result):
    primary = "--members={}".format(primary)
    secondary = "--other_members={}".format(secondary)
    non_members = "--non_members={}".format(non_members)

    return dsl.ContainerOp(
        name=name,
        image='texdade/train-mv',
        command=['python /usr/src/app/Train-MV.py'],
        arguments=[primary, secondary, non_members, result],
        file_outputs={
            'model':result,
        }
    )

def test_mv(model_a, model_b, test_imgs):
    model_a = "--MV_A={}".format(model_a)
    model_b = "--MV_B={}".format(model_b)
    test_imgs = "--test_imgs={}".format(test_imgs)

    return dsl.ContainerOp(
        name="Test models",
        image="texdade/test-mv",
        command=['python /usr/src/app/Test-MV.py'],
        arguments=[model_a, model_b, test_imgs]
    )

@dsl.pipeline(
    name='First pipeline',
    description='FP'
)
def first_pipeline():
    FE_A = feature_extractor('FE members A', "/usr/src/app/DB/A/", "/usr/src/app/A.npz")
    FE_B = feature_extractor('FE members B', "/usr/src/app/DB/B/", "/usr/src/app/B.npz")
    FE_N = feature_extractor('FE Non members', "/usr/src/app/DB/N/", "/usr/src/app/N.npz")
    FE_Test = feature_extractor('FE Test dataset', "/usr/src/app/DB/Test", "/usr/src/app/Test.npz")
    train_a = train_mv("Train members A", FE_A.output, FE_B.output, FE_N.output, "/usr/src/app/A.pb")
    train_b = train_mv("Train members B", FE_B.output, FE_A.output, FE_N.output, "/usr/src/app/B.pb")
    test = test_mv(train_a.output, train_b.output, FE_Test.output)

if __name__ == '__main__':
    kfp.compiler.Compiler().compile(first_pipeline, __file__+ '.yaml')

问题似乎是在容器上找不到 FeatureExtractor.py,这似乎很奇怪,因为手动启动容器(没有 Kubeflow)使其执行。

您对如何解决此问题有任何想法吗?提前致谢!:)

4

0 回答 0