我目前正在尝试在 Kubeflow 上部署管道,但每次启动它时,它都会返回:
This step is in Failed state with this message: OCI runtime create failed: container_linux.go:345: starting container process caused "exec: \"python /usr/src/app/FeatureExtractor.py\": stat python /usr/src/app/FeatureExtractor.py: no such file or directory": unknown
这是我的管道:它目前在所有 fe-* 组件上都失败了,这些组件是运行其他组件所必需的。
所述组件的镜像的 Dockerfile 是:
FROM python:2
COPY FeatureExtractor.py /usr/src/app/
COPY FE_freeze.pb /usr/src/app/
COPY DB /usr/src/app/
RUN pip install opencv-python==4.2.0.32
RUN pip install imutils
RUN pip install image
RUN pip install tensorflow==1.15
而管道是通过这个python函数创建的:
import kfp
from kfp import dsl
def feature_extractor(name, images, result):
images = "--path_imgs={}".format(images)
result = "--res_name={}".format(result)
return dsl.ContainerOp(
name=name,
image='texdade/feature-extractor',
command=['python /usr/src/app/FeatureExtractor.py'],
arguments=['--pretrained_model="/usr/src/app/FE_freeze.pb"', images, result],
file_outputs={
'feature_vector':result,
}
)
def train_mv(name, primary, secondary, non_members, result):
primary = "--members={}".format(primary)
secondary = "--other_members={}".format(secondary)
non_members = "--non_members={}".format(non_members)
return dsl.ContainerOp(
name=name,
image='texdade/train-mv',
command=['python /usr/src/app/Train-MV.py'],
arguments=[primary, secondary, non_members, result],
file_outputs={
'model':result,
}
)
def test_mv(model_a, model_b, test_imgs):
model_a = "--MV_A={}".format(model_a)
model_b = "--MV_B={}".format(model_b)
test_imgs = "--test_imgs={}".format(test_imgs)
return dsl.ContainerOp(
name="Test models",
image="texdade/test-mv",
command=['python /usr/src/app/Test-MV.py'],
arguments=[model_a, model_b, test_imgs]
)
@dsl.pipeline(
name='First pipeline',
description='FP'
)
def first_pipeline():
FE_A = feature_extractor('FE members A', "/usr/src/app/DB/A/", "/usr/src/app/A.npz")
FE_B = feature_extractor('FE members B', "/usr/src/app/DB/B/", "/usr/src/app/B.npz")
FE_N = feature_extractor('FE Non members', "/usr/src/app/DB/N/", "/usr/src/app/N.npz")
FE_Test = feature_extractor('FE Test dataset', "/usr/src/app/DB/Test", "/usr/src/app/Test.npz")
train_a = train_mv("Train members A", FE_A.output, FE_B.output, FE_N.output, "/usr/src/app/A.pb")
train_b = train_mv("Train members B", FE_B.output, FE_A.output, FE_N.output, "/usr/src/app/B.pb")
test = test_mv(train_a.output, train_b.output, FE_Test.output)
if __name__ == '__main__':
kfp.compiler.Compiler().compile(first_pipeline, __file__+ '.yaml')
问题似乎是在容器上找不到 FeatureExtractor.py,这似乎很奇怪,因为手动启动容器(没有 Kubeflow)使其执行。
您对如何解决此问题有任何想法吗?提前致谢!:)