1

我有一个 Python 3.6 应用程序,它使用scikit-learn部署到 IBM Cloud (Cloud Foundry)。它工作正常。我的本地开发环境是 Mac OS High Sierra。

最近,我向应用程序添加了 IBM Cloud Object Storage 功能 ( ibm_boto3)。COS 功能本身运行良好。我可以使用该ibm_boto3库很好地上传、下载、列出和删除对象。

奇怪的是,scikit-learn现在使用的应用程序部分冻结了。

如果我注释掉 ibm_boto3import语句(和相应的代码),则scikit-learn代码可以正常工作。

更令人困惑的是,这个问题只发生在运行 OS X 的本地开发机器上。当应用程序部署到 IBM Cloud 时,它运行良好——两者scikit-learn并排运行良好ibm_boto3

在这一点上,我们唯一的假设是该ibm_boto3库以某种方式显示了一个已知问题scikit-learn(请参阅这个numpy——在 OS X 上使用 Accelerator时,K-means 算法的并行版本被破坏)。请注意,我们只有在添加ibm_boto3到项目后才会遇到这个问题。

但是,在部署到 IBM Cloud 之前,我们需要能够在 localhost 上进行测试。ibm_boto3Mac OS 之间和Mac OS 上是否存在任何已知的兼容性问题scikit-learn

关于我们如何在开发机器上避免这种情况的任何建议?

干杯。

4

1 回答 1

2

到目前为止,还没有任何已知的兼容性问题。:)

在某些时候,OSX 附带的 vanilla SSL 库存在一些问题,但如果您能够读取和写入数据,那不是问题。

您在使用HMAC 凭据吗?如果是这样,我很好奇如果您使用原始boto3库而不是 IBM 分支,行为是否会继续。

下面是一个简单的示例,展示了如何使用pandas原始示例boto3

import boto3  # package used to connect to IBM COS using the S3 API
import io  # python package used to stream data
import pandas as pd  # lightweight data analysis package

access_key = '<access key>'
secret_key = '<secret key>'
pub_endpoint = 'https://s3-api.us-geo.objectstorage.softlayer.net'
pvt_endpoint = 'https://s3-api.us-geo.objectstorage.service.networklayer.com'
bucket = 'demo'  # the bucket holding the objects being worked on.
object_key = 'demo-data'  # the name of the data object being analyzed.
result_key = 'demo-data-results'  # the name of the output data object.


# First, we need to open a session and create a client that can connect to IBM COS.
# This client needs to know where to connect, the credentials to use,
# and what signature protocol to use for authentication. The endpoint
# can be specified to be public or private.
cos = boto3.client('s3', endpoint_url=pub_endpoint,
                   aws_access_key_id=access_key,
                   aws_secret_access_key=secret_key,
                   region_name='us',
                   config=boto3.session.Config(signature_version='s3v4'))

# Since we've already uploaded the dataset to be worked on into cloud storage,
# now we just need to identify which object we want to use. This creates a JSON
# representation of request's response headers.
obj = cos.get_object(Bucket=bucket, Key=object_key)

# Now, because this is all REST API based, the actual contents of the file are
# transported in the request body, so we need to identify where to find the
# data stream containing the actual CSV file we want to analyze.
data = obj['Body'].read()

# Now we can read that data stream into a pandas dataframe.
df = pd.read_csv(io.BytesIO(data))

# This is just a trivial example, but we'll take that dataframe and just
# create a JSON document that contains the mean values for each column.
output = df.mean(axis=0, numeric_only=True).to_json()

# Now we can write that JSON file to COS as a new object in the same bucket.
cos.put_object(Bucket=bucket, Key=result_key, Body=output)
于 2018-05-07T13:19:08.333 回答