python - ibm_boto3 在 Mac OS 上与 scikit-learn 的兼容性问题

Question

我有一个 Python 3.6 应用程序，它使用scikit-learn部署到 IBM Cloud (Cloud Foundry)。它工作正常。我的本地开发环境是 Mac OS High Sierra。

最近，我向应用程序添加了 IBM Cloud Object Storage 功能 ( ibm_boto3)。COS 功能本身运行良好。我可以使用该ibm_boto3库很好地上传、下载、列出和删除对象。

奇怪的是，scikit-learn现在使用的应用程序部分冻结了。

如果我注释掉 ibm_boto3import语句（和相应的代码），则scikit-learn代码可以正常工作。

更令人困惑的是，这个问题只发生在运行 OS X 的本地开发机器上。当应用程序部署到 IBM Cloud 时，它运行良好——两者scikit-learn并排运行良好ibm_boto3。

在这一点上，我们唯一的假设是该ibm_boto3库以某种方式显示了一个已知问题scikit-learn（请参阅这个numpy——在 OS X 上使用 Accelerator时，K-means 算法的并行版本被破坏）。请注意，我们只有在添加ibm_boto3到项目后才会遇到这个问题。

但是，在部署到 IBM Cloud 之前，我们需要能够在 localhost 上进行测试。ibm_boto3Mac OS 之间和Mac OS 上是否存在任何已知的兼容性问题scikit-learn？

关于我们如何在开发机器上避免这种情况的任何建议？

干杯。

score 2 · Accepted Answer

到目前为止，还没有任何已知的兼容性问题。:)

在某些时候，OSX 附带的 vanilla SSL 库存在一些问题，但如果您能够读取和写入数据，那不是问题。

您在使用HMAC 凭据吗？如果是这样，我很好奇如果您使用原始boto3库而不是 IBM 分支，行为是否会继续。

下面是一个简单的示例，展示了如何使用pandas原始示例boto3：

import boto3  # package used to connect to IBM COS using the S3 API
import io  # python package used to stream data
import pandas as pd  # lightweight data analysis package

access_key = '<access key>'
secret_key = '<secret key>'
pub_endpoint = 'https://s3-api.us-geo.objectstorage.softlayer.net'
pvt_endpoint = 'https://s3-api.us-geo.objectstorage.service.networklayer.com'
bucket = 'demo'  # the bucket holding the objects being worked on.
object_key = 'demo-data'  # the name of the data object being analyzed.
result_key = 'demo-data-results'  # the name of the output data object.


# First, we need to open a session and create a client that can connect to IBM COS.
# This client needs to know where to connect, the credentials to use,
# and what signature protocol to use for authentication. The endpoint
# can be specified to be public or private.
cos = boto3.client('s3', endpoint_url=pub_endpoint,
                   aws_access_key_id=access_key,
                   aws_secret_access_key=secret_key,
                   region_name='us',
                   config=boto3.session.Config(signature_version='s3v4'))

# Since we've already uploaded the dataset to be worked on into cloud storage,
# now we just need to identify which object we want to use. This creates a JSON
# representation of request's response headers.
obj = cos.get_object(Bucket=bucket, Key=object_key)

# Now, because this is all REST API based, the actual contents of the file are
# transported in the request body, so we need to identify where to find the
# data stream containing the actual CSV file we want to analyze.
data = obj['Body'].read()

# Now we can read that data stream into a pandas dataframe.
df = pd.read_csv(io.BytesIO(data))

# This is just a trivial example, but we'll take that dataframe and just
# create a JSON document that contains the mean values for each column.
output = df.mean(axis=0, numeric_only=True).to_json()

# Now we can write that JSON file to COS as a new object in the same bucket.
cos.put_object(Bucket=bucket, Key=result_key, Body=output)

python - ibm_boto3 在 Mac OS 上与 scikit-learn 的兼容性问题

1 回答 1

Related

Reference