5

我正在使用 CDK 的 Python 包装器来创建胶水作业。该command属性需要一个类型为 的对象IResolvable | Job­Command­Property。我试图在JobCommandProperty此处放置一个对象,但出现异常。

我创建了一个JobCommandProperty对象。我在某处寻找一个.builder()函数(类似于 Java API),但找不到。

from aws_cdk import (
    aws_glue as glue,
    aws_iam as iam,
    core
)

class ScheduledGlueJob (core.Stack):

    def __init__(self, scope: core.Construct, id: str, **kwargs) -> None:
        super().__init__(scope, id, **kwargs)

        policy_statement = iam.PolicyStatement(
                actions=['logs:*','s3:*','ec2:*','iam:*','cloudwatch:*','dynamodb:*','glue:*']
            )

        policy_statement.add_all_resources()

        glue_job_role = iam.Role(
            self,
            'Glue-Job-Role',
            assumed_by=iam.ServicePrincipal('glue.amazonaws.com')
        ).add_to_policy(
            policy_statement
        )

        job = glue.CfnJob(
            self,
            'glue-test-job',
            role=glue_job_role,
            allocated_capacity=10,
            command=glue.CfnJob.JobCommandProperty(
                name='glueetl',
                script_location='s3://my-bucket/glue-scripts/job.scala'
            ))

错误信息是这样的:

$cdk synth
Traceback (most recent call last):
  File "app.py", line 30, in <module>
    glue_job = ScheduledGlueJob(app, 'Cronned-Glue-Job')
  File "/Users/d439087/IdeaProjects/ds/test_cdk/.env/lib/python3.7/site-packages/jsii/_runtime.py", line 66, in __call__
    inst = super().__call__(*args, **kwargs)
  File "/Users/d439087/IdeaProjects/ds/test_cdk/glue/scheduled_job.py", line 33, in __init__
    script_location='s3://my-bucket/glue-scripts/job.scala'
  File "/Users/d439087/IdeaProjects/ds/test_cdk/.env/lib/python3.7/site-packages/jsii/_runtime.py", line 66, in __call__
    inst = super().__call__(*args, **kwargs)
  File "/Users/d439087/IdeaProjects/ds/test_cdk/.env/lib/python3.7/site-packages/aws_cdk/aws_glue/__init__.py", line 2040, in __init__
    jsii.create(CfnJob, self, [scope, id, props])
  File "/Users/d439087/IdeaProjects/ds/test_cdk/.env/lib/python3.7/site-packages/jsii/_kernel/__init__.py", line 208, in create
    overrides=overrides,
  File "/Users/d439087/IdeaProjects/ds/test_cdk/.env/lib/python3.7/site-packages/jsii/_kernel/providers/process.py", line 331, in create
    return self._process.send(request, CreateResponse)
  File "/Users/d439087/IdeaProjects/ds/test_cdk/.env/lib/python3.7/site-packages/jsii/_kernel/providers/process.py", line 316, in send
    raise JSIIError(resp.error) from JavaScriptError(resp.stack)
jsii.errors.JSIIError: Expected 'string', got true (boolean)

也许有人有一个有效的 CDK (python) 示例来创建一个CfnJob对象?

4

3 回答 3

3

没关系,role属性必须是 type string,我被 JSII 错误消息弄糊涂了。

于 2019-08-22T14:42:55.443 回答
1

请注意 acrawler与 a 不同job,但我认为权限是相似的。截至 2020 年 8 月 16 日,这适用于爬虫(不幸的是,以前的答案都没有)

from aws_cdk import (
    aws_iam as iam,
    aws_glue as glue,
    core
)

class MyDataScienceStack(core.Stack):

    def __init__(self, scope: core.Construct, id: str, **kwargs) -> None:
        super().__init__(scope, id, **kwargs)

        statement = iam.PolicyStatement(actions=["s3:GetObject","s3:PutObject"],
                                        resources=["arn:aws:s3:::mybucketname",
                                                    "arn:aws:s3:::mybucketname/data_warehouse/units/*"])
        write_to_s3_policy = iam.PolicyDocument(statements=[statement])
        glue_role = iam.Role(
            self, 'GlueCrawlerFormyDataScienceRole',
            role_name = 'GlueCrawlerFormyDataScienceRole',
            inline_policies=[write_to_s3_policy],
            assumed_by=iam.ServicePrincipal('glue.amazonaws.com'),
            managed_policies=[iam.ManagedPolicy.from_aws_managed_policy_name('service-role/AWSGlueServiceRole')]
        )

        glue_crawler = glue.CfnCrawler(
            self, 'glue-crawler-id',
            description="Glue Crawler for my-data-science-s3",
            name='any name',
            database_name='units',
            schedule={"scheduleExpression": "cron(5 * * * ? *)"},
            role=glue_role.role_arn,
            targets={"s3Targets": [{"path": "s3://mybucketname/data_warehouse/units"}]}
        )
于 2020-08-16T08:21:19.103 回答
1

glue_job_role变量的类型不再是 Role,因为您已将 .add_to_policy 添加到它。下面的代码应该可以工作。

glue_job_role = iam.Role(
            self,
            'Glue-Job-Role',
            assumed_by=iam.ServicePrincipal('glue.amazonaws.com')
        )
glue_job_role.add_to_policy(
            policy_statement
        )
job = glue.CfnJob(
            self,
            'glue-test-job',
            role=glue_job_role.arn,
            allocated_capacity=10,
            command=glue.CfnJob.JobCommandProperty(
                name='glueetl',
                script_location='s3://my-bucket/glue-scripts/job.scala'
            ))
于 2019-10-11T05:45:11.530 回答