Skip to main content

AWS Orchestrator

Enterprise

Image$DATAOPS_AWS_RUNNER_IMAGE

The AWS orchestrator includes the following features:

  • The ability to interact with the AWS services using the built-in AWS CLI tools provides a wide range of features focused on infrastructure automation
  • DataOps Vault functionality that allows scripts to retrieve variables from the vault
  • DataOps native tools that allow the development of custom scripts that interact with AWS
  • The following additional tools:
    • git
    • curl
    • ssh-client
    • perl
    • sshpass
    • unzip
    • terraform

Usage

The first use case described here is typical for this orchestrator; that is, to start an EC2 instance to perform a task in the pipeline:

pipelines/includes/local_includes/aws_jobs/my_aws_job.yml
"My AWS Job":
extends:
- .should_run_ingestion
- .agent_tag
stage: "Batch Ingestion"
image: $DATAOPS_AWS_RUNNER_IMAGE
variables:
# use one of the following connection methods
# EC2 instance role inheritance - no variables to set / default
#

# or default vault expansion for from vault keys `AWS.DEFAULT.S3_KEY` and `AWS.DEFAULT.S3_SECRET`
SET_AWS_KEYS_TO_ENV: 1

# or custom vault expansion for access key / secret
AWS_ACCESS_KEY_ID: DATAOPS_VAULT(PATH.TO.ACCESS_KEY_ID.IN.VAULT)
AWS_SECRET_ACCESS_KEY: DATAOPS_VAULT(PATH.TO.SECRET_ACCESS_KEY.IN.VAULT)
script:
- /dataops
- aws ... # your AWS CLI command
icon: ${AWS_ICON}

Additionally, the following use cases demonstrate how to connect to AWS from a DataOps pipeline:

Connecting to AWS

To connect to AWS, the AWS orchestrator supports the following four methods:

1. Inheriting the IAM role from the DataOps runner's EC2 instance

No additional configuration is necessary for the AWS orchestrator to use the AWS EC2 instance IAM role on which the DataOps runner is deployed. This behavior is the default behavior and what we recommend.

2. AWS access key ID and secret access key

Instead, if you are required to use the AWS access key ID and secret access key, provide them to the DataOps pipeline by setting the environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY.

note

Additional environment variables may be required for your use case. For more information, refer to the AWS CLI documentation on environment variables.

3. Using the standard DataOps vault credentials

When you want to use the standard DataOps Vault credentials, the following details apply:

  • The vault path details are stored in AWS.DEFAULT.S3_KEY and AWS.DEFAULT.S3_SECRET
  • Use the pipeline variable SET_AWS_KEYS_TO_ENV to retrieve the values of these two keys from the vault
  • The scope of these exposed environment variables is limited to the entry point /dataops
  • All the scripts that run from the /dataops directory also fall within this scope
note

If you need access to AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY from outside /dataops, you must reimport them by retrieving them from the DATAOPS_SOURCE_FILE. For example:

  variables:
- SET_AWS_KEYS_TO_ENV: 1
- DATAOPS_SOURCE_FILE: $CI_PROJECT_DIR/env.sh
script:
- /dataops
- source $DATAOPS_SOURCE_FILE
...

4. Using custom DataOps vault credentials

When you want to use your custom DataOps Vault credentials, use the DATAOPS_VAULT() function to retrieve credentials stored in a different vault path. For example:

variables:
AWS_ACCESS_KEY_ID: DATAOPS_VAULT(PATH.TO.ACCESS_KEY_ID.IN.VAULT)
AWS_SECRET_ACCESS_KEY: DATAOPS_VAULT(PATH.TO.SECRET_ACCESS_KEY.IN.VAULT)

Troubleshooting

When using the secrets manager with a DataOps Runner deployed on AWS EC2 review the necessary Instance Metadata Service Version 2 (IMDSv2) configuration changes.

Supported parameters

ParameterRequired/DefaultDescription
SET_AWS_KEYS_TO_ENVOptionalIf set, export the AWS access key ID AWS_ACCESS_KEY_ID) and secret access key AWS_SECRET_ACCESS_KEY by retrieving it from the DataOps Vault keys AWS.DEFAULT.S3_KEY and AWS.DEFAULT.S3_SECRET respectively.

Example jobs

You can create scripts that wrap your AWS usage in your project repository like /scripts/myawsscript.sh. Then run the script from inside your job. For example:

pipelines/includes/local_includes/aws_jobs/my_aws_job.yml
"My AWS Job":
extends:
- .should_run_ingestion
- .agent_tag
stage: "Batch Ingestion"
image: $DATAOPS_AWS_RUNNER_IMAGE
variables:
script:
- /scripts/my_aws_script.sh
icon: ${AWS_ICON}

For single AWS interactions, you can call an AWS CLI command directly:

pipelines/includes/local_includes/aws_jobs/my_aws_cli_job.yml
"My AWS CLI job":
extends:
- .should_run_ingestion
- .agent_tag
stage: "Batch Ingestion"
image: $DATAOPS_AWS_RUNNER_IMAGE
variables:
script:
- aws s3 ls s3://mybucket/
icon: ${AWS_ICON}

If you need access to a vault and other DataOps features, include /dataops in your script tag. For example:

pipelines/includes/local_includes/aws_jobs/my_aws_job.yml
My AWS Job:
extends:
- .should_run_ingestion
- .agent_tag
stage: "Batch Ingestion"
image: $DATAOPS_AWS_RUNNER_IMAGE
variables:
script:
- /dataops
- scripts/myawsscript.sh
icon: ${AWS_ICON}

Lastly, use the following YAML file when you need to use the AWS Access Key ID and Secret Access Key to connect:

pipelines/includes/local_includes/aws_jobs/my_aws_job.yml
"My AWS Job":
extends:
- .should_run_ingestion
- .agent_tag
stage: "Batch Ingestion"
image: $DATAOPS_AWS_RUNNER_IMAGE
variables:
- SET_AWS_KEYS_TO_ENV: 1
- DATAOPS_SOURCE_FILE: $CI_PROJECT_DIR/env.sh
script:
- /dataops
- source $DATAOPS_SOURCE_FILE
- scripts/myawsscript.sh
icon: ${AWS_ICON}