AWS Orchestrator
Enterprise
Image | $DATAOPS_AWS_RUNNER_IMAGE |
---|
The AWS orchestrator includes the following features:
- The ability to interact with the AWS services using the built-in AWS CLI tools provides a wide range of features focused on infrastructure automation
- DataOps Vault functionality that allows scripts to retrieve variables from the vault
- DataOps native tools that allow the development of custom scripts that interact with AWS
- The following additional tools:
- git
- curl
- ssh-client
- perl
- sshpass
- unzip
- terraform
Usage
The first use case described here is typical for this orchestrator; that is, to start an EC2 instance to perform a task in the pipeline:
"My AWS Job":
extends:
- .should_run_ingestion
- .agent_tag
stage: "Batch Ingestion"
image: $DATAOPS_AWS_RUNNER_IMAGE
variables:
# use one of the following connection methods
# EC2 instance role inheritance - no variables to set / default
#
# or default vault expansion for from vault keys `AWS.DEFAULT.S3_KEY` and `AWS.DEFAULT.S3_SECRET`
SET_AWS_KEYS_TO_ENV: 1
# or custom vault expansion for access key / secret
AWS_ACCESS_KEY_ID: DATAOPS_VAULT(PATH.TO.ACCESS_KEY_ID.IN.VAULT)
AWS_SECRET_ACCESS_KEY: DATAOPS_VAULT(PATH.TO.SECRET_ACCESS_KEY.IN.VAULT)
script:
- /dataops
- aws ... # your AWS CLI command
icon: ${AWS_ICON}
Additionally, the following use cases demonstrate how to connect to AWS from a DataOps pipeline:
Connecting to AWS
To connect to AWS, the AWS orchestrator supports the following four methods:
1. Inheriting the IAM role from the DataOps runner's EC2 instance
No additional configuration is necessary for the AWS orchestrator to use the AWS EC2 instance IAM role on which the DataOps runner is deployed. This behavior is the default behavior and what we recommend.
2. AWS access key ID and secret access key
Instead, if you are required to use the AWS access key ID and secret access key, provide them to the DataOps pipeline by setting the environment variables AWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY
.
Additional environment variables may be required for your use case. For more information, refer to the AWS CLI documentation on environment variables.
3. Using the standard DataOps vault credentials
When you want to use the standard DataOps Vault credentials, the following details apply:
- The vault path details are stored in
AWS.DEFAULT.S3_KEY
andAWS.DEFAULT.S3_SECRET
- Use the pipeline variable
SET_AWS_KEYS_TO_ENV
to retrieve the values of these two keys from the vault - The scope of these exposed environment variables is limited to the entry point
/dataops
- All the scripts that run from the
/dataops
directory also fall within this scope
If you need access to AWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY
from outside /dataops
, you must reimport them by retrieving them from the DATAOPS_SOURCE_FILE
. For example:
variables:
- SET_AWS_KEYS_TO_ENV: 1
- DATAOPS_SOURCE_FILE: $CI_PROJECT_DIR/env.sh
script:
- /dataops
- source $DATAOPS_SOURCE_FILE
...
4. Using custom DataOps vault credentials
When you want to use your custom DataOps Vault credentials, use the DATAOPS_VAULT()
function to retrieve credentials stored in a different vault path. For example:
variables:
AWS_ACCESS_KEY_ID: DATAOPS_VAULT(PATH.TO.ACCESS_KEY_ID.IN.VAULT)
AWS_SECRET_ACCESS_KEY: DATAOPS_VAULT(PATH.TO.SECRET_ACCESS_KEY.IN.VAULT)
Troubleshooting
When using the secrets manager with a DataOps Runner deployed on AWS EC2 review the necessary Instance Metadata Service Version 2 (IMDSv2) configuration changes.
Supported parameters
Parameter | Required/Default | Description |
---|---|---|
SET_AWS_KEYS_TO_ENV | Optional | If set, export the AWS access key ID AWS_ACCESS_KEY_ID ) and secret access key AWS_SECRET_ACCESS_KEY by retrieving it from the DataOps Vault keys AWS.DEFAULT.S3_KEY and AWS.DEFAULT.S3_SECRET respectively. |
Example jobs
You can create scripts that wrap your AWS usage in your project repository like /scripts/myawsscript.sh
. Then run the script from inside your job. For example:
"My AWS Job":
extends:
- .should_run_ingestion
- .agent_tag
stage: "Batch Ingestion"
image: $DATAOPS_AWS_RUNNER_IMAGE
variables:
script:
- /scripts/my_aws_script.sh
icon: ${AWS_ICON}
For single AWS interactions, you can call an AWS CLI command directly:
"My AWS CLI job":
extends:
- .should_run_ingestion
- .agent_tag
stage: "Batch Ingestion"
image: $DATAOPS_AWS_RUNNER_IMAGE
variables:
script:
- aws s3 ls s3://mybucket/
icon: ${AWS_ICON}
If you need access to a vault and other DataOps features, include /dataops
in your script tag. For example:
My AWS Job:
extends:
- .should_run_ingestion
- .agent_tag
stage: "Batch Ingestion"
image: $DATAOPS_AWS_RUNNER_IMAGE
variables:
script:
- /dataops
- scripts/myawsscript.sh
icon: ${AWS_ICON}
Lastly, use the following YAML file when you need to use the AWS Access Key ID and Secret Access Key to connect:
"My AWS Job":
extends:
- .should_run_ingestion
- .agent_tag
stage: "Batch Ingestion"
image: $DATAOPS_AWS_RUNNER_IMAGE
variables:
- SET_AWS_KEYS_TO_ENV: 1
- DATAOPS_SOURCE_FILE: $CI_PROJECT_DIR/env.sh
script:
- /dataops
- source $DATAOPS_SOURCE_FILE
- scripts/myawsscript.sh
icon: ${AWS_ICON}