Skip to main content

R Orchestrator

Enterprise

Image$DATAOPS_R_RUNNER_IMAGE

The R orchestrator incorporates r-base version 4.2 to execute R scripts in a DataOps pipeline and includes several helpful R libraries, such as:

  • DBI
  • devtools
  • dplyr
  • odbc
  • pracma
  • plyr
  • RJDBC
  • syuzhet
  • tidyvers

Usage

You can create your own scripts in your repo e.g. /scripts/myscript.r and run this from a job. For example:

pipelines/includes/local_includes/r_jobs/my_r_job.yml
"My R Job":
extends:
- .agent_tag
stage: "My Stage"
image: $DATAOPS_R_RUNNER_IMAGE
variables:
script:
- R my-script.r
icon: ${R_ICON}

Supported parameters

None

Example jobs

Running a basic script

pipelines/includes/local_includes/r_jobs/my_r_job.yml
"My R Job":
extends:
- .agent_tag
stage: "Additional Configuration"
image: $DATAOPS_R_RUNNER_IMAGE
variables:
SCRIPT_PATH: $CI_PROJECT_DIR/scripts/my-script.r
script:
- R $SCRIPT_PATH
icon: ${R_ICON}

If you need access to the DataOps vault, include /dataops in your script tag. For example:

script:
- /dataops
- R /scripts/myscript.r

And your script will be able to read from and write to the vault.

Using the Snowflake JDBC connector

In this example, you'll see how to use the Snowflake JDBC connector within the R orchestration framework, specifically with the driver named snowflake-jdbc-3.jar following the new naming convention.

JDBC_example.r
# Load the required libraries
library("RJDBC")
library(pracma)
library(plyr)

# Set Java options for memory allocation
options(java.parameters = "-Xmx4g")
options(scipen = 999)

# Fetch environment variables for Snowflake connection
user_id <- Sys.getenv('USERID')
pass_word <- Sys.getenv('USER_PWD')
account <- Sys.getenv('ACCOUNT')
database <- Sys.getenv('DATABASE')

# Define the Snowflake JDBC driver and classpath
jdbcDriver <- JDBC(driverClass = "net.snowflake.client.jdbc.SnowflakeDriver", classPath = "/JDBC/snowflake-jdbc-3.jar")

# Construct the connection string
connect_string <- sprintf("jdbc:snowflake://%s.snowflakecomputing.com/?db=%s", account, database)

# Establish the JDBC connection to Snowflake
jdbcConnection <- dbConnect(jdbcDriver, connect_string, user_id, pass_word)

# Define the SQL query to retrieve data
sql_command <- "SELECT TOP 50 * FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.CUSTOMER"

# Execute the query and store the results
results.var <- dbGetQuery(jdbcConnection, sql_command)

# Print the retrieved results
print(results.var)

# Close the JDBC connection
dbDisconnect(jdbcConnection)

Here's the job configuration to use R script using DATAOPS_RUN_R_SCRIPT:

pipelines/includes/local_includes/r_jobs/my_r_job.yml
"r-orchestrator Script Execution":
extends:
- .agent_tag
stage: "Version and Setup Check"
image: $DATAOPS_R_RUNNER_IMAGE
variables:
DATAOPS_RUN_R_SCRIPT: $CI_PROJECT_DIR/scripts/JDBC_example.r
USERID: <SNOWFLAKE_USER>
USER_PWD: <SNOWFLAKE_PASSWORD>
ACCOUNT: <SNOWFLAKE_ACCOUNT>
ROLE: <SNOWFLAKE_ROLE>
WAREHOUSE: <SNOWFLAKE_WAREHOUSE>
script:
- /dataops