Skip to main content

R Orchestrator

Enterprise

Image$DATAOPS_R_RUNNER_IMAGE

The DataOps R orchestrator incorporates r-base version 4.2 to execute R scripts in a DataOps pipeline and includes several helpful R libraries, such as:

  • DBI
  • devtools
  • dplyr
  • odbc
  • pracma
  • plyr
  • RJDBC
  • syuzhet
  • tidyvers

Usage

You can create your own scripts in your repo e.g. /scripts/myscript.r and run this from a job. For example:

pipelines/includes/local_includes/r_jobs/my_r_job.yml
"My R Job":
extends:
- .agent_tag
stage: "My Stage"
image: $DATAOPS_R_RUNNER_IMAGE
variables:
script:
- R scripts/my-script.r
icon: ${R_ICON}

Supported parameters

DataOps does not provide standard parameters. However, you can use variables to provide your own customizations and as inputs to the R programs you execute.

Example jobs

Running a basic script

pipelines/includes/local_includes/r_jobs/my_r_job.yml
"My R Job":
extends:
- .agent_tag
stage: "Additional Configuration"
image: $DATAOPS_R_RUNNER_IMAGE
variables:
SCRIPT_PATH: $CI_PROJECT_DIR/scripts/my-script.r
script:
- R $SCRIPT_PATH
icon: ${R_ICON}

If you need access to the DataOps vault, include /dataops in your script tag. For example:

script:
- /dataops
- R /scripts/myscript.r

And your script will be able to read from and write to the vault.

Using the Snowflake JDBC connector

In this example, you'll see how to use the Snowflake JDBC connector within the R orchestration framework, specifically with the driver named snowflake-jdbc-3.jar following the new naming convention.

info

The R orchestrator ships with Java 21 as part of the underlying Ubuntu 24 OS. This change introduces new changes to the JDBC connector which might be backwards incompatible with the previous scripts. One such change is how the response from the Snowflake JDBC connector is returned.

To ensure compatibility with the new JDBC connector, you need to update your scripts to handle the new response format. This can be accomplished by adding a new session parameter to the connection parameters, JDBC_QUERY_RESULT_FORMAT=JSON.

# Set up the JDBC driver and connection string
jdbcDriver <- JDBC(driverClass="net.snowflake.client.jdbc.SnowflakeDriver", classPath="/JDBC/snowflake-jdbc-3.jar")
connect_string <- sprintf("jdbc:snowflake://%s.snowflakecomputing.com/?db=%s&JDBC_QUERY_RESULT_FORMAT=JSON", account, database)

# Establish the connection to Snowflake
jdbcConnection <- dbConnect(jdbcDriver, connect_string, user_id, pass_word)
JDBC_example.r
# Load the required libraries
library("RJDBC")
library(pracma)
library(plyr)

# Set Java options for memory allocation
options(java.parameters = "-Xmx4g")
options(scipen = 999)

# Fetch environment variables for Snowflake connection
user_id <- Sys.getenv('USERID')
pass_word <- Sys.getenv('USER_PWD')
account <- Sys.getenv('ACCOUNT')
database <- Sys.getenv('DATABASE')

# Define the Snowflake JDBC driver and classpath
jdbcDriver <- JDBC(driverClass = "net.snowflake.client.jdbc.SnowflakeDriver", classPath = "/JDBC/snowflake-jdbc-3.jar")

# Construct the connection string
connect_string <- sprintf("jdbc:snowflake://%s.snowflakecomputing.com/?db=%s&JDBC_QUERY_RESULT_FORMAT=JSON", account, database)

# Establish the JDBC connection to Snowflake
jdbcConnection <- dbConnect(jdbcDriver, connect_string, user_id, pass_word)

# Define the SQL query to retrieve data
sql_command <- "SELECT TOP 50 * FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.CUSTOMER"

# Execute the query and store the results
results.var <- dbGetQuery(jdbcConnection, sql_command)

# Print the retrieved results
print(results.var)

# Close the JDBC connection
dbDisconnect(jdbcConnection)

Here's the job configuration to use R script using DATAOPS_RUN_R_SCRIPT:

pipelines/includes/local_includes/r_jobs/my_r_job.yml
"R Orchestrator Script Execution":
extends:
- .agent_tag
stage: "Version and Setup Check"
image: $DATAOPS_R_RUNNER_IMAGE
variables:
DATAOPS_RUN_R_SCRIPT: $CI_PROJECT_DIR/scripts/JDBC_example.r
USERID: <SNOWFLAKE_USER>
USER_PWD: <SNOWFLAKE_PASSWORD>
ACCOUNT: <SNOWFLAKE_ACCOUNT>
ROLE: <SNOWFLAKE_ROLE>
WAREHOUSE: <SNOWFLAKE_WAREHOUSE>
script:
- /dataops