Security and Compliance
Overall DataOps architecture
Broadly speaking, there are three components to the overall DataOps solution, as indicated in this image:
These components are discussed in detail in the following points:
DataOps runner and orchestrators
The DataOps Runner is a long-running process (usually in a long-running container) installed on customer infrastructure (on-prem or private cloud). At DataOps Runner installation time, a registration token from the DataOps.live data product platform is provided, and this is used to create and exchange authentication keys between the DataOps Runner and the DataOps Platform.
These keys authenticate the DataOps Runner when connecting to DataOps. The registration token determines the scope for which DataOps projects' jobs a DataOps Runner can handle. For example:
- If a DataOps project registration token is used, the DataOps Runner will only be allowed to run jobs for that specific project.
- If a DataOps group registration token is provided, then all projects in the group, and all subgroups, will be able to run jobs served by the specific DataOps Runner.
You can install any number of DataOps Runners. The number of runners usually reflects the different data centers or locations where data needs to be orchestrated.
Each DataOps Runner can then execute any number of DataOps Orchestrators that provide individual functionality to ELT data.
Data product platform
DataOps.live is a distributed, secure data product platform. It is only available via secure transport protocols. The HTTP port is open but only redirects to HTTPS. It serves:
- UI Access via HTTPS with a 72-hour session length
- RESTFul API/Web Service via HTTPS
- Git access over HTTPS or SSH
DataOps.live also handles core functions such as:
- Git repository storage
- DataOps pipeline running and scheduling
- DataOps project and group administration
- Historical storage of pipeline and job history plus created artifacts
Snowflake and other data consumers
Snowflake is a very well-understood and documented technology. See Snowflake.com for more information on this technology.
Snowflake is where all the organization's data is encrypted and stored. Data consumers cover any tools that access data from Snowflake, including BI tools such as Tableau, Power BI, Looker, ThoughtSpot, and libraries for more direct access through Python, Go, and Java.
Network connectivity
- Runner Connection
- Registry Connection
- Orchestrator Connection
- Local Connection
- Indirect Snowflake Connection
- Direct Snowflake Connection
- SaaS Connection
- Consumer Connection
- User Connection
- TLS protocol
- static-ip-addresses
- Authentication credentials and secret management
Runner connection
Direction of connection - The DataOps Runner makes the connection to the data product SaaS platform.
Type of connection - HTTPS/TLS 1.2. See TLS Protocol for further information.
Direction of data flow - Bidirectional, initiated from the DataOps Runner.
Description
-
Although this value is configurable, the DataOps Runner connects to DataOps every second.
-
The DataOps Runner provides its agent name and pre-exchanged authentication key (to validate its identity) to the data product Platform and asks if there are any pending jobs it is authorized to orchestrate.
-
If there are none, the connection is closed.
-
If there are any jobs, the required configuration for the jobs to be executed is passed down to the DataOps Runner. See Authentication Credentials and Secrets Management for further information.
-
The DataOps Runner then spawns additional DataOps orchestrators (Docker containers) using an image specific to the type of job it must orchestrate. For instance, the Python3 Orchestrator container image is used to create the relevant orchestrator if a Python 3 job needs to be run.
-
The STDOUT and STDERR for these additional containers are streamed back to DataOps in close to real-time for monitoring and debugging purposes.
-
Lastly, if the job is configured to produce artifacts (typically rendered configuration files, reports, etc.), these are sent back to DataOps.
Registry connection
Direction of connection - The DataOps Runner connects to the public container registry https://hub.docker.com.
Type of connection - HTTPS/TLS 1.2. See TLS Protocol for further information.
Direction of data flow - Bidirectional, initiated from the DataOps orchestrator host.
Description
- The DataOps Runner downloads one or more orchestrator images to execute the job logic as defined by the DataOps pipeline.
Orchestrator connection
Direction of connection - The DataOps Runner connects to the local container infrastructure (local Docker engine) from its host machine. The orchestrator container spawns on the same host as the DataOps Runner. For security reasons, the network connection used will be a local network in most scenarios.
Type of connection - Localhost and intranet only
Direction of data flow - Bidirectional, initiated from the DataOps Runner image to the orchestrator image.
Description
- The DataOps Runner executes the orchestrator container images and monitors their status until they exit.
Local connection
Direction of connection - The DataOps orchestrator connects to an on-premises or private cloud system such as Matillion to ELT data from the system.
Type of connection - Dictated by the capabilities of the system to be orchestrated. Usually, this is HTTPS/TLS 1.2. See TLS Protocol for further information.
Direction of data flow - Bidirectional, initiated from the DataOps orchestrator host.
Description
-
The specific DataOps Orchestrator, like Matillion, uses configuration information from the job or the DataOps Vault. The DataOps Vault securely provides connection credentials using the Secrets Manager Orchestrator.
-
The actual work, such as starting an ETL/ELT job, is requested.
-
For instance, if this is an asynchronous request, it will return before the actual task is completed. The DataOps orchestrator will then enter a polling loop to check the status of the asynchronous task until it finishes or a configurable timeout is hit.
Indirect Snowflake connection
Direction of connection - The orchestrated system connects to Snowflake.
Type of connection - Dictated by the capabilities of the orchestrated system. However, this can only be within the secure methods supported by Snowflake.
Direction of data flow - Bidirectional
Description - See the Snowflake data consumer and other documentation for the orchestrated system.
Direct Snowflake connection
Direction of connection - One of the DataOps Orchestrators that connects to Snowflake directly.
Type of connection - HTTPS/TLS 1.2. See TLS Protocol for further information.
Direction of data flow - Bidirectional, initiated via the DataOps Orchestrator from the DataOps Runner host.
Description
-
There are various functions within the DataOps orchestrators, namely the Modelling and Transformation (MATE) Orchestrator and the Snowflake Object Lifecycle (SOLE) Orchestrator, that orchestrate Snowflake directly as follows:
- The DataOps Orchestrator sends a SQL DML/DDL query to Snowflake, and Snowflake returns a response.
SaaS connection
Identical to the Local Connection except that traffic routes via the public internet.
Consumer connection
Direction of connection - Data Consumers connect to Snowflake
Type of connection - HTTPS/TLS 1.2. See TLS Protocol for further information.
Direction of data flow - Bidirectional
Description
- See the section on Snowflake and other data consumers as well as the documentation for the individual tools or connection libraries being used.
User connection
Direction of connection - DataOps end users connect to the DataOps app.
Type of connection - HTTPS/TLS 1.2. See TLS Protocol for further information.
Direction of data flow - Bidirectional
Description
The main use cases are:
- Standard Web UI access
- RESTful API/Web Service calls
- Git access over HTTPS
TLS protocol
DataOps only supports TLS protocol version 1.2. Older, insecure versions are disabled.
Only the following TLS Ciphers are enabled.
- ECDHE-ECDSA-AES128-GCM-SHA256
- ECDHE-RSA-AES128-GCM-SHA256
- ECDHE-ECDSA-AES128-SHA256
- ECDHE-RSA-AES128-SHA256
- ECDHE-ECDSA-AES256-GCM-SHA384
- ECDHE-RSA-AES256-GCM-SHA384
- ECDHE-ECDSA-AES256-SHA384
- ECDHE-RSA-AES256-SHA384
- AES128-GCM-SHA256
- AES128-SHA256
- AES256-GCM-SHA384
- AES256-SHA256
Static IP addresses
DataOps users, Git clients, and the DataOps Runners must open their firewall for outbound HTTPS connections to the following IPv4 addresses to connect to DataOps (via the Runner Connection and the User Connection).
3.9.0.146
3.11.211.102
13.41.41.2
34.252.20.84
54.155.227.211
54.73.50.91
In addition, if using DataOps development environment, development workspaces will initiate their traffic via IPv4 from:
18.135.200.239
18.169.227.255
52.56.96.175
IPv6 is currently not supported.
Authentication credentials and Secrets management
Authentication details are fetched via the Secrets Manager Orchestrator by the DataOps Runner. Credentials are either stored in the encrypted local DataOps Vault or (recommended) stored in a third-party secrets manager like AWS Keystore.
Secrets and credentials management
User access control
DataOps supports the following user access control methods:
- Username and Password Authentication
- Access Token Authentication
Two-factor authentication (MFA/2FA)
For Username and Password Authentication, DataOps supports time-based one-time passwords, compatible with applications such as:
- Authy
- Duo Mobile
- LastPass Authenticator
- Authenticator
- andOTP
- Google Authenticator
- Microsoft Authenticator
- SailOTP
Single sign-on (SSO)
DataOps.live supports Single Sign-On (SSO) via a third-party provider. Azure Directory Service, Microsoft's Active Directory Federation Services (ADFS), Ping Federate, the standard protocols SAML 2.0 and OpenID Connect (OIDC), and other SSO providers, are supported as external identity providers. You can read up more on the Single Sign-On documentation page.
Role-based access control (RBAC)
DataOps supports user restrictions in two dimensions:
Scope
The following points apply to the DataOps Scope:
- By default, a user has no access to any DataOps projects or groups until assigned access.
- If a user is given access to a specific project, this allows access to only that project.
- If users are given access to a particular group, they will have access to all projects and sub-groups underneath that group.
Function
When a user is assigned a scope, they are also assigned a functional role from the following available roles:
- Reporter: Read-only access
- Developer: Read-write access to all but protected branches (configurable)
- Maintainer: Read-write access to all branches, including protected branches (configurable)
- Owner: Read-write access to all branches plus the ability to register DataOps runners, change permissions for other users in their group, and perform specific project and group admin functions (such as rename, transfer, etc.)
Penetration testing
The DataOps.live data product platform is subject to regular penetration testing. The latest report can be obtained by contacting our support team after signing an NDA.
API rate limiting
API rate limiting is a common technique used to improve the security and durability of a web application, no matter if the request is malicious or just a bug. Rate limiting mitigates brute force attacks as well. DataOps.live uses rate limiting to protect many aspects of the platform, including but not limited to:
- General API access
- Failed Authentication attempts into the platform
- Import and export rates of files and projects
This ensures throttling requests from IP addresses where large request volumes originate. IP block-and-allow lists are in place to ban abusive clients. See API rate limits for additional details on the limits implemented in DataOps.live.
Business continuity management
The data product platform ensures high availability (>99.9%) and protection against unplanned interruptions.
Snowflake
Snowflake is a highly secure platform used in many highly-sensitive areas. Snowflake has excellent security documentation, including:
Physical and environmental security
All aspects of the data product platform are hosted on Amazon Web Services. Therefore, both physical and environmental security is provided by AWS. This is summarized in the AWS Data Center Controls web page.
Operations and processes
Staff onboarding and offboarding
Human Resource Security:
-
Does your organization perform background screening of applicants, including prior employment, criminal records, credit checks, professional and academic references, and drug screening (unless prohibited by law)?
- N/A - Internal Control Response is sufficient.
-
Are all employees and non-employees that manage your and/or your client data on behalf of your company are required, upon hire, to:
-
Sign a Code of Ethics or any agreement(s) that require non-disclosure, preservation of confidentiality, and/or acceptable use, and undergo information security awareness training upon hire?"
- N/A - Internal Control Response is sufficient.
-
Does your organization have a formal process to remove/change user access and obtain assets (as applicable) within 24 hours after HR's notification of termination, or when an employee / non-employee changes position?
-
Internal company IT/security operations
DataOps.live has policies that cover:
- Information Security
- Information Security Training
- Password Management
- Information Security Incident Management
- Asset Management
- Acceptable Use
- Access Control
- Business Continuity Management
- Management Review
- Social Media
- Data Encryption
- Anti-Virus
- Data Protection
- Internal Audit
- Document Control and Record Management
- Supplier Security
- BYOD (Bring Your Own Device)
- Secure Development
All data stored in the data product platform is encrypted both in transit and at rest:
Data Encryption | Description |
---|---|
Data in transit | Data in transit between the ata product platform and our servers is protected by supporting the latest recommended secure cipher suites to encrypt all traffic in transit, including the use of TLS 1.2 protocols, AES256 encryption, and SHA2 signatures, whenever supported by the clients. |
Data at rest | Data at rest in DataOps.live production network is encrypted using industry-standard 256-bit Advanced Encryption Standard (AES256), which applies to all types of data at rest within DataOps.live systems—relational databases, file stores, database backups, etc. |
All the access to the DataOps.live infrastructure for maintenance and administration is allowlist-controlled and only available via Secure Protocols (i.e., TLS 1.2, VPN, etc.). Unsecured protocols (e.g., telnet, FTP) are prohibited and prevented.
All firewall rules and access groups controlling administrative access to DataOps.live are set to deny by default.
Development and maintenance
DataOps.live development practices include:
- Requirements tracking using a standard requirements tracking tool
- Source Code Management (SCM) based on Git
- Code review processes
- Automated build and test CI/CD Infrastructure
- Automated security testing scans (see Penetration Testing). Critical vulnerabilities are corrected within 30 days. High-risk vulnerabilities are fixed within 90 days.
Incident Management
DataOps.live has an Incident Management and Escalation Process as well as Cybersecurity Insurance Coverage.