File Format
You can provide configuration to Snowflake Object Lifecycle Engine for the following operations with file format:
- Manage the lifecycle of new and existing file formats
- Manage grants of file format
Usage
- Default Configuration
- Data Products Configuration
databases:
<database-name>:
schemas:
<schema-name>:
file_formats:
<format-name>:
<configuration-key>: <value>
grants:
<privilege>:
- <role-name>
- <role-name>
- file_format:
name: <format-name>
<configuration-key>: <value>
database: rel(database.<database-name>)
schema: rel(schema.<schema-name>)
grants:
<privilege>:
- rel(role.<role-name>)
- rel(role.<role-name>)
Supported parameters
The engine supports the parameters listed below.
Configuration Key | Required/Optional | Data Types and Values | Description |
---|---|---|---|
format_type | Required | String: For info on format-specific options, see CSV, JSON, AVRO, ORC, PARQUET, and XML | Specifies the format of the data files to load |
comment | Optional | String | Specifies a comment for the file format |
deleted | Optional | Boolean: True enables deletion prevention, False does nothing | Specifies what objects are allowed to be deleted |
grants | Optional | Map: See Supported file_format Grants to Roles | List of Privileges and Roles to which privileges are granted on the current file format |
manage_mode | Optional | String: all (default), none , grants | Configures what properties to manage for the file format. See Changing Manage Mode before changing the value. |
Additional parameters
format_type
set to CSV
Configuration Key | Required/Optional | Data Types and Values | Description |
---|---|---|---|
binary_format | Optional | String: HEX (default), BASE64 , UTF8 | Defines the encoding format for binary input or output |
compression | Optional | String: AUTO (default), GZIP , BZ2 , BROTLI , ZSTD , DEFLATE , RAW_DEFLATE , NONE | Specifies the current compression algorithm for the data file |
date_format | Optional | String: AUTO (default) | Defines the format of date values in the data files |
encoding | Optional | String: UTF8 (default) | Specifies the character set of the source data when loading it into a table |
empty_field_as_null | Optional | Boolean: TRUE (default) | Specifies whether to insert SQL NULL for empty fields in an input file, which are represented by two successive delimiters |
error_on_column_count_mismatch | Optional | Boolean | Specifies whether to generate a parsing error if the number of delimited columns (i.e. fields) in an input file does not match the number of columns in the corresponding table |
escape | Optional | String: NONE (default) | Specifies the character set of the source data when loading it into a table |
escape_unenclosed_field | Optional | String: (\\) (default) | Single character string used as the escape character for unenclosed field values only. Note: Due to a parsing issue, double backslashes (\\ ) are converted to single backslashes (\ ). |
field_delimiter | Optional | String: , (default) | Specifies one or more single-byte or multibyte characters that separate fields in the data file |
field_optionally_enclosed_by | Optional | String | Character used to enclose strings |
file_extension | Optional | String | Specifies the extension for files unloaded to a stage |
null_if | Optional | List/array of string: \\N (default) | Converts to and from SQL NULL. Note: Due to a parsing issue, double backslashes ( \\ ) are converted to single backslashes (\ ). This results in DataOps always detecting a change between the target configuration and Snowflake and applying an update to the file format. |
file_extension | Optional | String | Specifies the extension for files unloaded to a stage |
record_delimiter | Optional | String: \n (default) | Specifies one or more single-byte or multibyte characters that separate records in the data file |
replace_invalid_characters | Optional | Boolean | Specifies whether to replace invalid UTF-8 characters with the Unicode replacement character |
skip_byte_order_mark | Optional | Boolean | Specifies whether to skip the BOM (byte order mark) if present in a data file |
skip_header | Optional | Integer: 0 (default) | Number of lines at the start of the file to skip |
skip_blank_lines | Optional | Boolean: FALSE (default) | Specifies to skip any blank lines encountered in the data files |
time_format | Optional | String: AUTO (default) | Format of time values in the data files |
timestamp_format | Optional | String: AUTO (default) | Format of timestamp values in the data files |
trim_space | Optional | Boolean: FALSE (default) | Specifies whether to remove white space from fields |
For more information, see the Snowflake Docs.
format_type
set to JSON
Configuration Key | Required/Optional | Data Types and Values | Description |
---|---|---|---|
allow_duplicate | Optional | Boolean: FALSE (default) | Allows duplicate object field names |
binary_format | Optional | String: HEX (default), BASE64 , UTF8 | Defines the encoding format for binary input or output |
compression | Optional | String: AUTO (default), GZIP , BZ2 , BROTLI , ZSTD , DEFLATE , RAW_DEFLATE , NONE | Specifies the current compression algorithm for the data file |
date_format | Optional | String: AUTO (default) | Defines the format of date values in the data files |
deleted | Optional | Boolean: True enables deletion prevention, False does nothing | Specifies what objects are allowed to be deleted |
enable_octal | Optional | Boolean: FALSE (default) | Enables parsing of octal numbers |
file_extension | Optional | String | Specifies the extension for files unloaded to a stage |
ignore_utf8_errors | Optional | Boolean: FALSE (default) | Specifies whether UTF-8 encoding errors produce error conditions |
null_if | Optional | List/array of string: \\N (default) | Converts to and from SQL NULL. Note: Due to a parsing issue, double backslashes ( \\ ) are converted to single backslashes (\ ). This results in DataOps always detecting a change between the target configuration and Snowflake and applying an update to the file format. |
replace_invalid_characters | Optional | Boolean: FALSE (default) | Specifies whether to replace invalid UTF-8 characters with the Unicode replacement character |
strip_outer_array | Optional | Boolean: FALSE (default) | Tells the JSON parser to remove outer brackets |
skip_byte_order_mark | Optional | Boolean: FALSE (default) | specifies whether to skip the BOM (byte order mark) if present in a data file |
strip_null_values | Optional | Boolean: FALSE (default) | Tells the JSON parser to remove object fields or array elements containing null values |
time_format | Optional | String: AUTO (default) | Format of time values in the data files |
timestamp_format | Optional | String: AUTO (default) | Defines the format of timestamp values in the data files |
trim_space | Optional | Boolean: FALSE (default) | specifies whether to remove white space from fields |
For more information, see the Snowflake Docs.
format_type
set to AVRO
Configuration Key | Required/Optional | Data Types and Values | Description |
---|---|---|---|
compression | Optional | String: AUTO , GZIP , BZ2 , BROTLI , ZSTD , DEFLATE , RAW_DEFLATE , NONE | Specifies the current compression algorithm for the data file |
null_if | Optional | List/array of string | Converts to and from SQL NULL |
trim_space | Optional | Boolean | specifies whether to remove white space from fields |
For more information, see the Snowflake Docs.
format_type
set to ORC
Configuration Key | Required/Optional | Data Types and Values | Description |
---|---|---|---|
null_if | Optional | List/array of string | Converts to and from SQL NULL |
trim_space | Optional | Boolean | specifies whether to remove white space from fields |
For more information, see the Snowflake Docs.
format_type
set to PARQUET
Configuration Key | Required/Optional | Data Types and Values | Description |
---|---|---|---|
binary_as_text | Optional | Boolean: TRUE (default) | Specifies whether to interpret columns with no defined logical data type as UTF-8 text |
compression | Optional | String: AUTO (default), LZO , SNAPPY , NONE | Specifies the current compression algorithm for the data file |
null_if | Optional | List/array of string: [] (default) | Converts to and from SQL NULL |
trim_space | Optional | Boolean: FALSE (default) | specifies whether to remove white space from fields |
For more information, see the Snowflake Docs.
format_type
set to XML
Configuration Key | Required/Optional | Data Types and Values | Description |
---|---|---|---|
compression | Optional | String: AUTO , GZIP , BZ2 , BROTLI , ZSTD , DEFLATE , RAW_DEFLATE , NONE | Specifies the current compression algorithm for the data file |
disable_snowflake_data | Optional | Boolean | Specifies whether the XML parser disables recognition of Snowflake semi-structured data tags |
disable_auto_convert | Optional | Boolean | Specifies whether the XML parser disables automatic conversion of numeric and Boolean values from text to native representation |
ignore_utf8_errors | Optional | Boolean | Specifies whether UTF-8 encoding errors produce error conditions |
preserve_space | Optional | Boolean | Specifies whether the XML parser preserves leading and trailing spaces in element content |
strip_outer_element | Optional | Boolean | Specifies whether the XML parser strips out the outer XML element, exposing 2nd level elements as separate documents |
skip_byte_order_mark | Optional | Boolean | specifies whether to skip the BOM (byte order mark) if present in a data file |
For more information, see the Snowflake Docs.
Supported file format grants to roles
Following are the privileges you can grant to roles in the file format definition:
- ALL PRIVILEGES
- USAGE
- OWNERSHIP
When you define ALL PRIVILEGES in the SOLE configuration file, you grant all the privileges listed above to roles on this object except OWNERSHIP. However, the management of ALL PRIVILEGES in SOLE differs from its handling in Snowflake. See Handling ALL PRIVILEGES in SOLE for more information.
Examples
file_format
with required params
- Default Configuration
- Data Products Configuration
databases:
SALES_RECORD:
comment: "product sales record"
schemas:
SALES_RECORD_DAILY:
comment: "SALES_RECORD_DAILY"
file_formats:
FILE_FORMAT_1:
format_type: CSV
grants:
USAGE:
- ROLE_1
- file_format:
name: FILE_FORMAT_1
database: rel(database.SALES_RECORD)
schema: rel(schema.SALES_RECORD_DAILY)
format_type: CSV
grants:
USAGE:
- rel(role.ROLE_1)
file_format
with format-specific options
- Default Configuration
- Data Products Configuration
databases:
SALES_RECORD_ANNUAL:
comment: "Database For SALES_RECORD_ANNUAL"
schemas:
SALES_RECORD:
comment: "Schema one"
file_formats:
ANNUAL_REPORT_1:
format_type: CSV
compression: AUTO
binary_format: HEX
record_delimiter: "\n"
- file_format:
name: ANNUAL_REPORT_1
database: rel(database.SALES_RECORD_ANNUAL)
schema: rel(schema.SALES_RECORD)
format_type: CSV
compression: AUTO
binary_format: HEX
record_delimiter: "\n"
file_format
with all format-specific options
- Default Configuration
- Data Products Configuration
databases:
SALES_RECORD_ANNUAL:
comment: "Database For SALES_RECORD_ANNUAL"
schemas:
SALES_RECORD:
comment: "Schema one"
file_formats:
ANNUAL_REPORT_2:
format_type: JSON
compression: "AUTO"
binary_format: "HEX"
comment: "File format for type JSON"
trim_space: false
enable_octal: false
strip_outer_array: false
strip_null_values: false
replace_invalid_characters: false
ignore_utf8_errors: true
skip_byte_order_mark: false
allow_duplicate: true
date_format: "AUTO"
time_format: "AUTO"
timestamp_format: "AUTO"
file_extension: ".json"
null_if:
- "NULL"
- "null"
grants:
USAGE:
- ANALYST
- MARKETING
- file_format:
name: ANNUAL_REPORT_2
database: rel(database.SALES_RECORD_ANNUAL)
schema: rel(schema.SALES_RECORD)
format_type: JSON
compression: "AUTO"
binary_format: "HEX"
comment: "File format for type JSON"
trim_space: false
enable_octal: false
strip_outer_array: false
strip_null_values: false
replace_invalid_characters: false
ignore_utf8_errors: true
skip_byte_order_mark: false
allow_duplicate: true
date_format: "AUTO"
time_format: "AUTO"
timestamp_format: "AUTO"
file_extension: ".json"
null_if:
- "NULL"
- "null"
grants:
USAGE:
- rel(role.ANALYST)
- rel(role.MARKETING)