File Format
You can provide configuration to Snowflake Object Lifecycle Engine for the following operations with file format:
- Manage the lifecycle of new and existing file formats
- Manage grants of file format
Usage
note
We have introduced SOLE for Data Products as a new framework for SOLE to help you easily build an ecosystem of data products. Learn more about SOLE for Data Products which is currently available as a private preview.
- File Format in Current Configuration
- File Format in SOLE for Data Products
databases:
<database-name>:
schemas:
<schema-name>:
file_formats:
<format-name>:
<configuration-key>: <value>
grants:
<privilege>:
- <role-name>
- <role-name>
- file_format:
name: <format-name>
<configuration-key>: <value>
database: rel(database.<database-name>)
schema: rel(schema.<schema-name>)
grants:
<privilege>:
- rel(role.<role-name>)
- rel(role.<role-name>)
Supported parameters
The engine supports the parameters listed below.
Configuration Key | Required/Optional | Data Types and Values | Description |
---|---|---|---|
format_type | Required | String: For info on format-specific options, see CSV, JSON, AVRO, ORC, PARQUET, and XML | Specifies the format of the data files to load |
comment | Optional | String | Specifies a comment for the file format |
deleted | Optional | Boolean: True enables deletion prevention, False does nothing | Specifies what objects are allowed to be deleted |
grants | Optional | Map: See Supported file_format Grants to Roles | List of Privileges and Roles to which privileges are granted on the current file format |
manage_mode | Optional | String: all (default), none , grants | Configures what properties to manage for the file format. See Changing Manage Mode before changing the value. |
Additional parameters
format_type
set to CSV
Configuration Key | Required/Optional | Data Types and Values | Description |
---|---|---|---|
binary_format | Optional | String: HEX (default), BASE64 , UTF8 | Defines the encoding format for binary input or output |
compression | Optional | String: AUTO (default), GZIP , BZ2 , BROTLI , ZSTD , DEFLATE , RAW_DEFLATE , NONE | Specifies the current compression algorithm for the data file |
date_format | Optional | String: AUTO (default) | Defines the format of date values in the data files |
encoding | Optional | String: UTF8 (default) | Specifies the character set of the source data when loading it into a table |
empty_field_as_null | Optional | Boolean: TRUE (default) | Specifies whether to insert SQL NULL for empty fields in an input file, which are represented by two successive delimiters |
error_on_column_count_mismatch | Optional | Boolean | Specifies whether to generate a parsing error if the number of delimited columns (i.e. fields) in an input file does not match the number of columns in the corresponding table |
escape | Optional | String: NONE (default) | Specifies the character set of the source data when loading it into a table |
escape_unenclosed_field | Optional | String: (\\) (default) | Single character string used as the escape character for unenclosed field values only. Note: Due to a parsing issue, double backslashes (\\ ) are converted to single backslashes (\ ). |
field_delimiter | Optional | String: , (default) | Specifies one or more single-byte or multibyte characters that separate fields in the data file |
field_optionally_enclosed_by | Optional | String | Character used to enclose strings |
file_extension | Optional | String | Specifies the extension for files unloaded to a stage |
null_if | Optional | List/array of string: \\N (default) | Converts to and from SQL NULL. Note: Due to a parsing issue, double backslashes ( \\ ) are converted to single backslashes (\ ). This results in DataOps always detecting a change between the target configuration and Snowflake and applying an update to the file format. |
file_extension | Optional | String | Specifies the extension for files unloaded to a stage |
record_delimiter | Optional | String: \n (default) | Specifies one or more single-byte or multibyte characters that separate records in the data file |
replace_invalid_characters | Optional | Boolean | Specifies whether to replace invalid UTF-8 characters with the Unicode replacement character |
skip_byte_order_mark | Optional | Boolean | Specifies whether to skip the BOM (byte order mark) if present in a data file |
skip_header | Optional | Integer: 0 (default) | Number of lines at the start of the file to skip |
skip_blank_lines | Optional | Boolean: FALSE (default) | Specifies to skip any blank lines encountered in the data files |
time_format | Optional | String: AUTO (default) | Format of time values in the data files |
timestamp_format | Optional | String: AUTO (default) | Format of timestamp values in the data files |
trim_space | Optional | Boolean: FALSE (default) | Specifies whether to remove white space from fields |
info
For more information, see the Snowflake Docs.
format_type
set to JSON
Configuration Key | Required/Optional | Data Types and Values | Description |
---|---|---|---|
allow_duplicate | Optional | Boolean: FALSE (default) | Allows duplicate object field names |
binary_format | Optional | String: HEX (default), BASE64 , UTF8 | Defines the encoding format for binary input or output |
compression | Optional | String: AUTO (default), GZIP , BZ2 , BROTLI , ZSTD , DEFLATE , RAW_DEFLATE , NONE | Specifies the current compression algorithm for the data file |
date_format | Optional | String: AUTO (default) | Defines the format of date values in the data files |
deleted | Optional | Boolean: True enables deletion prevention, False does nothing | Specifies what objects are allowed to be deleted |
enable_octal | Optional | Boolean: FALSE (default) | Enables parsing of octal numbers |
file_extension | Optional | String | Specifies the extension for files unloaded to a stage |
ignore_utf8_errors | Optional | Boolean: FALSE (default) | Specifies whether UTF-8 encoding errors produce error conditions |
null_if | Optional | List/array of string: \\N (default) | Converts to and from SQL NULL. Note: Due to a parsing issue, double backslashes ( \\ ) are converted to single backslashes (\ ). This results in DataOps always detecting a change between the target configuration and Snowflake and applying an update to the file format. |
replace_invalid_characters | Optional | Boolean: FALSE (default) | Specifies whether to replace invalid UTF-8 characters with the Unicode replacement character |
strip_outer_array | Optional | Boolean: FALSE (default) | Tells the JSON parser to remove outer brackets |
skip_byte_order_mark | Optional | Boolean: FALSE (default) | specifies whether to skip the BOM (byte order mark) if present in a data file |
strip_null_values | Optional | Boolean: FALSE (default) | Tells the JSON parser to remove object fields or array elements containing null values |
time_format | Optional | String: AUTO (default) | Format of time values in the data files |
timestamp_format | Optional | String: AUTO (default) | Defines the format of timestamp values in the data files |
trim_space | Optional | Boolean: FALSE (default) | specifies whether to remove white space from fields |
info
For more information, see the Snowflake Docs.
format_type
set to AVRO
Configuration Key | Required/Optional | Data Types and Values | Description |
---|---|---|---|
compression | Optional | String: AUTO , GZIP , BZ2 , BROTLI , ZSTD , DEFLATE , RAW_DEFLATE , NONE | Specifies the current compression algorithm for the data file |
null_if | Optional | List/array of string | Converts to and from SQL NULL |
trim_space | Optional | Boolean | specifies whether to remove white space from fields |
info
For more information, see the Snowflake Docs.
format_type
set to ORC
Configuration Key | Required/Optional | Data Types and Values | Description |
---|---|---|---|
null_if | Optional | List/array of string | Converts to and from SQL NULL |
trim_space | Optional | Boolean | specifies whether to remove white space from fields |
info
For more information, see the Snowflake Docs.
format_type
set to PARQUET
Configuration Key | Required/Optional | Data Types and Values | Description |
---|---|---|---|
binary_as_text | Optional | Boolean: TRUE (default) | Specifies whether to interpret columns with no defined logical data type as UTF-8 text |
compression | Optional | String: AUTO (default), LZO , SNAPPY , NONE | Specifies the current compression algorithm for the data file |
null_if | Optional | List/array of string: [] (default) | Converts to and from SQL NULL |
trim_space | Optional | Boolean: FALSE (default) | specifies whether to remove white space from fields |
info
For more information, see the Snowflake Docs.
format_type
set to XML
Configuration Key | Required/Optional | Data Types and Values | Description |
---|---|---|---|
compression | Optional | String: AUTO , GZIP , BZ2 , BROTLI , ZSTD , DEFLATE , RAW_DEFLATE , NONE | Specifies the current compression algorithm for the data file |
disable_snowflake_data | Optional | Boolean | Specifies whether the XML parser disables recognition of Snowflake semi-structured data tags |
disable_auto_convert | Optional | Boolean | Specifies whether the XML parser disables automatic conversion of numeric and Boolean values from text to native representation |
ignore_utf8_errors | Optional | Boolean | Specifies whether UTF-8 encoding errors produce error conditions |
preserve_space | Optional | Boolean | Specifies whether the XML parser preserves leading and trailing spaces in element content |
strip_outer_element | Optional | Boolean | Specifies whether the XML parser strips out the outer XML element, exposing 2nd level elements as separate documents |
skip_byte_order_mark | Optional | Boolean | specifies whether to skip the BOM (byte order mark) if present in a data file |
info
For more information, see the Snowflake Docs.
Supported file format grants to roles
Following are the privileges you can grant to roles in the file format definition:
- ALL PRIVILEGES
- USAGE
- OWNERSHIP
Examples
file_format
with required params
- File Format in Current Configuration
- File Format in SOLE for Data Products
databases:
SALES_RECORD:
comment: "product sales record"
schemas:
SALES_RECORD_DAILY:
comment: "SALES_RECORD_DAILY"
file_formats:
FILE_FORMAT_1:
format_type: CSV
grants:
USAGE:
- ROLE_1
- file_format:
name: FILE_FORMAT_1
database: rel(database.SALES_RECORD)
schema: rel(schema.SALES_RECORD_DAILY)
format_type: CSV
grants:
USAGE:
- rel(role.ROLE_1)
file_format
with format-specific options
- File Format in Current Configuration
- File Format in SOLE for Data Products
databases:
SALES_RECORD_ANNUAL:
comment: "Database For SALES_RECORD_ANNUAL"
schemas:
SALES_RECORD:
comment: "Schema one"
file_formats:
ANNUAL_REPORT_1:
format_type: CSV
compression: AUTO
binary_format: HEX
record_delimiter: "\n"
- file_format:
name: ANNUAL_REPORT_1
database: rel(database.SALES_RECORD_ANNUAL)
schema: rel(schema.SALES_RECORD)
format_type: CSV
compression: AUTO
binary_format: HEX
record_delimiter: "\n"
file_format
with all format-specific options
- File Format in Current Configuration
- File Format in SOLE for Data Products
databases:
SALES_RECORD_ANNUAL:
comment: "Database For SALES_RECORD_ANNUAL"
schemas:
SALES_RECORD:
comment: "Schema one"
file_formats:
ANNUAL_REPORT_2:
format_type: JSON
compression: "AUTO"
binary_format: "HEX"
comment: "File format for type JSON"
trim_space: false
enable_octal: false
strip_outer_array: false
strip_null_values: false
replace_invalid_characters: false
ignore_utf8_errors: true
skip_byte_order_mark: false
allow_duplicate: true
date_format: "AUTO"
time_format: "AUTO"
timestamp_format: "AUTO"
file_extension: ".json"
null_if:
- "NULL"
- "null"
grants:
USAGE:
- ANALYST
- MARKETING
- file_format:
name: ANNUAL_REPORT_2
database: rel(database.SALES_RECORD_ANNUAL)
schema: rel(schema.SALES_RECORD)
format_type: JSON
compression: "AUTO"
binary_format: "HEX"
comment: "File format for type JSON"
trim_space: false
enable_octal: false
strip_outer_array: false
strip_null_values: false
replace_invalid_characters: false
ignore_utf8_errors: true
skip_byte_order_mark: false
allow_duplicate: true
date_format: "AUTO"
time_format: "AUTO"
timestamp_format: "AUTO"
file_extension: ".json"
null_if:
- "NULL"
- "null"
grants:
USAGE:
- rel(role.ANALYST)
- rel(role.MARKETING)