Skip to main content

File Format

You can provide configuration to Snowflake Object Lifecycle Engine for the following operations with file format:

  • Manage the lifecycle of new and existing file formats
  • Manage grants of file format

Usage

note

We have introduced SOLE for Data Products as a new framework for SOLE to help you easily build an ecosystem of data products. Learn more about SOLE for Data Products which is currently available as a private preview.

databases:
<database-name>:
schemas:
<schema-name>:
file_formats:
<format-name>:
<configuration-key>: <value>
grants:
<privilege>:
- <role-name>
- <role-name>

Supported parameters

The engine supports the parameters listed below.

Configuration KeyRequired/OptionalData Types and ValuesDescription
format_typeRequiredString: For info on format-specific options, see CSV, JSON, AVRO, ORC, PARQUET, and XMLSpecifies the format of the data files to load
commentOptionalStringSpecifies a comment for the file format
deletedOptionalBoolean: True enables deletion prevention, False does nothingSpecifies what objects are allowed to be deleted
grantsOptionalMap: See Supported file_format Grants to RolesList of Privileges and Roles to which privileges are granted on the current file format
manage_modeOptionalString: all (default), none, grantsConfigures what properties to manage for the file format.
See Changing Manage Mode before changing the value.

Additional parameters

format_type set to CSV

Configuration KeyRequired/OptionalData Types and ValuesDescription
binary_formatOptionalString: HEX (default), BASE64, UTF8Defines the encoding format for binary input or output
compressionOptionalString: AUTO (default), GZIP, BZ2, BROTLI, ZSTD, DEFLATE, RAW_DEFLATE, NONESpecifies the current compression algorithm for the data file
date_formatOptionalString: AUTO (default)Defines the format of date values in the data files
encodingOptionalString: UTF8 (default)Specifies the character set of the source data when loading it into a table
empty_field_as_nullOptionalBoolean: TRUE (default)Specifies whether to insert SQL NULL for empty fields in an input file, which are represented by two successive delimiters
error_on_column_count_mismatchOptionalBooleanSpecifies whether to generate a parsing error if the number of delimited columns (i.e. fields) in an input file does not match the number of columns in the corresponding table
escapeOptionalString: NONE (default)Specifies the character set of the source data when loading it into a table
escape_unenclosed_fieldOptionalString: (\\) (default)Single character string used as the escape character for unenclosed field values only. Note: Due to a parsing issue, double backslashes (\\) are converted to single backslashes (\).
field_delimiterOptionalString: , (default)Specifies one or more single-byte or multibyte characters that separate fields in the data file
field_optionally_enclosed_byOptionalStringCharacter used to enclose strings
file_extensionOptionalStringSpecifies the extension for files unloaded to a stage
null_ifOptionalList/array of string: \\N (default)Converts to and from SQL NULL.
Note: Due to a parsing issue, double backslashes (\\) are converted to single backslashes (\). This results in DataOps always detecting a change between the target configuration and Snowflake and applying an update to the file format.
file_extensionOptionalStringSpecifies the extension for files unloaded to a stage
record_delimiterOptionalString: \n (default)Specifies one or more single-byte or multibyte characters that separate records in the data file
replace_invalid_charactersOptionalBooleanSpecifies whether to replace invalid UTF-8 characters with the Unicode replacement character
skip_byte_order_markOptionalBooleanSpecifies whether to skip the BOM (byte order mark) if present in a data file
skip_headerOptionalInteger: 0 (default)Number of lines at the start of the file to skip
skip_blank_linesOptionalBoolean: FALSE (default)Specifies to skip any blank lines encountered in the data files
time_formatOptionalString: AUTO (default)Format of time values in the data files
timestamp_formatOptionalString: AUTO (default)Format of timestamp values in the data files
trim_spaceOptionalBoolean: FALSE (default)Specifies whether to remove white space from fields
info

For more information, see the Snowflake Docs.

format_type set to JSON

Configuration KeyRequired/OptionalData Types and ValuesDescription
allow_duplicateOptionalBoolean: FALSE (default)Allows duplicate object field names
binary_formatOptionalString: HEX (default), BASE64, UTF8Defines the encoding format for binary input or output
compressionOptionalString: AUTO (default), GZIP, BZ2, BROTLI, ZSTD, DEFLATE, RAW_DEFLATE, NONESpecifies the current compression algorithm for the data file
date_formatOptionalString: AUTO (default)Defines the format of date values in the data files
deletedOptionalBoolean: True enables deletion prevention, False does nothingSpecifies what objects are allowed to be deleted
enable_octalOptionalBoolean: FALSE (default)Enables parsing of octal numbers
file_extensionOptionalStringSpecifies the extension for files unloaded to a stage
ignore_utf8_errorsOptionalBoolean: FALSE (default)Specifies whether UTF-8 encoding errors produce error conditions
null_ifOptionalList/array of string: \\N (default)Converts to and from SQL NULL.
Note: Due to a parsing issue, double backslashes (\\) are converted to single backslashes (\). This results in DataOps always detecting a change between the target configuration and Snowflake and applying an update to the file format.
replace_invalid_charactersOptionalBoolean: FALSE (default)Specifies whether to replace invalid UTF-8 characters with the Unicode replacement character
strip_outer_arrayOptionalBoolean: FALSE (default)Tells the JSON parser to remove outer brackets
skip_byte_order_markOptionalBoolean: FALSE (default)specifies whether to skip the BOM (byte order mark) if present in a data file
strip_null_valuesOptionalBoolean: FALSE (default)Tells the JSON parser to remove object fields or array elements containing null values
time_formatOptionalString: AUTO (default)Format of time values in the data files
timestamp_formatOptionalString: AUTO (default)Defines the format of timestamp values in the data files
trim_spaceOptionalBoolean: FALSE (default)specifies whether to remove white space from fields
info

For more information, see the Snowflake Docs.

format_type set to AVRO

Configuration KeyRequired/OptionalData Types and ValuesDescription
compressionOptionalString: AUTO, GZIP, BZ2, BROTLI, ZSTD, DEFLATE, RAW_DEFLATE, NONESpecifies the current compression algorithm for the data file
null_ifOptionalList/array of stringConverts to and from SQL NULL
trim_spaceOptionalBooleanspecifies whether to remove white space from fields
info

For more information, see the Snowflake Docs.

format_type set to ORC

Configuration KeyRequired/OptionalData Types and ValuesDescription
null_ifOptionalList/array of stringConverts to and from SQL NULL
trim_spaceOptionalBooleanspecifies whether to remove white space from fields
info

For more information, see the Snowflake Docs.

format_type set to PARQUET

Configuration KeyRequired/OptionalData Types and ValuesDescription
binary_as_textOptionalBoolean: TRUE (default)Specifies whether to interpret columns with no defined logical data type as UTF-8 text
compressionOptionalString: AUTO (default), LZO, SNAPPY, NONESpecifies the current compression algorithm for the data file
null_ifOptionalList/array of string: [] (default)Converts to and from SQL NULL
trim_spaceOptionalBoolean: FALSE (default)specifies whether to remove white space from fields
info

For more information, see the Snowflake Docs.

format_type set to XML

Configuration KeyRequired/OptionalData Types and ValuesDescription
compressionOptionalString: AUTO, GZIP, BZ2, BROTLI, ZSTD, DEFLATE, RAW_DEFLATE, NONESpecifies the current compression algorithm for the data file
disable_snowflake_dataOptionalBooleanSpecifies whether the XML parser disables recognition of Snowflake semi-structured data tags
disable_auto_convertOptionalBooleanSpecifies whether the XML parser disables automatic conversion of numeric and Boolean values from text to native representation
ignore_utf8_errorsOptionalBooleanSpecifies whether UTF-8 encoding errors produce error conditions
preserve_spaceOptionalBooleanSpecifies whether the XML parser preserves leading and trailing spaces in element content
strip_outer_elementOptionalBooleanSpecifies whether the XML parser strips out the outer XML element, exposing 2nd level elements as separate documents
skip_byte_order_markOptionalBooleanspecifies whether to skip the BOM (byte order mark) if present in a data file
info

For more information, see the Snowflake Docs.

Supported file format grants to roles

Following are the privileges you can grant to roles in the file format definition:

  • ALL PRIVILEGES
  • USAGE
  • OWNERSHIP

Examples

file_format with required params

databases:
SALES_RECORD:
comment: "product sales record"
schemas:
SALES_RECORD_DAILY:
comment: "SALES_RECORD_DAILY"
file_formats:
FILE_FORMAT_1:
format_type: CSV
grants:
USAGE:
- ROLE_1

file_format with format-specific options

databases:
SALES_RECORD_ANNUAL:
comment: "Database For SALES_RECORD_ANNUAL"
schemas:
SALES_RECORD:
comment: "Schema one"
file_formats:
ANNUAL_REPORT_1:
format_type: CSV
compression: AUTO
binary_format: HEX
record_delimiter: "\n"

file_format with all format-specific options

databases:
SALES_RECORD_ANNUAL:
comment: "Database For SALES_RECORD_ANNUAL"
schemas:
SALES_RECORD:
comment: "Schema one"
file_formats:
ANNUAL_REPORT_2:
format_type: JSON
compression: "AUTO"
binary_format: "HEX"
comment: "File format for type JSON"
trim_space: false
enable_octal: false
strip_outer_array: false
strip_null_values: false
replace_invalid_characters: false
ignore_utf8_errors: true
skip_byte_order_mark: false
allow_duplicate: true
date_format: "AUTO"
time_format: "AUTO"
timestamp_format: "AUTO"
file_extension: ".json"
null_if:
- "NULL"
- "null"
grants:
USAGE:
- ANALYST
- MARKETING