Skip to main content

File Format

You can provide configuration to Snowflake Object Lifecycle Engine for the following operations with file format:

  • Manage the lifecycle of new and existing file formats
  • Manage grants of file format

Usage

note

We have introduced SOLE for Data Products as a new framework for SOLE to help you easily build an ecosystem of data products. The major difference is in how you define Snowflake objects in the configuration file. Rather than having a grouped collection of objects, SOLE for Data Products goes for modular, self-describing, and explicit object definition.
Learn more about SOLE for Data Products, currently available as a public preview.

We have also introduced Data products as an extra layer on top of the data product platform capabilities making managing data products easier than ever. Learn more about Data Products, currently available as a private preview.

databases:
<database-name>:
schemas:
<schema-name>:
file_formats:
<format-name>:
<configuration-key>: <value>
grants:
<privilege>:
- <role-name>
- <role-name>

Supported parameters

The engine supports the parameters listed below.

Configuration KeyRequired/OptionalData Types and ValuesDescription
format_typeRequiredString: For info on format-specific options, see CSV, JSON, AVRO, ORC, PARQUET, and XMLSpecifies the format of the data files to load
commentOptionalStringSpecifies a comment for the file format
deletedOptionalBoolean: True enables deletion prevention, False does nothingSpecifies what objects are allowed to be deleted
grantsOptionalMap: See Supported file_format Grants to RolesList of Privileges and Roles to which privileges are granted on the current file format
manage_modeOptionalString: all (default), none, grantsConfigures what properties to manage for the file format.
See Changing Manage Mode before changing the value.

Additional parameters

format_type set to CSV

Configuration KeyRequired/OptionalData Types and ValuesDescription
binary_formatOptionalString: HEX (default), BASE64, UTF8Defines the encoding format for binary input or output
compressionOptionalString: AUTO (default), GZIP, BZ2, BROTLI, ZSTD, DEFLATE, RAW_DEFLATE, NONESpecifies the current compression algorithm for the data file
date_formatOptionalString: AUTO (default)Defines the format of date values in the data files
encodingOptionalString: UTF8 (default)Specifies the character set of the source data when loading it into a table
empty_field_as_nullOptionalBoolean: TRUE (default)Specifies whether to insert SQL NULL for empty fields in an input file, which are represented by two successive delimiters
error_on_column_count_mismatchOptionalBooleanSpecifies whether to generate a parsing error if the number of delimited columns (i.e. fields) in an input file does not match the number of columns in the corresponding table
escapeOptionalString: NONE (default)Specifies the character set of the source data when loading it into a table
escape_unenclosed_fieldOptionalString: (\\) (default)Single character string used as the escape character for unenclosed field values only. Note: Due to a parsing issue, double backslashes (\\) are converted to single backslashes (\).
field_delimiterOptionalString: , (default)Specifies one or more single-byte or multibyte characters that separate fields in the data file
field_optionally_enclosed_byOptionalStringCharacter used to enclose strings
file_extensionOptionalStringSpecifies the extension for files unloaded to a stage
null_ifOptionalList/array of string: \\N (default)Converts to and from SQL NULL.
Note: Due to a parsing issue, double backslashes (\\) are converted to single backslashes (\). This results in DataOps always detecting a change between the target configuration and Snowflake and applying an update to the file format.
file_extensionOptionalStringSpecifies the extension for files unloaded to a stage
record_delimiterOptionalString: \n (default)Specifies one or more single-byte or multibyte characters that separate records in the data file
replace_invalid_charactersOptionalBooleanSpecifies whether to replace invalid UTF-8 characters with the Unicode replacement character
skip_byte_order_markOptionalBooleanSpecifies whether to skip the BOM (byte order mark) if present in a data file
skip_headerOptionalInteger: 0 (default)Number of lines at the start of the file to skip
skip_blank_linesOptionalBoolean: FALSE (default)Specifies to skip any blank lines encountered in the data files
time_formatOptionalString: AUTO (default)Format of time values in the data files
timestamp_formatOptionalString: AUTO (default)Format of timestamp values in the data files
trim_spaceOptionalBoolean: FALSE (default)Specifies whether to remove white space from fields
info

For more information, see the Snowflake Docs.

format_type set to JSON

Configuration KeyRequired/OptionalData Types and ValuesDescription
allow_duplicateOptionalBoolean: FALSE (default)Allows duplicate object field names
binary_formatOptionalString: HEX (default), BASE64, UTF8Defines the encoding format for binary input or output
compressionOptionalString: AUTO (default), GZIP, BZ2, BROTLI, ZSTD, DEFLATE, RAW_DEFLATE, NONESpecifies the current compression algorithm for the data file
date_formatOptionalString: AUTO (default)Defines the format of date values in the data files
deletedOptionalBoolean: True enables deletion prevention, False does nothingSpecifies what objects are allowed to be deleted
enable_octalOptionalBoolean: FALSE (default)Enables parsing of octal numbers
file_extensionOptionalStringSpecifies the extension for files unloaded to a stage
ignore_utf8_errorsOptionalBoolean: FALSE (default)Specifies whether UTF-8 encoding errors produce error conditions
null_ifOptionalList/array of string: \\N (default)Converts to and from SQL NULL.
Note: Due to a parsing issue, double backslashes (\\) are converted to single backslashes (\). This results in DataOps always detecting a change between the target configuration and Snowflake and applying an update to the file format.
replace_invalid_charactersOptionalBoolean: FALSE (default)Specifies whether to replace invalid UTF-8 characters with the Unicode replacement character
strip_outer_arrayOptionalBoolean: FALSE (default)Tells the JSON parser to remove outer brackets
skip_byte_order_markOptionalBoolean: FALSE (default)specifies whether to skip the BOM (byte order mark) if present in a data file
strip_null_valuesOptionalBoolean: FALSE (default)Tells the JSON parser to remove object fields or array elements containing null values
time_formatOptionalString: AUTO (default)Format of time values in the data files
timestamp_formatOptionalString: AUTO (default)Defines the format of timestamp values in the data files
trim_spaceOptionalBoolean: FALSE (default)specifies whether to remove white space from fields
info

For more information, see the Snowflake Docs.

format_type set to AVRO

Configuration KeyRequired/OptionalData Types and ValuesDescription
compressionOptionalString: AUTO, GZIP, BZ2, BROTLI, ZSTD, DEFLATE, RAW_DEFLATE, NONESpecifies the current compression algorithm for the data file
null_ifOptionalList/array of stringConverts to and from SQL NULL
trim_spaceOptionalBooleanspecifies whether to remove white space from fields
info

For more information, see the Snowflake Docs.

format_type set to ORC

Configuration KeyRequired/OptionalData Types and ValuesDescription
null_ifOptionalList/array of stringConverts to and from SQL NULL
trim_spaceOptionalBooleanspecifies whether to remove white space from fields
info

For more information, see the Snowflake Docs.

format_type set to PARQUET

Configuration KeyRequired/OptionalData Types and ValuesDescription
binary_as_textOptionalBoolean: TRUE (default)Specifies whether to interpret columns with no defined logical data type as UTF-8 text
compressionOptionalString: AUTO (default), LZO, SNAPPY, NONESpecifies the current compression algorithm for the data file
null_ifOptionalList/array of string: [] (default)Converts to and from SQL NULL
trim_spaceOptionalBoolean: FALSE (default)specifies whether to remove white space from fields
info

For more information, see the Snowflake Docs.

format_type set to XML

Configuration KeyRequired/OptionalData Types and ValuesDescription
compressionOptionalString: AUTO, GZIP, BZ2, BROTLI, ZSTD, DEFLATE, RAW_DEFLATE, NONESpecifies the current compression algorithm for the data file
disable_snowflake_dataOptionalBooleanSpecifies whether the XML parser disables recognition of Snowflake semi-structured data tags
disable_auto_convertOptionalBooleanSpecifies whether the XML parser disables automatic conversion of numeric and Boolean values from text to native representation
ignore_utf8_errorsOptionalBooleanSpecifies whether UTF-8 encoding errors produce error conditions
preserve_spaceOptionalBooleanSpecifies whether the XML parser preserves leading and trailing spaces in element content
strip_outer_elementOptionalBooleanSpecifies whether the XML parser strips out the outer XML element, exposing 2nd level elements as separate documents
skip_byte_order_markOptionalBooleanspecifies whether to skip the BOM (byte order mark) if present in a data file
info

For more information, see the Snowflake Docs.

Supported file format grants to roles

Following are the privileges you can grant to roles in the file format definition:

  • ALL PRIVILEGES
  • USAGE
  • OWNERSHIP
ALL PRIVILEGES handling

When you define ALL PRIVILEGES in the SOLE configuration file, you grant all the privileges listed above to roles on this object except OWNERSHIP. However, the management of ALL PRIVILEGES in SOLE differs from its handling in Snowflake. See Handling ALL PRIVILEGES in SOLE for more information.

Examples

file_format with required params

databases:
SALES_RECORD:
comment: "product sales record"
schemas:
SALES_RECORD_DAILY:
comment: "SALES_RECORD_DAILY"
file_formats:
FILE_FORMAT_1:
format_type: CSV
grants:
USAGE:
- ROLE_1

file_format with format-specific options

databases:
SALES_RECORD_ANNUAL:
comment: "Database For SALES_RECORD_ANNUAL"
schemas:
SALES_RECORD:
comment: "Schema one"
file_formats:
ANNUAL_REPORT_1:
format_type: CSV
compression: AUTO
binary_format: HEX
record_delimiter: "\n"

file_format with all format-specific options

databases:
SALES_RECORD_ANNUAL:
comment: "Database For SALES_RECORD_ANNUAL"
schemas:
SALES_RECORD:
comment: "Schema one"
file_formats:
ANNUAL_REPORT_2:
format_type: JSON
compression: "AUTO"
binary_format: "HEX"
comment: "File format for type JSON"
trim_space: false
enable_octal: false
strip_outer_array: false
strip_null_values: false
replace_invalid_characters: false
ignore_utf8_errors: true
skip_byte_order_mark: false
allow_duplicate: true
date_format: "AUTO"
time_format: "AUTO"
timestamp_format: "AUTO"
file_extension: ".json"
null_if:
- "NULL"
- "null"
grants:
USAGE:
- ANALYST
- MARKETING