Skip to main content

File Format

Configuration can be provided to Snowflake Object Lifecycle Engine for the following operation with File Format:

  • Manage the Lifecycle of new and existing File Format
  • Manage Grants of File Format

Supported Parameters

The engine supports the parameters listed below.

  • COMMENT: Specifies a comment for the file format.
    • Configuration Key: comment
    • Data Type: String
  • FORMAT_TYPE: Specifies the format of the data files to load.
  • MANAGE_MODE: Configures what properties to manage for the file format.
    • Configuration key: manage_mode
    • Data Type: String
    • Possible Values:
      • none
      • grants
      • all(Default)
  • GRANTS: List of Privileges and Roles to which privileges are granted to on the current File Format.
    • Configuration key: grants
    • Data Type: Map

Additional Parameters

FORMAT_TYPE set to CSV

  • BINARY_FORMAT: Defines the encoding format for binary input or output.

    • Configuration Key: binary_format
    • Data Type: String
    • Possible Values:
      • HEX (Default)
      • BASE64
      • UTF8
  • COMPRESSION: Specifies the current compression algorithm for the data file.

    • Configuration Key: compression
    • Data Type: String
    • Possible Values:
      • AUTO (Default)
      • GZIP
      • BZ2
      • BROTLI
      • ZSTD
      • DEFLATE
      • RAW_DEFLATE
      • NONE
  • DATE_FORMAT: Defines the format of date values in the data files.

    • Configuration Key: date_format
    • Data Type: String
    • Default: AUTO
  • ENCODING: String (constant) that specifies the character set of the source data when loading data into a table.

    • Configuration Key: encoding
    • Data Type: String
    • Default: UTF8
  • EMPTY_FIELD_AS_NULL: Specifies whether to insert SQL NULL for empty fields in an input file, which are represented by two successive delimiters.

    • Configuration Key: empty_field_as_null
    • Data Type: Boolean
    • Default: TRUE
  • ERROR_ON_COLUMN_COUNT_MISMATCH: Boolean that specifies whether to generate a parsing error if the number of delimited columns (i.e. fields) in an input file does not match the number of columns in the corresponding table.

    • Configuration Key: error_on_column_count_mismatch
    • Data Type: Boolean
  • ESCAPE: Single character string used as the escape character for field values.

    • Configuration Key: escape
    • Data Type: String
    • Default: NONE
  • ESCAPE_UNENCLOSED_FIELD: Single character string used as the escape character for unenclosed field values only.

    • Configuration Key: escape_unenclosed_field

    • Data Type: String

    • Default: (\\)

      note

      Due to a parsing issue, double backslashes (\\) are converted to single backslashes (\). This results in DataOps always detecting a change between the target configuration and Snowflake and applying an update to the file format.

  • FIELD_DELIMITER: Specifies one or more single-byte or multibyte characters that separate fields in the data file.

    • Configuration Key: field_delimiter
    • Data Type: String
    • Default: ,
  • FIELD_OPTIONALLY_ENCLOSED_BY: Character used to enclose strings.

    • Configuration Key: field_optionally_enclosed_by
    • Data Type: String
  • FILE_EXTENSION: Specifies the extension for files unloaded to a stage.

    • Configuration Key: file_extension
    • Data Type: String
  • NULL_IF: String used to convert to and from SQL NULL.

    • Configuration Key: null_if

    • Data Type: List/array of string

    • Default: \\N

      note

      Due to a parsing issue, double backslashes (\\) are converted to single backslashes (\). This results in DataOps always detecting a change between the target configuration and Snowflake and applying an update to the file format.

  • RECORD_DELIMITER: Specifies one or more single-byte or multibyte characters that separate records in the data file.

    • Configuration Key: record_delimiter
    • Data Type: String
    • Default: \n
  • REPLACE_INVALID_CHARACTERS: Boolean that specifies whether to replace invalid UTF-8 characters with the Unicode replacement character.

    • Configuration Key: replace_invalid_characters
    • Data Type: Boolean
  • SKIP_BYTE_ORDER_MARK: Boolean that specifies whether to skip the BOM (byte order mark) if present in a data file.

    • Configuration Key: skip_byte_order_mark
    • Data Type: Boolean
  • SKIP_HEADER: Number of lines at the start of the file to skip.

    • Configuration Key: skip_header
    • Data Type: Integer
    • Default: 0
  • SKIP_BLANK_LINES: Boolean that specifies to skip any blank lines encountered in the data files.

    • Configuration Key: skip_blank_lines
    • Data Type: Boolean
    • Default: FALSE
  • TIME_FORMAT: The format of time values in the data files.

    • Configuration Key: time_format
    • Data Type: String
    • Default: AUTO
  • TIMESTAMP_FORMAT: Defines the format of timestamp values in the data files.

    • Configuration Key: timestamp_format
    • Data Type: String
    • Default: AUTO
  • TRIM_SPACE: Boolean that specifies whether to remove white space from fields.

    • Configuration Key: trim_space
    • Data Type: Boolean
    • Default: FALSE
  • VALIDATE_UTF8: Boolean that specifies whether to validate UTF-8 character encoding in string column data.

    • Configuration Key: validate_utf8
    • Data Type: Boolean
    • Default: TRUE
    VALIDATE_UTF8 only accepts TRUE

    This parameter only supports TRUE as value. Using FALSE would result in compiler failure. This is due to the fact FALSE value has been made obsolete by Snowflake as of release 2022_03.

info

For more information, See Snowflake Docs.

FORMAT_TYPE set to JSON

  • ALLOW_DUPLICATE: Boolean that specifies to allow duplicate object field names

    • Configuration Key: allow_duplicate
    • Data Type: Boolean
    • Default: FALSE
  • BINARY_FORMAT: Defines the encoding format for binary input or output.

    • Configuration Key: binary_format
    • Data Type: String
    • Possible Values:
      • HEX (Default)
      • BASE64
      • UTF8
  • COMPRESSION: Specifies the current compression algorithm for the data file.

    • Configuration Key: compression
    • Data Type: String
    • Possible Values:
      • AUTO (Default)
      • GZIP
      • BZ2
      • BROTLI
      • ZSTD
      • DEFLATE
      • RAW_DEFLATE
      • NONE
  • DATE_FORMAT: Defines the format of date values in the data files.

    • Configuration Key: date_format
    • Data Type: String
    • Default: AUTO
  • ENABLE_OCTAL: Boolean that enables parsing of octal numbers.

    • Configuration Key: enable_octal
    • Data Type: Boolean
    • Default: FALSE
  • FILE_EXTENSION: Specifies the extension for files unloaded to a stage.

    • Configuration Key: file_extension
    • Data Type: String
  • IGNORE_UTF8_ERRORS: Boolean that specifies whether UTF-8 encoding errors produce error conditions.

    • Configuration Key: ignore_utf8_errors
    • Data Type: Boolean
    • Default: FALSE
  • NULL_IF: String used to convert to and from SQL NULL.

    • Configuration Key: null_if

    • Data Type: List/array of string

    • Default: \\N

      note

      Due to a parsing issue, double backslashes (\\) are converted to single backslashes (\). This results in DataOps always detecting a change between the target configuration and Snowflake and applying an update to the file format.

  • REPLACE_INVALID_CHARACTERS: Boolean that specifies whether to replace invalid UTF-8 characters with the Unicode replacement character.

    • Configuration Key: replace_invalid_characters
    • Data Type: Boolean
    • Default: FALSE
  • STRIP_OUTER_ARRAY: Boolean that instructs the JSON parser to remove outer brackets.

    • Configuration Key: strip_outer_array
    • Data Type: Boolean
    • Default: FALSE
  • SKIP_BYTE_ORDER_MARK: Boolean that specifies whether to skip the BOM (byte order mark), if present in a data file.

    • Configuration Key: skip_byte_order_mark
    • Data Type: Boolean
    • Default: FALSE
  • STRIP_NULL_VALUES: Boolean that instructs the JSON parser to remove object fields or array elements containing null values.

    • Configuration Key: strip_null_values
    • Data Type: Boolean
    • Default: FALSE
  • TIME_FORMAT: The format of time values in the data files.

    • Configuration Key: time_format
    • Data Type: String
    • Default: AUTO
  • TIMESTAMP_FORMAT: Defines the format of timestamp values in the data files.

    • Configuration Key: timestamp_format
    • Data Type: String
    • Default: AUTO
  • TRIM_SPACE: Boolean that specifies whether to remove white space from fields.

    • Configuration Key: trim_space
    • Data Type: Boolean
    • Default: FALSE
info

For more information, See Snowflake Docs.

FORMAT_TYPE set to AVRO

  • COMPRESSION: Specifies the current compression algorithm for the data file.
    • Configuration Key: compression
    • Data Type: String
    • Possible Values:
      • AUTO
      • GZIP
      • BZ2
      • BROTLI
      • ZSTD
      • DEFLATE
      • RAW_DEFLATE
      • NONE
  • NULL_IF: String used to convert to and from SQL NULL.
    • Configuration Key: null_if
    • Data Type: List/array of string
  • TRIM_SPACE: Boolean that specifies whether to remove white space from fields.
    • Configuration Key: trim_space
    • Data Type: Boolean
info

For more information, See Snowflake Docs.

FORMAT_TYPE set to ORC

  • NULL_IF: String used to convert to and from SQL NULL.
    • Configuration Key: null_if
    • Data Type: List/array of string
  • TRIM_SPACE: Boolean that specifies whether to remove white space from fields.
    • Configuration Key: trim_space
    • Data Type: Boolean
info

For more information, See Snowflake Docs.

FORMAT_TYPE set to PARQUET

  • COMPRESSION: Specifies the current compression algorithm for the data file.
    • Configuration Key: compression
    • Data Type: String
    • Possible Values:
      • AUTO (Default)
      • LZO
      • SNAPPY
      • NONE
  • BINARY_AS_TEXT: Boolean that specifies whether to interpret columns with no defined logical data type as UTF-8 text.
    • Configuration Key: binary_as_text
    • Data Type: Boolean
    • Default: True
  • NULL_IF: String used to convert to and from SQL NULL.
    • Configuration Key: null_if
    • Data Type: List/array of string
    • Default: []
  • TRIM_SPACE: Boolean that specifies whether to remove white space from fields.
    • Configuration Key: trim_space
    • Data Type: Boolean
    • Default: False
info

For more information, See Snowflake Docs.

FORMAT_TYPE set to XML

  • COMPRESSION: Specifies the current compression algorithm for the data file.
    • Configuration Key: compression
    • Data Type: String
    • Possible Values:
      • AUTO
      • GZIP
      • BZ2
      • BROTLI
      • ZSTD
      • DEFLATE
      • RAW_DEFLATE
      • NONE
  • DISABLE_SNOWFLAKE_DATA: Boolean that specifies whether the XML parser disables recognition of Snowflake semi-structured data tags.
    • Configuration Key: disable_snowflake_data
    • Data Type: Boolean
  • DISABLE_AUTO_CONVERT: Boolean that specifies whether the XML parser disables automatic conversion of numeric and Boolean values from text to native representation.
    • Configuration Key: disable_auto_convert
    • Data Type: Boolean
  • IGNORE_UTF8_ERRORS: Boolean that specifies whether UTF-8 encoding errors produce error conditions.
    • Configuration Key: ignore_utf8_errors
    • Data Type: Boolean
  • PRESERVE_SPACE: Boolean that specifies whether the XML parser preserves leading and trailing spaces in element content.
    • Configuration Key: preserve_space
    • Data Type: Boolean
  • STRIP_OUTER_ELEMENT: Boolean that specifies whether the XML parser strips out the outer XML element, exposing 2nd level elements as separate documents.
    • Configuration Key: strip_outer_element
    • Data Type: Boolean
  • SKIP_BYTE_ORDER_MARK: Boolean that specifies whether to skip the BOM (byte order mark) if present in a data file.
    • Configuration Key: skip_byte_order_mark
    • Data Type: Boolean
info

For more information, See Snowflake Docs.

Basic syntax

databases:
<database-name>:
schemas:
<schema-name>:
file_formats:
<format-name>:
<configuration-key>: <value>
grants:
<privilege>:
- <role-name>
- <role-name>

Supported file_format Grants to Roles

Following is the list of Privileges Grant to Roles that can be specified in the file_format definition

  • ALL PRIVILEGES
  • USAGE
  • OWNERSHIP

Examples

databases:
SALES_RECORD:
comment: "product sales record"
schemas:
SALES_RECORD_DAILY:
comment: "SALES_RECORD_DAILY"
file_formats:
SALES_RECORD_:
format_type: CSV
grants:
USAGE:
- ROLE_1