How to Change the Column Datatype in a Table

If the table in Snowflake contains data, changing the datatype of a column requires additional consideration. You must ensure that you can successfully convert the data in the column to the new type without errors or loss of information.

Here's a general approach for changing the column datatype in a table that already has data in Snowflake:

Create a new table with the desired schema, including the updated datatype for the column.
Copy the data from the original table to the new table, performing any necessary datatype conversions or transformations during the process.
Drop the original table.
Rename the new table to the original table name.

Let's illustrate this with an example using SOLE (Snowflake Object Lifecycle Engine). Although SOLE doesn't support directly copying the data from one table into another, you can achieve this by adding hooks to the SOLE configuration. To implement the solution, you can use internal stages, backup tables, and hooks in SOLE with the following steps:

Step 1 - Create a backup table and internal stage, and load data into the backup table

a. Add configuration for a backup table

To create a backup table, copy the configuration of the existing table and add _BACKUP as a suffix in the table name.

databases.template.yml
tables:
  TABLE_DATA_TYPE_CHANGE:
    columns:
      START_DATE:
        type: DATE
      END_DATE:
        type: DATE
  TABLE_DATA_TYPE_CHANGE_BACKUP:
    columns:
      START_DATE:
        type: DATE
      END_DATE:
        type: DATE

b. Add configuration for an internal stage

You can use an internal stage to load data from one table into another. To create an internal stage, add the following configuration in the SANDBOX schema:

databases.template.yml
SANDBOX:
  tables:
    TABLE_DATA_TYPE_CHANGE:
      columns:
        START_DATE:
          type: DATE
        END_DATE:
          type: DATE
    TABLE_DATA_TYPE_CHANGE_BACKUP:
      columns:
        START_DATE:
          type: DATE
        END_DATE:
          type: DATE
  stages:
    DATA_LOAD:
      comment: This stage is used to move the data from backup to the main table.

c. Add hooks for data load and unload

You can load data into a stage and then into the backup table using database-level post_hooks. These will run once all database-level objects are created.

Add post_hooks by creating a new file, hooks.template.yml in the dataops/snowflake directory, which has the following content:

hooks.template.yml
database_level_hooks:
  post_hooks:
    - command: "COPY INTO @{{ env.DATAOPS_DATABASE }}.SANDBOX.DATA_LOAD/result/data_ FROM (SELECT * FROM {{ env.DATAOPS_DATABASE }}.SANDBOX.TABLE_DATA_TYPE_CHANGE) file_format=(compression='gzip') OVERWRITE=TRUE;"
      environment: snowflake
    - command: "COPY INTO {{ env.DATAOPS_DATABASE }}.SANDBOX.TABLE_DATA_TYPE_CHANGE_BACKUP FROM @{{ env.DATAOPS_DATABASE }}.SANDBOX.DATA_LOAD;"
      environment: snowflake

d. Commit and run a pipeline

Push the changes and run a pipeline once you have added all objects and hooks. The SOLE job performs the following:

Create the new table TABLE_DATA_TYPE_CHANGE_BACKUP
Create the new internal stage DATA_LOAD
Load data from the original table TABLE_DATA_TYPE_CHANGE into the stage DATA_LOAD
Load data from the internal stage into the backup table

Once the job is complete, check in Snowflake that the new table is created, and data is loaded into it.

Step 2 - Create a new table with an updated datatype and drop the old table

After ensuring the data is safe in the backup table, you can create a new table with the updated type and drop the old one.

a. Create a new table with an updated datatype

You can now add a new table with the updated datatype. Add the suffix _NEW to the table name. SOLE creates a new table in Snowflake named TABLE_DATA_TYPE_CHANGE_NEW and drops the old table name.

databases.template.yml
tables:
  TABLE_DATA_TYPE_CHANGE_NEW:
    columns:
      START_DATE:
        type: VARCHAR(16777216)
      END_DATE:
        type: VARCHAR(16777216)
  TABLE_DATA_TYPE_CHANGE_BACKUP:
    columns:
      START_DATE:
        type: DATE
      END_DATE:
        type: DATE

b. Update the hooks

To load the data from the backup table into the new table, update the queries in the post_hooks name.

hooks.template.yml
database_level_hooks:
  post_hooks:
    - command: "COPY INTO @{{ env.DATAOPS_DATABASE }}.SANDBOX.DATA_LOAD/result/data_ FROM (SELECT * FROM {{ env.DATAOPS_DATABASE }}.SANDBOX.TABLE_DATA_TYPE_CHANGE_BACKUP) file_format=(compression='gzip') OVERWRITE=TRUE;"
      environment: snowflake
    - command: "COPY INTO {{ env.DATAOPS_DATABASE }}.SANDBOX.TABLE_DATA_TYPE_CHANGE_NEW FROM @{{ env.DATAOPS_DATABASE }}.SANDBOX.DATA_LOAD;"
      environment: snowflake

c. Commit and run a pipeline

Once the configuration is updated, push the changes and run a pipeline. The SOLE job performs the following:

Drop the table TABLE_DATA_TYPE_CHANGE
Create a new table TABLE_DATA_TYPE_CHANGE_NEW
Load data from the backup table into the stage DATA_LOAD
Load data from the internal stage into the new table TABLE_DATA_TYPE_CHANGE_NEW

Step 3 - Rename the table (optional)

This step is optional. You can use it to revert the new table name - remove the _NEW suffix.

a. Rename table

Remove the _NEW suffix from the table name to drop the table and create a new one with the original table name.

b. Update hooks

Update the hooks to load the data into the updated table name.

databases.template.yml
database_level_hooks:
  post_hooks:
    - command: "COPY INTO @{{ env.DATAOPS_DATABASE }}.SANDBOX.DATA_LOAD/result/data_ FROM (SELECT * FROM {{ env.DATAOPS_DATABASE }}.SANDBOX.TABLE_DATA_TYPE_CHANGE_BACKUP) file_format=(compression='gzip') OVERWRITE=TRUE;"
      environment: snowflake
    - command: "COPY INTO {{ env.DATAOPS_DATABASE }}.SANDBOX.TABLE_DATA_TYPE_CHANGE FROM @{{ env.DATAOPS_DATABASE }}.SANDBOX.DATA_LOAD;"
      environment: snowflake

c. Commit and run a pipeline

Once the configuration is updated, push the changes and run a pipeline. The SOLE job performs the following:

Drop the table TABLE_DATA_TYPE_CHANGE_NEW
Create a new table TABLE_DATA_TYPE_CHANGE
Load data from the backup table into the stage DATA_LOAD
Load data from the internal stage into the new table TABLE_DATA_TYPE_CHANGE

Step 4 - Drop the backup table and the internal stage

Once the datatype for the column in the table is updated and the data is safely loaded back into the table, drop the temporary resources and configuration.

Remove the hooks.template.yml file from the directory.
Remove the configuration for the backup table TABLE_DATA_TYPE_CHANGE_BACKUP.
Remove the configuration for the internal stage DATA_LOAD.
Commit and run a pipeline.

Once the configuration is updated, you can push the changes and run a pipeline. The SOLE job performs the following:
- Drop the table TABLE_DATA_TYPE_CHANGE_BACKUP
- Drop the internal stage DATA_LOAD

Step 1 - Create a backup table and internal stage, and load data into the backup table​

a. Add configuration for a backup table​

b. Add configuration for an internal stage​

c. Add hooks for data load and unload​

d. Commit and run a pipeline​

Step 2 - Create a new table with an updated datatype and drop the old table​

a. Create a new table with an updated datatype​

b. Update the hooks​

c. Commit and run a pipeline​

Step 3 - Rename the table (optional)​

a. Rename table​

b. Update hooks​

c. Commit and run a pipeline​

Step 4 - Drop the backup table and the internal stage​

Step 1 - Create a backup table and internal stage, and load data into the backup table

a. Add configuration for a backup table

b. Add configuration for an internal stage

c. Add hooks for data load and unload

d. Commit and run a pipeline

Step 2 - Create a new table with an updated datatype and drop the old table

a. Create a new table with an updated datatype

b. Update the hooks

c. Commit and run a pipeline

Step 3 - Rename the table (optional)

a. Rename table

b. Update hooks

c. Commit and run a pipeline

Step 4 - Drop the backup table and the internal stage