How to Change the Column Datatype in a Table
If the table in Snowflake contains data, changing the datatype of a column requires additional consideration. You must ensure that you can successfully convert the data in the column to the new type without errors or loss of information.
Here's a general approach for changing the column datatype in a table that already has data in Snowflake:
- Create a new table with the desired schema, including the updated datatype for the column.
- Copy the data from the original table to the new table, performing any necessary datatype conversions or transformations during the process.
- Drop the original table.
- Rename the new table to the original table name.
Let's illustrate this with an example using SOLE (Snowflake Object Lifecycle Engine). Although SOLE doesn't support directly copying the data from one table into another, you can achieve this by adding hooks to the SOLE configuration. To implement the solution, you can use internal stages, backup tables, and hooks in SOLE with the following steps:
Step 1 - Create a backup table and internal stage, and load data into the backup table
a. Add configuration for a backup table
To create a backup table, copy the configuration of the existing table and add _BACKUP
as a suffix in the table name.
tables:
TABLE_DATA_TYPE_CHANGE:
columns:
START_DATE:
type: DATE
END_DATE:
type: DATE
TABLE_DATA_TYPE_CHANGE_BACKUP:
columns:
START_DATE:
type: DATE
END_DATE:
type: DATE
b. Add configuration for an internal stage
You can use an internal stage to load data from one table into another. To create an internal stage, add the following configuration in the SANDBOX
schema:
SANDBOX:
tables:
TABLE_DATA_TYPE_CHANGE:
columns:
START_DATE:
type: DATE
END_DATE:
type: DATE
TABLE_DATA_TYPE_CHANGE_BACKUP:
columns:
START_DATE:
type: DATE
END_DATE:
type: DATE
stages:
DATA_LOAD:
comment: This stage is used to move the data from backup to the main table.
c. Add hooks for data load and unload
You can load data into a stage and then into the backup table using database-level post_hooks
. These will run once all database-level objects are created.
Add post_hooks
by creating a new file, hooks.template.yml
in the dataops/snowflake
directory, which has the following content:
database_level_hooks:
post_hooks:
- command: "COPY INTO @{{ env.DATAOPS_DATABASE }}.SANDBOX.DATA_LOAD/result/data_ FROM (SELECT * FROM {{ env.DATAOPS_DATABASE }}.SANDBOX.TABLE_DATA_TYPE_CHANGE) file_format=(compression='gzip') OVERWRITE=TRUE;"
environment: snowflake
- command: "COPY INTO {{ env.DATAOPS_DATABASE }}.SANDBOX.TABLE_DATA_TYPE_CHANGE_BACKUP FROM @{{ env.DATAOPS_DATABASE }}.SANDBOX.DATA_LOAD;"
environment: snowflake
d. Commit and run a pipeline
Push the changes and run a pipeline once you have added all objects and hooks. The SOLE job performs the following:
- Create the new table
TABLE_DATA_TYPE_CHANGE_BACKUP
- Create the new internal stage
DATA_LOAD
- Load data from the original table
TABLE_DATA_TYPE_CHANGE
into the stageDATA_LOAD
- Load data from the internal stage into the backup table
Once the job is complete, check in Snowflake that the new table is created, and data is loaded into it.
Step 2 - Create a new table with an updated datatype and drop the old table
After ensuring the data is safe in the backup table, you can create a new table with the updated type and drop the old one.
a. Create a new table with an updated datatype
You can now add a new table with the updated datatype. Add the suffix _NEW
to the table name. SOLE creates a new table in Snowflake named TABLE_DATA_TYPE_CHANGE_NEW
and drops the old table name.
tables:
TABLE_DATA_TYPE_CHANGE_NEW:
columns:
START_DATE:
type: VARCHAR(16777216)
END_DATE:
type: VARCHAR(16777216)
TABLE_DATA_TYPE_CHANGE_BACKUP:
columns:
START_DATE:
type: DATE
END_DATE:
type: DATE
b. Update the hooks
To load the data from the backup
table into the new table, update the queries in the post_hooks
name.
database_level_hooks:
post_hooks:
- command: "COPY INTO @{{ env.DATAOPS_DATABASE }}.SANDBOX.DATA_LOAD/result/data_ FROM (SELECT * FROM {{ env.DATAOPS_DATABASE }}.SANDBOX.TABLE_DATA_TYPE_CHANGE_BACKUP) file_format=(compression='gzip') OVERWRITE=TRUE;"
environment: snowflake
- command: "COPY INTO {{ env.DATAOPS_DATABASE }}.SANDBOX.TABLE_DATA_TYPE_CHANGE_NEW FROM @{{ env.DATAOPS_DATABASE }}.SANDBOX.DATA_LOAD;"
environment: snowflake
c. Commit and run a pipeline
Once the configuration is updated, push the changes and run a pipeline. The SOLE job performs the following:
- Drop the table
TABLE_DATA_TYPE_CHANGE
- Create a new table
TABLE_DATA_TYPE_CHANGE_NEW
- Load data from the backup table into the stage
DATA_LOAD
- Load data from the internal stage into the new table
TABLE_DATA_TYPE_CHANGE_NEW
Step 3 - Rename the table (optional)
This step is optional. You can use it to revert the new table name - remove the _NEW
suffix.
a. Rename table
Remove the _NEW
suffix from the table name to drop the table and create a new one with the original table name.
b. Update hooks
Update the hooks to load the data into the updated table name.
database_level_hooks:
post_hooks:
- command: "COPY INTO @{{ env.DATAOPS_DATABASE }}.SANDBOX.DATA_LOAD/result/data_ FROM (SELECT * FROM {{ env.DATAOPS_DATABASE }}.SANDBOX.TABLE_DATA_TYPE_CHANGE_BACKUP) file_format=(compression='gzip') OVERWRITE=TRUE;"
environment: snowflake
- command: "COPY INTO {{ env.DATAOPS_DATABASE }}.SANDBOX.TABLE_DATA_TYPE_CHANGE FROM @{{ env.DATAOPS_DATABASE }}.SANDBOX.DATA_LOAD;"
environment: snowflake
c. Commit and run a pipeline
Once the configuration is updated, push the changes and run a pipeline. The SOLE job performs the following:
- Drop the table
TABLE_DATA_TYPE_CHANGE_NEW
- Create a new table
TABLE_DATA_TYPE_CHANGE
- Load data from the backup table into the stage
DATA_LOAD
- Load data from the internal stage into the new table
TABLE_DATA_TYPE_CHANGE
Step 4 - Drop the backup table and the internal stage
Once the datatype for the column in the table is updated and the data is safely loaded back into the table, drop the temporary resources and configuration.
-
Remove the
hooks.template.yml
file from the directory. -
Remove the configuration for the backup table
TABLE_DATA_TYPE_CHANGE_BACKUP
. -
Remove the configuration for the internal stage
DATA_LOAD
. -
Commit and run a pipeline.
Once the configuration is updated, you can push the changes and run a pipeline. The SOLE job performs the following:
- Drop the table
TABLE_DATA_TYPE_CHANGE_BACKUP
- Drop the internal stage
DATA_LOAD
- Drop the table