- May 21, 2024
- Posted by: Reshma M
- Category: Snowflake
Introduction:
In this blog, we will explore how to integrate Git with Snowflake. Git is crucial for version control and collaborative development, while Snowflake offers powerful cloud-based data warehousing solutions. We will provide an overview of the key elements, processes, steps, and use cases involved in synchronizing data between Git and Snowflake, highlighting the importance of seamless integration in modern data management workflows.
Prerequisites:
- Basic understanding of Git version control.
- Access to snowflake account with appropriate privileges.
Snowflake working with git :
With an integrated Git repository, you can fetch all the files from your remote Git repository to a Snowflake repository stage for use in Snowflake applications. The files in the repository stage represent a full clone of the repository that you can refresh as the repository changes.
With Snowflake git integration, you can connect securely to a git repo hosted outside of Snowflake (GitHub, GitLab, Azure DevOps, or BitBucket – currenly supported) and have the contents of that repo synced to your Snowflake account. This allows you to have the SQL scripts, Snowpark functions and procedures, Native Apps, and Streamlit Apps to pull from git.
Steps involved in Git Integration:
Step 1: Develop and maintain code in the git.
Develop and maintain the code that you want to execute in the snowflake.
Step 2: Create Git API integration and repository stage.
When you integrate a Git repository and clone the repository, Snowflake creates a Git repository stage that specifies the location of the repository, credentials (if needed), and details about how Snowflake should interact with the Git repository API.
To use a Git repository with Snowflake, follow these steps:
- Create a secret, if needed, to contain credentials for authenticating with the repository.
- Create an API integration to specify details about Snowflake interaction with the Git repository API.
- Create a Snowflake Git repository stage to which you can synchronize files from the repository.
When you want to refresh your repository stage, you can fetch contents of the integrated Git repository to the repository stage by using the ALTER GIT REPOSITORY command.
Now in the database/schema created, I have a repository stage, which has all files from the git repository. It has a logical naming structure so I can navigate across files in different branches (@repo_name/branches/branch_name), tags (@repo_name/tags/tag_name) or commits (@repo_name/commits/commit_hash).
The below command helps to view the local repository in snowflake which is same as git repository. You can view a list of files in the repository by using the LIST command in the following forms :
ls @git_sample/branches/main;
You can view branches and tags by using the SHOW GIT BRANCHES and SHOW GIT TAGS commands.
You can view repository stage properties by using the SQL commands SHOW GIT REPOSITORIES and DESCRIBE GIT REPOSITORY.
Use Case: 1) Execute SQL code from a repository.
You can use EXECUTE IMMEDIATE FROM to execute code in a repository stage.
In my repo I have a file that creates a table, creates a procedure (or replaces), and creates a task I also have a base.sql file which calls the other files in the right order. I can execute all these files in Snowflake which will execute the latest version of them in my main branch with the following:
This base.sql file has execute immediate statements to call & execute the other files.
Use Case: 2) Execute Snowpark code from a repository.
I can write a simple procedure in the snowflake to import the python file from the repo. Here the IMPORT statement fetches the file from the repo.
Execute the procedure.
The following code executes the procedure.
Conclusion :
Hence, by following the steps outlined in this guide, you can seamlessly synchronize code and data between Snowflake and your Git repository, ensuring the collaboration and data integrity and make use it for snowflake applications.