- June 5, 2025
- Posted by: Azar Mohammed
- Category: Snowflake

Overview:
Streaming data ingestion has traditionally been a complex, out-of-the-box scenario when integrating with modern cloud platforms. Organizations often had to rely on standalone tools separate from their core data pipelines. This added layer of complexity, with additional infrastructure, maintenance overhead, and operational silos.
📌 Real World Use Case
Modern banks must process millions of financial transactions daily—across mobile apps, online platforms, ATMs, and in-branch systems—while staying ahead of sophisticated cyber threats and financial fraud.
Failing to detect suspicious activity in real-time can result in:
- 🛑 Financial loss
- 🔐 Compromised customer trust
- ⚖️ Regulatory penalties (e.g., AML, PCI DSS, GDPR violations)

🔗 Kafka + Streaming: The Traditional Approach
Apache Kafka is the go-to backbone for real-time ingestion and distribution. However, building a full fraud detection and cybersecurity pipeline with Kafka often requires:
- Apache Flink/Spark for processing
- External fraud engines for ML
- Custom microservices for alerts
- Manual integrations for audit storage and dashboards
This makes it complex, fragile, and expensive to scale or govern.
🚀 Enter Snowflake ❄️ Open Flow: Streamlined Banking Intelligence
Snowflake Open Flow changes the game by offering a cloud-native, no-code/low-code framework to build and operate streaming data pipelines within the Snowflake ecosystem—natively and securely.
You can now ingest Kafka topics directly into Snowflake and build real-time data flows for fraud detection and cyber defense without external orchestration or infrastructure. Combining this with Snowflake’s Cortex Anomaly detection functions will significantly reduce the operational overhead from your cyber security team.
As a result, teams can simplify architecture, reduce operational costs, and accelerate time to insight, all while working within the same secure and governed data platform.
Let’s Deep Dive into setting up the Kafka Instance in Openflow,
Pre-Requisites:
OpenFlow – is now generally available (GA) across all AWS commercial regions and can be easily configured through the Snowflake Snowsight UI. This powerful capability enables seamless integration of streaming and batch workloads within the Snowflake platform itself.
For step-by-step guidance on setting up OpenFlow, refer to these detailed blog posts:
Snowflake Open Flow in Action – Part 1 – Snowflake OpenFlow – Native Data Integration Just Got Real – cittabase)
Streaming Data Source – In this example, we’re using a custom Kafka instance running inside a Docker container to simulate streaming data ingestion into Snowflake. However, Snowflake OpenFlow is highly flexible and supports integration with other streaming platforms as well, including AWS Kinesis, Amazon Firehose, Google Pub/Sub, Redpanda and Confluent Cloud, depending on your infrastructure and requirements.
Open Flow UI Overview:
Launching OpenFlow can be done directly from the Snowflake Snowsight UI.

Once initiated, you’ll be redirected to a dedicated interface where you can manage Deployments and Runtimes (the underlying compute engines that power your data integration workflows).
Within the Overview tab, you’ll find a wide range of connectors and data sources to choose from. For example, you can select the Kafka connector with SASL authentication, configure the necessary parameters, and seamlessly attach it to your runtime environment.

Runtime and Parameters:
After adding the connectors to the run time which is the process group, we must setup three different parameters for the process group,

Parameter Definitions :
1.Kafka SASL Source Parameters :

If your streaming data is formatted in AVRO, you can configure the corresponding AVRO schema within the parameters section. If you’re using another format, this step can be skipped.
Next, fill out the required Kafka source parameters. These include critical details such as:
- Kafka Bootstrap Servers – the address of your Kafka broker(s)
- Topic name
- Authentication credentials, including:
- SASL mechanism (e.g., PLAIN)
- Username and password
- Security protocol (e.g., SASL_PLAINTEXT)
These configurations ensure that Snowflake can securely connect to your Kafka instance and consume streaming data in real time.
2.Kafka SASL Destination Parameters :

Kafka Destination parameters. These include critical details such as:
- Destination Database, Schema and the account identifier – basic details of Snowflake destination account.
- Authentication credentials, including:
- Authentication Strategy – We have options to handle KEY_PAIR / SNOWFLAKE_SESSION_TOKEN – when we are running flow on SPCS, KEY_PAIR when we want to setup access using private key
These configurations ensure that Snowflake can securely connect to your Snowflake instance and consume streaming data in real time.
3.Kafka SASL Ingestion Parameters :

It’s not mandatory to fill out these parameters separately, as most of them are automatically referenced from the source and destination parameter configurations. However, there may be a few additional parameters that you can configure, such as:
- DLQ (Dead Letter Queue) Topic
- Consumer Group ID
These fields are optional and not required for a basic setup.
For a complete list of mandatory and optional parameters for each section, refer to the official General-Kafka-ingestion-parameters for accurate and up-to-date guidance.
Process Groups (DAGs):
A single process group in Snowflake OpenFlow can have multiple linked operations, all of which are preconfigured and orchestrated to work together. When you navigate into the Apache Kafka SASL process group, you’ll see several distinct operations, each representing an individual step in the workflow.
These operations can be visualized as independent DAGs (Directed Acyclic Graphs), connected to form a cohesive data pipeline. Each operation can be referenced, monitored, and modified individually, allowing for fine-grained control over your streaming ingestion and transformation logic.

As referred in the above image,
- Consume From Kafka
This is the entry point for ingesting streaming data from Kafka. It connects to the specified Kafka topic(s) and begins consuming data in real time. - Map Topic to Table
This operation is responsible for mapping incoming Kafka topics to corresponding Snowflake tables. It ensures that data from each topic is correctly transformed and loaded into the appropriate target table within Snowflake.
Additionally, you have the flexibility to configure actions based on the outcome of each processor—for example, defining different paths or behaviors for success and failure events. This enables more robust and controlled data flow management within your pipeline.

Let’s dive into action:
Now all the configuration and setup has been completed let’s dive into seeing Open flow streaming in action.
All we need to do now, is enabling all the control services of the Process group.

Step 1:
Enable all the controller services, which are automatically referenced from your source and destination parameters. These services manage the necessary connections and configurations for data flow.
Step 2:
Enable the Process Group that contains your Kafka ingestion logic. This will activate all the linked operations within the group.
Step 3:
Finally, click Start to kick off the streaming data ingestion process and begin real-time data flow into Snowflake.
Once the process is started, you’ll notice that a new table is automatically created in the destination Snowflake database and schema.

Initially, 10 records are inserted into this table—confirming that the data pipeline is functioning as expected.

Let’s now update the schema of the streaming data by adding additional fields to the same Kafka topic. Remarkably, the table in Snowflake is automatically updated to reflect the new schema, and the newly added columns are populated with incoming data seamlessly.
Before Schema change:

After Schema Change:

Conclusion:
In summary, Snowflake OpenFlow redefines how organizations approach real-time data ingestion by bringing the entire pipeline natively into the Snowflake environment. Unlike traditional architectures that rely heavily on third-party tools for streaming integration—often adding cost, complexity, and vendor lock-in __ OpenFlow allows you to build, run, and monitor streaming pipelines without ever leaving Snowflake.
This means no more managing external connectors, waiting for sync intervals, or dealing with the limitations of black-box ETL tools. Instead, teams gain full control, transparency, and scalability across both batch and streaming workloads—all within a single, governed platform. With OpenFlow, real-time data integration is no longer an afterthought or an external dependency— it’s a first-class, in-platform capability.
Please feel free to reach out to us for your Snowflake solution needs. Cittabase is a Premier partner with Snowflake.
Referral Links:
- Openflow Connector for Kafka | Snowflake Documentation
- Snowflake Open Flow in Action – Part 1 – Snowflake Openflow – Native Data Integration Just Got Real – cittabase