- May 2, 2024
- Posted by: Sriramvel M
- Categories:
Introduction:
Snowflake Cortex offers an ML function specifically designed for classification tasks. This function is currently available in public preview (as of March 2024) and lets you categorize your data into predefined classes. It supports both binary (two classes) and multi-class (more than two classes) classification problems. It is driven by the gradient boosting algorithm. In binary classification, the model undergoes training utilizing an area-under-the-curve loss function, while in multi-class classification, it is trained employing a logistic loss function.
Here’s a quick rundown of its capabilities:
- Automated predictions: The function leverages machine learning to uncover patterns in your data and generate predictions based on those patterns.
- No need for ML expertise: Snowflake handles the underlying model complexity. You just provide your data, and the function takes care of the rest.
- Flexibility: It works with various data types, including numeric, Boolean, and even string data.
- Feature importance: The function can highlight the most influential features that drive the model’s predictions.
This functionality allows data analysts and business users to leverage machine learning for classification tasks directly within Snowflake, without needing to move their data or become ML experts.
Let’s build a model in Snowflake using this native Classifier object.
The Step-by-step explanation is given below,
About the Data:
The Customer Churn Dataset which is available in Kaggle is used here for the implementation of our use case.
Dataset Link: Churn Dataset
Step -1: Import of Data:
- A Database named Customer and a Schema named Bank is created under which the dataset will be loaded to a table called BANK_DATA.
Step -2: Train-Test Split of Data:
- The data is split into train and test sets for model training and evaluation purposes. Here we split it in the ratio 80:20.
- TRAIN_DATA and TEST_DATA tables contain the training and test datasets respectively
Step – 3: Create ML Classifier:
- An ML Classifier named Churn classifier is created using the train dataset by mentioning appropriate input variables (The table and the target variable) and is shown in the image below:
The classifier has now been successfully created.
Step – 4: Prediction:
- Using the Churn Classifier created above, the prediction is run with the help of native Snowflake SQL commands as shown below.
- TEST_DATA which contains the data marked for testing is used here
- In the dictionary output – “class” has two outcomes – 0 being Customer NOT likely to Churn and 1 being likely to churn.
Step – 5: Model Evaluation:
- The Model is evaluated by calling SHOW_EVALUATION_METRICS ()
- From the evaluation, it is evident that the model performs very well for both the classes.
Step – 6: Confusion Matrix:
- A Confusion Matrix gives information about correctly classified and misclassified predictions per class. It helps in understanding the classes that are being confused by model as other class
- A Confusion Matrix displaying information about predicting ability of the model is shown below,
- The indicators like True Positive, True Negative, False Positive and False Negative are missing in the above outcome. They are necessary for model interpretation. So, we store the above output in a table called Confusion_Matrix.
- A column named OUTCOME is added to the table to store the indicators. .
- Now, the column is updated with appropriate indicators based on the values of ACTUAL and PREDICTED CLASS.
- The updated Confusion_Matrix table is shown below,
Step – 7: Feature Importance Chart:
- Feature Importance is a method that assigns scores for all the independent variables used in the model. It helps to identify the key variables the affects the dependent variable the most
- The feature importance chart is displayed using the native Snow Feature Importance function.
Conclusion:
Snowflake’s Cortex ML classification function brings machine learning power to everyday data analysis. You can now leverage automated classification to sort your data into meaningful groups, uncover hidden patterns, and gain deeper insights. This democratizes machine learning, making it accessible to a wider range of users within your organization. So, if you’re looking to streamline data analysis and unlock the potential of your data, be sure to check out Snowflake Cortex’s new ML classification function.
Cittabase is a select partner with Snowflake. Please feel free to contact us regarding your Snowflake solution needs. Our snowflake solutions encompass a suite of services for your data integration and migration needs. We are committed to providing personalized assistance and support customized to your requirements.