Getting Started with Snowflake

~ John Thuma

Welcome to the exciting world of Snowflake. This OLAP database has taken the data world by storm and is not losing steam. This article will focus on five things I wish I had known before I started developing my first application in Snowflake. Those things include: cost control, development governance, observability planning, starting your first application, and getting rid of technology debt. Even if you are a pro at Snowflake you might find this article interesting. Let's start with cost controls.

COST CONTROLS: Cost optimization is top of mind for almost everyone. Snowflake works on a consumption model, which means what you use is what you spend – you control the value received. The big two items that generate spend in Snowflake are compute and data storage. Compute can be controlled using resource monitors, query timeouts, and RBAC controls. I have seen an out-of-control warehouse cost up to $5000 over a weekend. Managing data is also important. You should get rid of large stale tables, ensure time travel is used appropriately, use zero copy clones, avoid exfiltration costs, and have an archival strategy. Another component of the overall cost is from global services which comes from metadata queries and third-party tools.

DEVELOPMENT GOVERNANCE: One thing I constantly hear from customers is the need for development governance. Development governance is about designing patterns and practices for Snowflake data ingestion, data transformation, and data consumption and egress. Of course, we need CICD, but it is just as important for an organization to design methods for predictable management as solutions get promoted into production. A good Snowflake Development Governance program is essential for managing how work gets done in Snowflake across a variety of teams.

OBSERVABILITY: Everything you do in Snowflake is captured in the Snowflake Information schema database. A good observability plan allows for cost optimization, data privacy oversight, and management of other security and compliance areas. Snowflake provides information to manage high powered account access, all metadata changes, all queries that run, login events, and much more. There are third party and home-grown solutions available to aid with alerts and ensuring regulatory compliance auditability.

GETTING STARTED: This can be a very challenging step for any organization. Taking the ‘Big Bang’ approach can create delays and overall project risks, resulting in the migration of legacy technical debt. My approach is to start with a small, high-value business use case that can be accomplished in 4-8 weeks. Get a quick win, earn trust with the business, and have fun doing it. High-value business use cases involve the business and should have revenue gains or cost avoidance as a result.

TECHNOLOGY DEBT: With Snowflake you are starting over! My recommendation is to avoid the patterns and practices of the past. This means you can define new best practices for data ingest-transformation-consumption, data governance, naming conventions, and enforce RBAC properly. Avoid ‘lift and shift’ of legacy applications. Setup your environment to rapidly change and adapt as business needs shift.

Snowflake is the easiest, safest, and most affordable way to get data to the cloud. It is exciting, but at the same time, may create fear given the potential scope. You don’t have to be afraid! The important thing about Snowflake is that your organization likely already has all the necessary skills to successfully manage a Snowflake environment. BlueCloud can help with all the topics covered in this article and much more. We have skilled Snowflake architects and developers ready to help you in your journey to the data cloud.