Snowflake and DBT

Here is a collection of interesting articles that i read as i looked into getting started with snowflake and DBT

  • https://blog.getdbt.com/how-we-configure-snowflake/

This is a good article to get a high level overview of how you should be structuring different layers in snowflake

https://quickstarts.snowflake.com/guide/data_engineering_with_dbt/index.html?index=..%2F..index#1

good course to get started with snowflake

https://about.gitlab.com/handbook/business-technology/data-team/platform/#tdf

good look at the Gitlab enterprise dataplatform , they use snowflake , data warehouse , dbt for modeling and airflow for orchestration

and here are steps at a high level on how to set up an environment to run dbt on win10

  • get a conda environment created -> C:\work\dbt>conda create -n dbtpy38 python=3.8
  • notice i used 3.8 for python , i was running into some cryptography library issues with 3.9
  • activate conda environment -> C:\work\dbt>conda activate dbtpy38
  • clone lab environment -> git clone https://github.com/dbt-labs/dbt.git
  • cd into dbt and run pip install and feed in the requirements.txt ->(dbtpy38) C:\work\dbt\dbt>pip install -r requirements.txt
  • start visual studio code from this directory by typing code . and you should be in visual studio.
  • create a new dbt project with the inti command -> dbt init dbt_hol
  • this creates a new project folder and also a default profile file which is in your home directory
  • open up the folder that has the profiles.yml file by typing in start C:\Users\vargh.dbt
  • update the profiles with your account name and user name and password
  • the account name should be the part of the url after https:// and before snowflakecomputing .com for e.g in my case it was -> “xxxxxx.east-us-2.azure ” . It automatically appends snowflakecomputing.com
  • update the dbt_project.yml file with the project name in name , profile and model section as shown here -https://quickstarts.snowflake.com/guide/data_engineering_with_dbt/index.html?index=..%2F..index#2
  • once everything is set ensure you can successfully run dbt debug, this should come up with a connection ok if all credentials are ok.
  • if you run into accessing get data from the data markeplace , make sure to use the account admin role in snowflake as opposed to the sysadmin role
  • for dbt user , we will need to grant appropriate permissions to the dbtuser role
  • explore packages in https://hub.getdbt.com/

steps to build a pipeline

create a the source.yml file under the corresponding model directory. This should include the name of the database, schema and the tables we will be using a source

The next step is to define a base view as defined in the best practices

https://docs.getdbt.com/docs/guides/best-practices

https://discourse.getdbt.com/t/how-we-structure-our-dbt-projects/355

i explicitly had to grant priv to the dbt roles

it was failing with this error before

12:17:07 | 1 of 2 START view model l10_staging.base_knoema_fx_rates…………. [RUN]
12:17:07 | 2 of 2 START view model l10_staging.base_knoema_stock_history…….. [RUN]
12:17:09 | 1 of 2 ERROR creating view model l10_staging.base_knoema_fx_rates…. [ERROR in 1.55s]
12:17:09 | 2 of 2 ERROR creating view model l10_staging.base_knoema_stock_history [ERROR in 1.56s]
12:17:10 |
12:17:10 | Finished running 2 view models in 6.59s.
Completed with 2 errors and 0 warnings:
Database Error in model base_knoema_fx_rates (models\l10_staging\base_knoema_fx_rates.sql)
002003 (02000): SQL compilation error:
Database 'ECONOMY_DATA_ATLAS' does not exist or not authorized.
compiled SQL at target\run\dbt_hol\models\l10_staging\base_knoema_fx_rates.sql
Database Error in model base_knoema_stock_history (models\l10_staging\base_knoema_stock_history.sql)
002003 (02000): SQL compilation error:
Database 'ECONOMY_DATA_ATLAS' does not exist or not authorized.
compiled SQL at target\run\dbt_hol\models\l10_staging\base_knoema_stock_history.sql

used these statements to grant access

GRANT IMPORTED PRIVILEGES ON DATABASE “ECONOMY_DATA_ATLAS” TO ROLE dbt_dev_role

GRANT IMPORTED PRIVILEGES ON DATABASE “ECONOMY_DATA_ATLAS” TO ROLE dbt_prod_role

then i was able to query the tables using the dbt role and also run the dbt command and it worked successfully

Found 2 models, 0 tests, 0 snapshots, 0 analyses, 324 macros, 0 operations, 0 seed files, 2 sources, 0 exposures

12:27:42 | Concurrency: 200 threads (target='dev')
12:27:42 |
12:27:42 | 1 of 2 START view model l10_staging.base_knoema_fx_rates…………. [RUN]
12:27:42 | 2 of 2 START view model l10_staging.base_knoema_stock_history…….. [RUN]
12:27:44 | 2 of 2 OK created view model l10_staging.base_knoema_stock_history… [SUCCESS 1 in 2.13s]
12:27:45 | 1 of 2 OK created view model l10_staging.base_knoema_fx_rates…….. [SUCCESS 1 in 2.25s]
12:27:46 |
12:27:46 | Finished running 2 view models in 7.98s.
Completed successfully

cheat sheet

https://datacaffee.com/dbt-data-built-tool-commands-cheat-sheet/

here is a write up on how to use dbt tests https://docs.getdbt.com/docs/building-a-dbt-project/tests