Here is a collection of interesting articles that i read as i looked into getting started with snowflake and DBT
- https://blog.getdbt.com/how-we-configure-snowflake/
This is a good article to get a high level overview of how you should be structuring different layers in snowflake
https://quickstarts.snowflake.com/guide/data_engineering_with_dbt/index.html?index=..%2F..index#1
good course to get started with snowflake
https://about.gitlab.com/handbook/business-technology/data-team/platform/#tdf
good look at the Gitlab enterprise dataplatform , they use snowflake , data warehouse , dbt for modeling and airflow for orchestration
and here are steps at a high level on how to set up an environment to run dbt on win10
- get a conda environment created -> C:\work\dbt>conda create -n dbtpy38 python=3.8
- notice i used 3.8 for python , i was running into some cryptography library issues with 3.9
- activate conda environment -> C:\work\dbt>conda activate dbtpy38
- clone lab environment -> git clone https://github.com/dbt-labs/dbt.git
- cd into dbt and run pip install and feed in the requirements.txt ->(dbtpy38) C:\work\dbt\dbt>pip install -r requirements.txt
- start visual studio code from this directory by typing code . and you should be in visual studio.
- create a new dbt project with the inti command -> dbt init dbt_hol
- this creates a new project folder and also a default profile file which is in your home directory
- open up the folder that has the profiles.yml file by typing in start C:\Users\vargh.dbt
- update the profiles with your account name and user name and password
- the account name should be the part of the url after https:// and before snowflakecomputing .com for e.g in my case it was -> “xxxxxx.east-us-2.azure ” . It automatically appends snowflakecomputing.com
- update the dbt_project.yml file with the project name in name , profile and model section as shown here -https://quickstarts.snowflake.com/guide/data_engineering_with_dbt/index.html?index=..%2F..index#2
- once everything is set ensure you can successfully run dbt debug, this should come up with a connection ok if all credentials are ok.
- if you run into accessing get data from the data markeplace , make sure to use the account admin role in snowflake as opposed to the sysadmin role
- for dbt user , we will need to grant appropriate permissions to the dbtuser role
- explore packages in https://hub.getdbt.com/
steps to build a pipeline
create a the source.yml file under the corresponding model directory. This should include the name of the database, schema and the tables we will be using a source
The next step is to define a base view as defined in the best practices
https://docs.getdbt.com/docs/guides/best-practices
https://discourse.getdbt.com/t/how-we-structure-our-dbt-projects/355
i explicitly had to grant priv to the dbt roles
it was failing with this error before
12:17:07 | 1 of 2 START view model l10_staging.base_knoema_fx_rates…………. [RUN]
12:17:07 | 2 of 2 START view model l10_staging.base_knoema_stock_history…….. [RUN]
12:17:09 | 1 of 2 ERROR creating view model l10_staging.base_knoema_fx_rates…. [ERROR in 1.55s]
12:17:09 | 2 of 2 ERROR creating view model l10_staging.base_knoema_stock_history [ERROR in 1.56s]
12:17:10 |
12:17:10 | Finished running 2 view models in 6.59s.
Completed with 2 errors and 0 warnings:
Database Error in model base_knoema_fx_rates (models\l10_staging\base_knoema_fx_rates.sql)
002003 (02000): SQL compilation error:
Database 'ECONOMY_DATA_ATLAS' does not exist or not authorized.
compiled SQL at target\run\dbt_hol\models\l10_staging\base_knoema_fx_rates.sql
Database Error in model base_knoema_stock_history (models\l10_staging\base_knoema_stock_history.sql)
002003 (02000): SQL compilation error:
Database 'ECONOMY_DATA_ATLAS' does not exist or not authorized.
compiled SQL at target\run\dbt_hol\models\l10_staging\base_knoema_stock_history.sql
used these statements to grant access
GRANT IMPORTED PRIVILEGES ON DATABASE “ECONOMY_DATA_ATLAS” TO ROLE dbt_dev_role
GRANT IMPORTED PRIVILEGES ON DATABASE “ECONOMY_DATA_ATLAS” TO ROLE dbt_prod_role
then i was able to query the tables using the dbt role and also run the dbt command and it worked successfully
Found 2 models, 0 tests, 0 snapshots, 0 analyses, 324 macros, 0 operations, 0 seed files, 2 sources, 0 exposures
12:27:42 | Concurrency: 200 threads (target='dev')
12:27:42 |
12:27:42 | 1 of 2 START view model l10_staging.base_knoema_fx_rates…………. [RUN]
12:27:42 | 2 of 2 START view model l10_staging.base_knoema_stock_history…….. [RUN]
12:27:44 | 2 of 2 OK created view model l10_staging.base_knoema_stock_history… [SUCCESS 1 in 2.13s]
12:27:45 | 1 of 2 OK created view model l10_staging.base_knoema_fx_rates…….. [SUCCESS 1 in 2.25s]
12:27:46 |
12:27:46 | Finished running 2 view models in 7.98s.
Completed successfully
cheat sheet
https://datacaffee.com/dbt-data-built-tool-commands-cheat-sheet/
here is a write up on how to use dbt tests https://docs.getdbt.com/docs/building-a-dbt-project/tests