Skip to main content
Version: devel

Overview and examples

dlt retrieves configuration and secrets from several locations like environment variables, dedicated files or secure vaults. It understands both simple and verbose layouts of configuration sections. You can use one of built-in credentials for popular external systems. Functions decorated with@dlt.source, @dlt.resource, or @dlt.destination can be configured without writing additional code - dlt will automatically inject missing arguments (like passwords or API keys) when you call them.

Choose where to store configuration

dlt+

To define your configuration (including sources, destinations, pipeline and parameters) in a declarative way using YAML files, check out dlt+.

dlt looks for configuration and secrets in various locations (environment variables, toml files or secure vaults) through config providers that are queried when your pipeline runs. You can pick a single location or combine them - for example, define secret api_key in environment variables and api_url in a TOML file. Providers are queried in the following order:

  1. Environment Variables: If a value is found in an environment variable, dlt uses it and doesn't check lower-priority providers.

  2. secrets.toml and config.toml files: These files store configuration values and secrets. secrets.toml contains sensitive information, while config.toml holds non-sensitive configuration.

  3. Vaults: Credentials stored in secure vaults like Google Secrets Manager, Azure Key Vault, or AWS Secrets Manager.

  4. Custom Providers added with register_provider: These are custom implementations you can create to use your own configuration formats or perform specialized preprocessing.

  5. Default Argument Values: The values specified in the function signature.

tip

Make sure your pipeline name contains only alphanumeric characters, hyphens (-), and underscores (_). Avoid whitespace and other punctuation to ensure compatibility with all configuration providers.

Select a configuration layout

You can define configuration in different ways depending on your project's complexity. For a simple pipeline with a single source and destination, your configuration can be straightforward:

Simplest source configuration:

api_key="some_value"

For destination, you typically need to configure credentials which group multiple related keys together. dlt places these under a credentials section.

[credentials]
user="dlthub"
password="some_value"

When using multiple sources with potentially conflicting argument names, or multiple destinations where you want separate credentials, you can organize your config keys with sections. Here's the recommended section layout that is most often used in this documentation and is also generated by dlt init command.

  • Use sources and destination top-level sections to separate their configurations
  • Use the Python module name where the source function is defined to separate configuration of sources defined in different modules
  • Use destination type to separate destinations
# source defined in notion.py
[sources.notion]
api_key="some_value"

Destination:

# use postgres destination
[destination.postgres.credentials]
user="dlthub"
password="some_value"

Refer to Add credentials guide for more examples and tips how to configure particular source and destination.

How dlt looks for values

dlt starts looking for a particular value with all possible sections present and if value is not found, it will eliminate rightmost section and try again.

For example, if the source function is in module notion.py:

# module: notion.py

@dlt.source
def notion_databases(api_key: str = dlt.secrets.value):
pass

dlt will search for the following keys in this order:

  1. sources.notion.notion_databases.api_key
  2. sources.notion.api_key
  3. sources.api_key
  4. api_key

Similarly with destination credentials. In that case credentials sections is considered a required grouping and won't be eliminated:

  1. destination.postgres.credentials.password
  2. destination.credentials.password
  3. credentials.password
tip

For more detailed information about configuration organization, see configuration and secrets structure.

tip

You can use pipeline name to create separate configurations for each pipeline in your project. Configuration values are searched first with the pipeline name prefix, then without it:

[pipeline_name_1.sources.google_sheets.credentials]
client_email = "<client_email_1>"
private_key = "<private_key_1>"
project_id = "<project_id_1>"

[pipeline_name_2.sources.google_sheets.credentials]
client_email = "<client_email_2>"
private_key = "<private_key_2>"
project_id = "<project_id_2>"

Use built-in credential types

Credentials are groups of configs and secrets that are defined together in order to access external systems. dlt implements several built-in credential types) to access AWS, Azure, Google Cloud and other common systems

Some of the credential types give you options how you specify them: For example, to connect to a sql_database source, you can either use a connection string:

[sources.sql_database]
credentials="snowflake://user:password@service-account/database?warehouse=warehouse_name&role=role"

Or set up the connection parameters separately:

[sources.sql_database.credentials]
drivername="snowflake"
username="user"
password="password"
database="database"
host="service-account"
warehouse="warehouse_name"
role="role"
tip

dlt can discover default credentials of all major cloud providers: it is able to use what is already present in the runtime environment: ie. when running in Colab or Google VM it has access to cloud credentials and if nothing is specified in the configuration it will use them instead.

Environment variables

Environment variables provide a convenient way to specify configuration and secrets, especially in deployment environments. When using environment variables, names are capitalized and sections are separated with double underscores (__).

For example, to set the Facebook Ads access token:

export SOURCES__FACEBOOK_ADS__ACCESS_TOKEN="<access_token>"

See the examples section for more details on setting up credentials with environment variables.

tip

For local development, you can use python-dotenv to automatically load variables from an .env file, making credential management easier and more secure.

tip

Environment variables can also retrieve secret values from /run/secrets/<secret-name> to seamlessly work with Kubernetes/Docker secrets.

For these secrets, dlt uses an alternative name format with lowercase letters, dashes (-) as separators, and underscores converted to dashes. For example, sources--facebook-ads--access-token would be checked for the above environment variable.

Only values marked as secrets (with dlt.secrets.value or using types like TSecretStrValue) are checked this way. Remember to name your secrets appropriately in Kubernetes resources or Docker Compose files.

Vaults

dlt may read configuration from secure vaults - specialized services for storing credentials.

secrets.toml and config.toml

The TOML configuration provider uses two separate files:

config.toml:

  • Contains non-sensitive configuration data that defines pipeline behavior
  • Includes settings like file paths, database hosts, timeouts, API URLs, and performance options
  • Values are accessible in code through the dlt.config dictionary
  • Can be safely committed to version control

secrets.toml:

  • Contains sensitive information that must be kept confidential
  • Includes credentials like passwords, API keys, and private keys
  • Values are accessible in code through the dlt.secrets dictionary
  • Should never be committed to version control

By default, the .gitignore file in your project prevents secrets.toml from being added to version control, while config.toml can be freely included.

File locations

The TOML provider loads files from the .dlt folder relative to your current working directory.

For example, if your working directory is my_dlt_project with this structure:

my_dlt_project:
|
pipelines/
|---- .dlt/secrets.toml
|---- google_sheets.py

When you run:

python pipelines/google_sheets.py

dlt will look for secrets in my_dlt_project/.dlt/secrets.toml and ignore my_dlt_project/pipelines/.dlt/secrets.toml.

If you change your working directory to pipelines and run:

python google_sheets.py

dlt will look for my_dlt_project/pipelines/.dlt/secrets.toml instead.

Special locations

The TOML provider also reads configuration from special locations depending on your runtime environment:

  1. Home directory: If available, dlt checks ~/.dlt/ for config.toml and secrets.toml. These values are merged with project-specific configurations, with project values taking precedence. This is useful for sharing global settings (like telemetry preferences) across all pipelines on a machine.

  2. Google Colab: When running in Colab, you can use Colab Secrets named secrets.toml and config.toml. The provider reads these as if they were TOML files. This functionality is disabled if files exist in the .dlt folder.

  3. Streamlit: When running in Streamlit without local .dlt/secrets.toml, the provider uses Streamlit secrets. You can add dlt secrets directly to your Streamlit secrets.

Custom providers

You can create and register your own configuration providers to customize how dlt accesses configuration values. The simplest approach is to write a function that returns a nested dictionary where keys correspond to sections and argument names.

This example demonstrates how to create a custom provider that loads configuration from a JSON file:

import dlt
from dlt.common import json
from dlt.common.configuration.providers import CustomLoaderDocProvider

# Create a function that loads a dictionary
def load_config():
with open("config.json", "rb") as f:
return json.load(f)

# Create the custom provider
provider = CustomLoaderDocProvider(
"my_json_provider",
load_config,
supports_secrets=False
)

# Register the provider with dlt
dlt.config.register_provider(provider)
tip

Check out our example YAML provider that supports switchable configuration profiles.

Examples

Configure both config and secrets

This example uses the Notion source and filesystem destination to demonstrate how to organize configuration in TOML files using the recommended section layout.

The Notion source is defined in a file named notion.py, so we use that module name in the configuration. We configure the api_key in our configuration while passing the list of database IDs explicitly in code. For the filesystem destination, we split configuration between config.toml (for bucket_url) and secrets.toml (for AWS credentials).

import dlt

@dlt.source
def notion_databases(
database_ids = None,
api_key: str = dlt.secrets.value, # mark argument to be injected as secret
):
...

# Pass database_id in code, let `dlt` inject api_key
sales_database = notion_databases( # type: ignore
database_ids=[
{
"id": "a94223535c674d33a24e313e7921ce15",
"use_name": "sales_alias",
}
]
)

config.toml

[runtime]
log_level="INFO"

# Do not compress files sent to the filesystem bucket
[normalize.data_writer]
disable_compression=true

# Recommended sections for the destination (destination.module)
[destination.filesystem]
bucket_url = "s3://[your_bucket_name]"

secrets.toml

# Recommended sections for sources (sources.module)
[sources.notion]
api_key = "your-notion-api-key" # Will be injected to api_key argument

# Recommended sections for destination credentials
[destination.filesystem.credentials]
aws_access_key_id = "ABCDEFGHIJKLMNOPQRST"
aws_secret_access_key = "1234567890_access_key"
caution

While you can put all configuration and credentials in secrets.toml for convenience, sensitive information should never be placed in config.toml or other non-secure locations. dlt will raise an exception if it detects secrets in inappropriate locations.

Use different Google credentials for source and destination

This example shows how to configure different credentials for Google-based sources and destinations:

Option 1: Share credentials between source and destination

If you want both the BigQuery destination and Google Sheets source to use the same credentials:

[credentials]
client_email = "<client_email_for_both>"
private_key = "<private_key_for_both>"
project_id = "<project_id_for_both>"

Option 2: Use separate credentials for sources and destinations

To keep source and destination credentials separate:

# Google Sheets credentials
[sources.credentials]
client_email = "<sheets_client_email>"
private_key = "<sheets_private_key>"
project_id = "<sheets_project_id>"

# BigQuery credentials
[destination.credentials]
client_email = "<bigquery_client_email>"
private_key = "<bigquery_private_key>"
project_id = "<bigquery_project_id>"

With this setup, dlt looks for destination credentials in this order:

destination.bigquery.credentials --> Not found
destination.credentials --> Found

And for source credentials:

sources.google_sheets_module.google_sheets_function.credentials --> Not found
sources.google_sheets_function.credentials --> Not found
sources.credentials --> Found

Configure credentials for multiple sources and destinations

When working with multiple Google-based sources and destinations, you can use recommended sections layout:

# Google Sheets credentials
[sources.google_sheets.credentials]
client_email = "<sheets_client_email>"
private_key = "<sheets_private_key>"
project_id = "<sheets_project_id>"

# Google Analytics credentials
[sources.google_analytics.credentials]
client_email = "<analytics_client_email>"
private_key = "<analytics_private_key>"
project_id = "<analytics_project_id>"

# BigQuery credentials
[destination.bigquery.credentials]
client_email = "<bigquery_client_email>"
private_key = "<bigquery_private_key>"
project_id = "<bigquery_project_id>"

Configure multiple instances of the same source

If you need to extract data from the same source type with different configurations, you can run them in different pipeline names:

[pipeline_name_1.sources.sql_database]
credentials="snowflake://user1:password1@service-account/database1?warehouse=warehouse_name&role=role1"

[pipeline_name_2.sources.sql_database]
credentials="snowflake://user2:password2@service-account/database2?warehouse=warehouse_name&role=role2"
tip

You have additional options for using multiple instances of the same source:

  1. Use the clone() method as explained in the sql_database documentation.

  2. Create named destinations to use the same destination type with different configurations.

Troubleshoot configuration errors

If dlt can't find a required configuration value or secret, it raises a ConfigFieldMissingException that provides detailed information about what was searched for and where.

For example, running the chess.py example without providing the password:

$ CREDENTIALS="postgres://loader@localhost:5432/dlt_data" python chess.py
...
dlt.common.configuration.exceptions.ConfigFieldMissingException: Following fields are missing: ['password'] in configuration with spec PostgresCredentials
for field "password" config providers and keys were tried in the following order:
In Environment Variables key CHESS_GAMES__DESTINATION__POSTGRES__CREDENTIALS__PASSWORD was not found.
In Environment Variables key CHESS_GAMES__DESTINATION__CREDENTIALS__PASSWORD was not found.
In Environment Variables key CHESS_GAMES__CREDENTIALS__PASSWORD was not found.
In secrets.toml key chess_games.destination.postgres.credentials.password was not found.
In secrets.toml key chess_games.destination.credentials.password was not found.
In secrets.toml key chess_games.credentials.password was not found.
In Environment Variables key DESTINATION__POSTGRES__CREDENTIALS__PASSWORD was not found.
In Environment Variables key DESTINATION__CREDENTIALS__PASSWORD was not found.
In Environment Variables key CREDENTIALS__PASSWORD was not found.
In secrets.toml key destination.postgres.credentials.password was not found.
In secrets.toml key destination.credentials.password was not found.
In secrets.toml key credentials.password was not found.
Please refer to https://dlthub.com/docs/general-usage/credentials/ for more information

This error message shows exactly:

  1. Which field is missing (password in this case)
  2. All the keys and locations dlt checked, in order of priority
  3. That it first looked with the pipeline name (chess_games) prefix, then without it
  4. That it searched environment variables first, then secrets.toml

Note that config.toml wasn't checked since it's not appropriate for storing secrets.

This demo works on codespaces. Codespaces is a development environment available for free to anyone with a Github account. You'll be asked to fork the demo repository and from there the README guides you with further steps.
The demo uses the Continue VSCode extension.

Off to codespaces!

DHelp

Ask a question

Welcome to "Codex Central", your next-gen help center, driven by OpenAI's GPT-4 model. It's more than just a forum or a FAQ hub – it's a dynamic knowledge base where coders can find AI-assisted solutions to their pressing problems. With GPT-4's powerful comprehension and predictive abilities, Codex Central provides instantaneous issue resolution, insightful debugging, and personalized guidance. Get your code running smoothly with the unparalleled support at Codex Central - coding help reimagined with AI prowess.