Workspace Configuration¶

This guide explains how to configure workspace paths when using the latent package in your project.

Overview¶

When latent is installed as a package (via uv pip install or similar), it needs to know where to find your project's directories.

Required Directories

Flows directory: Where flow configurations (parameters.yaml, catalog.yaml) are stored
Data directory: Where input datasets and output symlinks are stored
Logs directory: Where flow logs are written
MLruns directory: Where MLflow experiment metadata (SQLite DB) is stored
MLartifacts directory: Where MLflow artifacts (outputs) are stored
Cache directory: Where Prefect task results are cached
Stable directory: Where curated, version-controlled deployment artifacts are stored

Configuration Priority¶

Latent supports multiple configuration methods with the following priority order:

Environment variables (highest priority)
TOML configuration file (config/latent.toml)
Default values (lowest priority)

This allows you to define a base configuration in a TOML file while still being able to override specific values via environment variables.

Configuration via TOML File¶

The recommended way to configure your workspace is with a TOML configuration file. This provides a centralized, version-controllable configuration.

Quick Start¶

Generate a sample configuration file:

latent init

This creates config/latent.toml with documented options.

Configuration File Locations¶

Latent searches for configuration files in this order:

config/latent.toml (recommended)
latent.toml (project root)
.latent.toml (hidden file in root)

Example Configuration¶

config/latent.toml

# Environment mode: "production" (default) or "dev"/"development"
# Development mode automatically disables task caching for faster iteration
environment = "production"

[workspace]
# Workspace root directory (default: current working directory)
# root = "/path/to/workspace"

# Directory paths (relative to root, or absolute)
flows_dir = "flows"
data_dir = "data"
logs_dir = "logs"
mlruns_dir = "mlruns"
mlartifacts_dir = "mlartifacts"
cache_dir = ".cache"

# Stable artifacts directory (version-controlled, not auto-created)
# stable_dir = "data/stable"

[mlflow]
# Enable MLflow experiment tracking (default: true)
enabled = true

# Enable LiteLLM auto-tracing for LLM calls (default: true)
litellm_autolog = true

# Enable LangChain/LangGraph auto-tracing (default: true)
langchain_autolog = true

[logging]
# Logging level: DEBUG, INFO, WARNING, ERROR, CRITICAL
level = "INFO"

Configuration via Environment Variables¶

You can also configure workspace paths via environment variables, which will override any TOML settings.

Primary Environment Variables¶

Variable	Purpose	Default (fallback)	Example
`LATENT_ENVIRONMENT`	Environment mode (controls caching)	`production`	`dev`, `development`, `production`
`LATENT_WORKSPACE_ROOT`	Root directory of your project	`cwd` (current directory)	`/path/to/my-project`
`LATENT_FLOWS_DIR`	Directory containing flow configs	`{workspace}/flows`	`/path/to/my-project/flows`
`LATENT_DATA_DIR`	Directory for storing input data	`{workspace}/data`	`/path/to/my-project/data`
`LATENT_LOGS_DIR`	Directory for storing logs	`{workspace}/logs`	`/path/to/my-project/logs`
`LATENT_MLRUNS_DIR`	Directory for MLflow metadata (SQLite)	`{workspace}/mlruns`	`/path/to/my-project/mlruns`
`LATENT_MLARTIFACTS_DIR`	Directory for MLflow artifacts (outputs)	`{workspace}/mlartifacts`	`/path/to/my-project/mlartifacts`
`LATENT_CACHE_DIR`	Directory for Prefect cache	`{workspace}/.cache`	`/path/to/my-project/.cache`
`LATENT_STABLE_DIR`	Directory for stable artifacts	`{workspace}/data/stable`	`/path/to/my-project/data/stable`

Setting Environment Variables¶

devbox.json.env fileShell export

Recommended for development environments

{
  "env": {
    "LATENT_WORKSPACE_ROOT": "$PWD",
    "LATENT_FLOWS_DIR": "$PWD/flows"
  }
}

Good for local development

# .env
LATENT_WORKSPACE_ROOT=/path/to/my-project
LATENT_FLOWS_DIR=/path/to/my-project/flows
LATENT_DATA_DIR=/path/to/my-project/data
LATENT_LOGS_DIR=/path/to/my-project/logs
LATENT_MLRUNS_DIR=/path/to/my-project/mlruns

Then load with python-dotenv:

from dotenv import load_dotenv
load_dotenv()

Quick testing

export LATENT_WORKSPACE_ROOT=/path/to/my-project
export LATENT_FLOWS_DIR=/path/to/my-project/flows

Development Mode¶

Latent supports a development mode that automatically disables task caching for faster iteration during development.

Enabling Development Mode¶

Set the environment to dev or development:

TOML ConfigurationEnvironment Variabledevbox.json

config/latent.toml

environment = "dev"

export LATENT_ENVIRONMENT=dev

{
  "env": {
    "LATENT_ENVIRONMENT": "dev"
  }
}

What Development Mode Does¶

When environment is set to dev or development:

✅ Disables task caching automatically - All tasks run fresh every time
✅ No need to manually set cache=False - Works for all tasks
✅ Faster iteration - No stale cached results from previous runs
✅ Prevents timestamped output issues - Cached tasks won't skip file writes to new directories

Best Practice

Use development mode during:

Flow development and debugging
Iterating on task logic
Testing with timestamped output directories

Switch to production mode for:

Final evaluation runs
Production deployments
When caching benefits outweigh iteration speed

Task Caching Control¶

You can still explicitly control caching per task:

from latent.prefect import task

# Explicit cache control (overrides environment setting)
@task("my_task", cache=False)
def my_task():
    # Never cached, even in production mode
    pass

Avoid Using cache_policy Directly

Using Prefect's cache_policy parameter directly will trigger a warning:

# ❌ Not recommended
@task("my_task", cache_policy=None)
def my_task():
    pass

# ✅ Recommended
@task("my_task", cache=False)
def my_task():
    pass

Use Latent's cache=False parameter instead, which properly integrates with the framework.

Fallback Behavior¶

Configuration values are resolved in this order:

Environment Variable (if set)
TOML Config File (if exists and value is defined)
Default Value

Default values are:

Setting	Default
Workspace Root	Current working directory (`cwd`)
Flows Directory	`{workspace_root}/flows`
Data Directory	`{workspace_root}/data`
Logs Directory	`{workspace_root}/logs`
MLruns Directory	`{workspace_root}/mlruns`
MLartifacts Directory	`{workspace_root}/mlartifacts`
Cache Directory	`{workspace_root}/.cache`
Stable Directory	`{workspace_root}/data/stable`

Important

When running flows, make sure you're in the workspace root directory if you're relying on default fallback behavior.

Example: Custom Project Structure¶

If your project has a different structure than the default:

my-project/
├── config/
│   ├── latent.toml          # <-- Latent configuration
│   ├── flow1/
│   │   ├── parameters.yaml
│   │   └── catalog.yaml
│   └── flow2/
│       ├── parameters.yaml
│       └── catalog.yaml
├── storage/
│   ├── datasets/
│   ├── logs/
│   └── experiments/
└── .env

TOML Configuration (Recommended)Environment Variables

config/latent.toml

[workspace]
flows_dir = "config"          # Relative to workspace root
data_dir = "storage/datasets"
logs_dir = "storage/logs"
mlruns_dir = "storage/experiments"
mlartifacts_dir = "storage/artifacts"

.env

LATENT_WORKSPACE_ROOT=/path/to/my-project
LATENT_FLOWS_DIR=/path/to/my-project/config
LATENT_DATA_DIR=/path/to/my-project/storage/datasets
LATENT_LOGS_DIR=/path/to/my-project/storage/logs
LATENT_MLRUNS_DIR=/path/to/my-project/storage/experiments
LATENT_MLARTIFACTS_DIR=/path/to/my-project/storage/artifacts

Verification¶

To verify your configuration is correct, check the logs when running a flow:

python -m my_flows.my_flow

Look for these log lines:

Using workspace root from LATENT_WORKSPACE_ROOT: /path/to/my-project
Using flows directory from LATENT_FLOWS_DIR: /path/to/my-project/config
Using data directory from LATENT_DATA_DIR: /path/to/my-project/storage/datasets

Best Practices¶

Development Recommendations

Use config/latent.toml for shareable, version-controlled configuration
Use relative paths in TOML config for portability across machines
Keep flow configs separate from source code (e.g., in a config/ or flows/ directory)
Use version control for flow configs but .gitignore data directories

Production Recommendations

Use environment variables to override TOML settings for production-specific paths
Use absolute paths in environment variables for clarity
Always set LATENT_WORKSPACE_ROOT explicitly in production environments

Troubleshooting¶

Error: 'No parameters.yaml found'

Check that LATENT_FLOWS_DIR points to the correct directory
Verify that your flow directory exists: {LATENT_FLOWS_DIR}/{flow_name}/
Ensure parameters.yaml exists in that directory

Error: 'Input dataset not found in catalog'

Check that LATENT_DATA_DIR points to the correct directory
Verify that the dataset file exists: {LATENT_DATA_DIR}/{flow_name}/{dataset_name}.{ext}
For cross-flow references, ensure the source flow has run and created the dataset

Logs/Data going to unexpected locations

Add debug logging to see resolved paths
Verify environment variables are actually set: echo $LATENT_WORKSPACE_ROOT
Check for typos in environment variable names (they're case-sensitive)