Skip to content

Workspace Configuration

This guide explains how to configure workspace paths when using the latent package in your project.

Overview

When latent is installed as a package (via uv pip install or similar), it needs to know where to find your project's directories.

Required Directories

  • Flows directory: Where flow configurations (parameters.yaml, catalog.yaml) are stored
  • Data directory: Where input datasets and output symlinks are stored
  • Logs directory: Where flow logs are written
  • MLruns directory: Where MLflow experiment metadata (SQLite DB) is stored
  • MLartifacts directory: Where MLflow artifacts (outputs) are stored
  • Cache directory: Where Prefect task results are cached
  • Stable directory: Where curated, version-controlled deployment artifacts are stored

Configuration Priority

Latent supports multiple configuration methods with the following priority order:

  1. Environment variables (highest priority)
  2. TOML configuration file (config/latent.toml)
  3. Default values (lowest priority)

This allows you to define a base configuration in a TOML file while still being able to override specific values via environment variables.

Configuration via TOML File

The recommended way to configure your workspace is with a TOML configuration file. This provides a centralized, version-controllable configuration.

Quick Start

Generate a sample configuration file:

latent init

This creates config/latent.toml with documented options.

Configuration File Locations

Latent searches for configuration files in this order:

  1. config/latent.toml (recommended)
  2. latent.toml (project root)
  3. .latent.toml (hidden file in root)

Example Configuration

config/latent.toml
# Environment mode: "production" (default) or "dev"/"development"
# Development mode automatically disables task caching for faster iteration
environment = "production"

[workspace]
# Workspace root directory (default: current working directory)
# root = "/path/to/workspace"

# Directory paths (relative to root, or absolute)
flows_dir = "flows"
data_dir = "data"
logs_dir = "logs"
mlruns_dir = "mlruns"
mlartifacts_dir = "mlartifacts"
cache_dir = ".cache"

# Stable artifacts directory (version-controlled, not auto-created)
# stable_dir = "data/stable"

[mlflow]
# Enable MLflow experiment tracking (default: true)
enabled = true

# Enable LiteLLM auto-tracing for LLM calls (default: true)
litellm_autolog = true

# Enable LangChain/LangGraph auto-tracing (default: true)
langchain_autolog = true

[logging]
# Logging level: DEBUG, INFO, WARNING, ERROR, CRITICAL
level = "INFO"

Configuration via Environment Variables

You can also configure workspace paths via environment variables, which will override any TOML settings.

Primary Environment Variables

Variable Purpose Default (fallback) Example
LATENT_ENVIRONMENT Environment mode (controls caching) production dev, development, production
LATENT_WORKSPACE_ROOT Root directory of your project cwd (current directory) /path/to/my-project
LATENT_FLOWS_DIR Directory containing flow configs {workspace}/flows /path/to/my-project/flows
LATENT_DATA_DIR Directory for storing input data {workspace}/data /path/to/my-project/data
LATENT_LOGS_DIR Directory for storing logs {workspace}/logs /path/to/my-project/logs
LATENT_MLRUNS_DIR Directory for MLflow metadata (SQLite) {workspace}/mlruns /path/to/my-project/mlruns
LATENT_MLARTIFACTS_DIR Directory for MLflow artifacts (outputs) {workspace}/mlartifacts /path/to/my-project/mlartifacts
LATENT_CACHE_DIR Directory for Prefect cache {workspace}/.cache /path/to/my-project/.cache
LATENT_STABLE_DIR Directory for stable artifacts {workspace}/data/stable /path/to/my-project/data/stable

Setting Environment Variables

Recommended for development environments

{
  "env": {
    "LATENT_WORKSPACE_ROOT": "$PWD",
    "LATENT_FLOWS_DIR": "$PWD/flows"
  }
}

Good for local development

# .env
LATENT_WORKSPACE_ROOT=/path/to/my-project
LATENT_FLOWS_DIR=/path/to/my-project/flows
LATENT_DATA_DIR=/path/to/my-project/data
LATENT_LOGS_DIR=/path/to/my-project/logs
LATENT_MLRUNS_DIR=/path/to/my-project/mlruns

Then load with python-dotenv:

from dotenv import load_dotenv
load_dotenv()

Quick testing

export LATENT_WORKSPACE_ROOT=/path/to/my-project
export LATENT_FLOWS_DIR=/path/to/my-project/flows

Development Mode

Latent supports a development mode that automatically disables task caching for faster iteration during development.

Enabling Development Mode

Set the environment to dev or development:

config/latent.toml
environment = "dev"
export LATENT_ENVIRONMENT=dev
{
  "env": {
    "LATENT_ENVIRONMENT": "dev"
  }
}

What Development Mode Does

When environment is set to dev or development:

  • Disables task caching automatically - All tasks run fresh every time
  • No need to manually set cache=False - Works for all tasks
  • Faster iteration - No stale cached results from previous runs
  • Prevents timestamped output issues - Cached tasks won't skip file writes to new directories

Best Practice

Use development mode during:

  • Flow development and debugging
  • Iterating on task logic
  • Testing with timestamped output directories

Switch to production mode for:

  • Final evaluation runs
  • Production deployments
  • When caching benefits outweigh iteration speed

Task Caching Control

You can still explicitly control caching per task:

from latent.prefect import task

# Explicit cache control (overrides environment setting)
@task("my_task", cache=False)
def my_task():
    # Never cached, even in production mode
    pass

Avoid Using cache_policy Directly

Using Prefect's cache_policy parameter directly will trigger a warning:

# ❌ Not recommended
@task("my_task", cache_policy=None)
def my_task():
    pass

# ✅ Recommended
@task("my_task", cache=False)
def my_task():
    pass

Use Latent's cache=False parameter instead, which properly integrates with the framework.

Fallback Behavior

Configuration values are resolved in this order:

  1. Environment Variable (if set)
  2. TOML Config File (if exists and value is defined)
  3. Default Value

Default values are:

Setting Default
Workspace Root Current working directory (cwd)
Flows Directory {workspace_root}/flows
Data Directory {workspace_root}/data
Logs Directory {workspace_root}/logs
MLruns Directory {workspace_root}/mlruns
MLartifacts Directory {workspace_root}/mlartifacts
Cache Directory {workspace_root}/.cache
Stable Directory {workspace_root}/data/stable

Important

When running flows, make sure you're in the workspace root directory if you're relying on default fallback behavior.

Example: Custom Project Structure

If your project has a different structure than the default:

my-project/
├── config/
│   ├── latent.toml          # <-- Latent configuration
│   ├── flow1/
│   │   ├── parameters.yaml
│   │   └── catalog.yaml
│   └── flow2/
│       ├── parameters.yaml
│       └── catalog.yaml
├── storage/
│   ├── datasets/
│   ├── logs/
│   └── experiments/
└── .env
config/latent.toml
[workspace]
flows_dir = "config"          # Relative to workspace root
data_dir = "storage/datasets"
logs_dir = "storage/logs"
mlruns_dir = "storage/experiments"
mlartifacts_dir = "storage/artifacts"
.env
LATENT_WORKSPACE_ROOT=/path/to/my-project
LATENT_FLOWS_DIR=/path/to/my-project/config
LATENT_DATA_DIR=/path/to/my-project/storage/datasets
LATENT_LOGS_DIR=/path/to/my-project/storage/logs
LATENT_MLRUNS_DIR=/path/to/my-project/storage/experiments
LATENT_MLARTIFACTS_DIR=/path/to/my-project/storage/artifacts

Verification

To verify your configuration is correct, check the logs when running a flow:

python -m my_flows.my_flow

Look for these log lines:

Using workspace root from LATENT_WORKSPACE_ROOT: /path/to/my-project
Using flows directory from LATENT_FLOWS_DIR: /path/to/my-project/config
Using data directory from LATENT_DATA_DIR: /path/to/my-project/storage/datasets

Best Practices

Development Recommendations

  1. Use config/latent.toml for shareable, version-controlled configuration
  2. Use relative paths in TOML config for portability across machines
  3. Keep flow configs separate from source code (e.g., in a config/ or flows/ directory)
  4. Use version control for flow configs but .gitignore data directories

Production Recommendations

  1. Use environment variables to override TOML settings for production-specific paths
  2. Use absolute paths in environment variables for clarity
  3. Always set LATENT_WORKSPACE_ROOT explicitly in production environments

Troubleshooting

Error: 'No parameters.yaml found'
  • Check that LATENT_FLOWS_DIR points to the correct directory
  • Verify that your flow directory exists: {LATENT_FLOWS_DIR}/{flow_name}/
  • Ensure parameters.yaml exists in that directory
Error: 'Input dataset not found in catalog'
  • Check that LATENT_DATA_DIR points to the correct directory
  • Verify that the dataset file exists: {LATENT_DATA_DIR}/{flow_name}/{dataset_name}.{ext}
  • For cross-flow references, ensure the source flow has run and created the dataset
Logs/Data going to unexpected locations
  • Add debug logging to see resolved paths
  • Verify environment variables are actually set: echo $LATENT_WORKSPACE_ROOT
  • Check for typos in environment variable names (they're case-sensitive)