Workspace Configuration¶
This guide explains how to configure workspace paths when using the latent package in your project.
Overview¶
When latent is installed as a package (via uv pip install or similar), it needs to know where to find your project's directories.
Required Directories
- Flows directory: Where flow configurations (parameters.yaml, catalog.yaml) are stored
- Data directory: Where input datasets and output symlinks are stored
- Logs directory: Where flow logs are written
- MLruns directory: Where MLflow experiment metadata (SQLite DB) is stored
- MLartifacts directory: Where MLflow artifacts (outputs) are stored
- Cache directory: Where Prefect task results are cached
- Stable directory: Where curated, version-controlled deployment artifacts are stored
Configuration Priority¶
Latent supports multiple configuration methods with the following priority order:
- Environment variables (highest priority)
- TOML configuration file (
config/latent.toml) - Default values (lowest priority)
This allows you to define a base configuration in a TOML file while still being able to override specific values via environment variables.
Configuration via TOML File¶
The recommended way to configure your workspace is with a TOML configuration file. This provides a centralized, version-controllable configuration.
Quick Start¶
Generate a sample configuration file:
This creates config/latent.toml with documented options.
Configuration File Locations¶
Latent searches for configuration files in this order:
config/latent.toml(recommended)latent.toml(project root).latent.toml(hidden file in root)
Example Configuration¶
# Environment mode: "production" (default) or "dev"/"development"
# Development mode automatically disables task caching for faster iteration
environment = "production"
[workspace]
# Workspace root directory (default: current working directory)
# root = "/path/to/workspace"
# Directory paths (relative to root, or absolute)
flows_dir = "flows"
data_dir = "data"
logs_dir = "logs"
mlruns_dir = "mlruns"
mlartifacts_dir = "mlartifacts"
cache_dir = ".cache"
# Stable artifacts directory (version-controlled, not auto-created)
# stable_dir = "data/stable"
[mlflow]
# Enable MLflow experiment tracking (default: true)
enabled = true
# Enable LiteLLM auto-tracing for LLM calls (default: true)
litellm_autolog = true
# Enable LangChain/LangGraph auto-tracing (default: true)
langchain_autolog = true
[logging]
# Logging level: DEBUG, INFO, WARNING, ERROR, CRITICAL
level = "INFO"
Configuration via Environment Variables¶
You can also configure workspace paths via environment variables, which will override any TOML settings.
Primary Environment Variables¶
| Variable | Purpose | Default (fallback) | Example |
|---|---|---|---|
LATENT_ENVIRONMENT |
Environment mode (controls caching) | production |
dev, development, production |
LATENT_WORKSPACE_ROOT |
Root directory of your project | cwd (current directory) |
/path/to/my-project |
LATENT_FLOWS_DIR |
Directory containing flow configs | {workspace}/flows |
/path/to/my-project/flows |
LATENT_DATA_DIR |
Directory for storing input data | {workspace}/data |
/path/to/my-project/data |
LATENT_LOGS_DIR |
Directory for storing logs | {workspace}/logs |
/path/to/my-project/logs |
LATENT_MLRUNS_DIR |
Directory for MLflow metadata (SQLite) | {workspace}/mlruns |
/path/to/my-project/mlruns |
LATENT_MLARTIFACTS_DIR |
Directory for MLflow artifacts (outputs) | {workspace}/mlartifacts |
/path/to/my-project/mlartifacts |
LATENT_CACHE_DIR |
Directory for Prefect cache | {workspace}/.cache |
/path/to/my-project/.cache |
LATENT_STABLE_DIR |
Directory for stable artifacts | {workspace}/data/stable |
/path/to/my-project/data/stable |
Setting Environment Variables¶
Recommended for development environments
Development Mode¶
Latent supports a development mode that automatically disables task caching for faster iteration during development.
Enabling Development Mode¶
Set the environment to dev or development:
What Development Mode Does¶
When environment is set to dev or development:
- ✅ Disables task caching automatically - All tasks run fresh every time
- ✅ No need to manually set
cache=False- Works for all tasks - ✅ Faster iteration - No stale cached results from previous runs
- ✅ Prevents timestamped output issues - Cached tasks won't skip file writes to new directories
Best Practice
Use development mode during:
- Flow development and debugging
- Iterating on task logic
- Testing with timestamped output directories
Switch to production mode for:
- Final evaluation runs
- Production deployments
- When caching benefits outweigh iteration speed
Task Caching Control¶
You can still explicitly control caching per task:
from latent.prefect import task
# Explicit cache control (overrides environment setting)
@task("my_task", cache=False)
def my_task():
# Never cached, even in production mode
pass
Avoid Using cache_policy Directly
Using Prefect's cache_policy parameter directly will trigger a warning:
# ❌ Not recommended
@task("my_task", cache_policy=None)
def my_task():
pass
# ✅ Recommended
@task("my_task", cache=False)
def my_task():
pass
Use Latent's cache=False parameter instead, which properly integrates with the framework.
Fallback Behavior¶
Configuration values are resolved in this order:
- Environment Variable (if set)
- TOML Config File (if exists and value is defined)
- Default Value
Default values are:
| Setting | Default |
|---|---|
| Workspace Root | Current working directory (cwd) |
| Flows Directory | {workspace_root}/flows |
| Data Directory | {workspace_root}/data |
| Logs Directory | {workspace_root}/logs |
| MLruns Directory | {workspace_root}/mlruns |
| MLartifacts Directory | {workspace_root}/mlartifacts |
| Cache Directory | {workspace_root}/.cache |
| Stable Directory | {workspace_root}/data/stable |
Important
When running flows, make sure you're in the workspace root directory if you're relying on default fallback behavior.
Example: Custom Project Structure¶
If your project has a different structure than the default:
my-project/
├── config/
│ ├── latent.toml # <-- Latent configuration
│ ├── flow1/
│ │ ├── parameters.yaml
│ │ └── catalog.yaml
│ └── flow2/
│ ├── parameters.yaml
│ └── catalog.yaml
├── storage/
│ ├── datasets/
│ ├── logs/
│ └── experiments/
└── .env
LATENT_WORKSPACE_ROOT=/path/to/my-project
LATENT_FLOWS_DIR=/path/to/my-project/config
LATENT_DATA_DIR=/path/to/my-project/storage/datasets
LATENT_LOGS_DIR=/path/to/my-project/storage/logs
LATENT_MLRUNS_DIR=/path/to/my-project/storage/experiments
LATENT_MLARTIFACTS_DIR=/path/to/my-project/storage/artifacts
Verification¶
To verify your configuration is correct, check the logs when running a flow:
Look for these log lines:
Using workspace root from LATENT_WORKSPACE_ROOT: /path/to/my-project
Using flows directory from LATENT_FLOWS_DIR: /path/to/my-project/config
Using data directory from LATENT_DATA_DIR: /path/to/my-project/storage/datasets
Best Practices¶
Development Recommendations
- Use
config/latent.tomlfor shareable, version-controlled configuration - Use relative paths in TOML config for portability across machines
- Keep flow configs separate from source code (e.g., in a
config/orflows/directory) - Use version control for flow configs but
.gitignoredata directories
Production Recommendations
- Use environment variables to override TOML settings for production-specific paths
- Use absolute paths in environment variables for clarity
- Always set
LATENT_WORKSPACE_ROOTexplicitly in production environments
Troubleshooting¶
Error: 'No parameters.yaml found'
- Check that
LATENT_FLOWS_DIRpoints to the correct directory - Verify that your flow directory exists:
{LATENT_FLOWS_DIR}/{flow_name}/ - Ensure
parameters.yamlexists in that directory
Error: 'Input dataset not found in catalog'
- Check that
LATENT_DATA_DIRpoints to the correct directory - Verify that the dataset file exists:
{LATENT_DATA_DIR}/{flow_name}/{dataset_name}.{ext} - For cross-flow references, ensure the source flow has run and created the dataset
Logs/Data going to unexpected locations
- Add debug logging to see resolved paths
- Verify environment variables are actually set:
echo $LATENT_WORKSPACE_ROOT - Check for typos in environment variable names (they're case-sensitive)