Migrations¶
Serialized files are long-lived. A config written today may be loaded by code six months from now, after fields have been renamed, added, or removed. Without a migration story, you’d be forced to either keep old field names forever, or break all existing files on every schema change.
versionable solves this with versioned migrations. When you change a schema, you:
Bump
versionRecompute
hash(viaMyClass.hash())Add a
Migrateinner class that tells versionable how to transform old data into the new shape
When load() reads a file, it checks __VERSION__ in the metadata. If it’s behind the current version, migrations are
applied in order — v1 → v2 → v3 — before the object is constructed.
Declarative Operations¶
The examples below all evolve the same WorkerConfig class through a series of changes. Each section adds one migration
step on top of the previous version.
v1 — starting schema:
from dataclasses import dataclass
from versionable import Versionable, Migration
@dataclass
class WorkerConfig(Versionable, version=1, hash="5556c8"):
title: str
debug: bool
retries: int = 3
A v1 file on disk:
title: batch-processor
debug: false
retries: 5
__versionable__:
__OBJECT__: WorkerConfig
__VERSION__: 1
__HASH__: 5556c8
Rename a Field¶
In v2, title is renamed to name:
@dataclass
class WorkerConfig(Versionable, version=2, hash="ed3a90"):
name: str # renamed from "title"
debug: bool
retries: int = 3
class Migrate:
v1 = Migration().rename("title", "name")
versionable.load(WorkerConfig, "config.yaml") reads the v1 file, applies the rename, and returns
WorkerConfig(name="batch-processor", debug=False, retries=5).
Drop a Field¶
In v3, debug is removed. Old files still carry it, so we drop it explicitly:
@dataclass
class WorkerConfig(Versionable, version=3, hash="beb912"):
name: str
retries: int = 3
class Migrate:
v1 = Migration().rename("title", "name")
v2 = Migration().drop("debug")
A v1 file now goes through both migrations: title → name, then debug is discarded.
Add a Field¶
When a new field has a dataclass default, no migration and no version bump are needed — versionable fills in the default automatically for any file that doesn’t have the field. You only need to update the hash:
# still version=3 — only the hash changes because the fields changed
@dataclass
class WorkerConfig(Versionable, version=3, hash="ea7fc2"):
name: str
retries: int = 3
timeout_s: float = 30.0 # not in older files — default fills in automatically
class Migrate:
v1 = Migration().rename("title", "name")
v2 = Migration().drop("debug")
Bump the version (and add a migration) only when the value you want old files to receive differs from the dataclass
default. For example — timeout_s = 30.0 is a sensible default for new configs, but old worker files predate the
concept of a timeout entirely, so you want them to load as 0.0 (meaning “no timeout limit”) rather than silently
imposing a 30-second limit on them:
@dataclass
class WorkerConfig(Versionable, version=4, hash="ea7fc2"):
name: str
retries: int = 3
timeout_s: float = 30.0 # default for new configs
class Migrate:
v1 = Migration().rename("title", "name")
v2 = Migration().drop("debug")
v3 = Migration().add("timeout_s", default=0.0) # old files get 0.0, not 30.0
Convert a Field’s Value¶
Use convert when a field is kept but its unit or representation changes. Back to WorkerConfig — in v5 we rename
timeout_s to timeout_ms and store it as an integer milliseconds value:
@dataclass
class WorkerConfig(Versionable, version=5, hash="aac8a2"):
name: str
retries: int = 3
timeout_ms: int = 30000 # was timeout_s: float in v4
class Migrate:
v1 = Migration().rename("title", "name")
v2 = Migration().drop("debug")
v3 = Migration().add("timeout_s", default=0.0)
v4 = Migration().rename("timeout_s", "timeout_ms").convert("timeout_ms", via=lambda s: int(s * 1000))
A v3 file with
timeout_s = 5.0loads astimeout_ms = 5000.A v4 file with
timeout_s = 1.5loads astimeout_ms = 1500.A v2 file (which has no
timeout_s) getstimeout_s = 0.0injected by the v3 migration, then converted totimeout_ms = 0.
Chaining Operations¶
Multiple operations can be chained on a single Migration object:
class Migrate:
v1 = Migration().rename("title", "name").drop("debug")
For more complex histories, use .then() to link two separate migration objects:
class Migrate:
v2 = Migration().drop("debug")
v1 = Migration().rename("title", "name").then(v2)
This is equivalent to declaring v1 and v2 separately — choose whichever reads more clearly for your use case.
Multi-Version Chains¶
The full WorkerConfig history in one place — versionable applies every migration from the file’s version up to
the class’s current version, in ascending order:
@dataclass
class WorkerConfig(Versionable, version=5, hash="aac8a2"):
name: str
retries: int = 3
timeout_ms: int = 30000
class Migrate:
v1 = Migration().rename("title", "name")
v2 = Migration().drop("debug")
v3 = Migration().add("timeout_s", default=0.0)
v4 = Migration().rename("timeout_s", "timeout_ms").convert("timeout_ms", via=lambda s: int(s * 1000))
# no v5 needed — timeout_ms default (30000) is sufficient for files without the field
File version |
Migrations applied |
|---|---|
v1 |
|
v2 |
|
v3 |
|
v4 |
|
v5 |
none (already current) |
Derive from Another Field¶
Use derive when a schema refactor splits or restructures an existing field into a new one. Rather than requiring users
to re-export their data, you compute the new field’s value directly from what is already in the file.
A common case: v1 stored all sensor data in a single raw_data matrix with timestamps packed into the first column. In
v2, timestamps are promoted to their own field for easier access. The derive migration extracts them from the old
matrix so that existing v1 files load correctly into the new schema without any manual intervention:
# v1 — timestamps were the first column of raw_data
@dataclass
class Recording(Versionable, version=1, hash="c3a812"):
name: str
raw_data: npt.NDArray[np.float64] # shape (N, M) — first column is timestamps
# v2 — timestamps promoted to their own field
@dataclass
class Recording(Versionable, version=2, hash="d0155b"):
name: str
timestamps: npt.NDArray[np.float64] # new in v2; extracted from the first column of raw_data
raw_data: npt.NDArray[np.float64] # still present — derive keeps the source field by default
class Migrate:
v1 = Migration().derive("timestamps", from_="raw_data", via=lambda d: d[:, 0])
If the source field was removed in the new schema, chain a drop to clean it up:
v1 = Migration().derive("timestamps", from_="raw_data", via=lambda d: d[:, 0]).drop("raw_data")
Renaming a Class¶
When you rename a Versionable class, existing files on disk still contain the old name in their __OBJECT__ metadata.
Use old_names to register the old name(s) so those files can still be loaded:
# Was previously called "SensorReading"
@dataclass
class Measurement(
Versionable, version=1, hash="...", name="Measurement", old_names=["SensorReading"]
):
timestamp: datetime
value: float
With this declaration:
New files are saved with
__OBJECT__: "Measurement"Files saved with
__OBJECT__: "SensorReading"can still be loaded vialoadDynamic()Multiple old names are supported:
old_names=["SensorReading", "DataPoint"]
If another class already owns one of the old names, class definition raises VersionableError instead of silently
reusing or overwriting that registry entry.
Imperative Migrations¶
When the transformation involves branching logic or multiple fields at once, use the @migration decorator. Continuing
the WorkerConfig story — suppose v2 had a mode field that controlled how retries was interpreted, and v3 folds
that logic in and drops mode:
from versionable import migration, MigrationContext
@dataclass
class WorkerConfig(Versionable, version=3, hash="beb912"):
name: str
retries: int = 3
class Migrate:
v1 = Migration().rename("title", "name")
@migration(fromVersion=2)
def from_v2(ctx: MigrationContext) -> None:
# "aggressive" mode stored retries as a multiplier — convert back to absolute count
if ctx["mode"] == "aggressive":
ctx["retries"] = ctx["retries"] * 10
ctx.drop("mode") # mode no longer exists in v3
MigrationContext behaves like a mutable dict over the raw deserialized data. Read and write fields by key; call
ctx.drop(key) to remove a field that was deleted in the new version.