versionable — User Skills Reference¶
Serialization framework for Python 3.12+ dataclasses with schema versioning, hash validation, declarative migrations, type converters, and pluggable storage backends.
Installation¶
pip install versionable # Core (JSON backend, numpy)
pip install pyyaml # Add YAML backend
pip install toml # Add TOML backend
pip install h5py hdf5plugin # Add HDF5 backend
Quick Start¶
from __future__ import annotations
from dataclasses import dataclass
import versionable
from versionable import Versionable
@dataclass
class SensorConfig(Versionable, version=1, hash="<TBD>"):
sampleRate_Hz: float
label: str = "default"
# First run: compute the hash
print(SensorConfig.hash()) # e.g. "a3f1c9"
# Paste it into hash="a3f1c9", then:
versionable.save(SensorConfig(sampleRate_Hz=1000.0), "config.json")
loaded = versionable.load(SensorConfig, "config.json")
During development, call ignoreHashErrors(True) to get warnings instead of errors while you iterate on fields. Compute
and set the final hash before shipping.
Defining Versionable Classes¶
@dataclass
class MyClass(
Versionable,
version=1, # Required — increment when schema changes
hash="a1b2c3", # 6-char fingerprint (run .hash() to compute)
name="MyClass", # Serialization name (default: class name)
old_names=["PreviousName"], # Previous names for backward compat
skip_defaults=False, # Omit default-valued fields on save
unknown="ignore", # "ignore" | "error" | "preserve"
):
requiredField: float
optionalField: str = "hello"
What gets serialized: Fields with a type annotation and no leading underscore. ClassVar fields, private fields
(_name), and unannotated attributes are excluded.
Nested composition: Versionable objects can contain other Versionable objects. Each nested class versions independently.
@dataclass
class Inner(Versionable, version=1, hash="..."):
x: float
y: float
@dataclass
class Outer(Versionable, version=1, hash="..."):
name: str
point: Inner
Saving and Loading¶
import versionable
# Backend auto-selected by extension
versionable.save(obj, "config.json")
versionable.save(obj, "config.yaml") # requires pyyaml
versionable.save(obj, "config.toml") # requires toml
versionable.save(obj, "data.h5") # requires h5py + hdf5plugin
loaded = versionable.load(MyClass, "config.json")
Load without knowing the type (class must be registered and imported):
obj = versionable.loadDynamic("config.yaml")
Save Options¶
Option |
Backends |
Description |
|---|---|---|
|
YAML, TOML |
Comment out fields matching class defaults |
|
HDF5 |
Compression config (see HDF5 section) |
Load Options¶
Option |
Backends |
Description |
|---|---|---|
|
HDF5 |
|
|
HDF5 |
Skip arrays entirely (fastest for metadata scanning) |
|
All |
Allow migrations that rewrite the file |
|
All |
Override the version read from file metadata |
Backends¶
Backend |
Extensions |
None |
Large Arrays |
Lazy Load |
Best For |
|---|---|---|---|---|---|
YAML |
|
Yes |
Slow |
No |
Config files, data science |
JSON |
|
Yes |
Slow |
No |
Interoperability |
TOML |
|
No |
Slow |
No |
Hand-editable configs |
HDF5 |
|
Yes |
Fast/Native |
Yes |
Large numpy arrays |
TOML caveat: TOML has no null type. Fields holding None are omitted on save and restored from the class default
on load. Every TOML field should have a default value.
HDF5 Details¶
Every field maps to a native HDF5 construct — no JSON in the file. Scalars become attributes, arrays become datasets,
nested Versionables become subgroups with a __versionable__ metadata group, and list[np.ndarray] /
dict[str, np.ndarray] become groups of datasets.
Arrays and array collections are lazy-loaded by default — load() returns instantly even for multi-gigabyte files.
Accessing an array field or indexing into a list[np.ndarray] triggers the disk read.
import versionable
from versionable.hdf5 import GZIP_DEFAULT, ZSTD_DEFAULT
# Save with compression (gzip is the default)
versionable.save(obj, "data.h5", compression=GZIP_DEFAULT)
# Load with selective preloading
loaded = versionable.load(MyClass, "data.h5", preload=["largeArray"])
# Metadata-only (arrays raise ArrayNotLoadedError on access)
loaded = versionable.load(MyClass, "data.h5", metadataOnly=True)
Compression presets (from versionable.hdf5):
Preset |
Notes |
|---|---|
|
zstd level 3 — fast, good ratio |
|
zstd level 1 — fastest |
|
zstd level 9 — best ratio, slow |
|
Blosc + zstd — fast for large arrays |
|
gzip level 4 — default, universal compat |
|
LZF — fastest, no extra deps |
|
No compression |
gzip (default) and lzf work everywhere. zstd and blosc require hdf5plugin — use them if compatibility with other tools
is not a major concern.
HDF5 Sessions — Incremental Writes and Random Access¶
For large or long-running data, versionable.hdf5.open() provides incremental writes to chunked, resizable datasets and
random access reads without loading the whole file into memory.
from dataclasses import dataclass, field
import numpy as np
from numpy.typing import NDArray
import versionable
import versionable.hdf5
from versionable import Versionable
@dataclass
class Experiment(Versionable, version=1, hash="536849"):
name: str
traces: NDArray[np.float64] = field(default_factory=lambda: np.empty((0, 1024)))
# Write incrementally — each append extends the dataset on disk
session = versionable.hdf5.open(Experiment, "run.h5")
with session as obj:
obj.name = "acquisition-001"
for batch in data_source:
obj.traces.append(batch)
session.flush() # flush HDF5 buffers to OS
# Resume an existing file
session = versionable.hdf5.open(Experiment, "run.h5", mode="resume")
with session as obj:
obj.traces.append(more_data)
# Random access — read slices directly from disk
with versionable.hdf5.open(Experiment, "run.h5", mode="read") as obj:
print(obj.traces[1000]) # reads only row 1000
print(obj.traces[50:100]) # reads only this slice
Session modes:
Mode |
Description |
|---|---|
|
New file (default). Fails if file exists |
|
Append to existing file. Version/hash must match |
|
Read-only access. No writes allowed |
Field types in sessions:
Type |
Behavior |
|---|---|
Scalars |
Assignment writes through to disk |
|
|
|
|
|
|
Sessions do not support migrations. The file’s version and hash must exactly match the class. DatasetArray fields
raise BackendError after the session is closed — copy data before closing if needed.
Compression on resume: Appending to an existing dataset uses the original dataset’s compression filter, not the
session’s compression parameter. The session compression only applies to newly created datasets.
Supported Types¶
Built-in (no registration needed)¶
Primitives: int, float, str, bool, None
Collections: list[T], dict[K, V], set[T], frozenset[T], tuple[T, ...], Optional[T], Union[A, B],
Literal[...]
Stdlib types (auto-converted):
Type |
Serialized As |
|---|---|
|
ISO 8601 string |
|
ISO 8601 string |
|
ISO 8601 string |
|
Float (total seconds) |
|
String |
|
String |
|
String |
|
Base64 string |
|
|
|
Pattern string |
numpy arrays: Native HDF5 datasets (compressed, lazy-loaded). Base64-compressed npz blobs in JSON/TOML/YAML.
Enums¶
Serialized by .value. Set a fallback for graceful handling of removed enum members:
from enum import Enum
class Status(Enum):
ACTIVE = "active"
ARCHIVED = "archived"
UNKNOWN = "unknown"
Status.VERSIONABLE_FALLBACK = Status.UNKNOWN # Old values deserialize to UNKNOWN
Literal Fields¶
Use literalFallback for graceful handling of invalid literal values from old files:
from versionable import literalFallback
@dataclass
class Config(Versionable, version=1, hash="..."):
mode: Literal["fast", "balanced", "slow"] = literalFallback("balanced")
Custom Types¶
Option 1 — registerConverter (for third-party types or complex serialization):
from versionable import registerConverter
registerConverter(
Coord,
serialize=lambda v: {"lat": v.lat, "lon": v.lon},
deserialize=lambda v, _cls: Coord(v["lat"], v["lon"]),
)
Option 2 — VersionableValue protocol (for your own types mapping to a single primitive):
from versionable import VersionableValue
class UserId(VersionableValue):
def __init__(self, value: str) -> None:
self.value = value
def toValue(self) -> str:
return self.value
@classmethod
def fromValue(cls, value: str) -> UserId:
return cls(value)
Migrations¶
When you change a class’s fields, increment version, update hash, and add a migration so old files load correctly.
Declarative Migrations¶
@dataclass
class Config(Versionable, version=3, hash="x1y2z3"):
name: str
timeout_s: float = 30.0
retries: int = 3
class Migrate:
# v1 → v2: renamed "title" to "name"
v1 = Migration().rename("title", "name")
# v2 → v3: added "retries" with default for old files
v2 = Migration().add("retries", default=1)
Available operations (chainable):
Operation |
Description |
|---|---|
|
Rename a field |
|
Remove a field from old data |
|
Add field with default for old files |
|
Transform a field’s value |
|
Create new field from existing |
|
Split one field into multiple |
|
Merge multiple fields into one |
|
Mark as needing in-place rewrite |
|
Chain another migration |
Chain multiple operations: Migration().rename("a", "b").drop("c").add("d", default=0)
Imperative Migrations¶
For branching logic or complex transformations:
from versionable import MigrationContext, migration
class Migrate:
@migration(fromVersion=2)
def from_v2(ctx: MigrationContext) -> None:
raw = ctx.pop("rawData")
ctx["timestamps"] = [row[0] for row in raw]
ctx["values"] = [row[1] for row in raw]
Migrations apply sequentially: a v1 file on a v5 class runs v1 → v2 → v3 → v4 → v5.
Renaming a Class¶
Use old_names to load files saved under a previous class name:
@dataclass
class SensorConfig(Versionable, version=2, hash="...", old_names=["SensorSettings"]):
...
Introspection¶
import versionable
from versionable import metadata, getVersionableFields, registeredClasses
# Schema metadata
meta = metadata(SensorConfig)
meta.version # int
meta.hash # str (6 chars)
meta.name # str
meta.fields # list[str]
# Field types
fields = getVersionableFields(SensorConfig) # dict[str, type]
# Compute hash (paste into hash= parameter)
SensorConfig.hash() # str
# All registered classes
registeredClasses() # dict[name, type]
Error Handling¶
VersionableError (base — catch-all)
├── HashMismatchError — hash= doesn't match fields (raised at import time)
├── VersionError — file is newer than class, or missing migrations
├── MigrationError — migration failed to apply
├── ArrayNotLoadedError — accessing array loaded with metadataOnly=True
├── UpgradeRequiredError — migration needs upgradeInPlace=True
├── UnknownFieldError — file has field not in class (only with unknown="error")
├── ConverterError — type conversion failed
└── BackendError — file I/O or backend operation failed
All exceptions are importable from versionable:
from versionable import VersionableError, HashMismatchError, BackendError
Common Patterns¶
Configuration file with commented defaults¶
versionable.save(config, "defaults.yaml", commentDefaults=True)
Produces YAML/TOML where fields at their default value are commented out, making it easy to see what was customized.
Scanning HDF5 metadata without loading arrays¶
for path in Path("data/").glob("*.h5"):
obj = versionable.load(Experiment, path, metadataOnly=True)
print(f"{path}: {obj.name}, {obj.timestamp}")
Dynamic loading with type dispatch¶
obj = versionable.loadDynamic("unknown_file.yaml")
match type(obj).__name__:
case "SensorConfig":
processSensor(obj)
case "ExperimentResult":
processResult(obj)
Registering existing backends for custom extensions¶
Use registerBackend to map new file extensions to a built-in backend class:
from versionable import JsonBackend, registerBackend
registerBackend([".jsonc", ".json5"], JsonBackend)
All four backend classes are importable from versionable: JsonBackend, TomlBackend, YamlBackend, Hdf5Backend.
Writing a custom backend¶
from versionable import Backend, registerBackend
class MsgPackBackend(Backend):
nativeTypes: set[type] = set()
def save(self, fields: dict, meta: dict, path, *, cls: type, **kwargs) -> None: ...
def load(self, path) -> tuple[dict, dict]: ...
registerBackend([".msgpack"], MsgPackBackend)
The save() method receives raw (unserialized) field values and the Versionable class. Call serialize() internally
for dict-based formats, or handle type dispatch directly for binary formats.