versionable — User Skills Reference

Serialization framework for Python 3.12+ dataclasses with schema versioning, hash validation, declarative migrations, type converters, and pluggable storage backends.

Installation

pip install versionable            # Core (JSON backend, numpy)
pip install pyyaml                 # Add YAML backend
pip install toml                   # Add TOML backend
pip install h5py hdf5plugin        # Add HDF5 backend

Quick Start

from __future__ import annotations

from dataclasses import dataclass

import versionable
from versionable import Versionable


@dataclass
class SensorConfig(Versionable, version=1, hash="<TBD>"):
    sampleRate_Hz: float
    label: str = "default"


# First run: compute the hash
print(SensorConfig.hash())  # e.g. "a3f1c9"
# Paste it into hash="a3f1c9", then:

versionable.save(SensorConfig(sampleRate_Hz=1000.0), "config.json")
loaded = versionable.load(SensorConfig, "config.json")

During development, call ignoreHashErrors(True) to get warnings instead of errors while you iterate on fields. Compute and set the final hash before shipping.

Defining Versionable Classes

@dataclass
class MyClass(
    Versionable,
    version=1,                    # Required — increment when schema changes
    hash="a1b2c3",               # 6-char fingerprint (run .hash() to compute)
    name="MyClass",              # Serialization name (default: class name)
    old_names=["PreviousName"],  # Previous names for backward compat
    skip_defaults=False,         # Omit default-valued fields on save
    unknown="ignore",            # "ignore" | "error" | "preserve"
):
    requiredField: float
    optionalField: str = "hello"

What gets serialized: Fields with a type annotation and no leading underscore. ClassVar fields, private fields (_name), and unannotated attributes are excluded.

Nested composition: Versionable objects can contain other Versionable objects. Each nested class versions independently.

@dataclass
class Inner(Versionable, version=1, hash="..."):
    x: float
    y: float


@dataclass
class Outer(Versionable, version=1, hash="..."):
    name: str
    point: Inner

Saving and Loading

import versionable

# Backend auto-selected by extension
versionable.save(obj, "config.json")
versionable.save(obj, "config.yaml")   # requires pyyaml
versionable.save(obj, "config.toml")   # requires toml
versionable.save(obj, "data.h5")       # requires h5py + hdf5plugin

loaded = versionable.load(MyClass, "config.json")

Load without knowing the type (class must be registered and imported):

obj = versionable.loadDynamic("config.yaml")

Save Options

Option

Backends

Description

commentDefaults

YAML, TOML

Comment out fields matching class defaults

compression

HDF5

Compression config (see HDF5 section)

Load Options

Option

Backends

Description

preload

HDF5

["field"] or "*" — eager-load arrays instead of lazy

metadataOnly

HDF5

Skip arrays entirely (fastest for metadata scanning)

upgradeInPlace

All

Allow migrations that rewrite the file

assumeVersion

All

Override the version read from file metadata

Backends

Backend

Extensions

None

Large Arrays

Lazy Load

Best For

YAML

.yaml, .yml

Yes

Slow

No

Config files, data science

JSON

.json

Yes

Slow

No

Interoperability

TOML

.toml

No

Slow

No

Hand-editable configs

HDF5

.h5, .hdf5

Yes

Fast/Native

Yes

Large numpy arrays

TOML caveat: TOML has no null type. Fields holding None are omitted on save and restored from the class default on load. Every TOML field should have a default value.

HDF5 Details

Every field maps to a native HDF5 construct — no JSON in the file. Scalars become attributes, arrays become datasets, nested Versionables become subgroups with a __versionable__ metadata group, and list[np.ndarray] / dict[str, np.ndarray] become groups of datasets.

Arrays and array collections are lazy-loaded by default — load() returns instantly even for multi-gigabyte files. Accessing an array field or indexing into a list[np.ndarray] triggers the disk read.

import versionable
from versionable.hdf5 import GZIP_DEFAULT, ZSTD_DEFAULT

# Save with compression (gzip is the default)
versionable.save(obj, "data.h5", compression=GZIP_DEFAULT)

# Load with selective preloading
loaded = versionable.load(MyClass, "data.h5", preload=["largeArray"])

# Metadata-only (arrays raise ArrayNotLoadedError on access)
loaded = versionable.load(MyClass, "data.h5", metadataOnly=True)

Compression presets (from versionable.hdf5):

Preset

Notes

ZSTD_DEFAULT

zstd level 3 — fast, good ratio

ZSTD_FAST

zstd level 1 — fastest

ZSTD_BEST

zstd level 9 — best ratio, slow

BLOSC_DEFAULT

Blosc + zstd — fast for large arrays

GZIP_DEFAULT

gzip level 4 — default, universal compat

LZF

LZF — fastest, no extra deps

UNCOMPRESSED

No compression

gzip (default) and lzf work everywhere. zstd and blosc require hdf5plugin — use them if compatibility with other tools is not a major concern.

HDF5 Sessions — Incremental Writes and Random Access

For large or long-running data, versionable.hdf5.open() provides incremental writes to chunked, resizable datasets and random access reads without loading the whole file into memory.

from dataclasses import dataclass, field

import numpy as np
from numpy.typing import NDArray

import versionable
import versionable.hdf5
from versionable import Versionable


@dataclass
class Experiment(Versionable, version=1, hash="536849"):
    name: str
    traces: NDArray[np.float64] = field(default_factory=lambda: np.empty((0, 1024)))

# Write incrementally — each append extends the dataset on disk
session = versionable.hdf5.open(Experiment, "run.h5")
with session as obj:
    obj.name = "acquisition-001"
    for batch in data_source:
        obj.traces.append(batch)
        session.flush()             # flush HDF5 buffers to OS

# Resume an existing file
session = versionable.hdf5.open(Experiment, "run.h5", mode="resume")
with session as obj:
    obj.traces.append(more_data)

# Random access — read slices directly from disk
with versionable.hdf5.open(Experiment, "run.h5", mode="read") as obj:
    print(obj.traces[1000])         # reads only row 1000
    print(obj.traces[50:100])       # reads only this slice

Session modes:

Mode

Description

"create"

New file (default). Fails if file exists

"resume"

Append to existing file. Version/hash must match

"read"

Read-only access. No writes allowed

Field types in sessions:

Type

Behavior

Scalars

Assignment writes through to disk

NDArray / ndarray

DatasetArray with append(), resize(), slice access

list[np.ndarray]

TrackedListappend()/extend() write through

dict[str, ndarray]

TrackedDict__setitem__/update() write through

Sessions do not support migrations. The file’s version and hash must exactly match the class. DatasetArray fields raise BackendError after the session is closed — copy data before closing if needed.

Compression on resume: Appending to an existing dataset uses the original dataset’s compression filter, not the session’s compression parameter. The session compression only applies to newly created datasets.

Supported Types

Built-in (no registration needed)

Primitives: int, float, str, bool, None

Collections: list[T], dict[K, V], set[T], frozenset[T], tuple[T, ...], Optional[T], Union[A, B], Literal[...]

Stdlib types (auto-converted):

Type

Serialized As

datetime.datetime

ISO 8601 string

datetime.date

ISO 8601 string

datetime.time

ISO 8601 string

datetime.timedelta

Float (total seconds)

pathlib.Path

String

uuid.UUID

String

decimal.Decimal

String

bytes

Base64 string

complex

[real, imag]

re.Pattern

Pattern string

numpy arrays: Native HDF5 datasets (compressed, lazy-loaded). Base64-compressed npz blobs in JSON/TOML/YAML.

Enums

Serialized by .value. Set a fallback for graceful handling of removed enum members:

from enum import Enum


class Status(Enum):
    ACTIVE = "active"
    ARCHIVED = "archived"
    UNKNOWN = "unknown"


Status.VERSIONABLE_FALLBACK = Status.UNKNOWN  # Old values deserialize to UNKNOWN

Literal Fields

Use literalFallback for graceful handling of invalid literal values from old files:

from versionable import literalFallback


@dataclass
class Config(Versionable, version=1, hash="..."):
    mode: Literal["fast", "balanced", "slow"] = literalFallback("balanced")

Custom Types

Option 1 — registerConverter (for third-party types or complex serialization):

from versionable import registerConverter


registerConverter(
    Coord,
    serialize=lambda v: {"lat": v.lat, "lon": v.lon},
    deserialize=lambda v, _cls: Coord(v["lat"], v["lon"]),
)

Option 2 — VersionableValue protocol (for your own types mapping to a single primitive):

from versionable import VersionableValue


class UserId(VersionableValue):
    def __init__(self, value: str) -> None:
        self.value = value

    def toValue(self) -> str:
        return self.value

    @classmethod
    def fromValue(cls, value: str) -> UserId:
        return cls(value)

Migrations

When you change a class’s fields, increment version, update hash, and add a migration so old files load correctly.

Declarative Migrations

@dataclass
class Config(Versionable, version=3, hash="x1y2z3"):
    name: str
    timeout_s: float = 30.0
    retries: int = 3

    class Migrate:
        # v1 → v2: renamed "title" to "name"
        v1 = Migration().rename("title", "name")

        # v2 → v3: added "retries" with default for old files
        v2 = Migration().add("retries", default=1)

Available operations (chainable):

Operation

Description

.rename(old, new)

Rename a field

.drop(field)

Remove a field from old data

.add(field, default=value)

Add field with default for old files

.convert(field, via=fn)

Transform a field’s value

.derive(target, from_=source, via=fn)

Create new field from existing

.split(field, into={...})

Split one field into multiple

.merge(fields=[...], into=name, via=fn)

Merge multiple fields into one

.requiresUpgrade()

Mark as needing in-place rewrite

.then(other_migration)

Chain another migration

Chain multiple operations: Migration().rename("a", "b").drop("c").add("d", default=0)

Imperative Migrations

For branching logic or complex transformations:

from versionable import MigrationContext, migration


class Migrate:
    @migration(fromVersion=2)
    def from_v2(ctx: MigrationContext) -> None:
        raw = ctx.pop("rawData")
        ctx["timestamps"] = [row[0] for row in raw]
        ctx["values"] = [row[1] for row in raw]

Migrations apply sequentially: a v1 file on a v5 class runs v1 → v2 → v3 → v4 → v5.

Renaming a Class

Use old_names to load files saved under a previous class name:

@dataclass
class SensorConfig(Versionable, version=2, hash="...", old_names=["SensorSettings"]):
    ...

Introspection

import versionable
from versionable import metadata, getVersionableFields, registeredClasses

# Schema metadata
meta = metadata(SensorConfig)
meta.version       # int
meta.hash          # str (6 chars)
meta.name          # str
meta.fields        # list[str]

# Field types
fields = getVersionableFields(SensorConfig)  # dict[str, type]

# Compute hash (paste into hash= parameter)
SensorConfig.hash()  # str

# All registered classes
registeredClasses()  # dict[name, type]

Error Handling

VersionableError (base — catch-all)
├── HashMismatchError      — hash= doesn't match fields (raised at import time)
├── VersionError           — file is newer than class, or missing migrations
├── MigrationError         — migration failed to apply
├── ArrayNotLoadedError    — accessing array loaded with metadataOnly=True
├── UpgradeRequiredError   — migration needs upgradeInPlace=True
├── UnknownFieldError      — file has field not in class (only with unknown="error")
├── ConverterError         — type conversion failed
└── BackendError           — file I/O or backend operation failed

All exceptions are importable from versionable:

from versionable import VersionableError, HashMismatchError, BackendError

Common Patterns

Configuration file with commented defaults

versionable.save(config, "defaults.yaml", commentDefaults=True)

Produces YAML/TOML where fields at their default value are commented out, making it easy to see what was customized.

Scanning HDF5 metadata without loading arrays

for path in Path("data/").glob("*.h5"):
    obj = versionable.load(Experiment, path, metadataOnly=True)
    print(f"{path}: {obj.name}, {obj.timestamp}")

Dynamic loading with type dispatch

obj = versionable.loadDynamic("unknown_file.yaml")
match type(obj).__name__:
    case "SensorConfig":
        processSensor(obj)
    case "ExperimentResult":
        processResult(obj)

Registering existing backends for custom extensions

Use registerBackend to map new file extensions to a built-in backend class:

from versionable import JsonBackend, registerBackend

registerBackend([".jsonc", ".json5"], JsonBackend)

All four backend classes are importable from versionable: JsonBackend, TomlBackend, YamlBackend, Hdf5Backend.

Writing a custom backend

from versionable import Backend, registerBackend


class MsgPackBackend(Backend):
    nativeTypes: set[type] = set()

    def save(self, fields: dict, meta: dict, path, *, cls: type, **kwargs) -> None: ...
    def load(self, path) -> tuple[dict, dict]: ...


registerBackend([".msgpack"], MsgPackBackend)

The save() method receives raw (unserialized) field values and the Versionable class. Call serialize() internally for dict-based formats, or handle type dispatch directly for binary formats.