QVF v1 grows up: basis sets, full reactions, and a reference viewer

Three weeks ago I argued that quantum chemistry needs a single, modern container format and introduced QVF as the candidate. The post laid out the goals: one zip per calculation, typed sections, explicit units, schema validated, sha256 per binary, vendor extensions under an x_<vendor> namespace, a controlled vocabulary of kinds that lets readers ignore what they do not understand.
The v0.4 spec at the time was deliberately a starting point. Three things were obviously missing.
The first was a way to ship the basis set and the MO coefficients alongside the rest of the calculation, so a viewer could render any orbital at any resolution without the writer baking everything to voxels first. The second was a way to package a full reaction in one section, with reactant, transition state, intermediates, and product called out as such instead of buried inside an opaque trajectory. The third was a way to express a difference of two scalar fields, the kind of “where did the electrons go” picture every reaction paper wants but no widely shared format names.
All three landed. v1 of QVF is now a richer container, the writer in vibe-qc emits all of them, and there is a working reference viewer called vibe-view that opens the archive in a browser tab. This post walks through what changed and how the implementation hangs together.
Three new section kinds
The QVF kind registry grew by three implemented kinds and a new field on the manifest. They are small additions in line count and large in what they enable.
wavefunction.gto carries the atom centered Gaussian basis (shells with centers, angular momentum, exponents, contraction coefficients) plus the MO coefficient matrix of shape [n_mo, n_ao]. A renderer that knows how to evaluate Gaussians can compute any orbital on a grid of its choosing. A renderer that does not know about it lists the section as “skipped, unsupported” and keeps using whatever pre-baked volume.orbital sections are also present. That fallback is the whole point of the controlled vocabulary.
reaction.path is a self contained reaction path. It carries frames as float64 [n_frames, n_atoms, 3] in Angstrom, byte identical to the existing trajectory binary layout, so every consumer that decodes a trajectory already decodes this. On top of the frames it adds a waypoints list, each entry naming a frame_index, a label, and a kind from a small enum (reactant, transition_state, intermediate, product, point). Energies are in Hartree, with the _eh suffix on the JSON key making the unit explicit.
reaction.waypoints is the lightweight counterpart. It carries only the waypoint annotations and points at an existing trajectory section via a trajectory_ref. Use it when the frames are already in the archive and you simply want to flag which frames are the TS or the intermediate. The validator confirms at archive time that trajectory_ref resolves to a real trajectory section and that every frame_index falls inside its frame range.
volume.difference is structurally identical to volume.density, a grid descriptor plus a 3D float32 or float64 binary, but it carries optional operand_a and operand_b fields. If one is given the other is required, and the validator confirms that both ids resolve to sections in the same archive. Viewers render the field with a diverging colormap centered at zero. Viewers that do not know the kind do not pretend it is a density, they fall through to “skipped, unsupported” per the partial support contract.
The manifest also picked up viewer_defaults.bookmarks, an ordered list of camera positions in the VTK convention (position, focal point, view up, plus exactly one of view angle for perspective or parallel scale for orthographic). Bookmarks give two things at once: consistent framing across viewers, and deterministic playback for movie export workflows. The schema enforces the perspective-XOR-orthographic constraint with a oneOf.
Why the basis set belongs in the archive

The first post argued against shipping volumetric data as ASCII, on the grounds that a 200³ grid is around 8 million values and the modern era can afford binary. There is a related point that the first post did not make explicit. A pre-sampled orbital is a snapshot. Once written, the resolution is frozen, the bounding box is frozen, and the interesting region (which is rarely the entire bounding box) is also frozen. To zoom in or to render a different MO, the producer has to be re-invoked.
The Molden format figured this out in the 1990s. Ship the basis and the coefficients, evaluate at view time. wavefunction.gto follows that lineage, ported into the QVF container.
The contract is precise because vague Gaussian conventions are a famous source of silent bugs. Exponents are in bohr⁻². Contraction coefficients apply to normalized primitives, the same convention used by Molden, libint, NWChem, the .g94 files vibe-qc ships, and BSE downloads. Spherical (pure) shells use the m = −l, …, +l ordering. Cartesian shells use the libint lexicographic ordering over (i, j, k) with i + j + k = l, i descending then j descending. A pre-baked volume section can carry a wavefunction_ref back-reference so a re-sampling renderer can discard the blob and recompute at its own resolution if it prefers.
Periodic Bloch wavefunctions are out of scope for v1 of this kind. Crystal MO coefficients are k-resolved and supercell-sampled, which is a separate design pass. Producers must not emit wavefunction.gto for structures with any true entry in pbc. The schema does not enforce that yet (a periodic structure is a different section), but the validator and the renderers do.
Visualizing a full reaction in one section

Reaction visualization has lived in an awkward middle ground for a long time. The frames are usually written as an XYZ trajectory, which is fine for playback but tells a viewer nothing about which frame is the TS or the product. The energies live in a separate log. The reaction coordinate, if anyone computed it, lives somewhere else entirely. To produce the standard “energy along the path with waypoints called out” figure, a script glues the three sources back together.
reaction.path collapses that into one section. The frames are there. The energies are there, in Hartree, one per frame. An optional reaction_coordinate array gives one scalar per frame for the x-axis when “frame index” is not the right unit. The waypoints list calls out which frames matter and what they are. A consumer that opens the archive has everything it needs to render the figure with no glue code.
The split between reaction.path (self contained) and reaction.waypoints (annotation over an existing trajectory) is a deliberate data flow choice, not a versioning hedge. A pre-computed IRC or single-step NEB run naturally produces frames plus waypoints in one go and emits reaction.path. A geometry optimisation workflow that discovers it was actually a reaction in post-processing already has a trajectory section and only needs to overlay annotations. Duplicating the coords in that second case would be wasteful and the validator can confirm the cross-reference resolves, so a lightweight annotation kind is the right shape.
Where did the electrons go

Difference density plots are everywhere in catalysis, photochemistry, and reactivity work. They are also represented inconsistently across tools. Some viewers do the subtraction at view time (which means the user has to load two density files and configure the difference). Others write the difference back out as a fresh density and lose the operand information. Neither is satisfying.
QVF takes the second path with one critical addition: the difference section can name the operands it was computed from, by section id, in the same archive. Sign convention is fixed (data = operand_a − operand_b) so there is no convention drift between producer and consumer. The validator resolves the names at archive time, so a producer that mis-spells rho_reactant fails the gate at write time, not later in someone else’s renderer. Viewers render the field with a diverging colormap centered at zero, which is the right visual encoding for a signed scalar field. A viewer that does not implement the kind reports it as “skipped, unsupported” and moves on, never trying to render it as a density with a single-sided colormap.
One schema, three consumers, zero drift

The most consequential change in this round was not a new kind. It was making the JSON Schema the single source of truth for the format and wiring it through the entire producer-validator-consumer triangle.
The canonical schema lives in the vibe-qc repository at python/vibeqc/output/formats/qvf_manifest.schema.json. The producer (write_qvf()) loads it and runs the manifest it just wrote through the schema before returning the archive path. A regression in the writer cannot ship a malformed manifest because the writer refuses to return one. The validator (validate_qvf()) loads the same file and uses it to gate everything: kind shape, required members, dtype names, binary shape rank, vendor namespacing, and the v1 const on the version field. The viewer ships a copy of the schema as a symlink to the canonical file, and a sha256-identity test in both test suites fails loudly if the symlink ever degrades into a divergent copy. Adding a new kind takes three coordinated edits (writer, schema, kind registry), and a drift-guard test asserts that all three stay in lock step.
A few things the schema cannot express end up as code in the validator. Cross-references are checked there: volume.difference.operand_a/_b must resolve, reaction.waypoints.trajectory_ref must point at a real trajectory, every frame_index must be in range. Binary member integrity is checked there too: every format: binary member’s len(bytes) must equal itemsize(dtype) * product(shape). A silent under-sized buffer or a dtype mismatch fails the gate at validation time. The sha256 in the manifest is verified for every member before use.
This is the part of the work I am most pleased with. The first post talked about “schema validation” as a virtue in the abstract. v1 turns that into infrastructure. Anyone who writes a QVF and runs it through validate_qvf learns immediately whether they have shipped a valid archive. Anyone who reads a QVF can rely on the schema as a contract, not a wish.
The writer in vibe-qc
The producer side, in python/vibeqc/output/formats/qvf.py, is a single file at this point with one entry point per section kind. The public API is small.
from vibeqc.output.formats.qvf import (
write_qvf, validate_qvf,
qvf_density_data, qvf_mo_data, qvf_wf_data,
)
The convenience helpers live next to write_qvf because evaluating a density or an MO on a grid is something the writer needs to do anyway, and exposing the same code as a one-shot helper means a user can package those outputs without having to learn the internal section dict shape. qvf_wf_data in particular is the helper that turns a converged SCF result into the wavefunction.gto payload: it walks the basis shells, normalises the AO ordering, attaches the MO energies and occupations, and detects restricted vs unrestricted automatically.
Two small but useful properties of the writer. First, every binary member is written with a sha256 attached in the manifest entry, computed from the bytes on disk. There is no separate hash pass at validation time. Second, the archive has a 1 Gvoxel ceiling per binary member (about 8 GiB float64 worst case), which is also enforced by the validator. The two ceilings are pinned to the same constant in the source so a payload the writer accepts cannot later be rejected by the validator as “too large”.
Test discipline is the boring part that matters. Round-trip tests run write_qvf followed by validate_qvf on every implemented kind, then re-open the archive and decode the binary members. Drift-guard tests assert that the writer’s _IMPLEMENTED_KINDS registry, the schema’s oneOf branches, and the viewer’s SUPPORTED_KINDS stay aligned. Failure-path tests exercise the cross-reference checks (operand_a without operand_b, trajectory_ref pointing at a non-trajectory section, frame_index out of range) and confirm the validator catches them.
vibe-view: the reference viewer

The first post promised a viewer. vibe-view is now in the repository as a peer sub-project of vibe-qc, with its own pyproject.toml and CLI entry point. Open a QVF with one command:
pip install -e 'vibe-view[gpu]'
vibe-view open h2o_pbe0.qvf
The stack is PyVista + Trame, both pure Python on top of VTK. There is no JavaScript build step, no npm lockfile, no two-language project. The tradeoff is that the UI is server-rendered and ships pixels over a websocket, which is a fine choice for a viewer that should “just work” after pip install and a poor choice for a viewer that needs to run inside a web page someone else owns. The Trame backend lets you embed if needed, but the default mode is “open in a browser tab against a local server” and that covers the common case.
The viewer is small and the architecture is deliberately shallow. The QVFReader in qvf.py opens the zip, validates the manifest against the schema (the same schema as the writer), parses sections into dataclasses, and exposes typed accessors. Each kind has one renderer file under renderers/ and a one-line entry in the dispatch table. The application layer in app.py wires the sidebar, the viewport, the per-kind hint panels (isovalue slider for volumes, MO selector for wavefunctions, frame slider for trajectories and reactions), and the camera bookmark controls.
Four rules govern the partial-support contract, and they show up at the file boundaries.
Rule 1: supported kinds are declared in a single explicit set. No clever inference, no inheritance, no implicit “everything that looks like a volume should render”. A kind is in the set or it is not.
Rule 2: unknown kinds are classified as “skipped, unsupported” and listed in the banner. They are not errors. An archive with a future kind the viewer was not built against still opens, still renders everything else, and tells the user what was passed over.
Rule 3: vendor-namespace kinds (x_<vendor>.*) have their vendor name extracted and shown next to the skipped-reason. A reader can quickly see whether an archive was produced by a specific vendor toolchain by glancing at the banner.
Rule 4: every member’s sha256 is verified before its bytes are used. JSON members and binary members both. A mismatch is a hard error for that section only, and the viewer continues rendering the others. The failing section is marked “error, sha256 mismatch” in the banner.
The wavefunction renderer is the most interesting one to look at. It walks the basis shells, evaluates the primitive Gaussian normalisation (spherical for the radial part, plus a Cartesian per-component correction for l ≥ 2), generates the angular factors in the file’s AO order (libint lexicographic for Cartesian, m = −l, …, +l for spherical), and sums Σ c_µ χ_µ(r) on a Cartesian grid auto-sized from the atomic bounding box. The output is a dense float32 volume that flows into the same VolumeRenderer that handles a pre-baked volume.orbital, so isosurface rendering is shared. Shells with l > 3 are silently skipped, which loses accuracy on basis sets with g and h functions and keeps the viewer usable for the overwhelming majority of files. The full evaluator is on the roadmap.
The reaction renderer reuses the trajectory animation machinery (because the binary layout is identical) and layers the waypoint markers onto the energy curve. Click a waypoint and the viewer jumps to that frame. The energy plot is rendered server-side via matplotlib and shipped as a PNG, which is good enough for a 2D plot and avoids pulling in a second plotting stack.
What comes next
Several kinds are in the registry as “reserved”: volume.potential (electrostatic potential maps), volume.orbital_projection (NTO and IBO projections), topology.qtaim and topology.elf_basins (atoms-in-molecules basin meshes), and projections.lcao (LCAO projections of band wavefunctions). The validator accepts them today so a vendor producer can ship them ahead of the canonical writer, but the writer does not emit them yet. They will land as the upstream computations land.
The periodic wavefunction.gto question is the next real design pass. Bloch wavefunctions are k-resolved and the right way to package them is not obvious. A list of k-points each with its own MO coefficient block is the naive answer and is probably wrong because the natural consumer is a re-sampling renderer that wants to evaluate a single Bloch state on a supercell grid. I would rather get this one right than ship it fast.
On the viewer side, the obvious gaps are recordable movies (the bookmarks infrastructure is in place, the encoder is not), a proper file dialog so users do not have to type a path, and the higher-l shells in the wavefunction renderer. None of these are format changes. They are viewer work.
For other codes and visualization tools
The invitation from the first post stands and is now easier to act on. The format is specified, the schema is enforced, the writer is open source under MPL 2.0, the validator runs against any archive, and a reference viewer demonstrates the full v1 surface end to end. Adding a writer for QVF to another QC code is the kind of task a small focused effort can finish in a few days. Adding a reader for the kinds you care about is the kind of task a coding agent can finish in an afternoon, given the schema and a few example archives.
If you produce orbital and density data in a QC code, please consider emitting wavefunction.gto rather than (or alongside) pre-baked cubes. If you produce reaction-path data, please consider emitting reaction.path instead of yet another bespoke trajectory variant. If you build a viewer, please consider adopting the four-rule partial-support contract so unknown kinds are listed, not errored. Each of those choices costs the producer or the consumer very little and makes the ecosystem dramatically more useful for the next person who picks up your file.
The specification, the writer, the validator, and the vibe-view source are all in the vibe-qc repository. The design document lives at docs/design_qcv_format.md. Feedback, adoption notes, and proposals for new kinds are welcome at mpei@vibe-qc.com.















