I vibe-coded a quantum-chemical program in one day
During my PhD at the Mulliken Center for Theoretical Chemistry in Bonn, I wrote a quantum-chemical code called AICCM, an object-oriented educational implementation of the Cyclic Cluster Model (CCM) at the Hartree-Fock level. If you want the formal reference, it is described in:
M. F. Peintinger, T. Bredow, The cyclic cluster model at Hartree-Fock level, Journal of Computational Chemistry 35 (11), 839-846 (2014). DOI: 10.1002/jcc.23550
That work took a good chunk of my PhD. Years of implementation, debugging, validating against CRYSTAL and other periodic codes, reading obscure papers on Wigner-Seitz integration and boundary handling of three- and four-center integrals. Real work. The kind that leaves scars.
Alongside AICCM I also developed the pob-TZVP basis sets, a consistent triple-zeta valence family tuned specifically for periodic solid-state calculations. They were built for CRYSTAL, but from the start I had AICCM in mind as a second home for them:
M. F. Peintinger, D. V. Oliveira, T. Bredow, Consistent Gaussian basis sets of triple-zeta valence with polarization quality for solid-state calculations, Journal of Computational Chemistry 34 (6), 451-459 (2013). DOI: 10.1002/jcc.23153
These basis sets matter for the story later in this post, because vibe-qc already has a slot in its basis library waiting for them.
This week I listened to a podcast on vibe-coding, and the thought stuck in my head: what happens if I hand a modern coding agent a task I genuinely understand, something deeply non-trivial, and then resist the urge to help code? Not a toy CRUD app. Not a weekend script. A quantum chemistry program. The kind of thing where a single wrong sign in a density matrix silently gives you garbage for three hours of debugging before you notice.
So I created a blank git repo on my GitLab server, called it vibe-qc, and gave Claude the task. Just the task. No starter code, no reference implementation to crib from, no hand-holding on the math. I steered at the architecture level and made scoping decisions, but I did not write code. Not one line.
What I expected vs. what happened
I honestly expected it to crash out somewhere around the two-electron integrals, or to produce a plausible-looking SCF loop that silently disagreed with the reference by a few millihartree. That is the usual failure mode when someone writes this code for the first time: it runs, it looks right, and the total energy is wrong in the fourth digit.
Instead, by the end of the day we had a working program that agreed with PySCF to machine precision. It did fall into one trap I found amusing: at one point it tried to run restricted Hartree-Fock on lithium. Lithium has three electrons. You cannot do closed-shell RHF on an odd number of electrons. You need an open-shell method. We added UHF a few commits later and the problem went away, but it is telling that an AI coding agent can reproduce the exact kind of mistake a first-year grad student makes.
What got built in one day
16 commits. Here is the inventory.
Architecture
Python frontend with a C++17 numerical core, bridged by pybind11. CMake plus scikit-build-core for the build system, so pip install -e . just works. About 3500 lines of C++ for integrals, SCF drivers, gradients, and DFT machinery. libint2 v2.13.1 built from source under third_party/libint/ with max_am=5 and first-order derivatives enabled. Homebrew libxc 7.0 for the 500+ exchange-correlation functionals.
The GIL is released during native work, so the Python frontend is not a bottleneck. I had asked about this up front and was reassured that Python would not become a restriction for multi-core scaling later.
Hartree-Fock, closed and open shell
- RHF with Hcore or SAD initial guess, symmetric orthogonalization via S to the minus one-half, optional density damping.
- UHF with separate alpha and beta densities, reporting <S²> as a spin-contamination diagnostic. Closed-shell UHF collapses to RHF at 1e-14.
- Pulay DIIS convergence accelerator. On H₂O we measured 52 iterations (with damping 0.5) collapsing to 9 iterations (with DIIS). Per-spin DIIS for UHF.
- SAD initial guess (Superposition of Atomic Densities) that runs a fractional-occupation atomic SCF per unique element. This was added specifically after UHF hit a wrong local minimum on OH in 6-31G*. The fix was immediate.
- Analytic nuclear gradients for both RHF and UHF via the Pople-Binkley formula with an energy-weighted density matrix. Verified against PySCF at 1e-8 Ha/bohr.
Density Functional Theory
- Numerical integration grid: Treutler-Ahlrichs M4 radial (Chebyshev-2nd-kind nodes) combined with a Gauss-Legendre-in-cos(θ) and uniform-φ angular product, partitioned across atoms by Becke fuzzy cells with the iterated switch function. The default medium grid integrates the H-1s density to 1e-10.
- AO evaluation on the grid: both χ and ∇χ at every grid point, supporting Cartesian or pure spherical shells. Analytic AO gradients verified against finite differences at 1e-10.
- libxc wrapper accepting alias names (“LDA”, “PBE”, “B3LYP”) or explicit comma-separated XC_… IDs, with hybrid fractions detected automatically.
- RKS SCF driver with full energy decomposition (E_core, E_J, α·E_K, E_xc, E_nuc). Matches PySCF to 5e-11 Ha on LDA and to grid accuracy (~1e-6 Ha) on B3LYP and PBE.
ASE integration
This was the decision I am most happy about in retrospect. Rather than write a custom input parser, we wired the code into ASE (the Atomic Simulation Environment) as a proper Calculator subclass. That gave us, for free:
- Input readers for XYZ, CIF, PDB, POSCAR, Gaussian inputs, and basically every format chemists actually use.
- Geometry optimization via
ase.optimize.BFGS. H₂O in HF/STO-3G with a distorted starting geometry converges in 7 BFGS steps to r(OH) = 0.989 Å and angle HOH = 100.0°, exactly the known literature value. - Implicit method routing: HF with multiplicity 1 goes to RHF with forces, HF with multiplicity > 1 goes to UHF with forces, DFT with multiplicity 1 goes to RKS (energy-only at day’s end).
- Clean unit handling via
ase.units.Bohrandase.units.Hartree.
Basis set library
90 standard Gaussian .g94 basis sets shipped by libint are exposed automatically (STO-3G through aug-cc-pVQZ, the def2 family, plus the JKFIT, JFIT, and CABS auxiliary sets). On top of that there is a basis_library/custom/ directory where a user can drop their own .g94 files. A setup script assembles the union, with custom files overriding standard ones by name. This path is ready for dropping in my own pob-TZVP and pob-TZVP-D basis sets (the ones I developed for solid-state calculations) once I dig up the files.
Tests
131 tests across 13 files, full suite runs in about 10 seconds. Every numerical feature is cross-checked against PySCF as the reference. A representative slice of what actually passes:
- H₂ / STO-3G at R = 1.4 bohr: E_HF = −1.116714325063 Ha, difference from PySCF = −1.87e-14.
- H₂O / STO-3G at experimental geometry: E_HF = −74.964453863067 Ha, difference from PySCF = −2.84e-14.
- OH doublet / STO-3G UHF: <S²> = 0.7533 (ideal 0.75, about 0.3% spin contamination).
- H₂O / STO-3G / LDA: 5e-11 Ha vs PySCF.
- H₂O / STO-3G / B3LYP: 1.3e-7 Ha (grid-accuracy limited).
- CH₄ / 6-31G* / PBE: under 3e-6 Ha.
Documentation
A README with quick-start examples for both macOS and Linux, a detailed installation doc covering Homebrew, Debian/Ubuntu, Fedora/RHEL, and Conda paths, a quickstart tour, and a basis-library README. All written by the same agent, in one day, alongside the code.
What I deliberately did not add yet
UKS (unrestricted Kohn-Sham), analytic DFT gradients, ROHF, OpenMP inside the Fock and ERI-gradient loops, MPI for multi-node, post-HF correlation methods like MP2 and CCSD, and of course the actual cyclic cluster model. Every foundation for CCM is now in place: libint with periodicity hooks available, ASE’s CIF reader already wired in, and a validated HF/DFT SCF stack on finite molecules that I trust.
How I steered it
If you are thinking about trying this yourself, the interesting part was not the code, it was the scoping. A few patterns I noticed in my own behavior over the day:
Anticipatory infrastructure. Early in the morning, when the code was still just producing one-electron integrals, I made the decision to build libint from source with first-order derivatives enabled. Homebrew’s libint would have been faster to set up, but it did not include the derivative integrals. I knew we would need gradients for geometry optimization later, and I did not want to tear down and rebuild the install halfway through the day. That single 10-minute decision paid off twice: once for HF gradients, once for UHF gradients.
Validation before features. After RHF produced a plausible number, I pushed for formal pytest coverage against PySCF before we added DIIS. That caught two real bugs immediately: a Cartesian-vs-spherical d-orbital basis-purity issue, and a last-iteration MO self-consistency bug. Both were silent errors. Neither would have shown up if I had just eyeballed the total energy.
End-user surface before internals. We wired in the ASE Calculator and the logging integration before adding forces, before DFT, before SAD. The moment I caught the calculator running silently without emitting any trace, I paused the next phase and inserted a logging phase. It is much easier to debug a 50-iteration SCF when you can see the DIIS error norm falling.
Realistic descoping. My opening plan was “UHF and ROHF, then DFT.” By mid-afternoon it was “UHF only, ROHF can wait.” On the DFT grid, I accepted a Gauss-Legendre product angular grid instead of a proper Lebedev grid for the MVP. These were not compromises on correctness, they were compromises on sophistication, and they let us ship a working DFT implementation the same day.
Commit cadence. Every phase got its own focused commit, with a descriptive message, passing the full test suite. No force-pushes, no amends. 16 clean revertable checkpoints.
Standing on shoulders
None of this would have been possible in a day if the hard parts had not already been solved by other people. The integral engine is libint (Ed Valeev), the 500+ DFT functionals come from libxc (Susi Lehtola and co.), linear algebra is Eigen, the Python bindings are pybind11, the user-facing surface is ASE (Larsen et al.), and every single number in the test suite is cross-checked against PySCF (Qiming Sun et al.) as the reference oracle. On the methods side we are implementing formulas from Pulay (DIIS), Pople, Krishnan, Schlegel and Binkley (analytic HF gradients), Treutler and Ahlrichs (radial grid), and Becke (atomic partitioning). Full citations and a guided tour of the stack are in the companion post.
Watching it work was the best part
I need to say this plainly: I had a blast. I spent the day with Anthropic’s Claude Opus 4.7 as a pair-programmer, and watching the model work through this problem was genuinely thrilling. Every time I came back from a coffee or a meeting, there were new commits, clean ones, with descriptive messages, and a test suite that had grown and still passed. The SCF converged on H₂O. Then DIIS landed and the iteration count dropped from 52 to 9. Then gradients showed up and BFGS walked the water molecule to its equilibrium geometry. Then DFT arrived and B3LYP matched PySCF to grid accuracy. Each of those moments felt like a little gift.
What took me multiple years during my PhD, Opus 4.7 produced in a day. Not because vibe-qc is AICCM, it is not. AICCM implements real periodic CCM with four-center integrals at Wigner-Seitz boundaries, which is a genuinely hard problem and is not in vibe-qc yet. But the molecular HF + DFT + gradients + ASE integration layer, which in a traditional PhD timeline is itself a one- or two-year project, collapsed into a single afternoon. That is a kind of leverage I have never experienced before, and I am grinning about it.
The PhD was not wasted, quite the opposite. The reason I could steer this build effectively, and the reason I knew to demand a SAD guess the moment UHF stalled, and the reason I caught the lithium-on-RHF mistake instantly, is because I spent those years learning quantum chemistry from the inside. The expertise did not become obsolete. The expertise became the steering wheel, and it turns out steering is a deeply satisfying way to use what you know.
The really exciting part is what this unlocks. Ideas that used to be “interesting but I cannot justify six months to try it” suddenly become “let us see what happens this weekend.” The research questions I filed away after my PhD, the ones I always meant to come back to, are now within reach in a way they were not last year. That is a good day.
The next step, when I find another day, is the cyclic cluster model. That is the part where I find out whether the model can actually extend beyond well-trodden textbook ground into the corner of the literature that I personally carved out a decade ago. I am genuinely curious. I will report back.
The code lives at vibe-qc.com. If you want to reproduce the experiment yourself, the recipe is: a blank repo, a clear scoping brief, and the discipline to let the model cook.















