Skip to content

Version 3.1 Refactor#483

Open
pprcht wants to merge 416 commits into
crest-lab:masterfrom
pprcht:experimental
Open

Version 3.1 Refactor#483
pprcht wants to merge 416 commits into
crest-lab:masterfrom
pprcht:experimental

Conversation

@pprcht

@pprcht pprcht commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Major code refactor into Version 3.1.0

Changes collected over the past year.
Some important additions:

1. g-xTB via tblite (see #430)

The g-xTB method (a GFN-family method, https://chemrxiv.org/doi/10.26434/chemrxiv-2025-bjxvt) can now be invoked through the provided xtb binary or the tblite library in addition to the existing xtb system-call path.

Key changes in tblite_api.F90:

  • New calculator level xtblvl%gxtb mapped to new_gxtb_calculator.
  • Fermi temperature forced to 0 K for g-xTB (the method requires it; the previous default of 300 K gave wrong energies).
  • Numerical gradient internally delegated to g-xTB's own implementation to avoid repeated calculator reinitialization overhead.

Warning

The default build only supports the static xtb binary with g-xTB (retrievable here: https://github.com/grimme-lab/g-xtb). This is SLOW due to the systemcall infrastructure. The tblite build with g-xTB is already prepared and will speed this up, but will only be made the default once g-xTB releases there.
Build flag for the latter: -DWITH_GXTB=ON (requires WITH_TBLITE). A compile-time constant have_gxtb lets the parser emit a helpful error when the method is requested but not compiled in.

A STATIC CREST BINARY WITH TBLITE/G-XTB IS AVAILABLE UPON REQUEST.


2. Permutation-Invariant RMSD (iRMSD) (https://doi.org/10.1021/acs.jcim.4c02143)

A new irmsd_module (src/sorting/irmsd_module.f90) replaces the old fixed-order RMSD with an algorithm that finds the optimal atom permutation before computing the distance. Two assignment solvers are provided: a classic Hungarian algorithm and a Linear Sum Assignment Problem (LSAP) alternative. Atom ranks are derived from element identity and local connectivity so that the search only permutes chemically equivalent atoms, keeping the cost manageable.

A cache type (rmsd_cache, rmsd_core_cache) avoids repeated heap allocations in tight OpenMP loops. Optional topology-proxy checks can enforce stereochemical constraints during the permutation search.

Practical impact:

  • CREGEN now exports iRMSD-based sorting (cregen_irmsd_sort) as an additional conformer-uniqueness criterion
  • A standalone --sort irmsd run-type is exposed through the CLI and writes aligned structures to irmsd.xyz.
  • The --inversion flag controls whether mirror images are treated as identical.
  • Consistent with the standalone Python tool https://github.com/pprcht/irmsd, but exposed to OpenMP parallelism for processing ensembles.

3. Machine Learning Potential (MLIP) Interface via fmlip-relay

A new calculator back-end (src/calculator/mlip_sc.F90, module mlip_sc) calls Python-based MLIPs through fmlip-relay , a persistent relay server that keeps the Python interpreter alive between single-point calls, avoiding repeated interpreter start-up overhead.

Supported backends are whatever fmlip-relay exposes (currently MACE-OFF, UMA, and compatible ASE-wrapped models). Multiple relay server instances can run on different ports for parallel execution, with a configurable thread limit per instance.

Build flag: -DWITH_FMLIP_RELAY=ON (CMake) / -Dfmlip_relay=enabled (Meson). The code compiles cleanly without the dependency; all MLIP paths are guarded by #ifdef WITH_FMLIP_RELAY.

TOML keywords under [[calculation.level]]:

method    = "mace-off"
modelsize = "medium"

or with the CLI

--mlip mace-off/uma

Important

This utility requires a local install of the fmlip-relay-server which is accessible in CREST's subprojects and can be done with a pip install -e ./subprojects/fmlip-relay
The persistent python socket produces some overhead and may oversubscribe in parallel MD/OPT/etc loops. Therefore the recommended use case is as --refine option to re-rank force-field optimized geometries with MLIP singlepoint energies.


5. Spin-Polarized Calculations (see #480)

GFN2-xTB and related methods can now be run in open-shell (spin-polarized) mode via tblite.

  • New spin_polarized field on calculation_settings; set via TOML keyword spin_polarized = true or through uhf > 0.
  • The tblite API allocates a spin_polarization object and calls get_spin_constants from the tblite library when this flag is active.
  • nspin is set to 2 instead of 1, which activates the spin-unrestricted solver path in tblite.
  • A unit test (test/test_tblite.F90) validates energy and gradient for an open-shell test molecule.

6. External Electric Field for tblite (see #449)

A static external electric field can be applied to tblite single-points and optimizations.

TOML:

[[calculation.level]]
method = "gfn2"
efield = [0.0, 0.0, 0.05]   # in V/Å, x/y/z components

or via the CLI:

-efield <x> <y> <z>

Handled by the new public routine tblite_add_efield in tblite_api.F90. The field is converted to atomic units before being passed to the tblite context.


7. Extended XYZ (extxyz) Format Support

A complete reader/writer for the extended XYZ (ASE-compatible) format is added to the molecule I/O layer (src/molecule/io.f90):

  • read_extxyz_frame parses per-atom properties and key-value pairs from the comment line (energy, forces, lattice, …).
  • The mol%write() polymorphic dispatch now routes .extxyz file extensions to the extended writer.
  • Energy and force units are tracked and converted to CREST internal units (Hartree/Bohr) on read, and written back in the original units on write, indicated by a dedicated energy_units key in the extxyz
  • Ensemble readers (type_ensemble) now accept extxyz files for batch input to conformer workflows.
  • Trajectory output from refinement steps and crestopt.log.xyz can be written in extxyz format, enabling direct round-trip with ASE or MACE pipelines.

8. CREGEN Refactor and Improved Output

The conformer ranking/filtering code underwent a substantial internal reorganization:

  • All CREGEN routines are now collected under src/sorting/ as a proper module directory (cregen.f90, cregen_utils.f90, cregen_interfaces.f90, ensemblecomp.f90, etc.).
  • The newcregen driver logic is simplified; the old monolithic cregen_old.f90 is retained only as a legacy shim.
  • Reconstruction of incomplete ensemble queues is parallelized.
  • Pre-sorting of the queue before reconstruction avoids redundant comparisons.
  • A running 50% accumulative Boltzmann population line is printed during the conformer list output, giving a quick overview of how many structures make up the dominant half of the ensemble.
  • Printout reformatted throughout: column widths, energy/RMSD fields, and group labels are cleaner and consistent.

9. Restart / Checkpoint System (Redesigned) (see, e.g. #440)

The previous restart system stored full ensemble snapshots in memory, which was fragile and memory-intensive. The new implementation (src/restartlog.f90) is a lightweight file-based checkpoint:

  • A small restart_data type records only the stage name (a short string like 'mtd_loop', 'post_collect', 'entropy_smtd') and the last written file path.
  • Checkpoints are written to crest.restart (a plain text file) at each stage boundary.
  • On startup with --restart, CREST reads the checkpoint, prints what was completed, and skips the already-finished stages, re-using the last dumped ensemble file.
  • Integrated into the iMTD-GC conformer search (search_conformers.f90) and the entropy search (search_entropy.f90).

10. Symmetry Detection: C → Fortran Port

The symmetry detection back-end (symmetry_i), originally Patchkovskii's C code from 1996/2003, has been ported to a native Fortran module (src/symmetry_i.f90, ~1800 lines). The C source wrote to shared memory, making OpenMP parallel execution of the symmetry detection impossible.

The Fortran module exposes schoenflies and getsym with an identical interface to the C version but benefits from Fortran's dynamic allocation and avoids the need for a C-to-Fortran bridge in the calling code.

A dedicated test (test/test_getsym.F90) verifies the full set of common point groups (C₁, Cₛ, Cᵢ, Cₙ, Cₙᵥ, Cₙₕ, Dₙ, Dₙₕ, Dₙd, Sₙ, T, Tₕ, Td, O, Oₕ, I, Iₕ).


11. Thermochemistry Enhancements

Truhlar quasi-RRHO treatment

The Truhlar (2011) frequency cutoff model is now available as an alternative to the Grimme quasi-RRHO approach. Low-frequency modes below sthr (cm⁻¹) are treated as free rotors whose entropy contribution is replaced by that of a rotor at the cutoff frequency, avoiding the divergence of the harmonic entropy at zero frequency. Selectable via TOML keyword emodel.

ORCA Hessian reader

A new reader (rdfreq_orca_hess) parses ORCA .hess files directly, extracting frequencies and optionally mass-weighted Hessian data for use in the thermochemistry workflow. This supplements the existing ORCA IR-spectrum reader.

Parallel ΔG calculation

The per-conformer free energy correction (needed in refinement workflows) is now computed in parallel OpenMP sections rather than serially (see above point about symmetry detection)


12. Hybrid Method CLI Syntax

A new parser module (src/parsing/parse_hybrid.f90) recognizes composite method strings on the command line:

Syntax Meaning
A@B Workhorse B for sampling; quality method A for final single-points
A//B Inline single-point refinement at level A during B-driven search
A/sp/B Explicit single-point inline variant
A/opt/B Inline geometry re-optimization at level A

Example: crest mol.xyz --gfn2@gfnff --imtdgc runs the MD/MTD at GFN-FF speed and refines final structures with GFN2-xTB, likewise crest mol.xyz --gxtb//gfnff --imtdgc runs the sampling at GFN-FF and during the sampling re-ranks all structures with g-xTB singlepoints.

Method tokens are validated against the known method list; an unknown token triggers an early, informative error rather than a silent mis-configuration.


13. CLI and TOML Keyword Additions

Keyword / Flag Effect
--freeze / freeze Freeze selected atoms (ported from TOML-only to CLI)
--inversion Allow/disallow mirror-image matching in iRMSD comparisons
--itmdgc Alternative alias for --v3 / iMTD-GC run mode
--allowrestart Enable checkpoint-based restart for conformer searches
--cregen / --sort Rewired: --sort irmsd invokes the iRMSD-aware sorter
spin_polarized (TOML) Open-shell spin-polarized tblite calculation
efield (TOML) External electric field vector for tblite
ceh_guess (TOML/CLI) Use CEH charges as initial guess for GFN-FF / tblite
gxtb / g-xtb (TOML) Select g-xTB via tblite or via xtb system call

The CLI argument parser was refactored to use a processedarg boolean array, preventing double-processing of arguments and giving cleaner error messages for unrecognized flags.


14. QCG (Quantum Cluster Growth) Refactor

The QCG solvation tool was refactored to use the internal calculator layer:

  • The legacy zmolecule type is replaced by a new polymorphic coord_qcg type that extends the standard coord.
  • xtb_sp_qcg and xtb_opt_qcg now dispatch through the standard engrad calculator interface rather than writing xtb input files by hand.
  • The deprecated xtbiff QCG implementation is removed.
  • Internal module structure reorganized into qcg_main.f90, qcg_utils.f90, qcg_misc.f90, qcg_printouts.f90, and qcg_coord_type.f90.
  • QCG frequency calculations (qcg_freq) are now wired through the same thermochem module used by the rest of CREST.

Warning

QCG still requires xtb (>= version 6.7.1) for access to the aISS method.


15. MD / Thermostat Extensions

Two new thermostat algorithms are added to the MD module (src/dynamics/dynamics_module.f90) alongside the existing Berendsen thermostat:

  • Langevin thermostat: stochastic collision model, adds a friction force and a random noise force to each atom at each step. Appropriate for systems where coupling to an implicit solvent bath is desired.
  • Bussi–Donadio–Parrinello (BDP/CSVR) thermostat: stochastic velocity rescaling that preserves the correct canonical ensemble distribution (unlike Berendsen). The kinetic energy is drawn from a chi-squared distribution at each step.

The thermostat type is selected by integer index (thermotype_i): 1 = Berendsen, 2 = Langevin, 3 = BDP/CSVR. The velocity Verlet integrator was also cleaned up to ensure correct ordering of force, velocity, and thermostat steps.


16. Peak Memory Reporting

A new C helper (src/chelpers/mempeak.c) provides a cross-platform get_peak_rss_kb() function that returns the process peak resident set size in KB. Implemented via getrusage on Linux/macOS (with the macOS bytes→KB conversion). Called at program end to include peak RAM usage in the final timing/resource printout.
NOTE: The printout may be slightly flawed for fully static CREST builds.


17. Output File Naming

Two output files are renamed to carry explicit file extensions:

Old name New name
crestopt.log crestopt.log.xyz
crest_dynamics.trj crest_dynamics.trj.xyz

This makes the files immediately recognizable to external viewers and conforms with the convention established for all other trajectory/ensemble outputs.


18. Build System Updates

Meson build system

  • Full Meson build support is added/restored, including static builds with GNU and Intel LLVM (ifx/icx) compilers.
  • meson_options.txt is restructured with documented options for OpenMP, LAPACK/BLAS provider (auto, openblas, mkl, netlib, custom), and optional feature flags.
  • An intel-llvm.ini native file is provided for reproducible Intel oneAPI builds.

CMake

  • New Find*.cmake modules for all tblite-stack dependencies: mctc-lib, mstore, multicharge, s-dftd3, test-drive.
  • New build options: WITH_GXTB (requires WITH_TBLITE), WITH_FMLIP_RELAY.

New git submodules (tblite dependency chain)

mctc-lib, mstore, multicharge, s-dftd3, test-drive are added as submodules so tblite can be built from source.

Updated submodule versions

Submodule Change
tblite → 0.5.0, NOTE: This version does NOT implement g-xTB! A tblite-gxtb version build is available ON REQUEST
multicharge → 0.5.0
gfnff → 0.1.1
dftd4 updated
toml-f updated

Important

If you update and build the CREST source locally, do not forget to update the submodules accordingly! The recommended way is to not let meson/CMake handle that on their own but to use the gt submodule machinery instead. However, all should point to the same specific commits.


19. Post-Search Ensemble Re-ranking and Re-optimization

Two new standalone post-processing flags allow applying a higher-level method to a finished conformer ensemble without re-running the full search:

  • --rerank <method>, recomputes single-point energies at the specified level and re-sorts the ensemble by the new energies (p_prop_multilevel+2). The level string follows the same syntax as --method.
  • --reopt <method>, re-optimizes every conformer at the specified level and then re-sorts (p_prop_multilevel+3).

Both flags add a job to the property queue (env%addjob) so they compose cleanly with other post-search steps (e.g. --finalhess).


20. Ensemble Hessian and Final Hessian Flags

Two new run-mode / property flags provide Hessian-based thermochemistry over conformer ensembles:

  • --ensemblehess <file> (alias --mdhess), sets the run type to crest_ensemblehess, reads the specified ensemble file, and computes a numerical Hessian plus thermochemistry for each conformer. The lowest-energy structure is extracted as the reference geometry.
  • --finalhess, adds a Hessian + free-energy re-ranking step (p_prop_finalhess) to the property queue, intended as a post-conformer-search refinement: each conformer's Boltzmann weight is updated using proper free energies rather than just electronic energies.

21. Protonate / Deprotonate / Tautomerize: New Protocol Active by Default

The redesigned protonation, deprotonation, and tautomerization protocols (previously gated behind an experimental env%legacy = .true. flag) are now the default. The legacy code path is no longer invoked for --protonate, --deprotonate, and --tautomerize.


22. Revision of the shipped examples

Examples in examples/expl-<#>/ now include:

# Topic Molecule
0 Dry run — print settings without computing 1-propanol
1 Single-point energy 1-propanol
2 Geometry optimization 1-propanol
3 Optimization + Hessian (vibrational frequencies) 1-propanol
4 Standalone MD simulation 1-propanol
5 Default iMTD-GC conformer search 1-propanol
6 Two-level conformer search (GFN2//GFN-FF, A//B) 1-propanol
7 iMTD-GC with ALPB implicit solvation (GFN2) 1-propanol
8 Quick iMTD-GC conformer search (with -finalhess) 1-propanol
9 Standalone CREGEN ensemble sorting 1-propanol
10 Constrained conformer search 1-propanol
11 Ensemble optimization (mdopt) 1-propanol
12 NCI sampling mode (iMTD-NCI) water trimer
13 Protonation site sampling uracil
14 Metal/ion adducts (Cs+) alpha-D-glucose
15 Tautomer screening guanine
16 fmlip-relay: geometry optimisation with LJ potential (needs an fmlip-relay install) Ar4 cluster
17 fmlip-relay: geometry optimisation with FairChem UMA model (needs an fmlip-relay and uma/fairchem install) caffeine
18 fmlip-relay: geometry optimisation with MACE-OFF23 model (needs an fmlip-relay and mace install) caffeine

@Growl1234

Copy link
Copy Markdown

Really excited to see the refactor and comming version update!

That said, there's a small issue: with CMake flag WITH_TESTS=OFF, CMake will still try to download test-drive even if it exists in subprojects/ directory:

-- Could NOT find test-drive (missing: test-drive_DIR)
-- Retrieving test-drive from https://github.com/fortran-lang/test-drive

@pprcht

pprcht commented Jun 12, 2026

Copy link
Copy Markdown
Contributor Author

Really excited to see the refactor and comming version update!

That said, there's a small issue: with CMake flag WITH_TESTS=OFF, CMake will still try to download test-drive even if it exists in subprojects/ directory:

-- Could NOT find test-drive (missing: test-drive_DIR)
-- Retrieving test-drive from https://github.com/fortran-lang/test-drive

yeah... that is on tblite's side unfortnuately; it removed the option to turn off tests, so tests will always be build there. I don't want to build it from a modified fork though.

@pprcht

pprcht commented Jun 12, 2026

Copy link
Copy Markdown
Contributor Author

There are also one or two other feature I'm scoping out at the moment which I'd like to be in 3.1.0; holding off with the merge until (hopefully) end of next week.

@Growl1234

Copy link
Copy Markdown

yeah... that is on tblite's side unfortnuately; it removed the option to turn off tests, so tests will always be build there. I don't want to build it from a modified fork though.

I see tblite has WITH_TESTS option: https://github.com/tblite/tblite/blob/5dc1c97858eb585111affc6627783800bafb87cc/config/CMakeLists.txt#L24

So I have doubt if it is really tblite's issue.

Further testings show that toml-f use a more general option name, BUILD_TESTING, rather than WITH_TESTS, so maybe this is the reason... Maybe this can be fixed in Findtoml-f.cmake?

@Growl1234

Growl1234 commented Jun 12, 2026

Copy link
Copy Markdown

Further testings show that toml-f use a more general option name, BUILD_TESTING, rather than WITH_TESTS, so maybe this is the reason... Maybe this can be fixed in Findtoml-f.cmake?

I've found a simple way to fix this condition and will soon send a PR to the corresponding branch.

Edit: pprcht#2

@Growl1234

Copy link
Copy Markdown

For tblite version 0.6.0 there won't be such an issue because a symlinking behavior is added and performed in the release tarball (tblite/tblite#316). So, tblite-0.6.0 is still not able to switch off BUILD_TESTING for toml-f, but an online download with existing subprojects won't be triggered.

@pprcht

pprcht commented Jun 13, 2026

Copy link
Copy Markdown
Contributor Author

So the issue seems to be slightly different actually; the CMake import of the test-drive subproject was also coupled to the WITH_TESTS arg, I changed it accordingly so that test-drive is required even if no tests are build. In fact, some other subprojects do not guard their testsuite with any WITH_TESTS or BUILD_TESTING, so it would likely still persist when fixing it for tblite and toml-f.
The meson build seems a bit more robust w.r.t. all this because all subprojects share the top-level subproject dir.

One more thing, while updating tblite to 0.6.0 I noticed that it now imports the ddX project. I'm promoting that to a CREST subproject as well and am currently cleaning up before pushing...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants