Version 3.1 Refactor#483
Conversation
|
Really excited to see the refactor and comming version update! That said, there's a small issue: with CMake flag |
yeah... that is on |
|
There are also one or two other feature I'm scoping out at the moment which I'd like to be in 3.1.0; holding off with the merge until (hopefully) end of next week. |
I see tblite has So I have doubt if it is really tblite's issue. Further testings show that |
I've found a simple way to fix this condition and will soon send a PR to the corresponding branch. Edit: pprcht#2 |
|
For tblite version 0.6.0 there won't be such an issue because a symlinking behavior is added and performed in the release tarball (tblite/tblite#316). So, tblite-0.6.0 is still not able to switch off |
|
So the issue seems to be slightly different actually; the CMake import of the test-drive subproject was also coupled to the One more thing, while updating tblite to 0.6.0 I noticed that it now imports the ddX project. I'm promoting that to a CREST subproject as well and am currently cleaning up before pushing... |
… hbond, all selectable via [[calculation.level]] as proxy solvation bolt-on to arbitrary calculators
Major code refactor into Version 3.1.0
Changes collected over the past year.
Some important additions:
1. g-xTB via tblite (see #430)
The g-xTB method (a GFN-family method, https://chemrxiv.org/doi/10.26434/chemrxiv-2025-bjxvt) can now be invoked through the provided
xtbbinary or the tblite library in addition to the existing xtb system-call path.Key changes in
tblite_api.F90:xtblvl%gxtbmapped tonew_gxtb_calculator.Warning
The default build only supports the static
xtbbinary with g-xTB (retrievable here: https://github.com/grimme-lab/g-xtb). This is SLOW due to the systemcall infrastructure. The tblite build with g-xTB is already prepared and will speed this up, but will only be made the default once g-xTB releases there.Build flag for the latter:
-DWITH_GXTB=ON(requiresWITH_TBLITE). A compile-time constanthave_gxtblets the parser emit a helpful error when the method is requested but not compiled in.A STATIC CREST BINARY WITH TBLITE/G-XTB IS AVAILABLE UPON REQUEST.
2. Permutation-Invariant RMSD (iRMSD) (https://doi.org/10.1021/acs.jcim.4c02143)
A new
irmsd_module(src/sorting/irmsd_module.f90) replaces the old fixed-order RMSD with an algorithm that finds the optimal atom permutation before computing the distance. Two assignment solvers are provided: a classic Hungarian algorithm and a Linear Sum Assignment Problem (LSAP) alternative. Atom ranks are derived from element identity and local connectivity so that the search only permutes chemically equivalent atoms, keeping the cost manageable.A cache type (
rmsd_cache,rmsd_core_cache) avoids repeated heap allocations in tight OpenMP loops. Optional topology-proxy checks can enforce stereochemical constraints during the permutation search.Practical impact:
cregen_irmsd_sort) as an additional conformer-uniqueness criterion--sort irmsdrun-type is exposed through the CLI and writes aligned structures toirmsd.xyz.--inversionflag controls whether mirror images are treated as identical.3. Machine Learning Potential (MLIP) Interface via fmlip-relay
A new calculator back-end (
src/calculator/mlip_sc.F90, modulemlip_sc) calls Python-based MLIPs through fmlip-relay , a persistent relay server that keeps the Python interpreter alive between single-point calls, avoiding repeated interpreter start-up overhead.Supported backends are whatever fmlip-relay exposes (currently MACE-OFF, UMA, and compatible ASE-wrapped models). Multiple relay server instances can run on different ports for parallel execution, with a configurable thread limit per instance.
Build flag:
-DWITH_FMLIP_RELAY=ON(CMake) /-Dfmlip_relay=enabled(Meson). The code compiles cleanly without the dependency; all MLIP paths are guarded by#ifdef WITH_FMLIP_RELAY.TOML keywords under
[[calculation.level]]:or with the CLI
Important
This utility requires a local install of the
fmlip-relay-serverwhich is accessible in CREST's subprojects and can be done with apip install -e ./subprojects/fmlip-relayThe persistent python socket produces some overhead and may oversubscribe in parallel MD/OPT/etc loops. Therefore the recommended use case is as
--refineoption to re-rank force-field optimized geometries with MLIP singlepoint energies.5. Spin-Polarized Calculations (see #480)
GFN2-xTB and related methods can now be run in open-shell (spin-polarized) mode via tblite.
spin_polarizedfield oncalculation_settings; set via TOML keywordspin_polarized = trueor throughuhf > 0.spin_polarizationobject and callsget_spin_constantsfrom the tblite library when this flag is active.nspinis set to 2 instead of 1, which activates the spin-unrestricted solver path in tblite.test/test_tblite.F90) validates energy and gradient for an open-shell test molecule.6. External Electric Field for tblite (see #449)
A static external electric field can be applied to tblite single-points and optimizations.
TOML:
or via the CLI:
Handled by the new public routine
tblite_add_efieldintblite_api.F90. The field is converted to atomic units before being passed to the tblite context.7. Extended XYZ (extxyz) Format Support
A complete reader/writer for the extended XYZ (ASE-compatible) format is added to the molecule I/O layer (
src/molecule/io.f90):read_extxyz_frameparses per-atom properties and key-value pairs from the comment line (energy, forces, lattice, …).mol%write()polymorphic dispatch now routes.extxyzfile extensions to the extended writer.energy_unitskey in the extxyztype_ensemble) now accept extxyz files for batch input to conformer workflows.crestopt.log.xyzcan be written in extxyz format, enabling direct round-trip with ASE or MACE pipelines.8. CREGEN Refactor and Improved Output
The conformer ranking/filtering code underwent a substantial internal reorganization:
src/sorting/as a proper module directory (cregen.f90,cregen_utils.f90,cregen_interfaces.f90,ensemblecomp.f90, etc.).newcregendriver logic is simplified; the old monolithiccregen_old.f90is retained only as a legacy shim.9. Restart / Checkpoint System (Redesigned) (see, e.g. #440)
The previous restart system stored full ensemble snapshots in memory, which was fragile and memory-intensive. The new implementation (
src/restartlog.f90) is a lightweight file-based checkpoint:restart_datatype records only the stage name (a short string like'mtd_loop','post_collect','entropy_smtd') and the last written file path.crest.restart(a plain text file) at each stage boundary.--restart, CREST reads the checkpoint, prints what was completed, and skips the already-finished stages, re-using the last dumped ensemble file.search_conformers.f90) and the entropy search (search_entropy.f90).10. Symmetry Detection: C → Fortran Port
The symmetry detection back-end (
symmetry_i), originally Patchkovskii's C code from 1996/2003, has been ported to a native Fortran module (src/symmetry_i.f90, ~1800 lines). The C source wrote to shared memory, making OpenMP parallel execution of the symmetry detection impossible.The Fortran module exposes
schoenfliesandgetsymwith an identical interface to the C version but benefits from Fortran's dynamic allocation and avoids the need for a C-to-Fortran bridge in the calling code.A dedicated test (
test/test_getsym.F90) verifies the full set of common point groups (C₁, Cₛ, Cᵢ, Cₙ, Cₙᵥ, Cₙₕ, Dₙ, Dₙₕ, Dₙd, Sₙ, T, Tₕ, Td, O, Oₕ, I, Iₕ).11. Thermochemistry Enhancements
Truhlar quasi-RRHO treatment
The Truhlar (2011) frequency cutoff model is now available as an alternative to the Grimme quasi-RRHO approach. Low-frequency modes below
sthr(cm⁻¹) are treated as free rotors whose entropy contribution is replaced by that of a rotor at the cutoff frequency, avoiding the divergence of the harmonic entropy at zero frequency. Selectable via TOML keywordemodel.ORCA Hessian reader
A new reader (
rdfreq_orca_hess) parses ORCA.hessfiles directly, extracting frequencies and optionally mass-weighted Hessian data for use in the thermochemistry workflow. This supplements the existing ORCA IR-spectrum reader.Parallel ΔG calculation
The per-conformer free energy correction (needed in refinement workflows) is now computed in parallel OpenMP sections rather than serially (see above point about symmetry detection)
12. Hybrid Method CLI Syntax
A new parser module (
src/parsing/parse_hybrid.f90) recognizes composite method strings on the command line:A@BA//BA/sp/BA/opt/BExample:
crest mol.xyz --gfn2@gfnff --imtdgcruns the MD/MTD at GFN-FF speed and refines final structures with GFN2-xTB, likewisecrest mol.xyz --gxtb//gfnff --imtdgcruns the sampling at GFN-FF and during the sampling re-ranks all structures with g-xTB singlepoints.Method tokens are validated against the known method list; an unknown token triggers an early, informative error rather than a silent mis-configuration.
13. CLI and TOML Keyword Additions
--freeze/freeze--inversion--itmdgc--v3/ iMTD-GC run mode--allowrestart--cregen/--sort--sort irmsdinvokes the iRMSD-aware sorterspin_polarized(TOML)efield(TOML)ceh_guess(TOML/CLI)gxtb/g-xtb(TOML)The CLI argument parser was refactored to use a
processedargboolean array, preventing double-processing of arguments and giving cleaner error messages for unrecognized flags.14. QCG (Quantum Cluster Growth) Refactor
The QCG solvation tool was refactored to use the internal calculator layer:
zmoleculetype is replaced by a new polymorphiccoord_qcgtype that extends the standardcoord.xtb_sp_qcgandxtb_opt_qcgnow dispatch through the standardengradcalculator interface rather than writing xtb input files by hand.xtbiffQCG implementation is removed.qcg_main.f90,qcg_utils.f90,qcg_misc.f90,qcg_printouts.f90, andqcg_coord_type.f90.qcg_freq) are now wired through the same thermochem module used by the rest of CREST.Warning
QCG still requires
xtb(>= version 6.7.1) for access to the aISS method.15. MD / Thermostat Extensions
Two new thermostat algorithms are added to the MD module (
src/dynamics/dynamics_module.f90) alongside the existing Berendsen thermostat:The thermostat type is selected by integer index (
thermotype_i): 1 = Berendsen, 2 = Langevin, 3 = BDP/CSVR. The velocity Verlet integrator was also cleaned up to ensure correct ordering of force, velocity, and thermostat steps.16. Peak Memory Reporting
A new C helper (
src/chelpers/mempeak.c) provides a cross-platformget_peak_rss_kb()function that returns the process peak resident set size in KB. Implemented viagetrusageon Linux/macOS (with the macOS bytes→KB conversion). Called at program end to include peak RAM usage in the final timing/resource printout.NOTE: The printout may be slightly flawed for fully static CREST builds.
17. Output File Naming
Two output files are renamed to carry explicit file extensions:
crestopt.logcrestopt.log.xyzcrest_dynamics.trjcrest_dynamics.trj.xyzThis makes the files immediately recognizable to external viewers and conforms with the convention established for all other trajectory/ensemble outputs.
18. Build System Updates
Meson build system
meson_options.txtis restructured with documented options for OpenMP, LAPACK/BLAS provider (auto,openblas,mkl,netlib,custom), and optional feature flags.intel-llvm.ininative file is provided for reproducible Intel oneAPI builds.CMake
Find*.cmakemodules for all tblite-stack dependencies:mctc-lib,mstore,multicharge,s-dftd3,test-drive.WITH_GXTB(requiresWITH_TBLITE),WITH_FMLIP_RELAY.New git submodules (tblite dependency chain)
mctc-lib,mstore,multicharge,s-dftd3,test-driveare added as submodules so tblite can be built from source.Updated submodule versions
tblitemultichargegfnffdftd4toml-fImportant
If you update and build the CREST source locally, do not forget to update the submodules accordingly! The recommended way is to not let meson/CMake handle that on their own but to use the
gt submodulemachinery instead. However, all should point to the same specific commits.19. Post-Search Ensemble Re-ranking and Re-optimization
Two new standalone post-processing flags allow applying a higher-level method to a finished conformer ensemble without re-running the full search:
--rerank <method>, recomputes single-point energies at the specified level and re-sorts the ensemble by the new energies (p_prop_multilevel+2). The level string follows the same syntax as--method.--reopt <method>, re-optimizes every conformer at the specified level and then re-sorts (p_prop_multilevel+3).Both flags add a job to the property queue (
env%addjob) so they compose cleanly with other post-search steps (e.g.--finalhess).20. Ensemble Hessian and Final Hessian Flags
Two new run-mode / property flags provide Hessian-based thermochemistry over conformer ensembles:
--ensemblehess <file>(alias--mdhess), sets the run type tocrest_ensemblehess, reads the specified ensemble file, and computes a numerical Hessian plus thermochemistry for each conformer. The lowest-energy structure is extracted as the reference geometry.--finalhess, adds a Hessian + free-energy re-ranking step (p_prop_finalhess) to the property queue, intended as a post-conformer-search refinement: each conformer's Boltzmann weight is updated using proper free energies rather than just electronic energies.21. Protonate / Deprotonate / Tautomerize: New Protocol Active by Default
The redesigned protonation, deprotonation, and tautomerization protocols (previously gated behind an experimental
env%legacy = .true.flag) are now the default. The legacy code path is no longer invoked for--protonate,--deprotonate, and--tautomerize.22. Revision of the shipped examples
Examples in
examples/expl-<#>/now include:fmlip-relayinstall)fmlip-relayanduma/faircheminstall)fmlip-relayandmaceinstall)