Skip to content

lehmannfa/HEMEW3D

Repository files navigation

This repository contains the code to plot materials and velocity fields downloaded from the HEMEW-3D repository Recherche Data Gouv. Additional data can also be created following the notebook Create_materials.ipynb.

This repository also allows to train 4 neural operators using the HEMEW-3D dataset: Fourier Neural Operator (FNO), U-shaped Neural Operator (U-NO), Group-equivariant Fourier Neural Operator (G-FNO), Factorized Fourier Neural Operator (F-FNO). Pre-processing, training, and post-processing are described below.

Two versions of the HEMEW-3D dataset exist. They are based on the same ideas and the same simulation code but the second version provides a more general framework. The main difference between the two versions lies in the fact that the first version (HEMEW-3D) contains velocity fields generated by a fixed point-wise source while the second version (HEMEW^S-3D) was generated with a point-wise source with random position and orientation. Both versions are described in more detail below.

The HEMEW-3D and HEMEW^S-3D datasets contain 30,000 simulation results of the 3D elastic wave equation. Results have been obtained with the earthquake simulator SEM3D. This equation governs the propagation of waves in a 3D propagation medium (also called material in the following). Two types of data are given in the datasets.

Version Version 1 (HEMEW-3D) Version 2 (HEMEW^S-3D)
Link to data (https://entrepot.recherche.data.gouv.fr/dataset.xhtml?persistentId=doi:10.57745/LAI6YU&version=1.0) https://entrepot.recherche.data.gouv.fr/dataset.xhtml?persistentId=doi:10.57745/LAI6YU&version=2.1
Documentation HEMEW-3D.pdf HEMEW^S-3D.pdf

1. Materials dataset

The first type of data is the collection of 30,000 materials. The materials are 3D domains built from non-stationary random fields. Their size is 32 x 32 x 32 points. Physically, they correspond to a domain of length 9600m with 300m spacing between two points. Materials contain the values of shear-wave velocity. The minimum value is 1071m/s and the maximum is 4500m/s. All materials contain an 1800m-thick bottom layer with a constant velocity of 4500m/s.

Practical use

Materials are provided as .npy arrays, readable with python: a = np.load(‘materials0-1999.npy’) Each file contains 2000 materials. Therefore, a is of shape (2000, 32, 32, 32). Indices correspond to the material index, the x coordinate (from West to East), the y coordinate (from South to North), and the z coordinate (from bottom to top). The 15 materials files amount to 3.9GB. They are downloadable individually on Recherche Data Gouv. After download, they should be place in the data folder.
Note: Although the principles are similar to design materials in both versions of the dataset, a new set of 30,000 materials was designed for the second version. Therefore, one must not use materials of the first version to predict the velocity fields of the second version.

For version 1: Metadata are given in the data folder. They contain the minimum, mean, maximum and standard deviation of each material. For version 2: Metadata are given in the metadata_materials.tab file. They contain the mean velocity and thickness of each layer, as well as the properties of the random fields (coefficient of variation and correlation lengths).

2. Velocity fields dataset

The second type of data is the collection of surface velocity fields. They have been generated by solving the 3D elastic wave equation with the high-performance computing code SEM3D based on the Spectral Element Method (https://github.com/sem3d/SEM). To each material described above corresponds one velocity field, obtained by the propagation of waves through this material.

Computational details: The computational mesh was designed with elements of size 300m and 7 Gauss-Lobato-Legendre quadrature points. It can accurately represent the propagation of waves up to 5Hz frequency.

Version Version 1 (HEMEW-3D) Version 2 (HEMEW^S-3D)
Virtual sensors Velocity fields were recorded by a grid of 16 x 16 virtual sensors located at the surface of the propagation domain between 150m and 9450m (620m between consecutive sensors). Velocity fields were recorded by a grid of 32 x 32 virtual sensors located at the surface of the propagation domain between 150m and 9450m (300m between consecutive sensors).
Temporal sampling Each sensor records the 3-component velocity with a 100Hz sampling. Each sensor records the 3-component velocity with a 100Hz sampling.
Duration 20 seconds 8 seconds (enough to contain significant ground motion)
Source position Waves were generated by a point-wise source placed at the bottom of the domain, inside the constant layer (the position of the source is 4800, 4800, -8400m). Waves were generated by a point-wise source located randomly inside the domain, between 1200m and 8400m on the $x$ and $y$ axes, between -9000m and -600m on the $z$ axis. Described in source_properties.tab
Source orientation The seismic source is described by a moment tensor with fixed orientation (strike = 48°, dip = 45°, and rake = 88°) and amplitude (moment magnitude M0=2.47 · 10^16 N.m). The seismic source is described by a moment tensor with a random orientation: strike between 0° and 360°, dip between 0° and 90°, and rake between 0° and 360°. Described in source_properties.tab

Practical use for version 1

Results are given in .feather dataframes, readable with pandas library in Python: v = pd.read_feather(‘velocity0-99.feather’). Each dataframe contains 100 simulation results. Each row of the dataframe has the following format:

run field x y z 0.0 0.01 0.02 19.98 19.99
12 Veloc E 150.0 770.0 -1.0 0 0 … 1.1e-5 1.0e-5

where run indicates the index of the material used in this simulation, field indicates the component of the velocity field (Veloc E for East-West, Veloc N for North-South, Veloc Z for Vertical). x, y, z are the coordinates of the sensor (in meters). The next 2000 columns contain the velocity field for times 0, 0.01, …, 19.99.

The 300 velocity fields files amount to 369.9 GB. They are downloadable individually (1.2 GB per file) on Recherche Data Gouv. A batch of velocity fields corresponding to 10 materials is given in the data folder for illustration purposes.

Metadata are given in the data folder. They contain the first wave arrival time at the surface, the minimum, mean, and maximum Peak Ground Velocity.

Practical use for version 2

Results are given in individual .h5 files. Each file contains three keys: uE, uN, uZ corresponding to the three components of ground motion (East-West, North-South, Vertical). Each velocity field is of shape $32 \times 32 \times 800$ where the first index corresponds to the $y$ axis, the second index to the $x$ axis, and the third index to the temporal axis. Files are gathered in .zip archives containing 100 simulation results. The 300 .zip files amount to 263.4GB. They are downloadable individually (0.87GB per file) on Recherche Data Gouv.

3. Preprocessing data for machine learning applications

Due to the large size of the database, it cannot be entirely loaded on CPUs or GPUs. Therefore, the preprocessing step consists in writing individual files sample_i.h5 that contain the material a and the three components of the velocity fields uE, uN, and uZ.

To reduce the computational time of machine learning applications, velocity fields are downsampled from 100 Hz to 50 Hz. For version 1: velocity fields are restricted to the time interval [1; 7.4s] (leading to 320 time steps). They are also spatially interpolated from 16 x 16 sensors to 32 x 32 to match the input dimensions. To create the inputs, run python3 create_data_materials.py @Ntrain 27000 @Nval 3000 and then python3 create_data_velocityfields.py @Ntrain 27000 @Nval 3000 @interpolate.

For version2: velocity fields are restricted to the time interval [0; 6.4s] (leading to 320 time steps). To create the inputs, run python3 create_data_materials.py @Ntrain 27000 @Nval 3000 and then python3 create_data_velocityfields_v2.py @Ntrain 27000 @Nval 3000.

4. Training the models

Models FNO, U-NO, G-FNO, and F-FNO can be trained with the default options by running models.train_fno3d.py, models.train_uno3d.py, models.train_gfno3d.py, and models.train_ffno3d.py. The provided code supports CPU and single-GPU training.

5. Post-processing

To print loss history, model predictions under the form of timeseries of snapshots, use the notebook Neural_Operators_Predictions.ipynb. For detailled metrics of the neural operators performances, the notebook Intensity_Measures.ipynb computes Root Mean Squared Error (RMSE), Peak Ground Velocity (PGV), Cumulative Absolute Velocity (CAV), Relative Significant Duration (RSD), and Fourier coefficients in three frequency ranges.

How to cite?

If you use the HEMEW-3D or HEMEW^S-3D database, please cite

@data{LAI6YU_2023,
author = {Lehmann, Fanny},
publisher = {Recherche Data Gouv},
title = {{Physics-based Simulations of 3D Wave Propagation with Source Variability: HEMEW^S-3D}},
UNF = {UNF:6:8PCZ8VjJPJx1izlFPoWW2g==},
year = {2023},
version = {V2},
doi = {10.57745/LAI6YU},
url = {https://doi.org/10.57745/LAI6YU}
}

and

@article{essd-2023-470,
author = {Lehmann, F. and Gatti, F. and Bertin, M. and Clouteau, D.},
title = {Synthetic ground motions in heterogeneous geologies: the HEMEW-3D dataset for scientific machine learning},
journal = {Earth System Science Data Discussions},
volume = {2024},
year = {2024},
pages = {1--26},
url = {https://essd.copernicus.org/preprints/essd-2023-470/},
doi = {10.5194/essd-2023-470}
}

If you use neural operators from this work, please cite

@article{LEHMANN2024116718,
title = {3D elastic wave propagation with a Factorized Fourier Neural Operator (F-FNO)},
journal = {Computer Methods in Applied Mechanics and Engineering},
volume = {420},
pages = {116718},
year = {2024},
issn = {0045-7825},
doi = {https://doi.org/10.1016/j.cma.2023.116718},
url = {https://www.sciencedirect.com/science/article/pii/S0045782523008411},
author = {Fanny Lehmann and Filippo Gatti and Michaël Bertin and Didier Clouteau},
}

About

Code to run 3D simulations of the elastic wave equation and predict outputs with 3D neural operators

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors