Update SMP scripts#84
Merged
Merged
Conversation
After a successful download, the list of files is of type string. When checking for existence of those files locally, the pathlib returns Path objects. This makes the return type identical by casting them again as strings.
Add the measurement ID as key to the site name so we are creating one location per measurement in the sites table.
Add an in memory lookup cache attribute to the base class to prevent repeated DB selects per inserted data row. This has a big impact when inserting layer data of the same measurement type such as the SMP with +100K records of the same type.
…call Add all layer row entries in one transaction and commit once at the end. This speeds up the import. Also add an expunge to reduce memory footprint after a profile has been uploaded in the DB session.
Improve the lookup cache by storing a bare bones object with key and primary ID. This also changes the strategy to use a bulk add and commit via a configurable batch size. None has been set for points, but layers use 100K.
aaarendt
approved these changes
Oct 21, 2025
aaarendt
left a comment
Contributor
There was a problem hiding this comment.
Confirming I tested this locally and it works as expected.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #42
This imports all the SMP data and does not subset as it used to.
To speed up the imports I added/changed two things:
BaseUploadclass that holds DB objects of metadata locally instead of getting it over and over again per upload. This was a bottleneck when adding one SMP file, which holds more than 100K records.session.commit()for each. This was another performance boost.With the two changes, one SMP file now uploads in little more than a minute where it took around 5 before.
Dependencies
Needs PR M3Works/insitupy#32