From 61b7e066f149e6b3c6ac8d6d61304f0f74c4243e Mon Sep 17 00:00:00 2001 From: Roniel Valdez Date: Tue, 18 Mar 2025 20:35:27 +0100 Subject: [PATCH 01/15] docs: initial introduction of the store package and possible next sections --- store/README.md | 41 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 41 insertions(+) create mode 100644 store/README.md diff --git a/store/README.md b/store/README.md new file mode 100644 index 0000000000..24fb4d4520 --- /dev/null +++ b/store/README.md @@ -0,0 +1,41 @@ +# Package store + +[![Go Reference](https://pkg.go.dev/badge/github.com/canopy/canopy-network.svg)](https://pkg.go.dev/github.com/canopy-network/canopy/store) +[![License](https://img.shields.io/github/license/canopy-network/canopy)](https://github.com/canopy/canopy-network/blob/main/LICENSE) + +The `store` package implements the storage layer for the Canopy blockchain, leveraging **[BadgerDB](https://github.com/hypermodeinc/badger)** as its underlying key-value store. It provides structured abstractions for nested transactions, state management, and indexed blockchain data storage. Below is a high-level breakdown of its components: + +### **Core Components** + +1. **`Txn`**: + The foundational transactional layer. This implements **nested transactions** on top of BadgerDB, enabling atomic operations and rollbacks for complex storage workflows. + +2. **`TxnWrapper` & `SMT`**: + - **`TxnWrapper`**: Wraps BadgerDB to conform to the `RWStoreI` interface, providing a simple read/write abstraction with transaction support. + - **`SMT`**: An optimized Sparse Merkle Tree implementation backed by BadgerDB. It adheres to the `RWStoreI` interface and enables efficient cryptographic commitment to state data and proof of membership/non-membership of the keys. + +3. **`Indexer`**: + Built on `TxnWrapper`, this component organizes blockchain data (blocks, transactions, addresses, etc.) using **prefix-based keys**. This design allows efficient iteration and querying of domain-specific data (e.g., "all transactions in block X"). + +4. **`Store`**: + The top-level struct coordinating the storage layer: + - **`Indexer`**: Manages indexed blockchain operations. + - **`StateStore`**: Stores raw blockchain state data (blobs) using `TxnWrapper`. + - **`StateCommitStore`**: Uses the `SMT` implementation to cryptographically commit hashes of `StateStore` data into the Sparse Merkle Tree, ensuring tamper-evident state verification. + +### **Key Interactions** +- **Transactions**: `Txn` provides atomicity for operations across BadgerDB. +- **State Management**: `StateStore` (raw data) and `StateCommitStore` (hashes in SMT) work in tandem to balance performance with cryptographic integrity. +- **Querying**: The `Indexer`’s prefix-based structure enables fast, type-specific data retrieval. + +This layered design decouples storage concerns while ensuring compatibility with BadgerDB’s performance characteristics and the blockchain’s integrity requirements. + +## BadgerDB simple introduction and its usage on TxnWrapper + +## Txn ad hoc nested transactions implementation + +## Canopy's Sparse Merkle Tree Implementation + +## Indexer operations and prefix usage to optimize iterations + +## Store struct and how it adds up all together From d02f3a1ae8d4a8dae96fafcbf2bc06148c2e48d0 Mon Sep 17 00:00:00 2001 From: Roniel Valdez Date: Thu, 20 Mar 2025 19:22:33 +0100 Subject: [PATCH 02/15] docs: add a CONTRIBUTING.md --- CONTRIBUTING.md | 191 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 191 insertions(+) create mode 100644 CONTRIBUTING.md diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md new file mode 100644 index 0000000000..11031de24c --- /dev/null +++ b/CONTRIBUTING.md @@ -0,0 +1,191 @@ +# Contributing Guidelines + +*Pull requests, bug reports, and all other forms of contribution are welcomed and highly encouraged!* :octocat: + +*Remember you can always find us on our [Discord server](https://discord.com/invite/pNcSJj7Wdh).* + +## :inbox_tray: Opening an Issue + +Before +[creating an issue](https://help.github.com/en/github/managing-your-work-on-github/creating-an-issue), +check if you are using the latest version of the project. If you are not up-to-date, see if updating +fixes your issue first. + +### :beetle: Bug Reports and Other Issues + +A great way to contribute to the project is to send a detailed issue when you encounter a problem. +We always appreciate a well-written, thorough bug report. :v: + +In short, since you are most likely a developer, **provide a ticket that you would like to +receive**. + +- **Review the [documentation](TODO) before opening a new issue. + +- **Do not open a duplicate issue.** Search through existing issues to see if your issue has + previously been reported. If your issue exists, comment with any additional information you have. + You may simply note "I have this problem too", which helps prioritize the most common problems and + requests. + +- **Prefer using + [reactions](https://github.blog/2016-03-10-add-reactions-to-pull-requests-issues-and-comments/)**, + not comments, if you simply want to "+1" an existing issue. + +- **Fully complete the provided issue template.** The bug report template requests all the + information we need to quickly and efficiently address your issue. Be clear, concise, and + descriptive. Provide as much information as you can, including steps to reproduce, stack traces, + compiler errors, library versions, OS versions, and screenshots (if applicable). + +- **Use + [GitHub-flavored Markdown](https://help.github.com/en/github/writing-on-github/basic-writing-and-formatting-syntax).** + Especially put code blocks and console outputs in backticks (```). This improves readability. + +### :lock: Reporting Security Issues + +**Do not** file a public issue for security vulnerabilities, message the maintainers privately +first. + +## :repeat: Submitting Pull Requests + +We **love** pull requests! Before +[forking the repo](https://help.github.com/en/github/getting-started-with-github/fork-a-repo) and +[creating a pull request](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/proposing-changes-to-your-work-with-pull-requests) +for non-trivial changes, it is usually best to first open an issue to discuss the changes, or +discuss your intended approach for solving the problem in the comments for an existing issue. + +*Note: All contributions will be licensed under the project's license.* + +- **Smaller is better.** Submit **one** pull request per bug fix or feature. A pull request should + contain isolated changes pertaining to a single bug fix or feature implementation. **Do not** + refactor or reformat code that is unrelated to your change. It is better to **submit many small + pull requests** rather than a single large one. Enormous pull requests will take enormous amounts + of time to review, or may be rejected altogether. + +- **Coordinate bigger changes.** For large and non-trivial changes, open an issue to discuss a + strategy with the maintainers. Or better yet, contact us directly on our + [Discord](https://discord.gg/your-discord-link). Otherwise, you risk doing a lot of work for + nothing! + +- **Prioritize understanding over cleverness.** Write code clearly and concisely. Remember that + source code usually gets written once and read often. Ensure the code is clear to the reader. The + purpose and logic should be obvious to a reasonably skilled developer, otherwise you should add a + comment that explains it. + +- **Follow existing coding style and conventions.** Keep your code consistent with the style, + formatting, and conventions in the rest of the code base. When possible, these will be enforced + with a linter. Consistency makes it easier to review and modify in the future. + +- **Include test coverage.** Add unit tests or UI tests when possible. Follow existing patterns for + implementing tests. + +- **Update the example project** if one exists to exercise any new functionality you have added. + +- **Add documentation.** Document your changes with code doc comments or in existing guides. + +- **Update the CHANGELOG** for all enhancements and bug fixes. Include the corresponding date, issue + number if one exists and current version if any. Check the format of the [CHANGELOG](.docs/CHANGELOG.md). + +- **Use the repo's default main branch.** Branch from and + [submit your pull request](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/creating-a-pull-request-from-a-fork) + to the repo's default branch `main`. + +- + **[Resolve any merge conflicts](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/resolving-a-merge-conflict-on-github)** + that occur. + +- **Promptly address any CI failures**. If your pull request fails to build or pass tests, please + push another commit to fix it. + +- When writing comments, use properly constructed sentences, including punctuation and **always** aim + to add at least one comment per line of code, no matter how simple the statement is as we're focused + on allowing the code to be easily understood by the most amount of developers possible. + +### :nail_care: Coding Style + +Consistency is the most important. Following the existing style, formatting, and naming conventions +of the file you are modifying and of the overall project. Failure to do so will result in a +prolonged review process that has to focus on updating the superficial aspects of your code, rather +than improving its functionality and performance. + +Key requirements: + +- **Use gofmt**: All Go code must be formatted using the standard `gofmt` tool. This ensures + consistent formatting across the entire codebase. Run `gofmt -w .` before submitting your code. + +- **Style Guide**: Follow the [Google Go Style Guide](https://google.github.io/styleguide/go/) for + official recommendations on code structure and formatting. Additionally, review the project's + existing code patterns to ensure your contributions maintain consistency. + +- **Documentation**: Follow the official Go commentary guidelines: + - Every exported type, function, and variable must have a doc comment + - Comments should be complete sentences, starting with the item's name + - Example: `// Transaction represents a single blockchain transaction.` + +- **Branch Strategy**: All pull requests should be: + - Based on the `development` branch + - Opened against the `development` branch + - Merged into `development` first before being promoted to `main` + +- **EditorConfig Support**: We recommend using EditorConfig to maintain consistent coding styles. + The project includes an `.editorconfig` file that defines common formatting rules. + +- **Frontend Formatting**: Our Wallet and Explorer projects use `prettier` for consistent code + formatting. Either run `npm run prettier` before submitting your PR or configure your editor to + format on save using the project's `.prettierrc` settings. + +If in doubt about any styling decisions, feel free to ask in the project's [Discord server](https://discord.com/invite/pNcSJj7Wdh). + +### ✍️ Writing Package README.md Files + +Each high-level package in the project must include a `README.md file` that explains its purpose, +functionality, and usage. This ensures that contributors and users can easily understand the role of +each package in the project. Follow these markdown guidelines to write effective and consistent +`README.md` files which are loosely based on the +[Microsoft's markdown best practices](https://learn.microsoft.com/en-us/powershell/scripting/community/contributing/general-markdown): + +#### Blank Lines and Spacing + +- Insert a single blank line between different Markdown blocks (e.g., between a paragraph and a list or header). +- Avoid multiple consecutive blank lines; they render as a single blank line in HTML. +- Within code blocks, consecutive blank lines can break the block. +- Remove trailing spaces at the end of lines, as they can affect rendering. +- Use spaces instead of tabs for indentation. + +#### Titles and Headings + +- Utilize ATX-style headings (`#` for H1, `##` for H2, etc.). +- Apply sentence case: capitalize only the first letter and proper nouns. +- Ensure a single space exists between the `#` and the heading text. +- Surround headings with a single blank line. +- Limit documents to one H1 heading. +- Increment heading levels sequentially without skipping (e.g., H2 should follow H1). +- Restrict heading depth to H3 or H4. +- Avoid using bold or other markup within headings. + +#### Line Length + +- Limit lines to 100 characters for conceptual articles and cmdlet references. +- For `about_` topics, restrict line length to 79 characters. +- This practice enhances readability and simplifies version control diffs. + +#### Emphasis + +- Use `**` for bold text. +- Use `_` for italicized text. +- Consistent use clarifies intent, especially when mixing bold and italics. + +#### Fenced Code Blocks + +- Use triple backticks (```) to denote code blocks. +- Specify the language immediately after the opening backticks for syntax highlighting. + +Example: +```go +func main() { + fmt.Println("hello world") +} +``` + +#### Image Guidelines + +- Provide descriptive alt text for accessibility. +- Ensure images are relevant and enhance the content. From a6f1e60d1ec35425f89a0f11c729b25922417477 Mon Sep 17 00:00:00 2001 From: Roniel Valdez Date: Wed, 26 Mar 2025 15:07:58 +0100 Subject: [PATCH 03/15] docs: document formatting --- CONTRIBUTING.md | 9 +++++---- store/README.md | 33 +++++++++++++++++++++++---------- 2 files changed, 28 insertions(+), 14 deletions(-) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 11031de24c..85de74592d 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -175,10 +175,11 @@ each package in the project. Follow these markdown guidelines to write effective #### Fenced Code Blocks -- Use triple backticks (```) to denote code blocks. -- Specify the language immediately after the opening backticks for syntax highlighting. +- Use triple backticks (```) to denote code blocks. +- Specify the language immediately after the opening backticks for syntax highlighting. Example: + ```go func main() { fmt.Println("hello world") @@ -187,5 +188,5 @@ func main() { #### Image Guidelines -- Provide descriptive alt text for accessibility. -- Ensure images are relevant and enhance the content. +- Provide descriptive alt text for accessibility. +- Ensure images are relevant and enhance the content. diff --git a/store/README.md b/store/README.md index 24fb4d4520..715ef86cc6 100644 --- a/store/README.md +++ b/store/README.md @@ -3,32 +3,45 @@ [![Go Reference](https://pkg.go.dev/badge/github.com/canopy/canopy-network.svg)](https://pkg.go.dev/github.com/canopy-network/canopy/store) [![License](https://img.shields.io/github/license/canopy-network/canopy)](https://github.com/canopy/canopy-network/blob/main/LICENSE) -The `store` package implements the storage layer for the Canopy blockchain, leveraging **[BadgerDB](https://github.com/hypermodeinc/badger)** as its underlying key-value store. It provides structured abstractions for nested transactions, state management, and indexed blockchain data storage. Below is a high-level breakdown of its components: +The `store` package implements the storage layer for the Canopy blockchain, leveraging +**[BadgerDB](https://github.com/hypermodeinc/badger)** as its underlying key-value store. It +provides structured abstractions for nested transactions, state management, and indexed blockchain +data storage. Below is a high-level breakdown of its components: -### **Core Components** +## **Core Components** 1. **`Txn`**: - The foundational transactional layer. This implements **nested transactions** on top of BadgerDB, enabling atomic operations and rollbacks for complex storage workflows. + The foundational transactional layer. This implements **nested transactions** on top of BadgerDB, + enabling atomic operations and rollbacks for complex storage workflows. 2. **`TxnWrapper` & `SMT`**: - - **`TxnWrapper`**: Wraps BadgerDB to conform to the `RWStoreI` interface, providing a simple read/write abstraction with transaction support. - - **`SMT`**: An optimized Sparse Merkle Tree implementation backed by BadgerDB. It adheres to the `RWStoreI` interface and enables efficient cryptographic commitment to state data and proof of membership/non-membership of the keys. + - **`TxnWrapper`**: Wraps BadgerDB to conform to the `RWStoreI` interface, providing a simple + read/write abstraction with transaction support. + - **`SMT`**: An optimized Sparse Merkle Tree implementation backed by BadgerDB. It adheres to the + `RWStoreI` interface and enables efficient cryptographic commitment to state data and proof of + membership/non-membership of the keys. 3. **`Indexer`**: - Built on `TxnWrapper`, this component organizes blockchain data (blocks, transactions, addresses, etc.) using **prefix-based keys**. This design allows efficient iteration and querying of domain-specific data (e.g., "all transactions in block X"). + Built on `TxnWrapper`, this component organizes blockchain data (blocks, transactions, addresses, + etc.) using **prefix-based keys**. This design allows efficient iteration and querying of + domain-specific data (e.g., "all transactions in block X"). 4. **`Store`**: The top-level struct coordinating the storage layer: - **`Indexer`**: Manages indexed blockchain operations. - **`StateStore`**: Stores raw blockchain state data (blobs) using `TxnWrapper`. - - **`StateCommitStore`**: Uses the `SMT` implementation to cryptographically commit hashes of `StateStore` data into the Sparse Merkle Tree, ensuring tamper-evident state verification. + - **`StateCommitStore`**: Uses the `SMT` implementation to cryptographically commit hashes of + `StateStore` data into the Sparse Merkle Tree, ensuring tamper-evident state verification. + +## **Key Interactions** -### **Key Interactions** - **Transactions**: `Txn` provides atomicity for operations across BadgerDB. -- **State Management**: `StateStore` (raw data) and `StateCommitStore` (hashes in SMT) work in tandem to balance performance with cryptographic integrity. +- **State Management**: `StateStore` (raw data) and `StateCommitStore` (hashes in SMT) work in + tandem to balance performance with cryptographic integrity. - **Querying**: The `Indexer`’s prefix-based structure enables fast, type-specific data retrieval. -This layered design decouples storage concerns while ensuring compatibility with BadgerDB’s performance characteristics and the blockchain’s integrity requirements. +This layered design decouples storage concerns while ensuring compatibility with BadgerDB’s +performance characteristics and the blockchain’s integrity requirements. ## BadgerDB simple introduction and its usage on TxnWrapper From 9cf2371210cfd0bb500f59338e9b4fe167626b21 Mon Sep 17 00:00:00 2001 From: Roniel Valdez Date: Wed, 26 Mar 2025 17:30:27 +0100 Subject: [PATCH 04/15] docs: BadgerDB and Txn --- store/README.md | 100 ++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 79 insertions(+), 21 deletions(-) diff --git a/store/README.md b/store/README.md index 715ef86cc6..417fc747fd 100644 --- a/store/README.md +++ b/store/README.md @@ -8,42 +8,100 @@ The `store` package implements the storage layer for the Canopy blockchain, leve provides structured abstractions for nested transactions, state management, and indexed blockchain data storage. Below is a high-level breakdown of its components: -## **Core Components** +## **Core Components** -1. **`Txn`**: - The foundational transactional layer. This implements **nested transactions** on top of BadgerDB, - enabling atomic operations and rollbacks for complex storage workflows. +1. **`Txn`**: The foundational transactional layer. This implements **nested transactions** on top + of BadgerDB, enabling atomic operations and rollbacks for complex storage workflows. -2. **`TxnWrapper` & `SMT`**: +2. **`TxnWrapper` & `SMT`**: - **`TxnWrapper`**: Wraps BadgerDB to conform to the `RWStoreI` interface, providing a simple - read/write abstraction with transaction support. + read/write abstraction with transaction support. - **`SMT`**: An optimized Sparse Merkle Tree implementation backed by BadgerDB. It adheres to the `RWStoreI` interface and enables efficient cryptographic commitment to state data and proof of membership/non-membership of the keys. -3. **`Indexer`**: - Built on `TxnWrapper`, this component organizes blockchain data (blocks, transactions, addresses, - etc.) using **prefix-based keys**. This design allows efficient iteration and querying of - domain-specific data (e.g., "all transactions in block X"). +3. **`Indexer`**: Built on `TxnWrapper`, this component organizes blockchain data (blocks, + transactions, addresses, etc.) using **prefix-based keys**. This design allows efficient + iteration and querying of domain-specific data (e.g., "all transactions in block X"). -4. **`Store`**: - The top-level struct coordinating the storage layer: - - **`Indexer`**: Manages indexed blockchain operations. - - **`StateStore`**: Stores raw blockchain state data (blobs) using `TxnWrapper`. +4. **`Store`**: The top-level struct coordinating the storage layer: + - **`Indexer`**: Manages indexed blockchain operations. + - **`StateStore`**: Stores raw blockchain state data (blobs) using `TxnWrapper`. - **`StateCommitStore`**: Uses the `SMT` implementation to cryptographically commit hashes of - `StateStore` data into the Sparse Merkle Tree, ensuring tamper-evident state verification. + `StateStore` data into the Sparse Merkle Tree, ensuring tamper-evident state verification. -## **Key Interactions** +## **Key Interactions** -- **Transactions**: `Txn` provides atomicity for operations across BadgerDB. +- **Transactions**: `Txn` provides atomicity for operations across BadgerDB. - **State Management**: `StateStore` (raw data) and `StateCommitStore` (hashes in SMT) work in - tandem to balance performance with cryptographic integrity. -- **Querying**: The `Indexer`’s prefix-based structure enables fast, type-specific data retrieval. + tandem to balance performance with cryptographic integrity. +- **Querying**: The `Indexer`’s prefix-based structure enables fast, type-specific data retrieval. This layered design decouples storage concerns while ensuring compatibility with BadgerDB’s -performance characteristics and the blockchain’s integrity requirements. +performance characteristics and the blockchain’s integrity requirements. -## BadgerDB simple introduction and its usage on TxnWrapper +## Understanding BadgerDB and Transaction Management in Canopy + +**[BadgerDB](https://github.com/hypermodeinc/badger)** is a fast, embeddable, persistent key-value +(KV) database written in pure Go. It's designed to be highly performant for both read and write +operations. + +### Key Features + +1. **LSM Tree-based**: Uses a Log-Structured Merge-Tree architecture, optimized for SSDs +2. **ACID Compliant**: Ensures data consistency through Atomicity, Consistency, Isolation, and + Durability +3. **Concurrent Access**: Supports multiple readers and a single writer simultaneously +4. **Key-Value Separation**: Stores keys and values separately to improve performance +5. **Transactions**: Native support for both read-only and read-write transactions +6. **Iteration**: Provides efficient iteration over key-value pairs that are byte-wise + lexicographically ordered + +### TxnWrapper + +BadgerDB's native transaction system provides atomic operations through its `Txn` type, supporting +both read-only and read-write transactions. Each transaction works with a consistent snapshot of the +database, ensuring data integrity during concurrent operations. + +The `TxnWrapper` in Canopy builds upon this foundation by providing a clean abstraction layer over +BadgerDB's transaction system. It serves two main purposes: + +1. **Interface Compliance**: Implements the `RWStoreI` interface, establishing a consistent contract + for all storage operations within Canopy. This standardization ensures that different components + can interact with the storage layer uniformly. + +2. **Transaction Management**: Encapsulates BadgerDB's transaction handling, supporting both + read-only and read-write operations. This wrapper simplifies transaction management by: + - Providing a cleaner API for common database operations + - Extending support of BadgerDB's iterator functionality + - Returning errors in a consistent manner based on the project's error handling conventions + +The main operations `TxnWrapper` is set to support according to the `RWStoreI` interface are: + +- `Get(key []byte) ([]byte, ErrorI)`: Retrieves the value associated with the given key. +- `Set(key, value []byte) ErrorI`: Sets the value for the given key. +- `Delete(key []byte) ErrorI`: Deletes the value associated with the given key. +- `Iterator(prefix []byte) (IteratorI, ErrorI)`: Creates an iterator over byte-wise + lexicographically sorted key-value pairs that start with the given prefix. +- `RevIterator(prefix []byte) (IteratorI, ErrorI)`: Creates a reverse iterator over byte-wise + lexicographically sorted key-value pairs that start with the given prefix. + +Note that all operations within `TxnWrapper` are neither committed nor rolled back directly, as +`TxnWrapper` operates within the broader transaction scope managed by the `Store` struct, which +handles the final commit or rollback decisions. These prefixes are only used internally and never +exposed to the user. + +#### Key prefixing + +All keys in `TxnWrapper` are automatically prefixed with a unique identifier (e.g., "s/" for state +store, "c/" for commitment store) to achieve two main purposes: + +1. **Data Isolation**: Each component (`StateStore`, `StateCommitStore`, `Indexer`) maintains its own + prefix-based namespace, preventing key collisions in the shared BadgerDB instance. + +2. **Efficient Iteration**: Since BadgerDB stores keys in lexicographical order, prefixes enable + efficient range queries within specific components. For example, iterating through all state + store entries ("s/...") without touching commitment store data ("c/..."). ## Txn ad hoc nested transactions implementation From 00f4819012030141583ed75efebde5ca3da777f7 Mon Sep 17 00:00:00 2001 From: Roniel Valdez Date: Thu, 27 Mar 2025 16:48:51 +0100 Subject: [PATCH 05/15] docs: [in progress] smt docs --- store/README.md | 78 +++++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 73 insertions(+), 5 deletions(-) diff --git a/store/README.md b/store/README.md index 417fc747fd..37636a8084 100644 --- a/store/README.md +++ b/store/README.md @@ -6,11 +6,11 @@ The `store` package implements the storage layer for the Canopy blockchain, leveraging **[BadgerDB](https://github.com/hypermodeinc/badger)** as its underlying key-value store. It provides structured abstractions for nested transactions, state management, and indexed blockchain -data storage. Below is a high-level breakdown of its components: +data storage. ## **Core Components** -1. **`Txn`**: The foundational transactional layer. This implements **nested transactions** on top +1. **`Txn`**: The foundational transactional layer. This implements ad-hoc **nested transactions** on top of BadgerDB, enabling atomic operations and rollbacks for complex storage workflows. 2. **`TxnWrapper` & `SMT`**: @@ -88,8 +88,7 @@ The main operations `TxnWrapper` is set to support according to the `RWStoreI` i Note that all operations within `TxnWrapper` are neither committed nor rolled back directly, as `TxnWrapper` operates within the broader transaction scope managed by the `Store` struct, which -handles the final commit or rollback decisions. These prefixes are only used internally and never -exposed to the user. +handles the final commit or rollback decisions. #### Key prefixing @@ -103,9 +102,78 @@ store, "c/" for commitment store) to achieve two main purposes: efficient range queries within specific components. For example, iterating through all state store entries ("s/...") without touching commitment store data ("c/..."). +These prefixes are only used internally and never exposed to the user. + ## Txn ad hoc nested transactions implementation -## Canopy's Sparse Merkle Tree Implementation +## Canopy's Optimized Sparse Merkle Tree + +A [Merkle Tree](https://en.wikipedia.org/wiki/Merkle_tree) is a data structure that enables +efficient and secure verification of large data sets. It works by hashing pairs of nodes and +progressively combining them upwards to create a single "root" hash that cryptographically commits +to all the data below it. A Sparse Merkle Tree (SMT) is a specialized variant that efficiently +handles sparse key-value mappings, making it particularly useful for storing state data and proof +the existence or non-existence of keys. + +### Traditional Sparse Merkle Tree + +```mermaid +graph TD + Root --> H0 + Root --> H1 + H0 --> H00 + H0 --> H01 + H1 --> H10 + H1 --> H11 + H00 --> L0[Value 1] + H00 --> L1[Empty] + H01 --> L2[Empty] + H01 --> L3[Empty] + H10 --> L4[Empty] + H10 --> L5[Empty] + H11 --> L6[Empty] + H11 --> L7[Empty] + + classDef highlight fill:#0000ff,stroke:#333,stroke-width:2px; + class Root,H0,H00,L0 highlight; + classDef highlightSiblings fill:#cc2900,stroke:#333,stroke-width:2px; + class H01,H1,L1 highlightSiblings; +``` + +In a traditional Sparse Merkle Tree, as illustrated above, each level splits into two child nodes, +creating a binary tree structure. The blue nodes show the path to `Value 1`, while the red nodes +represent the sibling hashes needed for proof verification. For example, to prove the existence of +`Value 1`, we need: + +1. The hash of `Value 1` itself +2. Its immediate sibling hash (L1) +3. The sibling hash at the next level (H01) +4. And finally, H1 to reconstruct the root + +This approach has significant limitations: + +- **Storage overhead**: Even for sparse data, the tree maintains placeholder nodes for empty + branches +- **Computational cost**: Proof generation requires multiple hash computations along the path +- **Scalability challenges**: As the tree grows, both storage and computational requirements + increase exponentially + +These limitations become particularly problematic in blockchain environments where efficiency and +scalability are crucial. Canopy's SMT implementation introduces several optimizations to address +these challenges while maintaining the security properties of traditional Sparse Merkle Trees. + +### Canopy's Sparse Merkle Tree + +Some of the key optimizations of Canopy's SMT are: + +- **Sparse Structure**: Keys are organized by their binary representation, with internal nodes storing + common prefixes to reduce redundant paths + +- **Optimized Traversals**: Operations like insertion, deletion, and lookup focus only on the relevant + parts of the tree, minimizing unnecessary traversal of empty nodes + +- **Key-Value Operations**: Supports upserts and deletions by dynamically creating or removing nodes + while maintaining the Merkle tree structure ## Indexer operations and prefix usage to optimize iterations From 53eea22545516b782c80e89ae7d0937879f19ea4 Mon Sep 17 00:00:00 2001 From: Roniel Valdez Date: Thu, 27 Mar 2025 17:36:39 +0100 Subject: [PATCH 06/15] feat: [in-progress] smt steps and mermaid inline test --- store/README.md | 90 ++++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 82 insertions(+), 8 deletions(-) diff --git a/store/README.md b/store/README.md index 37636a8084..ece1d6c825 100644 --- a/store/README.md +++ b/store/README.md @@ -136,7 +136,7 @@ graph TD classDef highlight fill:#0000ff,stroke:#333,stroke-width:2px; class Root,H0,H00,L0 highlight; - classDef highlightSiblings fill:#cc2900,stroke:#333,stroke-width:2px; + classDef highlightSiblings fill:#ff0000,stroke:#333,stroke-width:2px; class H01,H1,L1 highlightSiblings; ``` @@ -164,16 +164,90 @@ these challenges while maintaining the security properties of traditional Sparse ### Canopy's Sparse Merkle Tree -Some of the key optimizations of Canopy's SMT are: +Canopy's SMT implementation introduces key optimizations to address traditional SMT limitations: -- **Sparse Structure**: Keys are organized by their binary representation, with internal nodes storing - common prefixes to reduce redundant paths +1. **Optimized Node Structure**: + - Nil leaf nodes for empty values + - Parent nodes with single non-nil child are replaced by that child + - A tree starts and always maintains two root children for consistent operations -- **Optimized Traversals**: Operations like insertion, deletion, and lookup focus only on the relevant - parts of the tree, minimizing unnecessary traversal of empty nodes +2. **Efficient Tree Operations**: + - Key organization via binary representation and common prefix storage for internal nodes + - Targeted traversal that only visits relevant tree paths + - Dynamic node creation/deletion with automatic tree restructuring -- **Key-Value Operations**: Supports upserts and deletions by dynamically creating or removing nodes - while maintaining the Merkle tree structure +3. **Space and Performance**: + - Eliminates storage of empty branches + - Reduces hash computation overhead + - Maintains compact tree structure without compromising security + +### Core Algorithm Operations + +1. **Tree Traversal** + - Navigates downward to locate the closest existing node matching the target key's binary path + +2. **Modification Operations** + + a. **Upsert (Insert/Update)** + - Updates node directly if the target node matches current position + - Otherwise: + - Creates new parent node using the greatest common prefix between target and current node keys + - Updates old parent's pointer to reference new parent + - Sets current and target as children of new parent + + b. **Delete** + - When target matches current position: + - Removes current node + - Updates grandparent to point to current's sibling + - Removes current's parent node + +3. **ReHash** + - Progressively updates hash values from modified node to root after each operation + - Ensures cryptographic integrity of tree structure + +#### Example operations + +##### Insert 1101 + + +
+ +
+Before + +```mermaid +graph TD + root((root)) --> n0000[0000] + root --> n1[1] + n1 --> n1000[1000] + n1 --> n111[111] + n111 --> n1110[1110] + n111 --> n1111[1111] +``` +
+ +
+After + +```mermaid +graph TD + root((root)) --> n0000[0000] + root --> n1[1] + n1 --> n1000[1000] + n1 --> n11[11] + n11 --> n1101[1101] + n11 --> n111[111] + n111 --> n1110[1110] + n111 --> n1111[1111] + %% n1000 --> n1101[1101] + + classDef highlightParent fill:#0000ff,stroke:#333; + class n11 highlightParent; + classDef highlightNewNode fill:#ff0000,stroke:#333; + class n1101 highlightNewNode; +``` +
+
## Indexer operations and prefix usage to optimize iterations From bb059d7b5d8925332b7aaab564181988bf733458 Mon Sep 17 00:00:00 2001 From: Roniel Valdez Date: Thu, 27 Mar 2025 21:25:24 +0100 Subject: [PATCH 07/15] docs: smt implementation section --- store/README.md | 86 +++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 76 insertions(+), 10 deletions(-) diff --git a/store/README.md b/store/README.md index ece1d6c825..97af91105f 100644 --- a/store/README.md +++ b/store/README.md @@ -203,15 +203,13 @@ Canopy's SMT implementation introduces key optimizations to address traditional 3. **ReHash** - Progressively updates hash values from modified node to root after each operation - - Ensures cryptographic integrity of tree structure + - Ensures cryptographic integrity of tree structure. #### Example operations -##### Insert 1101 - +#### Insert 1101
-
Before @@ -224,8 +222,8 @@ graph TD n111 --> n1110[1110] n111 --> n1111[1111] ``` -
+
After @@ -234,21 +232,89 @@ graph TD root((root)) --> n0000[0000] root --> n1[1] n1 --> n1000[1000] - n1 --> n11[11] - n11 --> n1101[1101] + n1 --> n11[11 *new parent*] + n11 --> n1101[1101 inserted node] n11 --> n111[111] n111 --> n1110[1110] n111 --> n1111[1111] %% n1000 --> n1101[1101] - classDef highlightParent fill:#0000ff,stroke:#333; - class n11 highlightParent; - classDef highlightNewNode fill:#ff0000,stroke:#333; + classDef highlightNewNode fill:#0000ff,stroke:#333; class n1101 highlightNewNode; + classDef highlightRehashedNodes fill:#006622,stroke:#333; + class n1,n11,root highlightRehashedNodes; + +``` + +
+ + +Steps: + +1. **Path Finding**: Navigate down the tree following the binary path of '1101' until reaching the closest existing node ('111') + +2. **Position Check**: Determine target node ('1101') doesn't exist at current position + +3. **Parent Creation**: Form new parent node with key '11' (greatest common prefix between '1101' and '111') + +4. **Restructure**: Update old parent to reference new parent node, making previous node ('111') a child of new parent + +5. **Insert**: Add new node ('1101') as the second child of the new parent, maintaining binary tree properties + +6. **ReHash**: Progressively updates hash values from the modified node'parent to the root after each + operation, ensuring cryptographic integrity of tree structure as shown in the diagram in green. + +#### Delete 010 + +
+
+Before + +```mermaid +graph TD; + root((root)) --> n0["0"] + root --> n1["1"] + n0 --> n000["000"] + n0 --> n010["010"] + n1 --> n101["101"] + n1 --> n111["111"] + + classDef highlightDeleteNode fill:#ff0000,stroke:#333; + class n010 highlightDeleteNode; +``` + +
+
+After + +```mermaid +graph TD; + root((root)) --> n000["000 new parent"] + root --> n1["1"] + n1 --> n101["101"] + n1 --> n111["111"] + + classDef highlightNewNode fill:#0000ff,stroke:#333; + class n000 highlightNewNode; + classDef highlightRehashedNodes fill:#006622,stroke:#333; + class root highlightRehashedNodes; ``` +
+Steps: + +1. **Path Finding**: Navigate down the tree following the binary path of '010' until reaching the + target node + +2. **Node Removal**: Remove target node ('010') from tree structure + +3. **Parent Update**: Replace parent node '0' in grandparent with target's sibling '000' + +4. **Tree Rehash**: Recalculate hash values upward from '000' parent to the root (on this case is the + same as root) to maintain integrity as shown in the diagram in green. + ## Indexer operations and prefix usage to optimize iterations ## Store struct and how it adds up all together From c4d7ea3655a896c65793bf05a3fd4186a1602a85 Mon Sep 17 00:00:00 2001 From: Roniel Valdez Date: Tue, 1 Apr 2025 17:46:01 +0200 Subject: [PATCH 08/15] docs: add section for the sparse merkle tree's proof generation/verification proces --- store/README.md | 50 +++++++++++++++++++++++++++++++++++++++++++++++++ store/smt.go | 17 +++++++++-------- 2 files changed, 59 insertions(+), 8 deletions(-) diff --git a/store/README.md b/store/README.md index 97af91105f..6183714284 100644 --- a/store/README.md +++ b/store/README.md @@ -315,6 +315,56 @@ Steps: 4. **Tree Rehash**: Recalculate hash values upward from '000' parent to the root (on this case is the same as root) to maintain integrity as shown in the diagram in green. +### Proof Generation and verification + +Canopy's SMT implementation supports both proof-of-membership and proof-of-non-membership through +the `GetMerkleProof` and `VerifyProof` methods. These proofs enable verification of whether a +specific key-value pair exists in the tree without requiring access to the complete tree data. + +#### Proof Generation (`GetMerkleProof`) + +The proof generation process constructs an ordered path from the target leaf node (or the closest +existing node where the key would reside if present) back to the root, by including every sibling +node encountered along each branch of this traversal. + +1. **Initial Node**: The first element in the proof is always the target node (for membership + proofs) or the node at the potential location (for non-membership proofs) + +2. **Sibling Collection**: For each level from the leaf to the root: + 1. Record the sibling node's key and value + 2. Store the sibling's position (left=0 or right=1) in the bitmask + - These siblings are essential for reconstructing parent hashes + +#### Proof Verification (`VerifyProof`) + +The verification process reconstructs the root hash using the provided proof and compares it with +the known root, if the hashes match, the resulting tree is then used in order to verify the +existence of the requested key-value pair. As a side note, unlike traditional sparse merkle trees, +both the key and value are required in order to generate the parent's hash: + +1. **Initial Setup**: + - Create a temporary in-memory tree + - Add the first proof node (target/potential location) + - The first proof node's key and value hash are then used to build the parent of the upcoming node + +2. **Hash Reconstruction**: + - For each sibling in the proof: + - Use the bitmask to determine sibling position + - Combine the current hash with the sibling's hash + - Compute the parent hash using the same function as the main tree + - Compute the parent's key by calculating the Greatest Common Prefix (GCP) of the current + node's key and the sibling's key + - Save the resulting parent node in the temporary tree + +3. **Root Hash Validation**: + - Compare reconstructed root hash with provided root + - If the hashes do not match, the proof is invalid and the verification fails + +4. **Final Verification**: + - Traverse the temporary tree to verify the existence of the requested key-value pair + - For membership proofs: Verifies target exists and value matches + - For non-membership proofs: Confirms target doesn't exist at expected position + ## Indexer operations and prefix usage to optimize iterations ## Store struct and how it adds up all together diff --git a/store/smt.go b/store/smt.go index 860f413e73..3a320ebabe 100644 --- a/store/smt.go +++ b/store/smt.go @@ -443,9 +443,10 @@ func (s *SMT) GetMerkleProof(k []byte) ([]*lib.Node, lib.ErrorI) { Key: s.current.Key.bytes(), Value: s.current.Value, }) - // Add current to the list of traversed nodes until the actual node - // in case of proof of membership. for proof of non membembershio, the - // possible location of the node is added instead + // Add current to the list of traversed nodes. For membership proofs, traversed nodes include the + // path to the target node. For non-membership proofs, the potential insertion location is + // included instead, this is used for proof verification as the binary key (required for parent + // hash calculation) is not externally known. s.traversed.Nodes = append(s.traversed.Nodes, s.current.copy()) // traverse the nodes back up to the root to generate the proof for i := len(s.traversed.Nodes) - 1; i > 0; i-- { @@ -857,13 +858,13 @@ func (x *node) replaceChild(oldKey, newKey []byte) { func (x *node) copy() *node { return &node{ Key: &key{ - mostSigBytes: append([]byte(nil), x.Key.mostSigBytes...), - leastSigBits: append([]int(nil), x.Key.leastSigBits...), + mostSigBytes: slices.Clone(x.Key.mostSigBytes), + leastSigBits: slices.Clone(x.Key.leastSigBits), }, Node: lib.Node{ - Value: append([]byte(nil), x.Value...), - LeftChildKey: append([]byte(nil), x.LeftChildKey...), - RightChildKey: append([]byte(nil), x.RightChildKey...), + Value: slices.Clone(x.Value), + LeftChildKey: slices.Clone(x.LeftChildKey), + RightChildKey: slices.Clone(x.RightChildKey), }, } } From 6ec8c69236ac0ea2e67b06ba1c7458e6e64a7da8 Mon Sep 17 00:00:00 2001 From: Roniel Valdez Date: Wed, 2 Apr 2025 19:59:35 +0200 Subject: [PATCH 09/15] docs: expand storage documentation with core concepts --- store/README.md | 123 +++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 105 insertions(+), 18 deletions(-) diff --git a/store/README.md b/store/README.md index 6183714284..61ee19f446 100644 --- a/store/README.md +++ b/store/README.md @@ -6,7 +6,111 @@ The `store` package implements the storage layer for the Canopy blockchain, leveraging **[BadgerDB](https://github.com/hypermodeinc/badger)** as its underlying key-value store. It provides structured abstractions for nested transactions, state management, and indexed blockchain -data storage. +data storage. This document aims to provide a comprehensive overview of the package's core +components and their functionalities while also introducing most of the concepts used in the package +and builds up each component step by step to form a complete storage solution. + +## Basic Concepts + +### Persistent Storage + +Persistent storage refers to data storage mechanisms where information persists after power loss or +process termination, unlike volatile memory (RAM) where data is temporary. + +Persistent storage systems must balance performance, durability, and consistency requirements. While +RAM provides fast access but loses data on power loss, storage devices like SSDs and HDDs offer +permanence at the cost of slower access speeds. This tradeoff is particularly important in +blockchain systems where transaction history and state changes must be both quickly accessible and +permanently stored while maintaining data integrity across system restarts. + +### Key-Value Storage + +A key-value storage is like a dictionary where unique keys can be associated with values. It +provides a fast and efficient way to store and retrieve data. + +Examples: + +- Key: "user_123" → Value: `{name: "John", balance: 100}` +- Key: "settings" → Value: `{theme: "dark", notifications: "on"}` + +Benefits: + +- Fast lookups ([O(1)](https://en.wikipedia.org/wiki/Time_complexity) in ideal cases) +- Simple to understand and use +- Flexible value storage (can store any type of data) + +### Transactions + +A transaction is a group of operations that must either all succeed or all fail together. A good +analogy is like transferring money between two bank accounts. + +```mermaid +graph LR + A[Account A: $100
Account B: $0] --> B[A Transfers $50 to B] --> C[Account B: $50
Account A: $50] +``` + +If anything fails during the transfer, both accounts should return to their original state. As +clients would be pretty upset if they lost their money. + +#### Transaction Properties (ACID) + +1. **Atomicity**: All operations in a transaction either succeed or fail together + - Example: In a money transfer, both the withdrawal and deposit must succeed together + +2. **Consistency**: Data remains valid before and after the transaction + - Example: Total money in all accounts must remain the same after transfers + +3. **Isolation**: Multiple transactions don't interfere with each other + - Example: Two people withdrawing money simultaneously shouldn't cause conflicts + +4. **Durability**: Once a transaction is committed, changes are permanent + - Example: After confirming a transfer, the new balances persist even if the system crashes + +#### Nested Transactions + +Nested transactions behave like a set of Russian dolls, where each transaction sits inside another: + +```mermaid +graph TD + A[Main Transaction] --> B[Sub-Transaction 1] + A --> C[Sub-Transaction 2] + B --> D[Sub-Sub-Transaction] +``` + +Benefits: + +- More granular control over operations +- Ability to rollback partial operations +- Better organization of complex operations + +### Why BadgerDB? + +**[BadgerDB](https://github.com/hypermodeinc/badger)** is a fast, embeddable, persistent key-value +(KV) database written in pure Go. It's designed to be highly performant for both read and write +operations. + +#### Key Features + +1. **LSM Tree-based**: Uses a [Log-Structured Merge-Tree](https://en.wikipedia.org/wiki/Log-structured_merge-tree) architecture, optimized for SSDs +2. **ACID Compliant**: Ensures data consistency through Atomicity, Consistency, Isolation, and + Durability +3. **Concurrent Access**: Supports multiple readers and a single writer simultaneously +4. **Key-Value Separation**: Stores keys and values separately to improve performance +5. **Transactions**: Native support for both read-only and read-write transactions +6. **Iteration**: Provides efficient iteration over key-value pairs that are byte-wise + lexicographically ordered + +### Putting It All Together + +In Canopy's storage system: + +1. BadgerDB provides the persistent key-value store +2. Transactions ensure data consistency +3. Nested transactions enable complex operations +4. Key-value pairs store blockchain state and data + +This foundation helps understand the more complex features that will be discussed in the following +sections. ## **Core Components** @@ -40,23 +144,6 @@ data storage. This layered design decouples storage concerns while ensuring compatibility with BadgerDB’s performance characteristics and the blockchain’s integrity requirements. -## Understanding BadgerDB and Transaction Management in Canopy - -**[BadgerDB](https://github.com/hypermodeinc/badger)** is a fast, embeddable, persistent key-value -(KV) database written in pure Go. It's designed to be highly performant for both read and write -operations. - -### Key Features - -1. **LSM Tree-based**: Uses a Log-Structured Merge-Tree architecture, optimized for SSDs -2. **ACID Compliant**: Ensures data consistency through Atomicity, Consistency, Isolation, and - Durability -3. **Concurrent Access**: Supports multiple readers and a single writer simultaneously -4. **Key-Value Separation**: Stores keys and values separately to improve performance -5. **Transactions**: Native support for both read-only and read-write transactions -6. **Iteration**: Provides efficient iteration over key-value pairs that are byte-wise - lexicographically ordered - ### TxnWrapper BadgerDB's native transaction system provides atomic operations through its `Txn` type, supporting From 2da51eb146bf8ddb6450688308e878b7835ee648 Mon Sep 17 00:00:00 2001 From: Roniel Valdez Date: Thu, 3 Apr 2025 21:27:19 +0200 Subject: [PATCH 10/15] docs: rewrite into, add hashing section, rewrite core components section --- store/README.md | 133 ++++++++++++++++++++++++++++++++++++------------ 1 file changed, 100 insertions(+), 33 deletions(-) diff --git a/store/README.md b/store/README.md index 61ee19f446..2a3b60acf1 100644 --- a/store/README.md +++ b/store/README.md @@ -3,15 +3,35 @@ [![Go Reference](https://pkg.go.dev/badge/github.com/canopy/canopy-network.svg)](https://pkg.go.dev/github.com/canopy-network/canopy/store) [![License](https://img.shields.io/github/license/canopy-network/canopy)](https://github.com/canopy/canopy-network/blob/main/LICENSE) -The `store` package implements the storage layer for the Canopy blockchain, leveraging -**[BadgerDB](https://github.com/hypermodeinc/badger)** as its underlying key-value store. It -provides structured abstractions for nested transactions, state management, and indexed blockchain -data storage. This document aims to provide a comprehensive overview of the package's core -components and their functionalities while also introducing most of the concepts used in the package -and builds up each component step by step to form a complete storage solution. +The store package serves as the storage layer for the Canopy blockchain, utilizing +**[BadgerDB](https://github.com/hypermodeinc/badger)** as its underlying key-value store. It offers +structured abstractions for nested transactions, state management, and indexed blockchain data +storage. This document provides a comprehensive overview of the package’s core components and their +functionalities, introducing key concepts while progressively building each component into a +complete storage solution. ## Basic Concepts +### Hashing + +Hashing is the process of converting data into an unique "fingerprint". A hash function converts any +data into a fixed-size string of characters, making it useful for data verification and storage. + +```mermaid +graph LR + A["Hello World"] -->|Hash Function| B["68e109f..."] + C["hello world"] -->|Hash Function| D["5eb63b..."] +``` + +Key Properties: + +- **Same Input, Same Hash**: "Hello" always produces the same hash +- **Small Changes, Different Hash**: "Hello" and "hello" produce completely different hashes +- **One-way**: You can't recreate the original data from its hash +- **Fixed Size**: All hashes have the same length, regardless of input size + +In blockchain, hashing is used to create unique IDs for transactions, verify data hasn't changed, and link blocks together. + ### Persistent Storage Persistent storage refers to data storage mechanisms where information persists after power loss or @@ -46,7 +66,7 @@ analogy is like transferring money between two bank accounts. ```mermaid graph LR - A[Account A: $100
Account B: $0] --> B[A Transfers $50 to B] --> C[Account B: $50
Account A: $50] + A[Account A: $100
Account B: $0] --> B[A Transfers $50 to B] --> C[Account A: $50
Account B: $50] ``` If anything fails during the transfer, both accounts should return to their original state. As @@ -91,12 +111,15 @@ operations. #### Key Features -1. **LSM Tree-based**: Uses a [Log-Structured Merge-Tree](https://en.wikipedia.org/wiki/Log-structured_merge-tree) architecture, optimized for SSDs +1. **LSM Tree-based**: Uses a + [Log-Structured Merge-Tree](https://en.wikipedia.org/wiki/Log-structured_merge-tree) + architecture, optimized for SSDs 2. **ACID Compliant**: Ensures data consistency through Atomicity, Consistency, Isolation, and Durability 3. **Concurrent Access**: Supports multiple readers and a single writer simultaneously 4. **Key-Value Separation**: Stores keys and values separately to improve performance -5. **Transactions**: Native support for both read-only and read-write transactions +5. **Transactions**: Native support for both read-only and read-write transactions. Every action in + BadgerDB happens within a transaction 6. **Iteration**: Provides efficient iteration over key-value pairs that are byte-wise lexicographically ordered @@ -108,41 +131,85 @@ In Canopy's storage system: 2. Transactions ensure data consistency 3. Nested transactions enable complex operations 4. Key-value pairs store blockchain state and data +5. Blockchain data is frequently stored hashed to improve security This foundation helps understand the more complex features that will be discussed in the following sections. -## **Core Components** +## **Store Package Components** + +The store package is built from several key components that work together, like building blocks, to +create a complete storage system. It could be represented like a well-organized filing cabinet, +where each component has a specific job in managing and storing data. Each compoent from the +simplest to the most complex is described as follows: + +1. **TxnWrapper**: It is a Wrapper around BadgerDB operations: + - Makes sure all database operations follow the [`RWStoreI`](../lib/store.go) interface + - Handles basic operations like storing and retrieving data + - Allows iteration between ranges of data + - Works like one of the translators between BadgerDB and the rest of Canopy + +2. **`Txn`**: The transaction manager + - Ensures that a group of operations happen together or not all (transactions) + - Enhances BadgerDB by implementing in-memory nested transactions, to allows to write or discard + groups of operations (like multiple read/writes) within a single BadgerDB transaction + - Follows the [`RWStoreI`](../lib/store.go) interface to interact with the database + +3. **`SMT`** (Sparse Merkle Tree): A special data structure for proving data existence + - Organizes data in a tree-like structure + - Makes it easy to prove whether data exists or doesn't exist + - Uses smart optimizations to save space and work faster + - Just like `TxnWrapper` it also follows the [`RWStoreI`](../lib/store.go) interface and + leverages `TxnWrapper` itself in order to interact with the database + +4. **`Indexer`**: The filing system + - Organizes blockchain data (like blocks and transactions) in an easy-to-find way + - Uses `TxnWrapper` under the hood to save the data in BadgerDB + - Uses prefixes (like labels on filing cabinets) to group related data + - Makes searching through data fast and efficient + +5. **`Store`**: The main coordinator + - Brings all other components together + - Contains three main parts: + - `Indexer`: Organizes and indexes blockchain data + - `StateStore`: Stores the actual blockchain data using `TxnWrapper` under the hood + - `StateCommitStore`: Stores hashes of the data using `SMT` under the hood + - Is the only component that commits to the database directly + +## **Key Interactions** + +```mermaid +graph TD + Store[Store] --> Indexer[Indexer
Indexed data] + Store --> StateStore[StateStore
Raw Data] + Store --> StateCommitStore[StateCommitStore
Hash Data] -1. **`Txn`**: The foundational transactional layer. This implements ad-hoc **nested transactions** on top - of BadgerDB, enabling atomic operations and rollbacks for complex storage workflows. + StateStore --> TxnWrapper[TxnWrapper] + StateCommitStore --> SMT[SMT
Sparse Merkle Tree] + Indexer --> TxnWrapper -2. **`TxnWrapper` & `SMT`**: - - **`TxnWrapper`**: Wraps BadgerDB to conform to the `RWStoreI` interface, providing a simple - read/write abstraction with transaction support. - - **`SMT`**: An optimized Sparse Merkle Tree implementation backed by BadgerDB. It adheres to the - `RWStoreI` interface and enables efficient cryptographic commitment to state data and proof of - membership/non-membership of the keys. + SMT --> TxnWrapper -3. **`Indexer`**: Built on `TxnWrapper`, this component organizes blockchain data (blocks, - transactions, addresses, etc.) using **prefix-based keys**. This design allows efficient - iteration and querying of domain-specific data (e.g., "all transactions in block X"). + TxnWrapper --> BadgerDB[BadgerDB] + Store --> BadgerDB +``` -4. **`Store`**: The top-level struct coordinating the storage layer: - - **`Indexer`**: Manages indexed blockchain operations. - - **`StateStore`**: Stores raw blockchain state data (blobs) using `TxnWrapper`. - - **`StateCommitStore`**: Uses the `SMT` implementation to cryptographically commit hashes of - `StateStore` data into the Sparse Merkle Tree, ensuring tamper-evident state verification. +1. The `Store` manages three main components: + - `Indexer` (for indexed blockchain data) + - `StateStore` (for raw data) + - `StateCommitStore` (for hash data) + - Commits directly to the database the data on all the components -## **Key Interactions** +2. `TxnWrapper` acts as a bridge to BadgerDB: + - Both `StateStore` and `Indexer` use it directly + - `SMT` also uses it for data storage + - Provides consistent way to interact with the database -- **Transactions**: `Txn` provides atomicity for operations across BadgerDB. -- **State Management**: `StateStore` (raw data) and `StateCommitStore` (hashes in SMT) work in - tandem to balance performance with cryptographic integrity. -- **Querying**: The `Indexer`’s prefix-based structure enables fast, type-specific data retrieval. +3. `BadgerDB` serves as the foundation: + - All data ultimately gets stored here + - Everything flows through `TxnWrapper` to reach BadgerDB -This layered design decouples storage concerns while ensuring compatibility with BadgerDB’s -performance characteristics and the blockchain’s integrity requirements. +This architecture ensures each component handles its specific tasks while maintaining consistent data storage and access patterns. ### TxnWrapper From 92e457a48d59a71015f7802809b2f1f1adddd3bc Mon Sep 17 00:00:00 2001 From: Roniel Valdez Date: Tue, 8 Apr 2025 19:33:27 +0200 Subject: [PATCH 11/15] temp --- store/README.md | 31 +++++++++++++++++++------------ 1 file changed, 19 insertions(+), 12 deletions(-) diff --git a/store/README.md b/store/README.md index 2a3b60acf1..f8bf23cc44 100644 --- a/store/README.md +++ b/store/README.md @@ -143,7 +143,7 @@ create a complete storage system. It could be represented like a well-organized where each component has a specific job in managing and storing data. Each compoent from the simplest to the most complex is described as follows: -1. **TxnWrapper**: It is a Wrapper around BadgerDB operations: +1. *`TxnWrapper`*: It is a Wrapper around BadgerDB operations: - Makes sure all database operations follow the [`RWStoreI`](../lib/store.go) interface - Handles basic operations like storing and retrieving data - Allows iteration between ranges of data @@ -265,9 +265,16 @@ These prefixes are only used internally and never exposed to the user. A [Merkle Tree](https://en.wikipedia.org/wiki/Merkle_tree) is a data structure that enables efficient and secure verification of large data sets. It works by hashing pairs of nodes and progressively combining them upwards to create a single "root" hash that cryptographically commits -to all the data below it. A Sparse Merkle Tree (SMT) is a specialized variant that efficiently -handles sparse key-value mappings, making it particularly useful for storing state data and proof -the existence or non-existence of keys. +to all the data below it. + +A Sparse Merkle Tree (SMT) extends this concept by efficiently handling sparse key-value mappings - +meaning it's optimized for situations where only a small subset of possible keys are actually used, +such as in blockchain state storage like this module. This optimization is crucial because a +traditional Merkle tree would require storing all possible keys, even empty ones, making it +impractical for large key spaces. + +Instead, a SMT allows us to both store and prove the existence (or non-existence) of key-value pairs +while maintaining a minimal memory footprint and providing efficient verification capabilities. ### Traditional Sparse Merkle Tree @@ -279,14 +286,14 @@ graph TD H0 --> H01 H1 --> H10 H1 --> H11 - H00 --> L0[Value 1] - H00 --> L1[Empty] - H01 --> L2[Empty] - H01 --> L3[Empty] - H10 --> L4[Empty] - H10 --> L5[Empty] - H11 --> L6[Empty] - H11 --> L7[Empty] + H00 --> L0[L0 - Value 1] + H00 --> L1[L1 - Empty] + H01 --> L2[L2 - Empty] + H01 --> L3[L3 - Empty] + H10 --> L4[L4 - Empty] + H10 --> L5[L5 - Empty] + H11 --> L6[L6 - Empty] + H11 --> L7[L7 - Empty] classDef highlight fill:#0000ff,stroke:#333,stroke-width:2px; class Root,H0,H00,L0 highlight; From 34fd13333d036107dd4295a046778c41d7ef2ffb Mon Sep 17 00:00:00 2001 From: Roniel Valdez Date: Wed, 9 Apr 2025 21:05:30 +0200 Subject: [PATCH 12/15] docs: remove basic concepts section, add section for indexer --- store/README.md | 204 ++++++++++++++++++++++++------------------------ 1 file changed, 102 insertions(+), 102 deletions(-) diff --git a/store/README.md b/store/README.md index f8bf23cc44..11ecb2ab7b 100644 --- a/store/README.md +++ b/store/README.md @@ -10,111 +10,19 @@ storage. This document provides a comprehensive overview of the package’s core functionalities, introducing key concepts while progressively building each component into a complete storage solution. -## Basic Concepts - -### Hashing - -Hashing is the process of converting data into an unique "fingerprint". A hash function converts any -data into a fixed-size string of characters, making it useful for data verification and storage. - -```mermaid -graph LR - A["Hello World"] -->|Hash Function| B["68e109f..."] - C["hello world"] -->|Hash Function| D["5eb63b..."] -``` - -Key Properties: - -- **Same Input, Same Hash**: "Hello" always produces the same hash -- **Small Changes, Different Hash**: "Hello" and "hello" produce completely different hashes -- **One-way**: You can't recreate the original data from its hash -- **Fixed Size**: All hashes have the same length, regardless of input size - -In blockchain, hashing is used to create unique IDs for transactions, verify data hasn't changed, and link blocks together. - -### Persistent Storage - -Persistent storage refers to data storage mechanisms where information persists after power loss or -process termination, unlike volatile memory (RAM) where data is temporary. - -Persistent storage systems must balance performance, durability, and consistency requirements. While -RAM provides fast access but loses data on power loss, storage devices like SSDs and HDDs offer -permanence at the cost of slower access speeds. This tradeoff is particularly important in -blockchain systems where transaction history and state changes must be both quickly accessible and -permanently stored while maintaining data integrity across system restarts. - -### Key-Value Storage - -A key-value storage is like a dictionary where unique keys can be associated with values. It -provides a fast and efficient way to store and retrieve data. - -Examples: - -- Key: "user_123" → Value: `{name: "John", balance: 100}` -- Key: "settings" → Value: `{theme: "dark", notifications: "on"}` - -Benefits: - -- Fast lookups ([O(1)](https://en.wikipedia.org/wiki/Time_complexity) in ideal cases) -- Simple to understand and use -- Flexible value storage (can store any type of data) - -### Transactions - -A transaction is a group of operations that must either all succeed or all fail together. A good -analogy is like transferring money between two bank accounts. - -```mermaid -graph LR - A[Account A: $100
Account B: $0] --> B[A Transfers $50 to B] --> C[Account A: $50
Account B: $50] -``` - -If anything fails during the transfer, both accounts should return to their original state. As -clients would be pretty upset if they lost their money. - -#### Transaction Properties (ACID) - -1. **Atomicity**: All operations in a transaction either succeed or fail together - - Example: In a money transfer, both the withdrawal and deposit must succeed together - -2. **Consistency**: Data remains valid before and after the transaction - - Example: Total money in all accounts must remain the same after transfers - -3. **Isolation**: Multiple transactions don't interfere with each other - - Example: Two people withdrawing money simultaneously shouldn't cause conflicts - -4. **Durability**: Once a transaction is committed, changes are permanent - - Example: After confirming a transfer, the new balances persist even if the system crashes - -#### Nested Transactions - -Nested transactions behave like a set of Russian dolls, where each transaction sits inside another: - -```mermaid -graph TD - A[Main Transaction] --> B[Sub-Transaction 1] - A --> C[Sub-Transaction 2] - B --> D[Sub-Sub-Transaction] -``` - -Benefits: - -- More granular control over operations -- Ability to rollback partial operations -- Better organization of complex operations - -### Why BadgerDB? +## Why BadgerDB? **[BadgerDB](https://github.com/hypermodeinc/badger)** is a fast, embeddable, persistent key-value (KV) database written in pure Go. It's designed to be highly performant for both read and write +operations and is the underlying databased used by the Canopy Blockchain for all of its persistence operations. -#### Key Features +### Key Features 1. **LSM Tree-based**: Uses a [Log-Structured Merge-Tree](https://en.wikipedia.org/wiki/Log-structured_merge-tree) architecture, optimized for SSDs -2. **ACID Compliant**: Ensures data consistency through Atomicity, Consistency, Isolation, and +2. **[ACID](https://en.wikipedia.org/wiki/ACID) Compliant**: Ensures data consistency through Atomicity, Consistency, Isolation, and Durability 3. **Concurrent Access**: Supports multiple readers and a single writer simultaneously 4. **Key-Value Separation**: Stores keys and values separately to improve performance @@ -123,18 +31,15 @@ operations. 6. **Iteration**: Provides efficient iteration over key-value pairs that are byte-wise lexicographically ordered -### Putting It All Together +### Overview of Canopy store's module -In Canopy's storage system: +In Canopy's store's module: 1. BadgerDB provides the persistent key-value store 2. Transactions ensure data consistency 3. Nested transactions enable complex operations 4. Key-value pairs store blockchain state and data -5. Blockchain data is frequently stored hashed to improve security - -This foundation helps understand the more complex features that will be discussed in the following -sections. +5. Blockchain data is frequently stored [hashed](https://en.wikipedia.org/wiki/Hash_function) to improve security ## **Store Package Components** @@ -528,4 +433,99 @@ both the key and value are required in order to generate the parent's hash: ## Indexer operations and prefix usage to optimize iterations +The Indexer component serves as an organized filing system for blockchain data, using prefix-based +storage patterns to enable efficient data retrieval and iteration. It manages four primary types of +data: transactions, blocks, quorum certificates, and checkpoints. + +### Prefix-Based Storage Structure + +The Indexer uses unique prefix bytes to segregate different types of data: + +```go +var ( + txHashPrefix = []byte{1} // Transaction by hash + txHeightPrefix = []byte{2} // Transactions by height + txSenderPrefix = []byte{3} // Transactions from sender + // and more... +) +``` + +This prefix system creates distinct "namespaces" in the database, allowing for: + +1. Data isolation between different types +2. Efficient range queries within specific data types +3. Prevention of key collisions + +### Key Operations + +#### 1. Transaction Indexing + +```mermaid +graph TD + TX[Transaction] --> TxHash[By Hash
prefix: 1] + TX --> TxHeight[By Height
prefix: 2] + TX --> TxSender[By Sender
prefix: 3] + TX --> TxRecipient[By Recipient
prefix: 4] +``` + +- **Multiple Access Patterns**: Each transaction is indexed in four ways: + - By hash: Direct lookup + - By height: Group transactions in same block + - By sender: Find all transactions from an address + - By recipient: Find all transactions to an address + +#### 2. Block Indexing + +```mermaid +graph TD + Block[Block] --> BlockHash[By Hash
prefix: 5] + Block --> BlockHeight[By Height
prefix: 6] + BlockHeight --> Txs[Associated Transactions
prefix: 2] +``` + +- **Dual Indexing**: Blocks are indexed by both: + - Hash: For direct lookups + - Height: For chronological access +- **Associated Data**: Links to related transactions using height references + +#### 3. Special Purpose Indices + +- **Quorum Certificates**: Indexed by height for consensus validation +- **Double Signers**: Track validator misbehavior +- **Checkpoints**: Store chain security checkpoints + +### Optimized Iteration Patterns + +The Indexer leverages BadgerDB's lexicographical ordering to implement iterations: + +1. **Forward/Reverse Iteration**: + +```go +// Example: Get transactions newest to oldest +it, err := indexer.db.RevIterator(txHeightPrefix) +// Example: Get transactions oldest to newest +it, err := indexer.db.Iterator(txHeightPrefix) +``` + +While the Indexer supports iteration capabilities, this approach should be used cautiously due to +its performance overhead. When dealing with large datasets, it is strongly recommended to retrieve +multiple elements in a single bulk operation and unmarshal them collectively, rather than performing +individual iterations and unmarshalling operations, as this pattern has proven to be significantly +more performant in practice. + +### Key Encoding Strategy + +The Indexer uses the following key encoding strategy: + +1. **Big-Endian Height Encoding**: + - Ensures proper lexicographical ordering + - Enables range queries by height + +2. **Length-Prefixed Keys**: + - Prevents key collision + - Maintains clear separation between key components + - Enables prefix scanning + +This structured approach to data indexing and storage enables efficient querying and iteration over blockchain data while maintaining data integrity and accessibility. + ## Store struct and how it adds up all together From ad62fb0b2036475daee10d03b5ce26725a8e154f Mon Sep 17 00:00:00 2001 From: Roniel Valdez Date: Tue, 29 Apr 2025 18:55:52 +0200 Subject: [PATCH 13/15] docs: store wrap up section --- store/README.md | 64 ++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 63 insertions(+), 1 deletion(-) diff --git a/store/README.md b/store/README.md index 11ecb2ab7b..0fcd704325 100644 --- a/store/README.md +++ b/store/README.md @@ -528,4 +528,66 @@ The Indexer uses the following key encoding strategy: This structured approach to data indexing and storage enables efficient querying and iteration over blockchain data while maintaining data integrity and accessibility. -## Store struct and how it adds up all together +## `Store` struct: Putting it all together + +The `Store` struct serves as the central coordinator for Canopy's storage system, integrating all previously described components into a cohesive whole. It provides a clean, high-level API that abstracts away the complexity of the underlying storage components. + +## Core Integration + +The `Store` struct brings together three main components: + +1. **State Store**: Manages the actual data blobs representing blockchain state +2. **State Commit Store**: Maintains a Sparse Merkle Tree of state hashes +3. **Indexer**: Organizes blockchain elements for efficient retrieval + +## Atomic Operations + +The `Store` ensures atomicity through BadgerDB's transaction system: + +1. All operations across components occur within a single BadgerDB transaction +2. The `Commit()` method finalizes changes with a single atomic write +3. Failed operations are automatically rolled back to maintain data integrity + +### Latest State and Historical State Stores + +Canopy separates the data of the state into two types of stores: the Latest State Store (LSS) and the Historical State Store (HSS). + +#### Latest State Store (LSS) + +The Latest State Store maintains the most current version of all state data, using the prefix `s/` for all keys. It: + +1. **Provides Fast Access**: Optimized for frequent read/write operations on current state (i.e., latest heights) +2. **Supports Iteration**: Enables efficient traversal of current state data +3. **Maintains Versioning**: Uses BadgerDB's version capabilities to track state changes + +#### Historical State Store (HSS) + +The Historical State Store maintains historical state data in partition-based snapshots, using the prefix `h/{partition_height}/` (e.g., `h/10000/`). It: + +1. **Preserves History**: Maintains complete state snapshots at regular intervals +2. **Enables Time Travel**: Allows querying state at any historical block height +3. **Supports Efficient Queries**: Optimized for historical data retrieval +4. **Supports Safe Pruning**: Partitioning enables safe deletion of older state data + +#### Partitioning Strategy + +The partitioning approach works as follows: + +1. **Partition Creation**: At regular intervals (every `partitionFrequency` blocks, e.g., 10,000), a complete state snapshot is created in a new HSS partition +2. **Complete Snapshots**: Each partition contains the full state as it existed at that height +3. **Automatic Switching**: The `Store` automatically determines whether to use LSS or HSS based on the query height + +## Versioning and State Roots + +The Store maintains two critical pieces of information: + +1. **Version**: The current blockchain height +2. **Root**: The Merkle root hash of the current state + +These are tracked in the `CommitIDStore`, enabling: + +1. **State Verification**: Other nodes can verify state consistency using just the root hash +2. **Historical Access**: Previous states can be accessed by version number +3. **Synchronization**: New nodes can verify they've correctly synchronized state + +This comprehensive approach to storage management provides Canopy with a robust, efficient, and secure foundation for blockchain state handling, enabling both high-performance current operations and flexible historical data access. From 03d874006dd8998c00f960ea441bd13ca53982d3 Mon Sep 17 00:00:00 2001 From: Pablo Ocampo Date: Thu, 1 May 2025 02:56:01 -0400 Subject: [PATCH 14/15] feat: add ## Txn: Ad Hoc Nested Transactions Implementation --- store/README.md | 103 +++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 101 insertions(+), 2 deletions(-) diff --git a/store/README.md b/store/README.md index 0fcd704325..6f68d26da3 100644 --- a/store/README.md +++ b/store/README.md @@ -6,7 +6,7 @@ The store package serves as the storage layer for the Canopy blockchain, utilizing **[BadgerDB](https://github.com/hypermodeinc/badger)** as its underlying key-value store. It offers structured abstractions for nested transactions, state management, and indexed blockchain data -storage. This document provides a comprehensive overview of the package’s core components and their +storage. This document provides a comprehensive overview of the package's core components and their functionalities, introducing key concepts while progressively building each component into a complete storage solution. @@ -163,7 +163,106 @@ store, "c/" for commitment store) to achieve two main purposes: These prefixes are only used internally and never exposed to the user. -## Txn ad hoc nested transactions implementation +## Txn: Ad Hoc Nested Transactions Implementation + +The `Txn` type provides a transaction-like interface for the store, allowing for atomic operations and rollbacks. This is particularly useful for testing block proposals and managing ephemeral states. + +### Core Components + +#### Txn Structure +The `Txn` type consists of three main structures: + +1. **Txn**: The main transaction type that embeds both the parent store interface and internal transaction state +2. **txn**: Internal state structure containing: + - A map of pending operations (set/delete) + - A sorted list of keys for efficient iteration + - A length counter for the sorted list +3. **op**: Operation type that represents either a set or delete operation with its associated value + +### Key Features + +- **In-Memory Operations**: + - All write operations (Set/Delete) are stored in memory until explicitly written to the parent store + - Provides fast access to pending changes without disk I/O + - Enables efficient rollback of operations before they're committed + +- **Atomic Operations**: + - Write() method ensures all operations are applied atomically to the parent store + - Either all operations succeed or none are applied + - Maintains data consistency even during concurrent access + +- **Rollback Support**: + - Discard() method allows rolling back all pending operations + - Useful for testing block proposals or handling failed operations + - Cleans up memory resources associated with pending operations + +- **Efficient Iteration**: + - Maintains sorted keys for efficient iteration and merging with parent store + - Supports both forward and reverse iteration patterns + - Enables efficient range queries and prefix-based filtering + +- **Prefix-Based Iteration**: + - Supports both forward and reverse iteration with prefix filtering + - Enables efficient scanning of related data + - Useful for batch operations on related keys + +### Performance Considerations + +- **Memory Usage**: + - All operations are kept in memory until Write() or Discard() + - Memory footprint grows linearly with the number of pending operations + - Efficient memory management through buffer pooling and automatic cleanup + +- **Iteration Efficiency**: + - Uses binary search for O(log n) key lookups in sorted list + - Maintains sorted order for efficient range queries + - Optimized merging of in-memory and parent store data during iteration + +- **Write Performance**: + - Write() operation is O(n) where n is the number of pending operations + - Batch operations are more efficient than individual writes + - Memory operations are significantly faster than disk operations + +- **Read Performance**: + - Get() is O(1) for in-memory operations, falls back to parent store + - In-memory operations take precedence over parent store data + - Efficient key lookup through hash map and sorted list combination + +### Implementation Details + +#### Write Operations +- Set() and Delete() operations are stored in an in-memory map +- Keys are automatically added to a sorted list for efficient iteration +- Operations are not applied to the parent store until Write() is called +- The sorted list enables efficient binary search for key lookups + +#### Read Operations +- Get() first checks the in-memory operations, then falls back to the parent store +- Iterator() and RevIterator() merge in-memory operations with parent store data +- Deleted keys are properly handled during iteration +- Prefix-based filtering is supported for both forward and reverse iteration + +#### Memory Management +- Uses a map for O(1) operation lookups +- Maintains a sorted slice for efficient iteration +- Implements buffer pooling for memory efficiency +- Automatically manages memory for pending operations + +### Usage Example + +The Txn type is typically used in scenarios requiring atomic operations or rollback capabilities: + +1. Create a new transaction with a parent store +2. Perform multiple Set() and Delete() operations +3. Either commit the changes using Write() or rollback using Discard() +4. The transaction maintains all operations in memory until explicitly committed + +### Limitations + +- Not thread-safe (should not be used across multiple goroutines) +- Write() is not atomic when writing to another memory store +- Keys must be smaller than 128 bytes +- Nested transactions are supported but iteration becomes increasingly inefficient with depth ## Canopy's Optimized Sparse Merkle Tree From ad7bbe3106baa7017b09e573de2c670c38faa6ea Mon Sep 17 00:00:00 2001 From: Pablo Ocampo Date: Thu, 1 May 2025 16:42:28 -0400 Subject: [PATCH 15/15] fix: make it less bullet point focused --- store/README.md | 134 +++++++++++++++--------------------------------- 1 file changed, 42 insertions(+), 92 deletions(-) diff --git a/store/README.md b/store/README.md index 6f68d26da3..55be9b6d9a 100644 --- a/store/README.md +++ b/store/README.md @@ -165,104 +165,54 @@ These prefixes are only used internally and never exposed to the user. ## Txn: Ad Hoc Nested Transactions Implementation -The `Txn` type provides a transaction-like interface for the store, allowing for atomic operations and rollbacks. This is particularly useful for testing block proposals and managing ephemeral states. +The `Txn` type provides a transaction-like interface for the store, allowing for atomic operations and rollbacks. This is particularly useful for testing block proposals and managing ephemeral states. +One of its features is the ability to create nested transactions, enabling complex multi-level state modifications while maintaining atomicity. -### Core Components +### Nested Transactions -#### Txn Structure -The `Txn` type consists of three main structures: +Nested transactions in Canopy's `Txn` implementation provide a code-level abstraction for managing hierarchical state modifications. +Unlike database-level nested transactions, these are implemented entirely in memory and provide a way to group operations that can be independently committed or rolled back. +This creates an abstraction for managing complex state changes that need to be atomic at different levels, while maintaining the simplicity of BadgerDB's single-level transaction model. -1. **Txn**: The main transaction type that embeds both the parent store interface and internal transaction state -2. **txn**: Internal state structure containing: - - A map of pending operations (set/delete) - - A sorted list of keys for efficient iteration - - A length counter for the sorted list -3. **op**: Operation type that represents either a set or delete operation with its associated value +#### Key Features of Nested Transactions + +1. **Hierarchical Structure** + Transactions form a hierarchy where child transactions inherit parent state but keep their own changes isolated until committed. Parents can roll back child changes atomically. + +2. **Independent Commit/Rollback** + Each transaction level can be committed or rolled back independently. Child changes affect the parent only when committed, and parent rollbacks undo all nested changes. + +3. **State Isolation** + Changes are confined to the transaction level where they occur. Parents don't see uncommitted child changes; children can access committed parent state. + +4. **Atomic Operations** + All changes within a transaction level are atomic. Commits and rollbacks affect only that level and its children, preserving integrity and consistency. + +### What `Txn` Does + +1. **Wraps a Parent Store**: Acts as a temporary overlay on top of an existing store, enabling nested transaction hierarchies +2. **Stores Changes in Memory**: All operations (set/delete) are kept in memory until explicitly written, allowing child transactions to maintain isolated state +3. **Supports Atomic Writes**: All in-memory changes can be committed at once or discarded, maintaining atomicity at each transaction level +4. **Enables Rollbacks**: Discarding reverts all uncommitted operations, including those from child transactions +5. **Supports Iteration**: Provides forward and reverse iteration with in-memory and base-store merging, respecting transaction hierarchy + +### How It Works + +The `Txn` system has three key internal parts that work together to support nested transactions: + +- **`Txn`**: The main transaction object exposed to users, capable of creating child transactions +- **`txn`**: A struct that holds: + - A map of pending operations, maintaining isolation between transaction levels + - A sorted list of keys for fast iteration across the transaction hierarchy +- **`op`**: Represents an individual operation (Set or Delete) that can be committed or rolled back at any transaction level ### Key Features -- **In-Memory Operations**: - - All write operations (Set/Delete) are stored in memory until explicitly written to the parent store - - Provides fast access to pending changes without disk I/O - - Enables efficient rollback of operations before they're committed - -- **Atomic Operations**: - - Write() method ensures all operations are applied atomically to the parent store - - Either all operations succeed or none are applied - - Maintains data consistency even during concurrent access - -- **Rollback Support**: - - Discard() method allows rolling back all pending operations - - Useful for testing block proposals or handling failed operations - - Cleans up memory resources associated with pending operations - -- **Efficient Iteration**: - - Maintains sorted keys for efficient iteration and merging with parent store - - Supports both forward and reverse iteration patterns - - Enables efficient range queries and prefix-based filtering - -- **Prefix-Based Iteration**: - - Supports both forward and reverse iteration with prefix filtering - - Enables efficient scanning of related data - - Useful for batch operations on related keys - -### Performance Considerations - -- **Memory Usage**: - - All operations are kept in memory until Write() or Discard() - - Memory footprint grows linearly with the number of pending operations - - Efficient memory management through buffer pooling and automatic cleanup - -- **Iteration Efficiency**: - - Uses binary search for O(log n) key lookups in sorted list - - Maintains sorted order for efficient range queries - - Optimized merging of in-memory and parent store data during iteration - -- **Write Performance**: - - Write() operation is O(n) where n is the number of pending operations - - Batch operations are more efficient than individual writes - - Memory operations are significantly faster than disk operations - -- **Read Performance**: - - Get() is O(1) for in-memory operations, falls back to parent store - - In-memory operations take precedence over parent store data - - Efficient key lookup through hash map and sorted list combination - -### Implementation Details - -#### Write Operations -- Set() and Delete() operations are stored in an in-memory map -- Keys are automatically added to a sorted list for efficient iteration -- Operations are not applied to the parent store until Write() is called -- The sorted list enables efficient binary search for key lookups - -#### Read Operations -- Get() first checks the in-memory operations, then falls back to the parent store -- Iterator() and RevIterator() merge in-memory operations with parent store data -- Deleted keys are properly handled during iteration -- Prefix-based filtering is supported for both forward and reverse iteration - -#### Memory Management -- Uses a map for O(1) operation lookups -- Maintains a sorted slice for efficient iteration -- Implements buffer pooling for memory efficiency -- Automatically manages memory for pending operations - -### Usage Example - -The Txn type is typically used in scenarios requiring atomic operations or rollback capabilities: - -1. Create a new transaction with a parent store -2. Perform multiple Set() and Delete() operations -3. Either commit the changes using Write() or rollback using Discard() -4. The transaction maintains all operations in memory until explicitly committed - -### Limitations - -- Not thread-safe (should not be used across multiple goroutines) -- Write() is not atomic when writing to another memory store -- Keys must be smaller than 128 bytes -- Nested transactions are supported but iteration becomes increasingly inefficient with depth +- **In-Memory Speed**: Fast reads/writes with no disk access, enabling efficient nested transaction operations +- **Efficient Iteration**: Maintains sorted keys for quick scanning and merging across transaction levels +- **Read Precedence**: Reads prioritize in-memory updates over the underlying store, respecting transaction hierarchy +- **Atomic Commit**: `Write()` commits all changes at once to the parent store, maintaining atomicity at each level +- **Safe Rollback**: `Discard()` wipes all uncommitted changes, including those from child transactions ## Canopy's Optimized Sparse Merkle Tree