Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,7 @@ The close time is a UNIX timestamp indicating when the ledger closes. Its accura

### Upgrades

How the network adjusts overall values (like the base fee) and agrees to network-wide changes (like switching to a new protocol version). This field is usually empty. When there is a network-wide upgrade, the SDF will inform and help coordinate participants using the #validators channel on the Dev Discord and the Stellar Validators Google Group.
How the network adjusts overall values (like the base fee) and agrees to network-wide changes (like switching to a new protocol version). This field is usually empty. When there is a network-wide upgrade, the SDF will inform and help coordinate participants using the #validator channel on the Dev Discord and the Stellar Validators Google Group.

### Transaction set result hash

Expand Down
6 changes: 6 additions & 0 deletions docs/validators/README.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,12 @@ This section of the docs explains how to run a validator node, which participate

If you are interested in running a validator node — because you issue an asset that you would like to help secure through transaction validation, because you want to help increase network health and decentralization, or because you want to participate in network governance — then this section of the docs is for you. It explains the technical and operational aspects of installing, configuring, and maintaining a Stellar Core validator node, and should help you figure out the best way to set up your Stellar integration.

:::tip Interested in Tier 1?

If your organization is evaluating or planning to join the Tier 1 quorum — the core group of organizations whose validators bear the safety and liveness of the network — see [Tier 1 Organizations](/docs/validators/tier-1-orgs) for requirements, estimated costs, and a step-by-step onboarding path. The Admin Guide below covers the technical setup for a single validator; the Tier 1 page covers what it takes to run three and join the quorum.

:::

## Node Setup Process

The basic flow, which you can navigate through using the "Admin Guide" on the left, goes (roughly) like this:
Expand Down
8 changes: 7 additions & 1 deletion docs/validators/admin-guide/configuring.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ stellar-core --conf betterfile.cfg <COMMAND>

When installing using official Debian packages systemd unit file is configured to use `/etc/stellar/stellar-core.cfg` file. The examples in these docs don't specify `--conf betterfile.cfg` for the sake of brevity.

This page will walk you through the key fields you'll need to include in your config file to get your node up and runninig.
This page will walk you through the key fields you'll need to include in your config file to get your node up and running.

:::info

Expand Down Expand Up @@ -94,6 +94,12 @@ HOME_DOMAIN=<your domain name here, same as NODE_HOME_DOMAIN>
QUALITY="MEDIUM"
```

:::note Tier 1 quality rating

The example above uses `QUALITY="MEDIUM"`, which is appropriate for a new validator building a track record. If your organization is a [Tier 1 participant](/docs/validators/tier-1-orgs) (or aspiring to become one), you should declare your own organization as `QUALITY="HIGH"`. Note that `HIGH` quality requires [publishing a history archive](/docs/validators/admin-guide/publishing-history-archives) — the requirement is programmatically enforced. Declaring a lower quality level significantly reduces your weight in [leader election](#impact-of-validator-quality-on-nomination) and may limit your participation in consensus.

:::

If you don't include a `NODE_SEED` or set `NODE_IS_VALIDATOR=true`, your node will still watch SCP and see all the data in the network, but it will not send validation messages.

If you run multiple validators, make sure to set a common `HOME_DOMAIN` for them by setting the `NODE_HOME_DOMAIN` property to the same value. This will ensure your nodes get grouped correctly during [quorum set generation](#home-domains-array). You also need to include your other nodes in in your config file's [`VALIDATORS` array](#validators-array).
Expand Down
2 changes: 1 addition & 1 deletion docs/validators/admin-guide/maintenance.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ We recommend performing the following steps in order (repeat sequentially as nee

## Special Considerations During Quorum Set Updates

When you join the ranks of node operators, it's also important to join the conversation. The best way to do that: follow the`#validators` channel on the [Stellar Developer Discord](https://discord.gg/stellardev). If you can't do that for some reason, sign up for the [Stellar Validators Google Group](https://groups.google.com/forum/#!forum/stellar-validators).
When you join the ranks of node operators, it's also important to join the conversation. The best way to do that: follow the `#validator` channel on the [Stellar Developer Discord](https://discord.gg/stellardev). If you can't do that for some reason, sign up for the [Stellar Validators Google Group](https://groups.google.com/forum/#!forum/stellar-validators).

Sometimes an organization needs to make changes that will impact the quorum sets of others:

Expand Down
24 changes: 21 additions & 3 deletions docs/validators/admin-guide/monitoring.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,20 @@ Some notable fields from this `info` endpoint are:
- `state`: the node's synchronization status relative to the network
- `quorum`: summary of the state of the SCP protocol participants, which is the same information returned by the `quorum` command ([see below](#quorum-health)).

### Quick-Reference Health Indicators

The table below summarizes the most important fields to check at a glance. For full details on each, see the explanations above and the [quorum health](#quorum-health) section below.

| Field | Where | Healthy | Investigate |
| --- | --- | --- | --- |
| `state` | `info` | `Synced!` | Any other value — node is not participating in consensus |
| `ledger.age` | `info` | < 10 seconds | > 10 seconds — node may be falling behind |
| `peers.authenticated_count` | `info` | ≥ 8 | < 3 — limited connectivity, may miss messages |
| `quorum.qset.phase` | `info` | `EXTERNALIZE` | Other phases for extended periods — consensus stalled |
| `quorum.qset.fail_at` | `info` / `quorum` | ≥ 2 | ≤ 1 — one more failure will halt your node |
| `quorum.qset.missing` | `info` / `quorum` | None or few | Multiple nodes — check if quorum peers are down |
| `quorum.transitive.intersection` | `info` / `quorum` | `true` | `false` — **critical**, network at risk of splitting |

## Peer Information

The `peers` command returns information on the peers your node is connected to.
Expand Down Expand Up @@ -663,7 +677,11 @@ You may find the below exporters useful for monitoring your infrastructure as th

Once you've configured Prometheus to scrape and store your stellar-core metrics, you will want a nice way to render this data for human consumption. Grafana offers the simplest and most effective way to achieve this. Installing Grafana is out of scope of this document but is a very simple process, especially when using the [prebuilt apt packages](https://grafana.com/docs/installation/debian/#apt-repository)

We recommend that administrators import the following two dashboards into their grafana deployments:
We recommend that administrators import the following two dashboards into their Grafana deployments:

| Dashboard | Grafana ID | Purpose |
| --- | --- | --- |
| [Stellar Core Monitoring](https://grafana.com/grafana/dashboards/10603) | `10603` | Key metrics, node status, and common problems. Start here for troubleshooting. |
| [Stellar Core Full](https://grafana.com/grafana/dashboards/10334) | `10334` | All metrics from the exporter. For in-depth analysis. |

- [**Stellar Core Monitoring**](https://grafana.com/grafana/dashboards/10603) - shows the most important metrics, node status and tries to surface common problems. It's a good troubleshooting starting point
- [**Stellar Core Full**](https://grafana.com/grafana/dashboards/10334) - shows a simple health summary as well as all metrics exposed by the `stellar-core-prometheus-exporter`. It's much more detailed than the _Stellar Core Monitoring_ and might be useful during in-depth troubleshooting
To import: in Grafana, go to **Dashboards → Import**, enter the dashboard ID, and select your Prometheus data source.
76 changes: 67 additions & 9 deletions docs/validators/admin-guide/prerequisites.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -13,15 +13,25 @@ CPU, RAM, Disk and network depends on network activity. If you decide to colloca

:::

At the time of writing (April 2024), Stellar Core with PostgreSQL running on the same machine worked well on a [c5d.2xlarge](https://aws.amazon.com/ec2/instance-types/c5/) instance in AWS (8x Intel Xeon vCPUs at 3.4 GHz; 16 GB RAM; 100 GB NVMe SSD (10,000 iops)).
:::tip For Tier 1 organizations

[Tier 1 organizations](/docs/validators/tier-1-orgs) run three geographically dispersed Full Validators, each meeting these requirements independently. Each node needs its own hardware, its own unique validator key, and its own history archive. Plan for three times the resources listed below, spread across different data centers or cloud regions. See [Tier 1 Organizations](/docs/validators/tier-1-orgs) for the full requirements and onboarding path.

:::

Stellar Core is designed to run on relatively modest hardware so that a whole range of individuals and organizations can participate in the network, and basic nodes should be able to function pretty well without tremendous overhead. That said, the more you ask of your node, the greater the requirements.

The following recommendations were verified against production nodes in April 2024. Hardware requirements grow with network activity; check the [stellar-core releases](https://github.com/stellar/stellar-core/releases) for any notes on updated requirements.

| Node Type | CPU | RAM | Disk | AWS SKU | Google Cloud SKU |
| --- | --- | --- | --- | --- | --- |
| Core Validator Node | 8x Intel Xeon @ 3.4 GHz | 16 GB | 100 GB NVMe SSD | [c5d.2xlarge] | [n4-highcpu-8] |
| Core Validator Node | 8 vCPUs @ 3.4 GHz | 16 GB | 100 GB NVMe SSD\* (10,000 IOPS) | [c5d.2xlarge] | [n4-highcpu-8] |

PostgreSQL co-located on the same machine performs well at this spec — a separate database host is not required for a single validator.

_\* Assuming a 30-day retention window for data storage._
_\* Disk sizing assumes a 30-day retention window (`AUTOMATIC_MAINTENANCE_COUNT` at default). See [Storage](#storage) below for details._

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AUTOMATIC_MAINTENANCE_COUNT has been deprecated, and no longer does anything. We should remove mentions of it.


<!-- Last verified: April 2024. If you're an SDF maintainer updating this, bump the date in the paragraph above. -->

## Stellar Network Access

Expand Down Expand Up @@ -60,13 +70,37 @@ If you need to expose this endpoint to other hosts in your local network, we str

## Storage

Most storage needs come from stellar-core's database backend, which needs to store the entire ledger state. For the most part, the contents of both the database and related directories (such as `buckets`) can be ignored, since they are completely managed by Stellar Core. In terms of storage space, 100 GB is enough (as of April 2024).
Stellar Core's local storage needs come from two sources: the **buckets directory** (which serves as the primary database backend under BucketListDB, the default since stellar-core 21.0) and a much smaller **SQL database** for metadata. Both are managed entirely by Stellar Core. Local disk usage stays bounded over time — see [Why local disk stays bounded](#why-local-disk-stays-bounded) below.

### How storage breaks down

Approximate sizes for a current default-config validator on Mainnet. These figures should be treated as planning estimates rather than precise measurements.

| Component | Approximate size | Notes |
| --- | --- | --- |
| Buckets directory (BucketListDB) | 20–40 GB | Primary store for live ledger state since stellar-core 21.0. |
| SQL database | A few GB | Post-BucketListDB, used only for non-ledger metadata, transaction history within the retention window, and some DEX queries. Most ledger state tables are dropped at migration. |
| WAL logs, temp files | 5–15 GB | PostgreSQL write-ahead logs and temporary space during maintenance operations. SQLite users will see lower numbers. |

A working set of roughly 30–60 GB is typical. The 100 GB local NVMe included with the recommended `c5d.2xlarge` (and comparable on Hetzner, OVH, Contabo, and others) leaves comfortable operational headroom on top of that — room for debug captures, re-syncs, and unforeseen operational needs.

### Why local disk stays bounded

A common misconception is that validators need to provision storage proportional to network history. They do not.

**Soroban (smart contract) state is bounded by [state archival](https://developers.stellar.org/docs/learn/encyclopedia/storage/state-archival).** Contract data and contract code entries carry a rent balance; when a Temporary entry's balance reaches zero, it is deleted from live state. Temporary entries are cheaper to create than Persistent entries and dominate total contract data volume, so as they expire and are removed, a significant amount of state is freed up — keeping the live state on disk from growing unboundedly.

**Classic ledger entries — accounts, trustlines, offers, claimable balances, liquidity pool shares, and data entries — do not expire.** They persist on the live state indefinitely. They grow slowly, though, because [reserve requirements](/docs/learn/fundamentals/stellar-data-structures/accounts#base-reserves-and-subentries) act as anti-spam friction on creation. In practice, the resulting working set still lands in the 30–60 GB range cited above, so local validator state stays compact even as cumulative network history grows.

**History archives live on object storage, not on the validator.** Full validators publish history archives to a separate object store (S3, R2, Backblaze B2, etc.) — that's where the multi-TB archive data lives. The validator process itself doesn't hold the archive on its local disk. See [Publishing History Archives](/docs/validators/admin-guide/publishing-history-archives) for the recommended setup.

**`CATCHUP_COMPLETE=true` is almost never the right choice.** This setting makes the node sync the entire ledger from genesis on startup and is rarely appropriate for a validator. The standard pattern for new validators — including new Tier 1 candidates — is to sync against current network state, publish a history archive forward from that point, and use `stellar-archivist mirror` to backfill historical data into the published archive as a separate operation. The validator's local disk requirements are determined by the live state model above, not by historical depth.

### Database

The database is consulted during consensus, and modified atomically when a transaction set is applied to the ledger. It's random access, fine-grained, and fast.
Even with BucketListDB as the primary store, Stellar Core still requires a SQL database — either SQLite or PostgreSQL (recommended for production) — for metadata and transaction history.

As of August 2024, Stellar Core officially switched to BucketListDB as its primary database backend. Note that BucketListDB still requires SQL database, either SQLite or Postgres (recommended).
The SQL database is consulted during consensus and modified atomically when a transaction set is applied to the ledger. Access patterns are random, fine-grained, and fast.

If you're using PostgreSQL, we recommend you configure your local database to be accessed over a Unix domain socket, as well as updating the below PostgreSQL configuration parameters:

Expand All @@ -80,13 +114,37 @@ If you're using PostgreSQL, we recommend you configure your local database to be

### Buckets

Stellar-core stores ledger state in the form of flat XDR files called "buckets." The bucket files are used for hashing and transmission of ledger differences to history archives. If BucketListDB is used, the `buckets` directory is also used as a primary database backend. If SQL is used, the `buckets` directory is only used for hashing and history archives, and simply represents a copy of the ledger state stored in SQL.
Stellar Core stores ledger state in the form of flat XDR files called "buckets." These files are used for hashing and transmission of ledger differences to history archives. Under BucketListDB (the default since stellar-core 21.0), the `buckets` directory also serves as the primary database backend — making it the largest single component of validator storage.

Buckets should be stored on a fast, local disk with sufficient space for several times the current ledger size. NVMe SSDs with 10,000+ IOPS are recommended for production validators. Network-attached or remote storage is not recommended; latency on the buckets path directly affects consensus performance.

<!--
Maintenance note for SDF docs maintainers: the size figures in the
breakdown table above are estimates and should be re-verified
periodically. The most reliable way to get current numbers is to ask
one or more current Tier 1 operators (#validator on the Stellar Dev
Discord) to share du -sh output of the buckets directory, the SQL
database (PostgreSQL: pg_database_size, SQLite: file size), and the
full stellar-core data directory.

Buckets should be stored on a fast, local disk with sufficient space to store several times the size of the current ledger. Current ledger size is approximately 10 GB (as of April 2024), so please plan accordingly.
Last estimated: May 2026. Reflects post-BucketListDB defaults
(stellar-core 21.0+) and the state archival design described in
CAP-0046-12 / CAP-0057.

NOTE: As of the original state archival announcement, the user-facing
rent interface was live on mainnet but archived entries were not yet
being deleted from validators. If full archival eviction has not yet
shipped at the time this page is updated, the "Why local disk stays
bounded" section may need to soften "are deleted from the validator"
to "will be deleted from the validator under the state archival
protocol."
-->

## Kubernetes considerations

We currently do not recommend running validator nodes in Kubernetes. If you choose to do so, consider the following:
We currently do not recommend running validator nodes in Kubernetes. Standard VM-based deployments (bare metal or cloud instances) are the well-tested path for production validators.

If you choose to use Kubernetes regardless, consider the following:

- Sensitive data such as node seeds will be stored in Kubernetes etcd. Consider consuming credentials using tools like [vault agent](https://developer.hashicorp.com/vault/docs/platform/k8s/injector) or the [AWS Secrets Store CSI driver](https://github.com/aws/secrets-store-csi-driver-provider-aws) to improve security
- Consider how external traffic will reach the pods. Tier 1 nodes need public DNS names and necessary ports must be accessible from the internet
Expand Down
6 changes: 6 additions & 0 deletions docs/validators/admin-guide/publishing-history-archives.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,12 @@ sidebar_position: 50

If you want to run a [Full Validator](../README.mdx#full-validator), you need to set up your node to publish a history archive. You can host an archive using a blob store such as Amazon's S3 or Digital Ocean's spaces, or you can simply serve a local archive directly via an HTTP server such as Nginx or Apache. If you're setting up a [Basic Validator](../README.mdx#basic-validator), you can skip this section. No matter what kind of node you're planning to run, make sure to set it up to `get` history, which is covered in [Environment Preparation](./environment-preparation.mdx).

:::caution One archive per node

Each node must publish to its own dedicated archive. Writing to the same archive from multiple nodes is not supported and will result in undefined behavior, potentially including data loss. If you're running multiple Full Validators (as [Tier 1 organizations](/docs/validators/tier-1-orgs) do), configure a separate archive for each — for example, `history-a.example.com`, `history-b.example.com`, and `history-c.example.com`.

:::

## Caching and History Archives

The _primary_ cost of running a validator will very likely be egress bandwidth. A crucial part of your strategy to manage those costs should be caching. You can significantly reduce these data transfer costs by using common caching techniques or a CDN. Three simple rules apply to caching the History archives:
Expand Down
Loading
Loading