Skip to content

Add AWS PrivateLink endpoint support for exposing ingress NLB#11

Open
Stan Borzhemsky (stanbsky) wants to merge 17 commits into
mainfrom
stanbsky/private-networking
Open

Add AWS PrivateLink endpoint support for exposing ingress NLB#11
Stan Borzhemsky (stanbsky) wants to merge 17 commits into
mainfrom
stanbsky/private-networking

Conversation

@stanbsky
Copy link
Copy Markdown

This PR adds an ability to manually deploy PrivateLink endpoints and endpoint services linked to a private hosted zone DNS, and lock the NLB against changes via IAM deny rules. While the work of properly productionlsing support for private networking within Valet is ongoing, this change gives us a fallback option in case deadlines slip aand we need to unblock our customers.

The process works broadly as follows:

  1. The account is bootstrapped with this tool as before.
  2. We deploy a Big Peer in the account - we need NLBs up and ready before anything.
  3. We use dittocloud private-networking endpoint-service to deploy the endpoint service in the BYOC account. Upon completion, we get a TXT record name and value that will get passed by the customer to Ditto.
  4. Ditto creates the TXT record under ${txtRecordName}.${valetOrgName}.dittolive.app
  5. After the domain ownership is verified, the customer is able to run dittocloud private-networking endpoint in their own account to provision the endpoint for access to the NLB. As part of that process, a private hosted DNS zone is created so that we resolve ${bpName}.${valetOrgName}.dittolive.app to the ugly AWS DNS name of the endpoint, e.g. vpce-049c1c41acc25e931-zudxmys9.vpce-svc-0ca36e905e6fe650d.us-east-2.vpce.amazonaws.com. Because this private DNS name name matches the common name on the cert, we're able to establish a TLS connection.
  6. The customer can now (or also could after step 3) lock their NLB from further changes by Valet by running dittocloud private-networking lock-nlb. Similarly, if maintenance needs to be performed, they would first need to unlock it in a similar way, then assert whether we need to recreate or modify the endpoint service.

Partial validation was performed:

  1. Deployed a Valet cluster into the SMBP account.
  2. Created an endpoint service in that account, and an endpoint in my scratchpad. Given I have no access to the DNS for dittolive.app, I used my own domain stanbsky.xyz
  3. Validated that I can access the ingress and see the expected CN - even if different from the DNS name.
ubuntu@ip-172-31-26-87:~$ curl -v https://private-nlb.stanbsky.xyz
* Host private-nlb.stanbsky.xyz:443 was resolved.
* IPv6: (none)
* IPv4: 172.31.23.68, 172.31.35.20, 172.31.5.65
*   Trying 172.31.23.68:443...
* Connected to private-nlb.stanbsky.xyz (172.31.23.68) port 443
* ALPN: curl offers h2,http/1.1
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
*  CAfile: /etc/ssl/certs/ca-certificates.crt
*  CApath: /etc/ssl/certs
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384 / X25519 / RSASSA-PSS
* ALPN: server accepted h2
* Server certificate:
*  subject: CN=dlr-private-network.ditto-ps.dittolive.app
*  start date: Apr 21 01:53:52 2026 GMT
*  expire date: Jul 20 01:53:51 2026 GMT
*  subjectAltName does not match private-nlb.stanbsky.xyz
* SSL: no alternative certificate subject name matches target host name 'private-nlb.stanbsky.xyz'
* Closing connection
* TLSv1.3 (OUT), TLS alert, close notify (256):
curl: (60) SSL: no alternative certificate subject name matches target host name 'private-nlb.stanbsky.xyz'
More details here: https://curl.se/docs/sslcerts.html

curl failed to verify the legitimacy of the server and therefore could not
establish a secure connection to it. To learn more about this situation and
how to fix it, please visit the web page mentioned above.

Final validation would entail working with platform/SRE to redo this process properly by validating the dittolive.app domain via TXT. Then, we'll want to ensure we can indeed connect by a small peer running within that AWS account w/o access to the public web. We should also have Valet poke and prod the cluster's NLB - we want to ensure the proctection works as expected.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a “private-networking” feature set to the dittocloud CLI to support a temporary AWS PrivateLink-based ingress exposure path for Big Peer deployments, including endpoint service creation, customer endpoint creation, and an IAM-based NLB “lock” mechanism.

Changes:

  • Added new Terraform modules for VPC Endpoint Service, customer VPC Endpoint (+ SG), and NLB protection via IAM deny policies.
  • Added new CLI commands: private-networking endpoint-service, endpoint, lock-nlb, unlock-nlb.
  • Exported Terraform installer helper (GetTerraform) and wired the new command group into the root CLI; updated docs.

Reviewed changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 17 comments.

Show a summary per file
File Description
terraform/aws/private_networking/vpc_endpoint_service/variables.tf Inputs for endpoint service module (NLB discovery + allowed principals + DNS).
terraform/aws/private_networking/vpc_endpoint_service/providers.tf AWS provider config for endpoint service module.
terraform/aws/private_networking/vpc_endpoint_service/outputs.tf Outputs service + NLB + DNS verification details.
terraform/aws/private_networking/vpc_endpoint_service/main.tf Creates the VPC Endpoint Service bound to the ingress NLB.
terraform/aws/private_networking/vpc_endpoint/variables.tf Inputs for customer endpoint module (service name/VPC/subnets/DNS).
terraform/aws/private_networking/vpc_endpoint/security_group.tf Security group for the customer interface endpoint ENIs.
terraform/aws/private_networking/vpc_endpoint/providers.tf AWS provider config for endpoint module.
terraform/aws/private_networking/vpc_endpoint/outputs.tf Outputs endpoint/SG/DNS details.
terraform/aws/private_networking/vpc_endpoint/main.tf Creates the interface VPC endpoint to the service.
terraform/aws/private_networking/nlb_protection/variables.tf Inputs for NLB protection (Big Peer + CAPA role names).
terraform/aws/private_networking/nlb_protection/providers.tf AWS provider config for NLB protection module.
terraform/aws/private_networking/nlb_protection/outputs.tf Outputs NLB and protection policy summary.
terraform/aws/private_networking/nlb_protection/main.tf Attaches deny inline policies to CAPA roles to “lock” the NLB.
cmd/internal/privatenetworking/private_networking.go Implements private-networking parent + endpoint-service + endpoint commands and helper funcs.
cmd/internal/privatenetworking/nlb_lock.go Implements lock-nlb / unlock-nlb commands.
cmd/internal/bootstrap/install_test.go Updates tests to use exported GetTerraform.
cmd/internal/bootstrap/install.go Exports GetTerraform for reuse outside bootstrap package.
cmd/internal/bootstrap/bootstrap.go Switches bootstrap to call GetTerraform.
cmd/dittocloud/main.go Registers private-networking in the root command.
README.md Documents PrivateLink workflow and state files.
AGENTS.md Documents new command/module structure and private networking feature.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread terraform/aws/private_networking/vpc_endpoint_service/variables.tf Outdated
Comment thread cmd/internal/privatenetworking/private_networking.go
Comment thread cmd/internal/privatenetworking/nlb_lock.go Outdated
Comment thread cmd/internal/privatenetworking/nlb_lock.go Outdated
Comment thread cmd/internal/privatenetworking/private_networking.go Outdated
Comment thread cmd/internal/privatenetworking/nlb_lock.go Outdated
Comment thread cmd/internal/privatenetworking/nlb_lock.go Outdated
Comment thread cmd/internal/privatenetworking/private_networking.go Outdated
Comment thread cmd/internal/privatenetworking/private_networking.go Outdated
Comment thread cmd/internal/privatenetworking/private_networking.go Outdated
@stanbsky Stan Borzhemsky (stanbsky) marked this pull request as draft April 29, 2026 17:15
@stanbsky Stan Borzhemsky (stanbsky) marked this pull request as ready for review May 10, 2026 19:35
…lation

- Prevent state file wipe on ReadFile failure: defer blocks in
  EndpointServiceCmd and EndpointCmd now return early if the state
  file cannot be read, avoiding a WriteFile call with nil/empty data
  that would silently destroy the caller's terraform state

- Make terraform path resolution injectable (terraformPathFinder) so
  endpoint tests no longer invoke real terraform discovery; previously
  all TestEndpointCmd subtests failed because GetTerraform hit the
  network before the mock executor was ever reached

- Move --no-color to PersistentFlags on PrivateNetworkingCmd so it is
  automatically inherited by all subcommands, removing the per-command
  redeclaration that would silently break color suppression on any
  future subcommand that forgot to declare the flag
… runTerraformLifecycle

The ~150 lines of identical tmpdir setup, state file management, and
terraform init/plan/apply/destroy logic that existed in both
EndpointServiceCmd and EndpointCmd are now consolidated into a single
runTerraformLifecycle function parameterised by lifecycleConfig.

Command-specific behaviour (module path, destroy messages, output
display) is supplied via the config struct. Each RunE is now ~40 lines
covering only flag validation and var building before delegating to the
shared function. Also collapsed the duplicated showDetailedPlan branch
within each command into a single plan call preceded by conditional
stdout routing.
…boilerplate helpers

toPlanOptions, toApplyOptions, and toDestroyOptions were trivial
one-trick functions whose only tests verified slice length rather than
value identity. Inlined the conversions at the three call sites in
runTerraformLifecycle and deleted the functions and their tests.
…hten security group

- allowed_principal (string) → allowed_principals (list) in Terraform and CLI
- --allowed-principals flag accepts a comma-separated list of AWS account IDs or ARNs
- Security group ingress restricted to TCP 443 (was all protocols)
- acceptance_required = false comment clarifies the principal-scoped security model
- formatCommaSeparated shared helper replaces duplicated logic in formatSubnetIDs and formatPrincipals
- Add test covering two-account principal scenario
… and test destroy paths

- --yes bypasses interactive confirmation for both destroy and apply, enabling automation/CI use
- Destroy tests added for endpoint-service and endpoint commands using --yes to avoid blocking on os.Stdin
- Removes the "omitted" comments that acknowledged the coverage gap
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants