Skip to content

Commit de026c0

Browse files
dbshah12claude
andcommitted
Drop laddr tag from tcp_stats; keep raddr for per-host breakdown
laddr is always the engine's own IP — constant, adds no diagnostic value as a tag. Dropping it reduces row size (~15 bytes/row) and removes a tag that was arbitrary in LocalTCPStatsCollector's default mode anyway. raddr is kept so callers can split tcp_stats by remote host (e.g. NFS throughput per VDB host, as Craig confirmed is needed for PerfDB). Aggregation key changes from (laddr, raddr, service) → (raddr, service). telegraf.base updated to remove laddr from csv_column_names and csv_tag_columns. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent 9a5175a commit de026c0

2 files changed

Lines changed: 19 additions & 18 deletions

File tree

telegraf/connstat-stats.sh

Lines changed: 12 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,18 @@
11
#!/usr/bin/env python3
22
#
3-
# Collect per-connection TCP stats from connstat and aggregate by remote
4-
# endpoint (laddr:raddr:service) to bound cardinality on engines with many
5-
# connections — e.g. Oracle dNFS (hundreds of connections per VDB host) or
6-
# Elastic Data (many connections per object storage endpoint IP).
7-
# Mirrors the aggregation done by LocalTCPStatsCollector in the mgmt stack.
3+
# Collect per-connection TCP stats from connstat and aggregate by
4+
# (raddr, service) to mirror the behaviour of LocalTCPStatsCollector.
5+
#
6+
# laddr (local address) is intentionally excluded: it is always the engine's
7+
# own IP and adds no diagnostic value as a tag. raddr is kept so that callers
8+
# can split stats by remote host — e.g. to see NFS throughput per VDB host.
89
#
910
# Service name lookup reads from /etc/services, matching LocalTCPStatsCollector
1011
# exactly. lport is checked before rport so that listening services (where the
1112
# engine is the server) are identified correctly. Falls back to "unknown".
1213
#
13-
# Output fields per aggregated endpoint:
14-
# laddr, raddr, service
14+
# Output fields per aggregated (raddr, service) group:
15+
# raddr, service
1516
# inbytes, outbytes, retranssegs, suna, unsent (summed across connections)
1617
# swnd, cwnd, rwnd, rtt (averaged across connections)
1718
# connections (count of aggregated conns)
@@ -69,9 +70,9 @@ for raw in proc.stdout:
6970
line = raw.rstrip('\n')
7071
if line.startswith('='):
7172
for key, n in cnt.items():
72-
la, ra, sv = key
73+
ra, sv = key
7374
sys.stdout.write(
74-
f"{la},{ra},{sv},"
75+
f"{ra},{sv},"
7576
f"{inb[key]},{outb[key]},{ret_[key]},{sun[key]},{uns[key]},"
7677
f"{sw[key]//n},{cw[key]//n},{rw[key]//n},{rt[key]//n},{n}\n"
7778
)
@@ -83,7 +84,7 @@ for raw in proc.stdout:
8384
fields = line.split(',')
8485
if len(fields) != 13:
8586
continue
86-
la, lp, ra, rp = fields[0], fields[1], fields[2], fields[3]
87+
lp, ra, rp = fields[1], fields[2], fields[3]
8788
lp_i = int(lp) if lp.isdigit() else 0
8889
rp_i = int(rp) if rp.isdigit() else 0
8990
if lp_i in svc:
@@ -93,7 +94,7 @@ for raw in proc.stdout:
9394
else:
9495
sv = 'unknown'
9596

96-
key = (la, ra, sv)
97+
key = (ra, sv)
9798
cnt[key] = cnt.get(key, 0) + 1
9899
inb[key] = inb.get(key, 0) + int(fields[4])
99100
outb[key] = outb.get(key, 0) + int(fields[5])

telegraf/telegraf.base

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -59,10 +59,10 @@
5959
fieldpass = ["tcp*","bytes*","packets*","err*","drop*"]
6060

6161
# Per-endpoint TCP stats (bytes, RTT, window sizes) via connstat.
62-
# Aggregated by remote endpoint (laddr:raddr:rport) to mirror the aggregation
63-
# in LocalTCPStatsCollector — avoids cardinality explosion on Oracle dNFS
64-
# engines (hundreds of connections per VDB host) and Elastic Data engines
65-
# (many connections per object storage endpoint IP).
62+
# Aggregated by (raddr, service): laddr (always the engine's own IP) is dropped
63+
# as it adds no diagnostic value. raddr is kept so stats can be split by remote
64+
# host — e.g. NFS throughput per VDB host. Matches LocalTCPStatsCollector's
65+
# default aggregation behaviour.
6666
# Cumulative fields (inbytes, outbytes, etc.) are summed; window/RTT fields
6767
# are averaged; connections = number of TCP connections aggregated.
6868
[[inputs.execd]]
@@ -73,9 +73,9 @@
7373
data_format = "csv"
7474
csv_delimiter = ","
7575
csv_trim_space = true
76-
csv_column_names = ["laddr", "raddr", "service", "inbytes", "outbytes", "retranssegs", "suna", "unsent", "swnd", "cwnd", "rwnd", "rtt", "connections"]
77-
csv_column_types = ["string", "string", "string", "int", "int", "int", "int", "int", "int", "int", "int", "int", "int"]
78-
csv_tag_columns = ["laddr", "raddr", "service"]
76+
csv_column_names = ["raddr", "service", "inbytes", "outbytes", "retranssegs", "suna", "unsent", "swnd", "cwnd", "rwnd", "rtt", "connections"]
77+
csv_column_types = ["string", "string", "int", "int", "int", "int", "int", "int", "int", "int", "int", "int"]
78+
csv_tag_columns = ["raddr", "service"]
7979

8080
# Track CPU and Memory for the "delphix-mgmt" service (and children).
8181
[[inputs.procstat]]

0 commit comments

Comments
 (0)