You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently logging is expensive in production and hence we need to be careful and 'minimal' in the logs which we do generate.
The idea here is that we want to simplify the logging story using more 'standard-practice' techniques rather than just logging on stdout, and hence also free up logging budget for observability.
The eventual aim is to enable in production tracing of: 'occasional' state machines (node startup, snapshotting, reconfiguration), per-request profiling, active trace validation of ccfraft.tla; however that is all quite far away still.
For now I propose the following rough logging structure to make it easy to enable telemetry in production and simplify and improve the current state, primarily by adding log export to fluentd.
The default should be to preserve compatibility with what we have currently.
Note: log level should still be controlled by the CLI but that should be the most verbose level that we allow and per-sink levels are clamped silently to that.
That allows us to have just FAIL to STDOUT but INFO or DEBUG to fluentd.
Constraints
Performance impact should be minimal
Backpressure from fluentd cannot block the cchost process
Likely only option is to drop packets, but log to stderr that it happened (w/ debounce and aggregate count)
Log level captured at CLI
How to log before fluentd connected? Just buffer?
Rough implementation plan
Micro-msgpack library for emitting the trace messages we need
Export to fluentd
Performance overhead test with trace logging of requests, for landmark style performance metrics
Non-goals
Logging semantics
Message ordering vs timestamp
Just sort the flow by timestamp
Clock sync in a distributed deployment
We're considering adding vector clocks or lamport clocks to the raft messages to allow a total order over these that respects causality.
Other cases MUST NOT rely on cross instance timestamps
Dynamic export sinks
Private data in loglines controlled on a per-log-level basis, CLI arg takes care of this
Migration of existing json logging to msgpack
I think we want json as a midpoint which is machine and human readable
TLS between cchost and fluentd
There is support for TLS for fluentd in_forward but as its all on the same vnet (or localhost) this is unnecessary for v0
Currently logging is expensive in production and hence we need to be careful and 'minimal' in the logs which we do generate.
The idea here is that we want to simplify the logging story using more 'standard-practice' techniques rather than just logging on stdout, and hence also free up logging budget for observability.
The eventual aim is to enable in production tracing of: 'occasional' state machines (node startup, snapshotting, reconfiguration), per-request profiling, active trace validation of ccfraft.tla; however that is all quite far away still.
For now I propose the following rough logging structure to make it easy to enable telemetry in production and simplify and improve the current state, primarily by adding log export to fluentd.
flowchart LR subgraph Node["CCF node"] PROD["Producer threads<br/>LOG_*_FMT macros"] TRACE["Producer threads<br/>TRACE(<tag>, data)"] CONS["Console sink<br/>(stdout / journald)"] FLU["Fluentd sink<br/>per-thread queues<br/>+ host reader thread"] PROD --> CONS PROD --> FLU TRACE --> CONS TRACE --> FLU end FLUENTD["fluentd<br/>in_forward :24224"] MDSD["mdsd"] GENEVA["Geneva Logs"] FLU -->|msgpack Forward<br/>over TCP| FLUENTD FLUENTD --> MDSD MDSD --> GENEVAConfiguration UX
This is what the config could roughly look like:
The default should be to preserve compatibility with what we have currently.
Note: log level should still be controlled by the CLI but that should be the most verbose level that we allow and per-sink levels are clamped silently to that.
That allows us to have just FAIL to STDOUT but INFO or DEBUG to fluentd.
Constraints
Rough implementation plan
Non-goals
in_forwardbut as its all on the same vnet (or localhost) this is unnecessary for v0