I managed to get some of the execution metrics to show up in Grafana, however, more than half are missing. I'm using Spark 3.5.2 on Kubernetes and these are the relevant parts of the config:
# spark-dashboard
spark.metrics.conf.*.sink.graphite.class=org.apache.spark.metrics.sink.GraphiteSink
spark.metrics.conf.*.sink.graphite.host=spark-dashboard.spark-dashboard.svc.cluster.local
spark.metrics.conf.*.sink.graphite.port=2003
spark.metrics.conf.*.sink.graphite.period=10
spark.metrics.conf.*.sink.graphite.unit=seconds
spark.metrics.conf.*.sink.graphite.prefix=lucatest
# Enable JVM metrics collection
spark.metrics.conf.*.source.jvm.class=org.apache.spark.metrics.source.JvmSource
spark.metrics.staticSources.enabled true
spark.metrics.appStatusSource.enabled true
spark.executor.processTreeMetrics.enabled true
spark.jars.packages=ch.cern.sparkmeasure:spark-measure_2.12:0.27,ch.cern.sparkmeasure:spark-plugins_2.12:0.4
spark.plugins=ch.cern.HDFSMetrics,ch.cern.CgroupMetrics,ch.cern.CloudFSMetrics
spark.cernSparkPlugin.cloudFsName s3a
The driver is running on the machine I'm executing spark-submit from.
As you can see in the screenshot, some data is being reported incorrectly, while other data is simply missing. The "extended" dashboard is almost completely empty.
The workload I'm running is reading the TPCDS store_sales table at scale factor 1000, and saving it using Iceberg. In the spark dashboard I can see the data being read and written, including the shuffle stage.
As far as I remember, sparkmeasure gives correct numbers at the end of the job's run.
I managed to get some of the execution metrics to show up in Grafana, however, more than half are missing. I'm using Spark 3.5.2 on Kubernetes and these are the relevant parts of the config:
The driver is running on the machine I'm executing spark-submit from.
As you can see in the screenshot, some data is being reported incorrectly, while other data is simply missing. The "extended" dashboard is almost completely empty.
The workload I'm running is reading the TPCDS store_sales table at scale factor 1000, and saving it using Iceberg. In the spark dashboard I can see the data being read and written, including the shuffle stage.
As far as I remember, sparkmeasure gives correct numbers at the end of the job's run.