You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Add support for jmx prometheus exporter
- Write unit test for jmxexport function
- Add documentation
- Document how to run unit test
- Add integration tests over k8s
* Avoid importing implicits
* Add support for e2e run in PR
* Improve Dockerfile and provide on-liner to run e2e tests
---------
Co-authored-by: Luca Canali <luca.canali@cern.ch>
## Exporting sparkMeasure Metrics to Prometheus via JMX
2
+
3
+
`sparkMeasure` collects execution metrics from Spark jobs at the driver or executor level. While it does not expose its metrics directly via JMX, it can be used alongside Spark's JMX metrics system to enable Prometheus-based monitoring.
4
+
5
+
In a Kubernetes environment using the **Spark Operator**, you can configure the Spark driver and executor to expose their sparkMeasure metrics through JMX Prometheus exporter and scrape them with Prometheus.
6
+
7
+
> ✅ This setup has been validated **only** in **Kubernetes environments using the [Spark Operator](https://www.kubeflow.org/docs/components/spark-operator)**.
8
+
9
+
### Enable the JMX prometheus exporter in Spark
10
+
11
+
To configure JMX and Prometheus exporter monitoring with Spark on Kubernetes, follow the official Kubeflow Spark Operator documentation:
12
+
13
+
📖 [Monitoring with JMX and Prometheus — Kubeflow Spark Operator Guide](https://www.kubeflow.org/docs/components/spark-operator/user-guide/monitoring-with-jmx-and-prometheus/)
14
+
15
+
### Exporting `sparkMeasure` Metrics via JMX in Python
16
+
17
+
To programmatically export `sparkMeasure` metrics in Python alongside standard JMX metrics, you can leverage the `jmxexport` function from the `sparkmeasure.jmx` module. This enables custom metrics collected during job execution to be exposed through the same Prometheus exporter as native Spark metrics.
The `jmxexport()` call updates the current Spark application’s JMX metrics with the `sparkMeasure` results, making them available to any configured Prometheus instance.
38
+
39
+
See a full implementation example here:
40
+
📄 [How to use the JMX exporter in Python code](../e2e/rootfs/opt/spark/examples/spark-sql.py)
41
+
42
+
---
43
+
44
+
### Prometheus Exporter Configuration
45
+
46
+
In addition to exposing metrics via JMX, you must configure the Prometheus JMX exporter in the Spark driver and executor pods to make the custom `sparkMeasure` metrics queryable by Prometheus. This configuration should be added *on top of* the existing JMX metrics exporter configuration.
47
+
48
+
Ensure your Spark pod manifest or Helm chart includes a properly configured `ConfigMap` for the JMX exporter. Specifically, you’ll need to add mappings for the custom `sparkMeasure` metrics to the YAML under the `rules` section used by the Prometheus JMX exporter.
49
+
50
+
A production-ready configuration example is available here:
51
+
📄 [How to configure the Prometheus exporter to expose sparkMeasure metrics](../e2e/charts/spark-demo/templates/jmx-configmap.yaml)
52
+
53
+
---
54
+
55
+
By combining Python-based metric collection with a Prometheus-compatible JMX exporter, you can ensure comprehensive observability for Spark applications, including custom performance instrumentation through `sparkMeasure`.
56
+
57
+
> **Security Tip:** In production environments, ensure that JMX ports are protected using appropriate Kubernetes NetworkPolicies or service mesh configurations. Avoid exposing unauthenticated JMX endpoints externally to mitigate the risk of unauthorized access.
### Collecting metrics at finer granularity: use Task metrics
85
85
86
86
Collecting Spark task metrics at the granularity of each task completion has additional overhead
87
-
compare to collecting at the stage completion level, therefore this option should only be used if you need data with
87
+
compare to collecting at the stage completion level, therefore this option should only be used if you need data with
88
88
this finer granularity, for example because you want to study skew effects, otherwise consider using
89
89
stagemetrics aggregation as preferred choice.
90
90
@@ -98,7 +98,7 @@ stagemetrics aggregation as preferred choice.
98
98
taskmetrics.end()
99
99
taskmetrics.print_report()
100
100
```
101
-
101
+
102
102
```python
103
103
from sparkmeasure import TaskMetrics
104
104
taskmetrics = TaskMetrics(spark)
@@ -108,18 +108,18 @@ stagemetrics aggregation as preferred choice.
108
108
109
109
### Exporting metrics data for archiving and/or further analysis
110
110
111
-
One simple use case is to make use of the data collected and reported by stagemetrics and taskmetrics
112
-
printReport methods for immediate troubleshooting and workload analysis.
113
-
You also have options to save metrics aggregated asin the printReport output.
111
+
One simple use case is to make use of the data collected and reported by stagemetrics and taskmetrics
112
+
printReport methods for immediate troubleshooting and workload analysis.
113
+
You also have options to save metrics aggregated asin the printReport output.
114
+
115
+
Another option is to export the metrics to an external system, see details at [Prometheus Pushgateway](Prometheus.md) oror [Prometheus exporter through JMX](Prometheus_through_JMX.md).
114
116
115
-
Another option is to export the metrics to an external system, see details at [Prometheus Pushgateway](Prometheus.md)
116
-
117
117
- Example on how to export raw Stage metrics data in json format
118
118
```python
119
119
from sparkmeasure import StageMetrics
120
120
stagemetrics = StageMetrics(spark)
121
121
stagemetrics.runandmeasure(globals(), ...your workload here ... )
0 commit comments