You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Use Spark Plugins to extend Apache Spark with custom metrics and executors' startup actions.
6
-
5
+
**Spark Plugins** are an Apache Spark feature for extending Spark with custom metrics and actions.
6
+
This repository provides ready-to-use examples for deploying Spark plugins across various use cases.
7
7
### Key Features
8
8
9
-
-**Spark Plugins** are a mechanism to extend Apache Spark with custom code for metrics and actions.
10
-
- This repository provides examples of plugins that you can deploy to extend Spark with custom metrics and actions.
11
-
-**Extending Spark instrumentation** with custom metrics
12
-
-**Running custom actions** when the executors start up, typically useful for integrating with
13
-
external systems, such as monitoring systems.
14
-
- This repo provides code and examples of plugins applied to measuring Spark on cluster resources (YARN, K8S, Standalone),
15
-
including measuring Spark I/O from cloud Filesystems, OS metrics, custom application metrics, and integrations with external systems like Pyroscope.
16
-
- The code in this repo is for Spark 3.x. For Spark 2.x, see instead [Executor Plugins for Spark 2.4](https://github.com/cerndb/SparkExecutorPlugins2.4)
9
+
***Custom Metrics:** Extend Spark's instrumentation with user-defined metrics.
10
+
***Executor Actions:** Trigger custom actions upon executor startup, useful for integrations (e.g., monitoring systems).
-[Demo and basic plugins](#demo-and-basic-plugins)
18
+
-[Implementation notes](#implementation-notes)
21
19
-[Plugin for integrating Pyroscope with Spark](#plugin-for-integrating-with-pyroscope)
22
20
-[Plugin for OS metrics instrumentation with Cgroups for Spark on Kubernetes](#os-metrics-instrumentation-with-cgroups-for-spark-on-kubernetes)
23
21
-[Plugin to collect I/O storage statistics for HDFS and Hadoop-compatible filesystems](#plugins-to-collect-io-storage-statistics-for-hdfs-and-hadoop-compatible-filesystems)
@@ -33,19 +31,6 @@ Use Spark Plugins to extend Apache Spark with custom metrics and executors' star
33
31
34
32
Author and contact: Luca.Canali@cern.ch
35
33
36
-
### Implementation Notes:
37
-
- Spark plugins implement the `org.apache.spark.api.Plugin` interface, they can be written in Scala or Java
38
-
and can be used to run custom code at the startup of Spark executors and driver.
39
-
- Plugins basic configuration: `--conf spark.plugins=<list of plugin classes>`
40
-
- Plugin JARs need to be made available to Spark executors
41
-
- you can distribute the plugin code to the executors using `--jars` and `--packages`
42
-
- for K8S you can also consider making the jars available directly in the container image
43
-
- Most of the Plugins described in this repo are intended to extend the Spark Metrics System
44
-
- See the details on the Spark metrics system at [Spark Monitoring documentation](https://spark.apache.org/docs/latest/monitoring.html#metrics).
45
-
- You can find the metrics generated by the plugins in the Spark metrics system stream under the
46
-
namespace `namespace=plugin.<Plugin Class Name>`
47
-
- See also: [SPARK-29397](https://issues.apache.org/jira/browse/SPARK-29397), [SPARK-28091](https://issues.apache.org/jira/browse/SPARK-28091), [SPARK-32119](https://issues.apache.org/jira/browse/SPARK-32119).
48
-
49
34
---
50
35
## Getting Started - Your First Spark Plugins
51
36
- Deploy the code of the Spark plugins described here using from maven central
@@ -76,6 +61,20 @@ Author and contact: Luca.Canali@cern.ch
76
61
```
77
62
- You can see if the plugin has run by checking that the file `/tmp/plugin.txt` has been
78
63
created on the executor machines.
64
+
---
65
+
### Implementation Notes:
66
+
- Spark plugins implement the `org.apache.spark.api.Plugin` interface, they can be written in Scala or Java
67
+
and can be used to run custom code at the startup of Spark executors and driver.
68
+
- Plugins basic configuration: `--conf spark.plugins=<list of plugin classes>`
69
+
- Plugin JARs need to be made available to Spark executors
70
+
- you can distribute the plugin code to the executors using `--jars` and `--packages`
71
+
- for K8S you can also consider making the jars available directly in the container image
72
+
- Most of the Plugins described in this repo are intended to extend the Spark Metrics System
73
+
- See the details on the Spark metrics system at [Spark Monitoring documentation](https://spark.apache.org/docs/latest/monitoring.html#metrics).
74
+
- You can find the metrics generated by the plugins in the Spark metrics system stream under the
75
+
namespace `namespace=plugin.<Plugin Class Name>`
76
+
- See also: [SPARK-29397](https://issues.apache.org/jira/browse/SPARK-29397), [SPARK-28091](https://issues.apache.org/jira/browse/SPARK-28091), [SPARK-32119](https://issues.apache.org/jira/browse/SPARK-32119).
0 commit comments