Skip to content

Commit 514b911

Browse files
committed
Prepare release 0.27
1 parent 5fd8192 commit 514b911

20 files changed

Lines changed: 78 additions & 78 deletions

.github/workflows/build_with_scala_and_python_tests.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ jobs:
1111
runs-on: ubuntu-24.04
1212
strategy:
1313
matrix:
14-
python-version: [ '3.9', '3.12' ]
14+
python-version: [ '3.10', '3.12' ]
1515
java-version: [ 11, 17 ]
1616

1717
steps:
@@ -25,7 +25,7 @@ jobs:
2525
cache: sbt
2626
- uses: sbt/setup-sbt@v1
2727
with:
28-
sbt-runner-version: 1.11.5
28+
sbt-runner-version: 1.11.6
2929

3030
- name: Build using sbt for all Scala versions supported
3131
run: sbt +package

README.md

Lines changed: 20 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -32,14 +32,14 @@ and spark-shell/pyspark environments.
3232
- [✨ Highlights](#highlights)
3333
- [📚 Table of Contents](#tableofcontents)
3434
- [Links to related work on Spark Performance](#links-to-related-work-on-spark-performance)
35-
- [🚀 Quick start](#quickstart)
35+
- [🚀 Quick start](#quickstart)
3636
- [Examples of sparkMeasure on notebooks](#examples-of-sparkmeasure-on-notebooks)
3737
- [Examples of sparkMeasure on the CLI](#examples-of-sparkmeasure-on-the-cli)
3838
- [Python CLI](#python-cli)
3939
- [Scala CLI](#scala-cli)
4040
- [Memory report](#memory-report)
4141
- [CLI example for Task Metrics:](#cli-example-for-task-metrics)
42-
- [Setting Up SparkMeasure with Spark](#setting-up-sparkmeasure-with-spark)
42+
- [Setting Up SparkMeasure with Spark](#setting-up-sparkmeasure-with-spark)
4343
- [Version Compatibility for SparkMeasure](#version-compatibility-for-sparkmeasure)
4444
- [📥 Downloading SparkMeasure](#-downloading-sparkmeasure)
4545
- [Setup Examples](#setup-examples)
@@ -48,7 +48,7 @@ and spark-shell/pyspark environments.
4848
- [Including sparkMeasure in your Spark environment](#including-sparkmeasure-in-your-spark-environment)
4949
- [Running unit tests](#running-unit-tests)
5050
- [Notes on Spark Metrics](#notes-on-spark-metrics)
51-
- [Documentation, API, and examples](#documentation-api-and-examples)
51+
- [Documentation, API, and examples](#documentation-api-and-examples)
5252
- [Architecture diagram](#architecture-diagram)
5353
- [Main concepts underlying sparkMeasure implementation](#main-concepts-underlying-sparkmeasure-implementation)
5454
- [FAQ:](#faq)
@@ -95,7 +95,7 @@ Main author and contact: Luca.Canali@cern.ch
9595
# Python CLI
9696
# pip install pyspark
9797
pip install sparkmeasure
98-
pyspark --packages ch.cern.sparkmeasure:spark-measure_2.13:0.26
98+
pyspark --packages ch.cern.sparkmeasure:spark-measure_2.13:0.27
9999
100100
# Import sparkMeasure
101101
from sparkmeasure import StageMetrics
@@ -118,12 +118,12 @@ Main author and contact: Luca.Canali@cern.ch
118118
# get metrics as a dictionary
119119
metrics = stagemetrics.aggregate_stage_metrics()
120120
```
121-
Note: for Spark 3.x with Scala 2.12, use `--packages ch.cern.sparkmeasure:spark-measure_2.12:0.26`
122-
instead of `--packages ch.cern.sparkmeasure:spark-measure_2.13:0.26`
121+
Note: for Spark 3.x with Scala 2.12, use `--packages ch.cern.sparkmeasure:spark-measure_2.12:0.27`
122+
instead of `--packages ch.cern.sparkmeasure:spark-measure_2.13:0.27`
123123

124124
#### Scala CLI
125125
```
126-
spark-shell --packages ch.cern.sparkmeasure:spark-measure_2.13:0.26
126+
spark-shell --packages ch.cern.sparkmeasure:spark-measure_2.13:0.27
127127
128128
val stageMetrics = ch.cern.sparkmeasure.StageMetrics(spark)
129129
stageMetrics.runAndMeasure(spark.sql("select count(*) from range(1000) cross join range(1000) cross join range(1000)").show())
@@ -206,7 +206,7 @@ Notes:
206206
This is similar but slightly different from the example above as it collects metrics at the Task-level rather than Stage-level
207207
```
208208
# Scala CLI
209-
spark-shell --packages ch.cern.sparkmeasure:spark-measure_2.13:0.26
209+
spark-shell --packages ch.cern.sparkmeasure:spark-measure_2.13:0.27
210210
211211
val taskMetrics = ch.cern.sparkmeasure.TaskMetrics(spark)
212212
taskMetrics.runAndMeasure(spark.sql("select count(*) from range(1000) cross join range(1000) cross join range(1000)").show())
@@ -215,7 +215,7 @@ This is similar but slightly different from the example above as it collects met
215215
# Python CLI
216216
# pip install pyspark
217217
pip install sparkmeasure
218-
pyspark --packages ch.cern.sparkmeasure:spark-measure_2.13:0.26
218+
pyspark --packages ch.cern.sparkmeasure:spark-measure_2.13:0.27
219219
220220
from sparkmeasure import TaskMetrics
221221
taskmetrics = TaskMetrics(spark)
@@ -229,8 +229,8 @@ This is similar but slightly different from the example above as it collects met
229229

230230
| Spark Version | Recommended SparkMeasure Version | Scala Version |
231231
| -------------- |----------------------------------|---------------------|
232-
| Spark 4.x | 0.26 (latest) | Scala 2.13 |
233-
| Spark 3.x | 0.26 (latest) | Scala 2.12 and 2.13 |
232+
| Spark 4.x | 0.27 (latest) | Scala 2.13 |
233+
| Spark 3.x | 0.27 (latest) | Scala 2.12 and 2.13 |
234234
| Spark 2.4, 2.3 | 0.19 | Scala 2.11 |
235235
| Spark 2.2, 2.1 | 0.16 | Scala 2.11 |
236236

@@ -244,7 +244,7 @@ To get SparkMeasure, choose one of the following options:
244244

245245
2. **Specific Versions:**
246246

247-
* Download JAR files from the [sparkMeasure release notes](https://github.com/LucaCanali/sparkMeasure/releases/tag/v0.26).
247+
* Download JAR files from the [sparkMeasure release notes](https://github.com/LucaCanali/sparkMeasure/releases/tag/v0.27).
248248

249249
3. **Latest Development Builds:**
250250

@@ -258,21 +258,21 @@ To get SparkMeasure, choose one of the following options:
258258

259259
#### Spark 4 with Scala 2.13
260260

261-
* **Scala:** `spark-shell --packages ch.cern.sparkmeasure:spark-measure_2.13:0.26`
261+
* **Scala:** `spark-shell --packages ch.cern.sparkmeasure:spark-measure_2.13:0.27`
262262
* **Python:**
263263

264264
```bash
265-
pyspark --packages ch.cern.sparkmeasure:spark-measure_2.13:0.26
265+
pyspark --packages ch.cern.sparkmeasure:spark-measure_2.13:0.27
266266
pip install sparkmeasure
267267
```
268268

269269
#### Spark 3 with Scala 2.12
270270

271-
* **Scala:** `spark-shell --packages ch.cern.sparkmeasure:spark-measure_2.12:0.26`
271+
* **Scala:** `spark-shell --packages ch.cern.sparkmeasure:spark-measure_2.12:0.27`
272272
* **Python:**
273273

274274
```bash
275-
pyspark --packages ch.cern.sparkmeasure:spark-measure_2.12:0.26
275+
pyspark --packages ch.cern.sparkmeasure:spark-measure_2.12:0.27
276276
pip install sparkmeasure
277277
```
278278
### Including sparkMeasure in your Spark environment
@@ -282,14 +282,14 @@ Choose your preferred method:
282282
* Use the `--packages` option:
283283

284284
```bash
285-
--packages ch.cern.sparkmeasure:spark-measure_2.13:0.26
285+
--packages ch.cern.sparkmeasure:spark-measure_2.13:0.27
286286
```
287287
* Directly reference the JAR file:
288288

289289
```bash
290-
--jars /path/to/spark-measure_2.13-0.26.jar
291-
--jars https://github.com/LucaCanali/sparkMeasure/releases/download/v0.26/spark-measure_2.13-0.26.jar
292-
--conf spark.driver.extraClassPath=/path/to/spark-measure_2.13-0.26.jar
290+
--jars /path/to/spark-measure_2.13-0.27.jar
291+
--jars https://github.com/LucaCanali/sparkMeasure/releases/download/v0.27/spark-measure_2.13-0.27.jar
292+
--conf spark.driver.extraClassPath=/path/to/spark-measure_2.13-0.27.jar
293293
```
294294

295295

build.sbt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44

55
name := "spark-measure"
66

7-
version := "0.27-SNAPSHOT"
7+
version := "0.27"
88

99
scalaVersion := "2.12.18"
1010
crossScalaVersions := Seq("2.12.18", "2.13.16")

docs/Flight_recorder_mode_FileSink.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ Metrics can also be printed to stdout.
1212
## Recording metrics using the Flight Recorder mode with Stage-level granularity
1313
To record metrics at the stage execution level granularity add these configurations to spark-submit:
1414
```
15-
--packages ch.cern.sparkmeasure:spark-measure_2.13:0.26
15+
--packages ch.cern.sparkmeasure:spark-measure_2.13:0.27
1616
--conf spark.extraListeners=ch.cern.sparkmeasure.FlightRecorderStageMetrics
1717
```
1818

@@ -25,7 +25,7 @@ The usage is almost the same as for the stage metrics mode described above, just
2525
The configuration parameters applicable to Flight recorder mode for Task granularity are:
2626

2727
```
28-
--packages ch.cern.sparkmeasure:spark-measure_2.13:0.26
28+
--packages ch.cern.sparkmeasure:spark-measure_2.13:0.27
2929
--conf spark.extraListeners=ch.cern.sparkmeasure.FlightRecorderTaskMetrics
3030
```
3131

@@ -51,7 +51,7 @@ A Python example
5151
- This runs the pi.py example script
5252
- collects and saves the metrics to `/tmp/stageMetrics_flightRecorder` in json format:
5353
```
54-
bin/spark-submit --master local[*] --packages ch.cern.sparkmeasure:spark-measure_2.13:0.26 \
54+
bin/spark-submit --master local[*] --packages ch.cern.sparkmeasure:spark-measure_2.13:0.27 \
5555
--conf spark.extraListeners=ch.cern.sparkmeasure.FlightRecorderStageMetrics \
5656
examples/src/main/python/pi.py
5757
@@ -63,7 +63,7 @@ A Scala example
6363
- same example as above, in addition use a custom output filename
6464
- print metrics also to stdout
6565
```
66-
bin/spark-submit --master local[*] --packages ch.cern.sparkmeasure:spark-measure_2.13:0.26 \
66+
bin/spark-submit --master local[*] --packages ch.cern.sparkmeasure:spark-measure_2.13:0.27 \
6767
--class org.apache.spark.examples.SparkPi \
6868
--conf spark.extraListeners=ch.cern.sparkmeasure.FlightRecorderStageMetrics \
6969
--conf spark.sparkmeasure.printToStdout=true \
@@ -80,7 +80,7 @@ This example collected metrics with Task granularity.
8080
(note: source the Hadoop environment before running this)
8181
```
8282
bin/spark-submit --master yarn --deploy-mode cluster \
83-
--packages ch.cern.sparkmeasure:spark-measure_2.13:0.26 \
83+
--packages ch.cern.sparkmeasure:spark-measure_2.13:0.27 \
8484
--conf spark.extraListeners=ch.cern.sparkmeasure.FlightRecorderTaskMetrics \
8585
--conf spark.sparkmeasure.outputFormat=json_to_hadoop \
8686
--conf spark.sparkmeasure.outputFilename="hdfs://myclustername/user/luca/test/myoutput_$(date +%s).json" \
@@ -96,7 +96,7 @@ Example, use Spark 4, Kubernetes, Scala 2.13 and write output to S3:
9696
bin/spark-submit --master k8s://https://XXX.XXX.XXX.XXX --deploy-mode client --conf spark.executor.instances=3 \
9797
--conf spark.executor.cores=2 --executor-memory 6g --driver-memory 8g \
9898
--conf spark.kubernetes.container.image=apache/spark \
99-
--packages org.apache.hadoop:hadoop-aws:3.4.1,ch.cern.sparkmeasure:spark-measure_2.13:0.26 \
99+
--packages org.apache.hadoop:hadoop-aws:3.4.2,ch.cern.sparkmeasure:spark-measure_2.13:0.27 \
100100
--conf spark.hadoop.fs.s3a.secret.key="YYY..." \
101101
--conf spark.hadoop.fs.s3a.access.key="ZZZ..." \
102102
--conf spark.hadoop.fs.s3a.endpoint="https://s3.cern.ch" \
@@ -105,7 +105,7 @@ bin/spark-submit --master k8s://https://XXX.XXX.XXX.XXX --deploy-mode client --c
105105
--conf spark.sparkmeasure.outputFormat=json_to_hadoop \
106106
--conf spark.sparkmeasure.outputFilename="s3a://test/myoutput_$(date +%s).json" \
107107
--class org.apache.spark.examples.SparkPi \
108-
examples/jars/spark-examples_2.13-4.0.0.jar 10
108+
examples/jars/spark-examples_2.13-4.0.1.jar 10
109109
```
110110

111111

@@ -115,7 +115,7 @@ To post-process the saved metrics you will need to deserialize objects saved by
115115
This is an example of how to do that using the supplied helper object sparkmeasure.Utils
116116

117117
```
118-
bin/spark-shell --packages ch.cern.sparkmeasure:spark-measure_2.13:0.26
118+
bin/spark-shell --packages ch.cern.sparkmeasure:spark-measure_2.13:0.27
119119
120120
val myMetrics = ch.cern.sparkmeasure.IOUtils.readSerializedStageMetricsJSON("/tmp/stageMetrics_flightRecorder")
121121
// use ch.cern.sparkmeasure.IOUtils.readSerializedStageMetrics("/tmp/stageMetrics.serialized") for java serialization

docs/Flight_recorder_mode_InfluxDBSink.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -87,7 +87,7 @@ bin/spark-shell \
8787
--conf spark.sparkmeasure.influxdbURL="http://localhost:8086" \
8888
--conf spark.extraListeners=ch.cern.sparkmeasure.InfluxDBSink,ch.cern.sparkmeasure.InfluxDBSinkExtended \
8989
--conf spark.sparkmeasure.influxdbStagemetrics=true
90-
--packages ch.cern.sparkmeasure:spark-measure_2.13:0.26
90+
--packages ch.cern.sparkmeasure:spark-measure_2.13:0.27
9191
9292
// run a Spark job, this will produce metrics
9393
spark.sql("select count(*) from range(1000) cross join range(1000) cross join range(1000)").show

docs/Flight_recorder_mode_KafkaSink.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ This code depends on "kafka-clients". If you deploy sparkMeasure from maven cent
4444
the dependency is being taken care of.
4545
If you run sparkMeasure from a jar instead, you may need to add the dependency manually
4646
in spark-submit as in:
47-
- `--packages org.apache.kafka:kafka-clients:3.9.0`
47+
- `--packages org.apache.kafka:kafka-clients:4.1.0`
4848

4949
## Use cases
5050

@@ -69,7 +69,7 @@ bin/spark-shell \
6969
--conf spark.extraListeners=ch.cern.sparkmeasure.KafkaSink \
7070
--conf spark.sparkmeasure.kafkaBroker=localhost:9092 \
7171
--conf spark.sparkmeasure.kafkaTopic=metrics
72-
--packages ch.cern.sparkmeasure:spark-measure_2.13:0.26
72+
--packages ch.cern.sparkmeasure:spark-measure_2.13:0.27
7373
```
7474

7575
- Look at the metrics being written into Kafka:

docs/Flight_recorder_mode_PrometheusPushgatewaySink.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -60,7 +60,7 @@ Examples:
6060
bin/spark-shell \
6161
--conf spark.extraListeners=ch.cern.sparkmeasure.PushGatewaySink \
6262
--conf spark.sparkmeasure.pushgateway=localhost:9091 \
63-
--packages ch.cern.sparkmeasure:spark-measure_2.13:0.26
63+
--packages ch.cern.sparkmeasure:spark-measure_2.13:0.27
6464
```
6565

6666
- Look at the metrics being written to the Pushgateway

docs/Instrument_Python_code.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ You can find an example of how to instrument a Scala application running Apache
1111

1212
How to run the example:
1313
```
14-
bin/spark-submit --packages ch.cern.sparkmeasure:spark-measure_2.13:0.26 <path_to_examples>/test_sparkmeasure_python.py
14+
bin/spark-submit --packages ch.cern.sparkmeasure:spark-measure_2.13:0.27 <path_to_examples>/test_sparkmeasure_python.py
1515
```
1616

1717
Some relevant snippet of code are:
@@ -54,10 +54,10 @@ The details are discussed in the [examples for Python shell and notebook](https:
5454

5555
- This is how to run sparkMeasure using a packaged version in Maven Central
5656
```
57-
bin/spark-submit --packages ch.cern.sparkmeasure:spark-measure_2.13:0.26 your_python_code.py
57+
bin/spark-submit --packages ch.cern.sparkmeasure:spark-measure_2.13:0.27 your_python_code.py
5858
5959
// alternative: just download and use the jar (it is only needed in the driver) as in:
60-
bin/spark-submit --conf spark.driver.extraClassPath=<path>/spark-measure_2.13-0.26.jar ...
60+
bin/spark-submit --conf spark.driver.extraClassPath=<path>/spark-measure_2.13-0.27.jar ...
6161
```
6262

6363
### Download and build sparkMeasure (optional)
@@ -73,8 +73,8 @@ The details are discussed in the [examples for Python shell and notebook](https:
7373
pip install .
7474
7575
# Run as in one of these examples:
76-
bin/spark-submit --jars path>/spark-measure_2.13-0.27-SNAPSHOT.jar ...
76+
bin/spark-submit --jars path>/spark-measure_2.13-0.28-SNAPSHOT.jar ...
7777
7878
# alternative, set classpath for the driver (sparkmeasure code runs only in the driver)
79-
bin/spark-submit --conf spark.driver.extraClassPath=<path>/spark-measure_2.13-0.27-SNAPSHOT.jar ...
79+
bin/spark-submit --conf spark.driver.extraClassPath=<path>/spark-measure_2.13-0.28-SNAPSHOT.jar ...
8080
```

docs/Instrument_Scala_code.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ How to run the example:
1313
# build the example jar
1414
sbt package
1515
16-
bin/spark-submit --master local[*] --packages ch.cern.sparkmeasure:spark-measure_2.13:0.26 --class ch.cern.testSparkMeasure.testSparkMeasure <path_to_the_example_jar>/testsparkmeasurescala_2.13-0.1.jar
16+
bin/spark-submit --master local[*] --packages ch.cern.sparkmeasure:spark-measure_2.13:0.27 --class ch.cern.testSparkMeasure.testSparkMeasure <path_to_the_example_jar>/testsparkmeasurescala_2.13-0.1.jar
1717
```
1818

1919
### Collect and save Stage Metrics
@@ -72,10 +72,10 @@ You have the option to export aggregated stage metrics and/or task metrics to:
7272

7373
- This is how to run sparkMeasure using a packaged version in Maven Central
7474
```
75-
bin/spark-submit --packages ch.cern.sparkmeasure:spark-measure_2.13:0.26
75+
bin/spark-submit --packages ch.cern.sparkmeasure:spark-measure_2.13:0.27
7676

7777
// or just download and use the jar (it is only needed in the driver) as in:
78-
bin/spark-submit --conf spark.driver.extraClassPath=<path>/spark-measure_2.13-0.26.jar ...
78+
bin/spark-submit --conf spark.driver.extraClassPath=<path>/spark-measure_2.13-0.27.jar ...
7979
```
8080
- The alternative, see paragraph above, is to build a jar from master (See below).
8181

@@ -86,11 +86,11 @@ You have the option to export aggregated stage metrics and/or task metrics to:
8686
git clone https://github.com/lucacanali/sparkmeasure
8787
cd sparkmeasure
8888
sbt +package
89-
ls -l target/scala-2.12/spark-measure*.jar # location of the compiled jar
89+
ls -l target/scala-2.13/spark-measure*.jar # location of the compiled jar
9090
9191
# Run as in one of these examples:
92-
bin/spark-submit --jars path>/spark-measure_2.13-0.27-SNAPSHOT.jar
92+
bin/spark-submit --jars path>/spark-measure_2.13-0.28-SNAPSHOT.jar
9393
9494
# alternative, set classpath for the driver (it is only needed in the driver)
95-
bin/spark-submit --conf spark.driver.extraClassPath=<path>/spark-measure_2.13-0.27-SNAPSHOT.jar ...
95+
bin/spark-submit --conf spark.driver.extraClassPath=<path>/spark-measure_2.13-0.28-SNAPSHOT.jar ...
9696
```

docs/Prometheus.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ https://prometheus.io/docs/instrumenting/exposition_formats/
3535

3636
1. Measure metrics at the Stage level (example in Scala):
3737
```
38-
bin/spark-shell --packages ch.cern.sparkmeasure:spark-measure_2.13:0.26
38+
bin/spark-shell --packages ch.cern.sparkmeasure:spark-measure_2.13:0.27
3939
4040
val stageMetrics = ch.cern.sparkmeasure.StageMetrics(spark)
4141
stageMetrics.begin()

0 commit comments

Comments
 (0)