BiJuTy

An Interactive HPC-Aware Big Data Cluster Lifecycle Manager and Performance Assessment Utility for JupyterHub

About

BiJuTy (pronounced BYOO-tee) is an interactive Jupyter Notebook-based framework that simplifies cluster lifecycle management and performance assessment on HPC systems for users of all experience levels. It enables seamless multi-cluster management, automates performance metric collection, and allows users to iteratively optimize big data applications in just a few clicks.

Getting Started

Install the package directly from GitHub inside a Jupyter notebook cell:

!pip install git+https://github.com/apurvkulkarni7/bijuty.git

Or install from a local clone:

git clone https://github.com/apurvkulkarni7/bijuty.git
cd bijuty
pip install -e .

To get started, simply import the package in a notebook cell:

import bijuty

Requirements

BiJuTy supports the following packages:

Package	Version
Python	3.12.3
Apache Spark and PySpark	3.5.1
Apache Flink and PyFlink	2.1.2

Additional requirements:

A JupyterHub / Jupyter Notebook environment
ipywidgets enabled in Jupyter
An active SLURM job allocation (the tool auto-detects SLURM resources)

Enabling Jupyter Widgets

If ipywidgets is not already enabled, run once in a terminal:

jupyter nbextension enable --py widgetsnbextension
# For JupyterLab:
jupyter labextension install @jupyter-widgets/jupyterlab-manager

Interface Sections

The BiJuTy provides an interactive interface with the following sections:

Configuration Panel

Control	Purpose
Framework	Select Spark or Flink
Custom FRAMEWORK_HOME	Optionally override the framework installation path
Template	Use the default config template or specify a custom one
Destination	Directory where the generated configuration will be written
Master Host	The node to use as the cluster master
Worker Hosts	Nodes to use as workers (checkboxes auto-populated from SLURM)
Driver / Worker / Executor CPU	CPU allocation sliders
Driver / Worker / Executor Memory	Memory allocation sliders (MB)
Randomize Master Port	Avoid port conflicts when many users share nodes
Load to Environment	Generate configuration and update environment variables

Resource Allocation Overview

A live visualization shows how CPU and memory resources are distributed across master, worker, and executor roles based on the SLURM allocation.

Cluster Controls

After loading the environment:

Start Cluster - starts the selected framework cluster
Stop Cluster - gracefully stops the cluster
Web UI Links - buttons to open framework web UIs (Spark Master, Worker, Application UI; Flink JobManager UI)

SSH Port Forwarding: If running on the HPC cluster, the GUI displays an SSH command to forward web UI ports to your local machine so you can access cluster and application GUI.

Performance Metrics

Real-time monitoring interface that aggregates metrics across multiple levels:

Level	Description
Process Level	Per-process CPU utilization, memory consumption, and I/O statistics for running cluster components
Framework Level	Framework-specific metrics such as Spark executor metrics, task throughput, or Flink job statistics
External Metrics	Integration with external monitoring sources (e.g., Pika metrics server) for extended cluster-wide observability

Multi-Cluster Management

Add new cluster tabs with + and remove them with x to manage multiple independent framework clusters via a tabbed interface.

License

This project is licensed under the GNU General Public License v3.0 (GPL-3.0-or-later) - see the LICENSE file for details.

Acknowledgements

This work was developed at ScaDS.AI (Center for Scalable Data Analytics and Artificial Intelligence).

Name		Name	Last commit message	Last commit date
Latest commit History 86 Commits
bijuty		bijuty
docs		docs
example		example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BiJuTy

About

Getting Started

Requirements

Enabling Jupyter Widgets

Interface Sections

Configuration Panel

Resource Allocation Overview

Cluster Controls

Performance Metrics

Multi-Cluster Management

License

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

BiJuTy

About

Getting Started

Requirements

Enabling Jupyter Widgets

Interface Sections

Configuration Panel

Resource Allocation Overview

Cluster Controls

Performance Metrics

Multi-Cluster Management

License

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages