Skip to content

mcevik0/fabric-generic-cluster

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

56 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

fabric-generic-cluster

A comprehensive, type-safe Python framework for managing FABRIC testbed slices with support for complex network topologies, DPU interfaces, multi-OS configurations, and various hardware components.

PyPI version Python 3.9+ Pydantic V2 License: MIT

🌟 Features

Core Capabilities

  • βœ… Type-Safe Data Models - Pydantic-based topology definitions with automatic validation
  • βœ… DPU Interface Support - Full support for DPU network interfaces alongside traditional NICs
  • βœ… Multi-OS Support - Automatic detection and configuration for Rocky Linux, Ubuntu, and Debian
  • βœ… Hardware Components - Full support for GPUs, FPGAs, DPUs, NVMe, and custom NICs
  • βœ… Network Management - L2/L3 network configuration with IPv4/IPv6 support
  • βœ… SSH Automation - Passwordless SSH setup across all nodes
  • βœ… Visualization - Multiple output formats (text, ASCII, graphs, tables)
  • βœ… Easy Installation - Available on PyPI via pip install
  • βœ… Modular Design - Separated concerns for better maintainability

Hardware Support

  • GPUs - NVIDIA RTX series, Tesla T4, A30, A40
  • FPGAs - Xilinx Alveo U280, U50, U250
  • DPUs - ConnectX-7 100G/400G Data Processing Units with network interfaces
  • NVMe - Intel P4510, P4610 NVMe storage
  • NICs - Basic, ConnectX-5, ConnectX-6, SharedNICs, SmartNICs
  • Persistent Storage - Volume management

πŸ“‹ Table of Contents

πŸš€ Installation

From PyPI (Recommended)

pip install fabric-generic-cluster

From Source

git clone https://github.com/mcevik0/fabric-generic-cluster.git
cd fabric-generic-cluster
pip install -e .

Prerequisites

  • Python 3.9 or higher
  • Access to FABRIC testbed
  • fabrictestbed-extensions>=1.4.0 (installed automatically)

Verify Installation

import fabric_generic_cluster
print(fabric_generic_cluster.__version__)

🎯 Quick Start

Option 1: Python Script

from fabric_generic_cluster import (
    load_topology_from_yaml_file,
    deploy_topology_to_fabric,
    configure_l3_networks,
    configure_node_interfaces,
    setup_passwordless_ssh,
)

# Load topology
topology = load_topology_from_yaml_file("topology.yaml")

# Deploy to FABRIC
slice = deploy_topology_to_fabric(topology, "my-cluster")

# Configure networks (if using L3 networks)
configure_l3_networks(slice, topology)

# Configure interfaces
configure_node_interfaces(slice, topology)

# Setup SSH
setup_passwordless_ssh(slice)

print("βœ… Cluster deployed and configured!")

Option 2: Using the Example Script

# Clone the repository for examples
git clone https://github.com/mcevik0/fabric-generic-cluster.git
cd fabric-generic-cluster

# Run the complete deployment example
python examples/complete-deployment-example.py \
    --yaml path/to/topology.yaml \
    --slice-name my-test-slice

Option 3: Jupyter Notebooks

For interactive workflows, check out the fabric-generic-cluster-notebooks repository:

git clone https://github.com/mcevik0/fabric-generic-cluster-notebooks.git
cd fabric-generic-cluster-notebooks
jupyter notebook

πŸ“¦ Package Structure

fabric-generic-cluster/
β”œβ”€β”€ fabric_generic_cluster/          # Main package
β”‚   β”œβ”€β”€ __init__.py                  # Package exports
β”‚   β”œβ”€β”€ models.py                    # Pydantic models for topology
β”‚   β”œβ”€β”€ deployment.py                # Slice deployment functions
β”‚   β”œβ”€β”€ network_config.py            # Network configuration
β”‚   β”œβ”€β”€ ssh_setup.py                 # SSH management
β”‚   β”œβ”€β”€ topology_viewer.py           # Visualization tools
β”‚   β”œβ”€β”€ builder_compat.py            # Backward compatibility
β”‚   └── tools/                       # Command-line tools
β”‚       β”œβ”€β”€ __init__.py
β”‚       └── topology_summary.py      # Topology summary generator
β”‚
β”œβ”€β”€ examples/                        # Usage examples
β”‚   └── complete-deployment-example.py
β”‚
β”œβ”€β”€ tests/                          # Test suite
β”‚   β”œβ”€β”€ test-dpu-support.py
β”‚   └── test-fpga-support.py
β”‚
β”œβ”€β”€ pyproject.toml                  # Package metadata
β”œβ”€β”€ setup.py                        # Setup configuration
β”œβ”€β”€ MANIFEST.in                     # Package data
β”œβ”€β”€ LICENSE                         # MIT License
└── README.md                       # This file

πŸ“š Usage Examples

Example 1: Load and Explore Topology

from fabric_generic_cluster import (
    load_topology_from_yaml_file,
    print_topology_summary,
    draw_topology_graph,
)

# Load topology
topology = load_topology_from_yaml_file("topology.yaml")

# Print summary
print_topology_summary(topology)

# Create visualization
draw_topology_graph(topology, show_ip=True, save_path="topology.png")

Example 2: Deploy Multi-Site Cluster

from fabric_generic_cluster import (
    load_topology_from_yaml_file,
    deploy_topology_to_fabric,
    configure_node_interfaces,
    verify_node_interfaces,
)

# Load topology with nodes at multiple sites
topology = load_topology_from_yaml_file("multi-site-topology.yaml")

# Deploy
slice = deploy_topology_to_fabric(topology, "multi-site-cluster")

# Configure all nodes
configure_node_interfaces(slice, topology)

# Verify configuration
verify_node_interfaces(slice, topology)

Example 3: Access Type-Safe Data

from fabric_generic_cluster import load_topology_from_yaml_file

topology = load_topology_from_yaml_file("topology.yaml")

# Get specific node
node = topology.get_node_by_hostname("node-1")

print(f"Node: {node.hostname}")
print(f"Site: {node.site}")
print(f"CPU: {node.capacity.cpu} cores")
print(f"RAM: {node.capacity.ram} GB")

# Check hardware components
if node.pci.dpu:
    print(f"DPUs: {len(node.pci.dpu)}")
    for dpu_name, dpu in node.pci.dpu.items():
        print(f"  - {dpu_name}: {dpu.model}")
        print(f"    Interfaces: {len(dpu.interfaces)}")

if node.pci.fpga:
    print(f"FPGAs: {len(node.pci.fpga)}")
    for fpga_name, fpga in node.pci.fpga.items():
        print(f"  - {fpga_name}: {fpga.model}")

# Get all interfaces (NIC + DPU)
all_interfaces = node.get_all_interfaces()
print(f"\nTotal interfaces: {len(all_interfaces)}")

for device_name, iface_name, iface in all_interfaces:
    device_type = "DPU" if device_name.startswith("dpu") else "NIC"
    print(f"{device_type} {device_name}.{iface_name}: {iface.binding}")

Example 4: Test Network Connectivity

from fabric_generic_cluster import (
    get_slice,
    load_topology_from_yaml_file,
    ping_network_from_node,
    verify_ssh_access,
)

# Get existing slice
slice = get_slice("my-cluster")
topology = load_topology_from_yaml_file("topology.yaml")

# Test ping connectivity
ping_results = ping_network_from_node(
    slice, 
    topology, 
    source_hostname="node-1", 
    network_name="network1",
    count=3
)

if all(ping_results.values()):
    print("βœ… All ping tests passed!")

# Test SSH access
ssh_results = verify_ssh_access(
    slice,
    topology,
    source_hostname="node-1",
    network_name="network1"
)

if all(ssh_results.values()):
    print("βœ… All SSH connections successful!")

Example 5: Using Module-Style Imports

For compatibility with existing code:

from fabric_generic_cluster import deployment as sd
from fabric_generic_cluster import network_config as snc
from fabric_generic_cluster import ssh_setup as ssh
from fabric_generic_cluster import load_topology_from_yaml_file

# Load topology
topology = load_topology_from_yaml_file("topology.yaml")

# Deploy
slice = sd.deploy_topology_to_fabric(topology, "my-slice")

# Configure
snc.configure_node_interfaces(slice, topology)
ssh.setup_passwordless_ssh(slice)

πŸ”§ API Reference

Models and Loaders

from fabric_generic_cluster import (
    SiteTopology,              # Main topology model
    Node,                      # Node model
    Network,                   # Network model
    load_topology_from_yaml_file,   # Load from YAML file
    load_topology_from_dict,        # Load from dictionary
)

Deployment Functions

from fabric_generic_cluster import (
    deploy_topology_to_fabric,   # Deploy slice to FABRIC
    configure_l3_networks,        # Configure L3 networks
    get_slice,                    # Get existing slice
    delete_slice,                 # Delete slice
    check_slices,                 # List all slices
)

# Usage
slice = deploy_topology_to_fabric(topology, "slice-name")
configure_l3_networks(slice, topology)

Network Configuration

from fabric_generic_cluster import (
    configure_node_interfaces,    # Configure all interfaces
    verify_node_interfaces,       # Verify configuration
    ping_network_from_node,       # Test connectivity
    update_hosts_file_on_nodes,   # Update /etc/hosts
)

# Usage
configure_node_interfaces(slice, topology)
verify_node_interfaces(slice, topology)

SSH Setup

from fabric_generic_cluster import (
    setup_passwordless_ssh,       # Complete SSH setup
    verify_ssh_access,            # Verify SSH connectivity
)

# Usage
setup_passwordless_ssh(slice)
results = verify_ssh_access(slice, topology, "node-1", "network1")

Visualization

from fabric_generic_cluster import (
    print_topology_summary,       # Detailed summary
    print_compact_summary,        # Brief summary
    draw_topology_graph,          # Visual graph
)

# Usage
print_topology_summary(topology)
draw_topology_graph(topology, show_ip=True, save_path="topology.png")

πŸ› οΈ Command-Line Tools

Topology Summary Generator

The package includes a command-line tool for generating topology summaries:

# Generate summary for a YAML file
fabric-topology-summary input.yaml --output output.yaml

# Just print summary without modifying file
fabric-topology-summary input.yaml --dry-run

# Include ASCII diagram
fabric-topology-summary input.yaml --ascii --output output.yaml

This tool is automatically installed when you install the package.

πŸ’» Development

Setting Up Development Environment

# Clone repository
git clone https://github.com/mcevik0/fabric-generic-cluster.git
cd fabric-generic-cluster

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install in editable mode with dev dependencies
pip install -e ".[dev]"

Running Tests

# Run test suite
pytest tests/

# Run specific test
python tests/test-dpu-support.py
python tests/test-fpga-support.py

Building the Package

# Install build tools
pip install build twine

# Build distribution
python -m build

# Check package
twine check dist/*

# Test upload to TestPyPI
twine upload --repository testpypi dist/*

# Upload to PyPI
twine upload dist/*

Code Style

# Format code
black fabric_generic_cluster/

# Check style
flake8 fabric_generic_cluster/

πŸ“– Documentation

Comprehensive Guides

Example Topologies

Example YAML topology files are available in the notebooks repository:

  • Basic 2-node cluster
  • Multi-site deployment
  • Storage cluster with NVMe
  • DPU/SmartNIC configurations
  • FPGA-enabled topologies
  • OpenStack deployment variants

YAML Topology Format

site_topology:
  nodes:
    node-1:
      hostname: node-1
      site: SITE1
      capacity:
        cpu: 8
        ram: 32
        disk: 100
        os: default_rocky_9
      nics:
        nic1:
          interfaces:
            iface1:
              binding: network1
              ipv4_address: 10.0.1.1
              ipv4_netmask: 255.255.255.0
      pci:
        dpu:
          dpu1:
            model: NIC_ConnectX_7_100
            interfaces:
              iface1:
                binding: network1
                ipv4_address: 10.0.1.10

  networks:
    network1:
      name: network1
      type: L2Bridge
      subnet: 10.0.1.0/24

🀝 Contributing

Contributions are welcome! Here's how you can help:

  1. Report bugs: Open an issue on GitHub
  2. Suggest features: Open an issue with your idea
  3. Submit PRs: Fork, make changes, and submit a pull request

Contribution Guidelines

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Development Workflow

  1. Update code in fabric_generic_cluster/
  2. Add tests in tests/
  3. Update documentation
  4. Run tests: pytest tests/
  5. Build package: python -m build
  6. Test locally: pip install dist/*.whl

πŸ“Š Performance

  • Validation Speed: ~10ms for typical topology (3-10 nodes)
  • Deployment Time: Depends on FABRIC (typically 5-10 minutes)
  • Network Config: ~30 seconds per node
  • SSH Setup: ~1-2 minutes for 3-node cluster

πŸ—ΊοΈ Roadmap

  • Type-safe Pydantic models
  • DPU interface support
  • Multi-distro support (Rocky/Ubuntu/Debian)
  • L2/L3 network configuration
  • Automated SSH setup
  • PyPI package distribution
  • Web-based topology editor
  • Ansible playbook integration
  • Monitoring and metrics collection
  • REST API endpoint

πŸ› Troubleshooting

Import Issues

Problem: ModuleNotFoundError: No module named 'fabric_generic_cluster'

Solution:

pip install fabric-generic-cluster

YAML File Not Found

Problem: FileNotFoundError when loading topology

Solution: Use absolute paths or ensure YAML file is in current directory:

from pathlib import Path

yaml_file = Path("path/to/topology.yaml")
topology = load_topology_from_yaml_file(str(yaml_file))

DPU Interfaces Not Detected

Problem: DPU interfaces not showing up

Solution: Verify DPU configuration in YAML:

node = topology.get_node_by_hostname("node-1")
print(f"DPUs: {node.pci.dpu}")

# Check all interfaces
all_ifaces = node.get_all_interfaces()
print(f"Total interfaces: {len(all_ifaces)}")

Network Configuration Fails

Problem: Interface configuration errors

Solution:

  1. Check L3 networks are configured first: configure_l3_networks(slice, topology)
  2. Ensure nodes are active: slice.wait()
  3. Verify OS detection: Check logs for supported distro

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

πŸ“ž Support

πŸ“¦ Related Repositories

πŸ”— Links


Made with ❀️ for the FABRIC Community

Author: Mert Cevik (@mcevik0)

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages