Skip to content
View Tobi-Adesoye's full-sized avatar

Block or report Tobi-Adesoye

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please donโ€™t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this userโ€™s behavior. Learn more about reporting abuse.

Report abuse
Tobi-Adesoye/README.md

Hi, I'm Tobi Adesoye ๐Ÿ‘‹

Systems Architect & AI Infrastructure Engineer

I build hardware-aware optimization layers, deterministic execution frameworks, and low-level tensor routing middleware for distributed deep learning loops.


โšก Current Main Focus: renorm-native

I am the author of renorm-native, a completely decoupled, self-bootstrapping middleware layer that eliminates cumulative memory fragmentation and delayed CUDA/ROCm OOM cliffs in long-context transformer architectures.

  • Zero-Dependency Hook: Acts as a passive layout referee via an externalized profile matrix (gateway_profiles.json).
  • Register-Fused Optimization: Bypasses off-chip HBM round-trips by locking dynamic tensor structures inside local SRAM registers via Triton.

๐Ÿ”ฌ Core Technical Analysis & Research

If you are managing multi-node accelerator clusters or fighting unhandled allocation crashes mid-run, check out my definitive engineering guide on memory physics:

๐Ÿ“‘ [Read: Why Your PyTorch Models Crash at Step 200: The Physics of Cumulative Memory Fragmentation]https://medium.com/@adesoyetobe/why-your-pytorch-models-crash-at-step-200-the-physics-of-cumulative-memory-fragmentation-0b2fc37cd92c?postPublishedType=repub

An in-depth micro-architectural deep-dive exploring how uncoalesced memory transit steps leave jagged micro-gaps across physical cache lines, why standard caching allocators panic, and how block-level tensor padding stabilizes the VRAM floor permanently.


๐Ÿ› ๏ธ Expertise & Domain Focus

  • Hardware Architecture: NVIDIA HBM3e layout mapping, AMD ROCm/wave64 sector alignment.
  • Framework Runtimes: Deep PyTorch internals, custom Triton kernels, vLLM worker clustering, FSDP tensor sharding.
  • System Design: Property-driven declarative metadata engines, zero-maintenance middleware.

๐Ÿ“ซ Let's stabilize your compute: If your infrastructure team is scaling high-throughput serving architectures, enterprise SFT runs, or complex agent loops and hitting memory ceilings, open an issue on renorm-native or reach out for custom cluster profiling.

Popular repositories Loading

  1. gordon-docs gordon-docs Public

    HTML

  2. renorm-native renorm-native Public

    Custom CUDA & Triton fused layers for self-stabilizing transformer architectures. Accelerate forward/backward passes and prevent gradient explosions in large-scale LLM training.

    Python

  3. torchtune torchtune Public

    Forked from meta-pytorch/torchtune

    PyTorch native post-training library

    Python

  4. FastVideo FastVideo Public

    Forked from hao-ai-lab/FastVideo

    A unified inference and post-training framework for accelerated video generation.

    Python

  5. demucs demucs Public

    Forked from facebookresearch/demucs

    Code for the paper Hybrid Spectrogram and Waveform Source Separation

    Python

  6. aria aria Public

    Forked from aria-aiops/aria

    Python