Skip to content
View hoyathalis's full-sized avatar
💭
MLE at Apple
💭
MLE at Apple
  • Apple

Block or report hoyathalis

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
hoyathalis/README.md

Hi, I'm Hoyath Ali

I'm a Machine Learning Engineer working on scalable ML systems — training infrastructure, performance optimization, and on-device / edge inference across diverse hardware accelerators.


Interests

  • Compiler-level perf work for ML (MLIR, Triton, kernel authoring)
  • Distributed and large-scale training infrastructure
  • On-device and edge inference across heterogeneous accelerators

Education

  • M.S. Computer Science — University of California, Riverside

Writing

  • Medium — deep dives on ML systems, distributed training, and performance engineering.

Contact

Pinned Loading

  1. distributed_playground distributed_playground Public

    Minimal PyTorch playground for benchmarking tensor parallelism, compares row-wise vs column-wise splits with NCCL profiling and TensorBoard analysis.

    Python

  2. packet-normalization-cuda packet-normalization-cuda Public

    Optimized CUDA kernel for network packet normalization: 5× faster than PyTorch for real-time ML preprocessing in intrusion detection systems.

    Python

  3. MultiGPUMatMul MultiGPUMatMul Public

    Forked from hoyathali/MultiGPUMatMul

    This project leverages multiple Graphics Processing Units (GPUs) to accelerate matrix multiplication across distributed networks. By dividing large matrices into smaller sections (Bands) and distri…

    Cuda

  4. bedtime.ai bedtime.ai Public

    This project creates a personalized bedtime story system that leverages AI advancements to enhance children's bedtime experiences. By using voice cloning technology, stories are narrated in a paren…

    Jupyter Notebook 2

  5. stablediff_anime stablediff_anime Public

    Stable Diffusion model implemented from scratch and trained on 64x64 anime face dataset. Features complete training pipeline, custom noise scheduling, and inference scripts for generating anime-sty…

    Python

  6. GameOfLife3D GameOfLife3D Public

    Forked from hoyathali/GameOfLife3D

    This implementation provides an interactive visualization of Conway's Game of Life in a 3D space using Python, Numba, Mayavi running on GPU

    Python