Skip to content
View manjitpokhrel's full-sized avatar

Highlights

  • Pro

Block or report manjitpokhrel

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
manjitpokhrel/README.md

Manjit Pokhrel

Adversarial ML Researcher · AI Security · GPU Systems

Kathmandu, Nepal
manjitpokhrel.com.np · Email · LinkedIn


Research Focus

I study multilingual alignment failures in instruction-tuned large language models and design training-free inference optimization systems for consumer GPUs.

My work operates at the intersection of:

  • Adversarial robustness
  • Tokenization asymmetries
  • Multilingual safety alignment
  • GPU kernel-level performance engineering

Selected Contributions

NASB — Nepali Adversarial Safety Benchmark

The first structured adversarial benchmark for Nepali LLMs.

  • 1,200+ multilingual and code-switched adversarial probes
  • 73.7% safety bypass rate in Nepali
  • 0% bypass rate in English across the same models
  • Evaluated across Qwen, Gemma, Llama, and Gemini

Revealed systemic tokenizer and alignment asymmetries in low-resource languages.


Vajra Morphing

A sub-tokenization attack exploiting Devanagari–Latin code-switching.

Demonstrates how tokenizer fragmentation can bypass safety-aligned boundaries without semantic mutation.


GhostWeight

Training-free activation sparsity framework for LLM inference.

  • Sparse row-packing CUDA kernels
  • No retraining
  • No fine-tuning
  • Up to 110.5% inference speedup on RTX 5060

Designed for consumer-grade hardware deployment.


LLM Red Teaming & Alignment Analysis

Systematic evaluation of:

  • Prompt injection attacks
  • Jailbreak vulnerabilities
  • Multilingual safety failures
  • Alignment boundary inconsistencies

Across open-weight and proprietary instruction-tuned models.


Publications & Disclosures

2026
Lost in Translation: Safety Alignment Failures in Nepali and Code-Switched Variants of Instruction-Tuned Large Language Models
Zenodo

2026
GhostWeight: Training-Free Activation Sparsity for LLM Inference on Consumer Hardware
PyPI / GitHub

2026
Google AI VRP — Multilingual Alignment Bypass Disclosure (Triaged)

2026
Meta Whitehat — Multilingual Safety Asymmetry Disclosure (Submitted)


Technical Stack

Adversarial ML & Security
Prompt injection · Jailbreak evaluation · PGD · Membership inference · Responsible disclosure

ML Frameworks
PyTorch · Transformers · PEFT · HuggingFace · NumPy · scikit-learn · llama.cpp

GPU & Systems
CUDA 12.6 · CUDA C · Sparse matrix ops · CuPy JIT · Nsight Compute · Roofline analysis

Languages
Python · C++ · CUDA C · SQL · JavaScript

Infrastructure
Linux · Git · Docker · HuggingFace Spaces


Education

Kathmandu University
BSc Computer Science — Expected 2029

Focus: Adversarial ML · AI Security · GPU Systems Optimization · Transformer Architectures

Pinned Loading

  1. mini-GPT mini-GPT Public

    GPT built from scratch in pure NumPy. No PyTorch. No TensorFlow. 211K parameters. Trained on Shakespeare. Full transformer with multi-head attention, backpropagation, and Adam optimizer.

    Python

  2. nepali-finetune nepali-finetune Public

    Fine-tuning Qwen2.5-1.5B on Nepali text using a RTX 5060 (Blackwell, sm_120)

    Python

  3. GhostWeight GhostWeight Public

    Training-free activation sparsity for LLMs. 74% hardware speedup on RTX 5060 (Blackwell) with 5.91% perplexity cost. Zero retraining. Static dead neuron masking + GhostGate threshold activation sur…

    Python 4

  4. bridgedoc bridgedoc Public

    Trilingual document translation tool for PDF, DOCX, CSV and TSV files across English, Nepali and Tamang — Google TMT Hackathon 2026

    Python