nvjullin

nvjullin

Achievements

vllm vllm Public

Forked from vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python
flashinfer flashinfer Public

Forked from flashinfer-ai/flashinfer

FlashInfer: Kernel Library for LLM Serving

C++
sglang sglang Public

Forked from sgl-project/sglang

SGLang is a fast serving framework for large language models and vision language models.

Python
bench_serving bench_serving Public

Forked from kedarpotdar-nv/bench_serving

Python
srt-slurm srt-slurm Public

Forked from NVIDIA/srt-slurm

NVIDIA Inference Benchmarks provide recipes in ready-to-use templates for evaluating platform speed. Validate your platform across specific AI use cases across hardware and software combinations.

Python
sgl-DeepGEMM sgl-DeepGEMM Public

Forked from sgl-project/DeepGEMM

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda