I'm an AI Engineering student at Inha University with a strong interest in Computer Vision and Multimodal AI. I've spent the last year digging into Vision-Language Models (VLMs) as an undergraduate research assistant — specifically on the problem of object hallucination, where models confidently describe things that aren't in the image.
Beyond research, I like building things that actually run. Most of my recent projects sit at the intersection of LLMs, RAG pipelines, and practical deployment.
- Finishing up RoboGuard, an RLAIF-based RAG agent for industrial robot manuals that uses a self-correction loop (LangGraph + Gemini) to catch and fix hallucinated responses before they reach the user
- Exploring how inference-time defense strategies (like B-VCD) can make VLMs more reliable without any retraining
- kubeflow/mcp-server #44 — added HuggingFace model-ID suggestions to the MCP server's
pre_flighttool, with unit tests (under review)
ML / DL
PyTorch Hugging Face Transformers scikit-learn TensorFlow / Keras
LLM & RAG
LangChain LangGraph ChromaDB Gemini API LangSmith
Vision
OpenCV PIL
General
Python Git Jupyter Google Colab macOS (Apple Silicon)
RoboGuard-RLAIF An enterprise RAG agent for UR10e robot technical support. The core idea: instead of just retrieving and generating, the system runs a judge model on every response and loops back to revise if it detects hallucinated content. Integrates InstructGPT-style reward modeling, Reflexion-style episodic memory, and Self-RAG critique tokens — all wired together in a cyclic LangGraph pipeline with a Streamlit UI and LangSmith tracing.
B-VCD: Mitigating Object Hallucination in VLMs Training-free, inference-time hallucination defense for VLMs. Perturbs the visual input with a physically-modeled degradation (motion blur + illumination attenuation + Poisson–Gaussian noise) and selects the most grounded answer via a degraded-image-grounded LLM-as-a-Judge (Gemini 2.5 Flash) on VizWiz-VQA.
Emotion-Aware Multimodal Chatbot Combines BiLSTM + Attention for text emotion recognition with EfficientNet for image emotion, then fuses both modalities to generate context-aware responses.
Pneumonia Classification CNN-based chest X-ray classifier built with TensorFlow/Keras.
Forest Cover Classification Multi-class tabular classification using tree-based and ensemble methods.
IBM SkillsBuild
DeepLearning.ai
Naver Boostcourse
Codecademy