TaeYang Hong taeyang0505

Hi, I'm Taeyang

I'm an AI Engineering student at Inha University with a strong interest in Computer Vision and Multimodal AI. I've spent the last year digging into Vision-Language Models (VLMs) as an undergraduate research assistant — specifically on the problem of object hallucination, where models confidently describe things that aren't in the image.

Beyond research, I like building things that actually run. Most of my recent projects sit at the intersection of LLMs, RAG pipelines, and practical deployment.

What I'm working on

Finishing up RoboGuard, an RLAIF-based RAG agent for industrial robot manuals that uses a self-correction loop (LangGraph + Gemini) to catch and fix hallucinated responses before they reach the user
Exploring how inference-time defense strategies (like B-VCD) can make VLMs more reliable without any retraining

Open-source contributions

kubeflow/mcp-server #44 — added HuggingFace model-ID suggestions to the MCP server's pre_flight tool, with unit tests (under review)

Tech I use regularly

ML / DL PyTorch Hugging Face Transformers scikit-learn TensorFlow / Keras

LLM & RAG LangChain LangGraph ChromaDB Gemini API LangSmith

Vision OpenCV PIL

General Python Git Jupyter Google Colab macOS (Apple Silicon)

Projects

RoboGuard-RLAIF An enterprise RAG agent for UR10e robot technical support. The core idea: instead of just retrieving and generating, the system runs a judge model on every response and loops back to revise if it detects hallucinated content. Integrates InstructGPT-style reward modeling, Reflexion-style episodic memory, and Self-RAG critique tokens — all wired together in a cyclic LangGraph pipeline with a Streamlit UI and LangSmith tracing.

B-VCD: Mitigating Object Hallucination in VLMs Training-free, inference-time hallucination defense for VLMs. Perturbs the visual input with a physically-modeled degradation (motion blur + illumination attenuation + Poisson–Gaussian noise) and selects the most grounded answer via a degraded-image-grounded LLM-as-a-Judge (Gemini 2.5 Flash) on VizWiz-VQA.

Emotion-Aware Multimodal Chatbot Combines BiLSTM + Attention for text emotion recognition with EfficientNet for image emotion, then fuses both modalities to generate context-aware responses.

Pneumonia Classification CNN-based chest X-ray classifier built with TensorFlow/Keras.

Forest Cover Classification Multi-class tabular classification using tree-based and ensemble methods.

Certifications

IBM SkillsBuild

DeepLearning.ai

Naver Boostcourse

Codecademy

Provide feedback

Saved searches

Use saved searches to filter your results more quickly