I'm a data enthusiast with a passion for building end-to-end production pipelines and developing impactful machine learning solutions. Currently, I’m working as a Research Assistant with the Virtuosos team, where I focus on leveraging AI to solve complex challenges and research cutting-edge applications. Previously, I have held roles at Amazon, Quincy Credit Union, and Virtusa (Jhonson & Jhonson), where I gained extensive experience in building scalable data solutions, implementing machine learning models, and optimizing data processing pipelines across diverse industries. With nearly 3 years of experience in the data space, I have a proven track record in delivering successful products/projects across the Pharma, Finance, and E-commerce industries.
My technical expertise spans Python, PyTorch, TensorFlow, Node.js, Java, and Terraform, alongside extensive experience with cloud platforms such as AWS, Azure, and GCP. I am particularly interested in developing scalable data pipelines, implementing robust MLOps strategies, and fine-tuning large language models (LLMs) for domain-specific applications.
Driven by a passion for innovation and problem-solving, I thrive in collaborative environments where I can design, optimize, and deploy machine learning solutions to production. Feel free to explore my projects below, and reach out if you’d like to discuss opportunities for collaboration, explore new ideas, or connect for potential roles!
- Feel free to check my LinkedIn
- Check out my work on Hugging Face 🤗
Feel free to explore my projects below and connect with me for opportunities to collaborate or discuss potential roles.
📧 Email: ganugula.h@northeastern.edu
-
TagMyCompaint: Our project aims to streamline complaint resolution by automatically categorizing consumer complaints using a hybrid approach that combines fine-tuned DistilBERT for product and issue classification with traditional machine learning models for sub-products and sub-issues. This reduces manual errors and enhances routing efficiency, ensuring faster complaint handling.
-
GeneSQL: GeneSQL is a project focused on leveraging text-to-SQL models to enable natural language querying of structured databases. By integrating advanced NLP techniques, the system translates user queries into SQL commands, simplifying database interactions and making data retrieval more intuitive for non-technical users.
-
RAG Enchancements: We are currently focused on optimizing the existing RAG framework by integrating advanced techniques inspired by Anthropic research and leveraging state-of-the-art re-ranking methodologies to significantly enhance the relevance and precision of the system’s responses.
-
FiNER - FiNER is a financial entity recognition system designed to identify 139 distinct entity tags within company filings submitted to the SEC during quarterly reports. The solution leverages advanced NLP techniques, including RNNs, LSTMs, and a custom transformer architecture, to accurately classify and extract financial entities, thereby streamlining the analysis of complex regulatory documents. and
-
LLM's on domain specifics - In this project, I implemented Low-Rank Adaptation (LoRA) by leveraging various adapters to fine-tune large language models (LLMs) for answering domain-specific queries. This approach enabled efficient parameter updates, reducing training overhead while preserving model performance. The fine-tuned model demonstrated improved accuracy in generating targeted responses, making it a robust solution for handling specialized domain questions.
-
Activity Identification - The real-time activity recognition project utilizes Amazon Go data to accurately identify and classify customer actions within the store environment. By leveraging state-of-the-art computer vision models, including YOLO for object detection and Vision Transformers for action recognition, the system effectively captures and interprets customer behaviors, enabling enhanced insights into shopping patterns and improving automated checkout experiences.
-
Real-time events processing - Engineered a real-time data pipeline using Azure EventHub to capture 5 GB+ of daily logs, achieving a improvement in data processing efficiency and a boost in data retrieval speeds by leveraging Azure Storage and Azure Analytics for enhanced scalability.


