Skip to content
View Harshan1823's full-sized avatar

Highlights

  • Pro

Block or report Harshan1823

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Harshan1823/README.md

Visitor Badge

🚀 About Me

I'm a data enthusiast with a passion for building end-to-end production pipelines and developing impactful machine learning solutions. Currently, I’m working as a Research Assistant with the Virtuosos team, where I focus on leveraging AI to solve complex challenges and research cutting-edge applications. Previously, I have held roles at Amazon, Quincy Credit Union, and Virtusa (Jhonson & Jhonson), where I gained extensive experience in building scalable data solutions, implementing machine learning models, and optimizing data processing pipelines across diverse industries. With nearly 3 years of experience in the data space, I have a proven track record in delivering successful products/projects across the Pharma, Finance, and E-commerce industries.

My technical expertise spans Python, PyTorch, TensorFlow, Node.js, Java, and Terraform, alongside extensive experience with cloud platforms such as AWS, Azure, and GCP. I am particularly interested in developing scalable data pipelines, implementing robust MLOps strategies, and fine-tuning large language models (LLMs) for domain-specific applications.

Driven by a passion for innovation and problem-solving, I thrive in collaborative environments where I can design, optimize, and deploy machine learning solutions to production. Feel free to explore my projects below, and reach out if you’d like to discuss opportunities for collaboration, explore new ideas, or connect for potential roles!

Feel free to explore my projects below and connect with me for opportunities to collaborate or discuss potential roles.

📧 Email: ganugula.h@northeastern.edu

Currently Working On

  1. TagMyCompaint: Our project aims to streamline complaint resolution by automatically categorizing consumer complaints using a hybrid approach that combines fine-tuned DistilBERT for product and issue classification with traditional machine learning models for sub-products and sub-issues. This reduces manual errors and enhances routing efficiency, ensuring faster complaint handling.

  2. GeneSQL: GeneSQL is a project focused on leveraging text-to-SQL models to enable natural language querying of structured databases. By integrating advanced NLP techniques, the system translates user queries into SQL commands, simplifying database interactions and making data retrieval more intuitive for non-technical users.

  3. RAG Enchancements: We are currently focused on optimizing the existing RAG framework by integrating advanced techniques inspired by Anthropic research and leveraging state-of-the-art re-ranking methodologies to significantly enhance the relevance and precision of the system’s responses.

Key Projects

  • FiNER - FiNER is a financial entity recognition system designed to identify 139 distinct entity tags within company filings submitted to the SEC during quarterly reports. The solution leverages advanced NLP techniques, including RNNs, LSTMs, and a custom transformer architecture, to accurately classify and extract financial entities, thereby streamlining the analysis of complex regulatory documents. and

  • LLM's on domain specifics - In this project, I implemented Low-Rank Adaptation (LoRA) by leveraging various adapters to fine-tune large language models (LLMs) for answering domain-specific queries. This approach enabled efficient parameter updates, reducing training overhead while preserving model performance. The fine-tuned model demonstrated improved accuracy in generating targeted responses, making it a robust solution for handling specialized domain questions.

  • Activity Identification - The real-time activity recognition project utilizes Amazon Go data to accurately identify and classify customer actions within the store environment. By leveraging state-of-the-art computer vision models, including YOLO for object detection and Vision Transformers for action recognition, the system effectively captures and interprets customer behaviors, enabling enhanced insights into shopping patterns and improving automated checkout experiences.

  • Real-time events processing - Engineered a real-time data pipeline using Azure EventHub to capture 5 GB+ of daily logs, achieving a improvement in data processing efficiency and a boost in data retrieval speeds by leveraging Azure Storage and Azure Analytics for enhanced scalability.

Languages and Tools:

arduino aws azure bash c cassandra cplusplus django docker elasticsearch express flask gcp git hadoop hive java javascript jenkins kafka kubernetes linux mariadb matlab mongodb mssql mysql nodejs opencv oracle pandas postgresql postman python pytorch redis scikit_learn seaborn selenium spring sqlite tensorflow

Pinned Loading

  1. complaintTag complaintTag Public

    Jupyter Notebook 1 1

  2. FinancialNumericEntityRecognition FinancialNumericEntityRecognition Public

    Python

  3. RealTime_Activity_Recognition RealTime_Activity_Recognition Public

    Jupyter Notebook

  4. LLM-with-LoRA LLM-with-LoRA Public

    Jupyter Notebook

  5. Advanced-Text-Generation-with-Transformer-Architectures Advanced-Text-Generation-with-Transformer-Architectures Public

    Jupyter Notebook

  6. mahesh973/Docs-RAG mahesh973/Docs-RAG Public

    RAG System on Scikit-Learn Docs

    Python