Labmate MVP

A student-facing research matchmaking platform that connects students with research opportunities at top institutions. This MVP includes a FastAPI backend for resume parsing, web scraping, matching, and email generation, plus a Next.js frontend with authentication.

Project Structure

labmate/
├── backend/              # FastAPI backend
│   ├── app/
│   │   ├── main.py      # FastAPI app with /match and /generate_email endpoints
│   │   └── services/
│   │       ├── resume_parser.py    # PDF parsing with PyMuPDF + spaCy
│   │       ├── scraper.py          # Web scraping with BeautifulSoup/Selenium
│   │       ├── matching.py          # BERTScore-based semantic matching
│   │       └── email_generator.py  # OpenAI cold email generation
│   └── requirements.txt
├── web/                  # Next.js frontend
│   ├── app/
│   │   ├── page.tsx              # Main matching interface
│   │   ├── auth/signin/          # Sign-in page
│   │   └── api/auth/[...nextauth]/ # Auth.js API routes
│   ├── lib/
│   │   ├── auth.ts      # NextAuth configuration
│   │   └── prisma.ts    # Prisma client
│   └── prisma/
│       └── schema.prisma # Prisma schema (you'll configure this)
└── README.md

Features

Resume Parsing: Extracts skills, interests, and experiences from PDF resumes using PyMuPDF and spaCy
Web Scraping: Scrapes faculty pages from 6 NJ institutions (Rutgers, NJIT, Princeton, Stevens, TCNJ, Seton Hall) using BeautifulSoup and Selenium
Semantic Matching: Uses BERTScore to compute similarity between resume and professor profiles, returns top 3 matches
Cold Email Generation: Generates personalized outreach emails using OpenAI's GPT-4o-mini
Authentication: NextAuth.js with Prisma adapter, supports GitHub OAuth and email/password
Session Management: Tracks contacted professors to avoid repeat suggestions

Backend Setup

Prerequisites

Python 3.9+
Chrome/Chromium (for Selenium scraping)

Installation

cd backend

# Create virtual environment
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Download spaCy model
python -m spacy download en_core_web_sm

Environment Variables

Create a .env file in backend/:

OPENAI_API_KEY=your-openai-api-key
CORS_ORIGINS=http://localhost:3000
DATABASE_URL=postgresql://user:password@localhost:5432/labmate  # Optional for future Prisma integration

Run Backend

uvicorn app.main:app --reload --port 8000

The API will be available at http://localhost:8000. Check http://localhost:8000/health to verify.

Frontend Setup

Prerequisites

Node.js 18+
PostgreSQL database (for Prisma)

Installation

cd web

# Install dependencies
npm install

# Configure Prisma (you'll set up the database connection)
# Edit prisma/schema.prisma and set your DATABASE_URL

Environment Variables

Create a .env.local file in web/:

# Database
DATABASE_URL="postgresql://user:password@localhost:5432/labmate"

# NextAuth
NEXTAUTH_SECRET="generate-a-random-secret-here"
NEXTAUTH_URL="http://localhost:3000"

# OAuth Providers (optional)
GOOGLE_CLIENT_ID="your-google-client-id"
GOOGLE_CLIENT_SECRET="your-google-client-secret"
GITHUB_CLIENT_ID="your-github-client-id"
GITHUB_CLIENT_SECRET="your-github-client-secret"

# Backend API
NEXT_PUBLIC_API_BASE_URL="http://localhost:8000"

Database Setup

# Generate Prisma Client
npx prisma generate

# Run migrations
npx prisma migrate dev

# (Optional) Open Prisma Studio to view data
npx prisma studio

Run Frontend

npm run dev

The app will be available at http://localhost:3000.

Usage

Sign In: Use GitHub OAuth or create an account with email/password
Upload Resume: Upload a PDF resume
Select Institutions: Choose one or more NJ institutions
Get Matches: Click "Find my top 3 professors" to get ranked matches
Select Professor: Click on a professor card to select them
Generate Email: Enter your name and click "Generate cold email"

Web Scraping Details

The scraper (backend/app/services/scraper.py) supports both BeautifulSoup (for static HTML) and Selenium (for JavaScript-rendered content). Each institution has a configuration with:

Base URL for faculty pages
CSS selectors for professor containers, names, departments, research focus, and profile links
Fallback patterns if primary selectors fail

Note: Real-world scraping will require institution-specific adjustments as website structures vary. The current implementation provides a solid foundation that can be extended with LangGraph-based orchestration.

Matching Algorithm

The matching pipeline:

Converts resume profile (skills, interests, experiences) into a text representation
Converts each professor profile into a text representation
Uses BERTScore to compute semantic similarity (F1 score)
Ranks professors by similarity and returns top 3

BERTScore will download models on first run (~400MB).

API Endpoints

`POST /match`

Body: Form data with resume (PDF file) and institutions (query params)
Response: { resume_profile: {...}, top_professors: [...] }

`POST /generate_email`

Body: { resume_profile: {...}, professor: {...}, user_name: string }
Response: { email_text: string }

`GET /health`

Response: { status: "healthy" }

Next Steps

Connect Prisma to backend for professor caching
Implement LangGraph-based scraping orchestration
Add professor exclusion logic (prevent showing already-contacted professors)
Enhance resume parsing with more sophisticated NLP
Add email tracking and analytics
Deploy backend and frontend

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.cursor		.cursor
.vscode		.vscode
backend		backend
database		database
web		web
README.md		README.md
SETUP.md		SETUP.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Labmate MVP

Project Structure

Features

Backend Setup

Prerequisites

Installation

Environment Variables

Run Backend

Frontend Setup

Prerequisites

Installation

Environment Variables

Database Setup

Run Frontend

Usage

Web Scraping Details

Matching Algorithm

API Endpoints

`POST /match`

`POST /generate_email`

`GET /health`

Next Steps

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Labmate MVP

Project Structure

Features

Backend Setup

Prerequisites

Installation

Environment Variables

Run Backend

Frontend Setup

Prerequisites

Installation

Environment Variables

Database Setup

Run Frontend

Usage

Web Scraping Details

Matching Algorithm

API Endpoints

POST /match

POST /generate_email

GET /health

Next Steps

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`POST /match`

`POST /generate_email`

`GET /health`

Packages