A student-facing research matchmaking platform that connects students with research opportunities at top institutions. This MVP includes a FastAPI backend for resume parsing, web scraping, matching, and email generation, plus a Next.js frontend with authentication.
labmate/
├── backend/ # FastAPI backend
│ ├── app/
│ │ ├── main.py # FastAPI app with /match and /generate_email endpoints
│ │ └── services/
│ │ ├── resume_parser.py # PDF parsing with PyMuPDF + spaCy
│ │ ├── scraper.py # Web scraping with BeautifulSoup/Selenium
│ │ ├── matching.py # BERTScore-based semantic matching
│ │ └── email_generator.py # OpenAI cold email generation
│ └── requirements.txt
├── web/ # Next.js frontend
│ ├── app/
│ │ ├── page.tsx # Main matching interface
│ │ ├── auth/signin/ # Sign-in page
│ │ └── api/auth/[...nextauth]/ # Auth.js API routes
│ ├── lib/
│ │ ├── auth.ts # NextAuth configuration
│ │ └── prisma.ts # Prisma client
│ └── prisma/
│ └── schema.prisma # Prisma schema (you'll configure this)
└── README.md
- Resume Parsing: Extracts skills, interests, and experiences from PDF resumes using PyMuPDF and spaCy
- Web Scraping: Scrapes faculty pages from 6 NJ institutions (Rutgers, NJIT, Princeton, Stevens, TCNJ, Seton Hall) using BeautifulSoup and Selenium
- Semantic Matching: Uses BERTScore to compute similarity between resume and professor profiles, returns top 3 matches
- Cold Email Generation: Generates personalized outreach emails using OpenAI's GPT-4o-mini
- Authentication: NextAuth.js with Prisma adapter, supports GitHub OAuth and email/password
- Session Management: Tracks contacted professors to avoid repeat suggestions
- Python 3.9+
- Chrome/Chromium (for Selenium scraping)
cd backend
# Create virtual environment
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Download spaCy model
python -m spacy download en_core_web_smCreate a .env file in backend/:
OPENAI_API_KEY=your-openai-api-key
CORS_ORIGINS=http://localhost:3000
DATABASE_URL=postgresql://user:password@localhost:5432/labmate # Optional for future Prisma integrationuvicorn app.main:app --reload --port 8000The API will be available at http://localhost:8000. Check http://localhost:8000/health to verify.
- Node.js 18+
- PostgreSQL database (for Prisma)
cd web
# Install dependencies
npm install
# Configure Prisma (you'll set up the database connection)
# Edit prisma/schema.prisma and set your DATABASE_URLCreate a .env.local file in web/:
# Database
DATABASE_URL="postgresql://user:password@localhost:5432/labmate"
# NextAuth
NEXTAUTH_SECRET="generate-a-random-secret-here"
NEXTAUTH_URL="http://localhost:3000"
# OAuth Providers (optional)
GOOGLE_CLIENT_ID="your-google-client-id"
GOOGLE_CLIENT_SECRET="your-google-client-secret"
GITHUB_CLIENT_ID="your-github-client-id"
GITHUB_CLIENT_SECRET="your-github-client-secret"
# Backend API
NEXT_PUBLIC_API_BASE_URL="http://localhost:8000"# Generate Prisma Client
npx prisma generate
# Run migrations
npx prisma migrate dev
# (Optional) Open Prisma Studio to view data
npx prisma studionpm run devThe app will be available at http://localhost:3000.
- Sign In: Use GitHub OAuth or create an account with email/password
- Upload Resume: Upload a PDF resume
- Select Institutions: Choose one or more NJ institutions
- Get Matches: Click "Find my top 3 professors" to get ranked matches
- Select Professor: Click on a professor card to select them
- Generate Email: Enter your name and click "Generate cold email"
The scraper (backend/app/services/scraper.py) supports both BeautifulSoup (for static HTML) and Selenium (for JavaScript-rendered content). Each institution has a configuration with:
- Base URL for faculty pages
- CSS selectors for professor containers, names, departments, research focus, and profile links
- Fallback patterns if primary selectors fail
Note: Real-world scraping will require institution-specific adjustments as website structures vary. The current implementation provides a solid foundation that can be extended with LangGraph-based orchestration.
The matching pipeline:
- Converts resume profile (skills, interests, experiences) into a text representation
- Converts each professor profile into a text representation
- Uses BERTScore to compute semantic similarity (F1 score)
- Ranks professors by similarity and returns top 3
BERTScore will download models on first run (~400MB).
- Body: Form data with
resume(PDF file) andinstitutions(query params) - Response:
{ resume_profile: {...}, top_professors: [...] }
- Body:
{ resume_profile: {...}, professor: {...}, user_name: string } - Response:
{ email_text: string }
- Response:
{ status: "healthy" }
- Connect Prisma to backend for professor caching
- Implement LangGraph-based scraping orchestration
- Add professor exclusion logic (prevent showing already-contacted professors)
- Enhance resume parsing with more sophisticated NLP
- Add email tracking and analytics
- Deploy backend and frontend
MIT