Skip to content

Vincentwei1021/awesome-vibe-research

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Awesome Vibe Research Awesome

Humans steer. Agents execute. Research scales.

GitHub stars

English简体中文日本語Español

The English version is the source of truth. Translations may lag slightly behind.

A curated list of tools, frameworks, and resources for vibe research — the discipline of designing research environments, agent workflows, compute loops, and review systems so humans and AI agents can collaborate on literature, experiments, and reporting at scale.

What is Vibe Research?

Vibe Research = Vibe Coding applied to research. Humans define the question. AI agents do the throughput. Humans approve before anything moves forward.

Vibe Coding showed that developers can express intent while AI handles more of the implementation.

Vibe Research applies the same idea to research work:

  • Humans define the problem, scope, standards, constraints, and decision criteria.
  • AI agents search literature, map prior work, propose questions, run experiments, summarize results, and draft reports.
  • Humans remain responsible for steering, approval, interpretation, and final judgment.

The hard part is rarely the model alone. The real bottlenecks are coordination, fragmented context, manual handoffs, idle compute, broken continuity between literature review and experimentation, and the lack of a shared system where humans and agents can see the same research state.

The Five Layers of Vibe Research

Every reliable human-AI research system needs all five layers working together:

┌─────────────────────────────────────────────────────┐
│  5. OBSERVABILITY                                   │
│     Can you see what every agent is doing right now? │
├─────────────────────────────────────────────────────┤
│  4. TRUST                                           │
│     Audit trails for every decision, not just the   │
│     final output. Verifiable reasoning chains.      │
├─────────────────────────────────────────────────────┤
│  3. APPROVAL GATES                                  │
│     Humans approve before agents act on anything    │
│     that changes the research direction.            │
├─────────────────────────────────────────────────────┤
│  2. CONTEXT CONTINUITY                              │
│     Research state persists across sessions, agents, │
│     and weeks. Nothing is lost on restart.          │
├─────────────────────────────────────────────────────┤
│  1. ARCHITECTURE                                    │
│     Agent roles, tool access, memory systems,       │
│     compute orchestration. The environment design.  │
└─────────────────────────────────────────────────────┘

        ↑ Build from the bottom up.
        ↑ Most failures happen in layers 2–4.

Stages of Agent Autonomy

As trust and tooling mature, vibe research naturally progresses through three stages:

Stage Role What the Agent Does What Humans Own
Stage 1: Agent as Intern Execution only Runs experiments designed by humans. No idea proposals, no research questions — just reliable throughput on well-scoped tasks. Problem definition, experiment design, interpretation, all decisions.
Stage 2: Agent as Researcher Independent research within a question Owns a single research question end-to-end. Self-loops at the experiment level: proposes experiments, runs them, interprets results, and iterates — all within the boundary of one research question. Research question formulation, cross-question prioritization, final judgment.
Stage 3: Agent as Research Lead Full project autonomy Drives an entire research project. Formulates research questions, builds and tests hypotheses, decides which to accept or reject, and charts the research direction. Strategic oversight, ethical review, publication decisions.

Most teams today operate at Stage 1. The goal of vibe research tooling is to make Stage 2 reliable and Stage 3 feasible.

Core Principles

Borrowing from agent harness engineering and adapting it to research:

  1. Humans steer, agents execute - researchers own direction, agents own throughput.
  2. Research context is the system of record - papers, notes, experiment status, reports, and code should be inspectable in one stack.
  3. Research is iterative, not linear - strong systems support branching questions, loops, retries, and parallel work.
  4. Approval gates matter - autonomous proposal is useful; autonomous acceptance is dangerous.
  5. Observability matters - if you cannot inspect search traces, experiment progress, and outputs, you cannot trust the workflow.
  6. Compute is part of the workflow - GPU queues, clusters, budgets, and execution status should not live outside the research system.
  7. Agent legibility is a feature - workflows, prompts, tools, and artifacts should be easy for the next human or agent to pick up.
  8. Context continuity beats prompt cleverness - the best systems keep state across literature review, planning, execution, and reporting.

Contents

Full Lifecycle Platforms

Platforms that span a meaningful portion of the research lifecycle: literature discovery, question formation, experiment orchestration, reporting, and human review.

  • Synapse - Recommended starting point. A research orchestration platform built specifically for human researchers and AI agents. It covers literature review, research question formulation, experiment execution, and report generation, with built-in agent management, compute orchestration, real-time observability, an autonomous loop, and 60+ MCP tools. If you only evaluate one platform for vibe research, start here.
  • Denario - A modular multi-agent system for scientific research assistance. More paper-oriented than Synapse, with a stronger emphasis on pushing from idea to method to results to research artifact.
  • OpenAGS - Formerly Auto-Research. An ambitious open-ended effort around autonomous generalist scientist systems across literature, experiment, writing, and robotics. More exploratory than production-ready, but important for the direction of the category.

Auto-Research & AI Scientists

Systems explicitly aimed at autonomous or semi-autonomous research loops: idea discovery, hypothesis generation, experiment iteration, and, in some cases, paper writing. These sit closer to auto-research than to generic agent frameworks.

  • AutoResearchClaw - Fully autonomous and self-evolving research system that pushes from idea to paper. One of the clearest "chat an idea, get a paper" projects in this category.
  • ARIS (Auto-Research-In-Sleep) - Lightweight Markdown-only skills for autonomous ML research, with cross-model review loops, idea discovery, and experiment automation. Canonical upstream for the ARIS line of repos and forks.
  • AgentLaboratory - End-to-end autonomous research workflow designed to help human researchers turn ideas into executable research loops.
  • EvoScientist - Self-evolving AI scientist system focused on autonomous discovery and iterative research workflows.
  • The AI Scientist - Sakana AI's influential system for fully automated open-ended scientific discovery. A foundational reference point for the AI scientist category.
  • The AI Scientist-v2 - Follow-up work focused on workshop-level automated scientific discovery via agentic tree search, pushing deeper into autonomous research branching.
  • autonomous-researcher - Lightweight autonomous researcher project from the direction of independent research execution rather than generic chat-style synthesis.
  • Auto-Deep-Research - Fully automated personal AI assistant with a strong deep-research flavor, useful as a bridge between autonomous research and assistant-style workflows.
  • AutoResearch-SibylSystem - Fully autonomous AI research system with self-evolution, built natively on Claude Code.
  • freephdlabor - Personalized multi-agent system that researches continuously on a user-specified scientific problem. Strongly aligned with the "research while you sleep" pattern.
  • InnoClaw - AI research agent for scientific innovation, closer to autonomous ideation and discovery workflows.
  • BioAgents - Domain-specific AI scientist framework for autonomous deep research in biological sciences, combining literature-analysis agents with data-science agents.
  • biomedical-aiq-research-agent - NVIDIA blueprint for biomedical research-agent workflows. A good example of domain-specific automated researcher design.
  • Marco-DeepResearch - Search-heavy deep research agent focused on realistic and challenging agentic search. Useful when autonomous research quality is bottlenecked by search and retrieval.
  • AI-Co-Scientist - Multi-agent system for scientific research and hypothesis generation. Early-stage, but directionally aligned with the AI scientist category.

Agent Orchestrators

These tools help structure complex agent workflows, split responsibilities across roles, and keep research tasks legible as they move from search to synthesis to execution.

  • LangGraph - Graph-based framework for resilient language agents. Useful when research workflows have explicit stages, checkpoints, retries, and human approval steps.
  • Deep Agents - Agent harness built on LangChain and LangGraph, with planning, filesystem access, and subagent spawning. A good fit for research stacks that need progressive disclosure and recursive task decomposition.
  • AutoGen - Microsoft's framework for agentic AI. Helpful when you want role-specialized research agents such as scout, critic, experimenter, and writer.
  • CrewAI - Framework for orchestrating role-playing autonomous agents. Well-suited to research flows where different agents own literature review, benchmark design, implementation, and write-up.
  • Semantic Kernel - Agent and workflow framework with strong connector support. Useful if your research stack needs to integrate enterprise systems, memory, and multiple model backends.
  • cmbagent - Multi-agent research system powered by AG2. Especially relevant when you want domain-oriented scientific agents rather than only generic planning abstractions.
  • GPT Researcher - Autonomous agent for deep research across web and local data. A good reference point for report-oriented research agents that plan, gather, filter, and synthesize.
  • DeerFlow - Long-horizon super-agent harness with sandboxes, memories, tools, skills, and subagents. Useful when research tasks span browsing, coding, and artifact generation over long sessions.
  • OpenScholar - Retrieval-augmented system for scientific literature synthesis. Especially relevant for grounded, citation-heavy responses over academic corpora.
  • PaperQA2 - High-accuracy literature agent for answering questions from scientific documents with citations. One of the strongest building blocks for paper-grounded research workflows.
  • RD-Agent - Microsoft's R&D automation framework for hypothesis iteration, code writing, experiment execution, and refinement. A strong bridge between software agents and research agents.
  • AutoRA - Automated Research Assistant for closed-loop scientific discovery. Particularly relevant for empirical workflows that iterate between modeling, experiment design, and data collection.

Experiment & Workflow Runners

Once a research workflow leaves the chat window and starts touching jobs, clusters, datasets, or scheduled pipelines, these tools become part of the harness.

  • Metaflow - Build, manage, and deploy AI/ML systems. Strong choice when experiments need lineage, resumability, and a clean path from notebook prototype to recurring workflow.
  • Dagster - Asset-oriented orchestration platform with strong observability. Useful when research outputs become reusable data assets, evaluation sets, or recurring pipelines.
  • Prefect - Python-native workflow orchestration framework for resilient pipelines and scheduled jobs. A practical middle ground for agent-triggered research jobs.
  • SkyPilot - Unified interface for running AI workloads across Kubernetes, Slurm, public cloud, and on-prem infrastructure. Very relevant for GPU-heavy vibe research.
  • Modal - Managed compute platform for batch jobs, agents, and data workloads without owning the full infrastructure stack. Great for small teams that want fast iteration over research code.
  • Ray - AI compute engine and distributed runtime. Useful for parallel search, training, simulation, evaluation sweeps, and high-throughput experiment execution.
  • ClearML - Open-source platform covering experiment management, pipelines, scheduling, and remote execution. Helpful when vibe research needs a more integrated ops layer.
  • TruLens - Evaluation and tracking for LLM experiments and AI agents. Especially useful when you need to measure the quality of research-agent outputs and RAG pipelines.

Agent Runtimes

Runtimes keep agents alive across sessions, channels, and long-running tasks. In research settings, that matters because useful work spans days, not just one prompt.

  • OpenClaw - Persistent agent runtime with skills, sub-agent spawning, and multi-channel session management. A strong fit when research agents need to stay available across messaging, planning, and execution surfaces.

Coding & Lab Agents

Even research-heavy teams still spend a large amount of time writing experiment code, glue code, evaluation code, and analysis scripts. These are the execution layer for that work.

  • Codex - Lightweight coding agent that runs in the terminal. Useful for experiment code, data scripts, automation, and repo-grounded lab tasks.
  • Claude Code - Strong repo-grounded coding agent with skill files, terminal tooling, and increasingly mature multi-agent patterns. Pairs well with research repos that need persistent conventions.
  • OpenHands - AI-driven development platform that can operate inside sandboxed environments. Helpful when research plans need to become concrete code changes or runnable prototypes.
  • Aider - Lightweight AI pair programming in the terminal. Great for smaller research repos, one-off analyses, and iterative experiment hacking.

Literature & Knowledge Systems

Research agents are only as good as the corpora and context systems behind them. These tools help build that layer.

  • PapersGPT for Zotero - Zotero AI and MCP plugin for chatting with PDFs, generating summaries, and automating literature review directly inside a paper library.
  • Zotero - Still the most practical system of record for personal and team paper collections. Many vibe research stacks should treat Zotero as the canonical bibliography layer.
  • Semantic Scholar API - Useful API for paper search, metadata retrieval, citation neighborhood lookup, and related-work discovery.
  • OpenAlex - Open scholarly graph and API for works, authors, institutions, venues, and citations. Very strong default source for reproducible research metadata pipelines.
  • Connected Papers - Visual graph for exploring adjacent work and citation neighborhoods. Good for quickly expanding a seed paper into a landscape map.
  • Elicit - Research assistant focused on literature review and evidence extraction. Helpful for early-phase synthesis and question refinement.
  • Research Rabbit - Literature discovery system that builds collections, explores citation neighborhoods, and recommends adjacent work over time.
  • Litmaps - Dynamic literature mapping for tracking citation networks and newly emerging papers around a seed set.
  • Inciteful - Literature graph exploration tools for discovery from seed papers and for finding connections across bodies of work.
  • Paperpile - Reference management system with good collaborative ergonomics and strong Google Docs or Word workflows for research teams.
  • AgentSLR - Agentic system for systematic literature reviews. A good fit when the workflow needs search, screening, extraction, and synthesis rather than only ad hoc browsing.
  • arXiv - Core preprint corpus for fast-moving fields. Essential if agents need to track the frontier rather than only published papers.
  • OpenReview - Important source of peer review context, workshop papers, and discussion threads in ML-adjacent fields.
  • Papers with Code - Connects papers to code, tasks, datasets, and benchmarks. Very useful when turning literature review into reproducible experimentation.

Reproducibility & Experiment Ops

Research quality depends on being able to rerun, compare, inspect, and hand off work cleanly.

  • MLflow - Open-source AI engineering platform for experiments, evaluation, monitoring, and model or app lifecycle management.
  • Weights & Biases - Popular platform for experiment tracking, dashboards, artifacts, and sweeps. Strong choice when a team needs shared visibility across many runs.
  • DVC - Data versioning and ML experiments for Git-based workflows. Useful when datasets, artifacts, and metrics need to travel with the repo.
  • JupyterLab - The standard computational environment for exploratory analysis, quick experiment iteration, and interactive reporting.
  • Marimo - Reactive notebook stored as pure Python rather than JSON. Particularly friendly to agent workflows because notebooks diff cleanly and can execute like scripts or apps.

Research Writing & Report Generation

Research does not end at a successful run. These tools help turn notes, citations, and results into readable outputs that humans can review and publish.

  • Typst - Modern typesetting system with cleaner syntax and faster iteration than traditional LaTeX. A strong option for agent-assisted research writing.
  • Overleaf - Collaborative LaTeX editor that remains the default writing surface for many academic teams.
  • STORM - Stanford's citation-grounded report generation system for researching a topic and producing a long-form article. Useful as a reference for structured synthesis pipelines.

RAG & Knowledge Infrastructure

These tools help turn paper libraries, notes, and web corpora into queryable knowledge systems that research agents can actually use.

  • LlamaIndex - Widely used framework for ingestion, indexing, retrieval, and document agents. Useful for building paper-grounded research assistants.
  • Firecrawl - Web data API that converts sites into clean, LLM-friendly content. Helpful for large-scale research collection beyond PDFs.
  • Jina Reader - Lightweight way to turn arbitrary URLs into clean markdown for downstream agent consumption.

Standards & Protocols

Research harnesses become much more reusable when tools, agents, and repositories can speak a shared language.

  • MCP (Model Context Protocol) - Open standard for connecting models to external tools and data. Increasingly central for research agents that need live literature, file, and compute access.
  • ACP (Agent Communication Protocol) - Open protocol for communication between agents and harnesses. Relevant when research systems need multiple specialized agents to coordinate cleanly.
  • agents.md - Open standard for project-level agent instructions. Helpful for making research repo conventions explicit and portable.
  • AGENTS.md - OpenAI's repository-level convention for agent instructions. Good for codifying research repo structure, evaluation steps, and tool expectations.
  • Semantic Scholar API - Programmatic access to academic paper search, metadata, and citation relationships. One of the most practical APIs for research-agent backends.
  • arXiv API - Access to arXiv preprints for discovery and retrieval workflows in fast-moving fields.
  • CrossRef API - DOI-centric metadata infrastructure for bibliographic enrichment, reference completion, and citation workflows.

Research-Focused MCP Servers

  • Zotero MCP - MCP bridge for letting AI assistants interact directly with a Zotero library.
  • ScholarMCP - Academic research MCP server spanning literature search, PDF ingestion, and citation management.
  • OpenAlex Research MCP - MCP server for scholarly search, citation analysis, trend discovery, and collaboration mapping via OpenAlex.
  • arXiv MCP Server - MCP server for searching, analyzing, and exporting arXiv papers.
  • Unpaywall MCP - MCP server for DOI metadata and open-access PDF retrieval via Unpaywall.

Methodologies & Workflows

These references are useful because vibe research is not just "let an LLM browse the web." It is a workflow design problem.

  • AI-DLC Workflows - AWS's workflow implementation of AI-DLC. Even though it comes from software delivery, the structure maps cleanly to research loops with explicit phases, artifacts, and review points.
  • AI-Driven Development Lifecycle - A methodology reference for human-in-the-loop agent systems with clear boundaries between understanding, planning, and execution.
  • Harness Engineering: Leveraging Codex in an Agent-First World - Important framing for why environment design, repository structure, feedback loops, and mechanical enforcement matter more than prompt cleverness.
  • Building Effective Agents - A strong reference for keeping agent systems simple, composable, and grounded in real tasks.

Reference & Knowledge

General Harness Reading

Curated Lists

Contributing

Contributions are welcome. Please read CONTRIBUTING.md before opening a PR.

Know a tool that belongs here? Open a PR. Building something in the Vibe Research space? We want to list it.

If this list is useful to you, give it a ⭐ — it helps others find it.

License

CC0 1.0

About

A curated list of tools, frameworks, and resources for vibe research

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors