A proof-of-concept showing how an LLM orchestrator can intelligently route requests between legacy systems and modern microservices.
When migrating from a monolithic backend to a set of microservices, you typically want to do this piece by piece following something like a Strangler Fig Pattern. This helps mitigate the risk by allowing you to move the old software piece by piece instead of all at once.
This PoC shows how an LLM can sit in the middle and figure out the best way to handle any request, automatically preferring modern services over legacy ones, and using advanced reasoning like conditional intents and multiple intents in a single message.
Simple terminal interface. Type your requests like you're chatting with support.
The brain. Figures out what you want, decides which services to call, and when to call them.
Remembers your conversations. Stored as JSON files, one per user.
The legacy monolith. This will be called if the intent handler exists only here.
The system will prefer to call the intent handlers in these microservices.
- The orchestrator boots up and asks all services (old and new) what they can do.
- Each service returns a list of "intents" it can handle, along with parameters and descriptions.
- The orchestrator prefers microservice versions when there's overlap.
- The bot service asks you to identify yourself by email.
The bot passes your message to the orchestrator along with your email
- The orchestrator grabs your current conversation and recent past conversations from memory.
- Current context is crucial, past context is just for reference.
The orchestrator makes an LLM call with your full conversation and all available intents. The LLM responds with:
- A list of intents to execute (in order)
- All needed parameters for each intent
- Optional conditions for conditional execution
- If it's too soon to execute intents (either because we don't know what the intents are yet or because we want to gather more information), the LLM returns a response for the user designed to gather more info. Then we return to step 2.
- When we have a list of intents to execute, we iterate through that list.
- For each intent, if there's a condition attached (like "only if the order is recent"), the orchestrator makes another LLM call to check if it's true before executing, and to attach any necessary context from the previous intent handler responses to the execution of the next.
- Intent handlers are executed sequentially, and responses are stored in memory.
After all intents finish, the orchestrator makes one final LLM call to turn all the results into a natural, helpful response for the user.
├── 📂 conversations/ # Stored conversation JSON files
├── 📂 microservices/
│ ├── 📃 user.py # Modern user service
│ ├── 📃 order.py # Modern order service
│ └── 📃 shipping.py # Modern shipping service
├── 📃 bot.py # Terminal interface
├── 📃 orchestrator.py # LLM intelligence layer
├── 📃 memory.py # Conversation storage
├── 📃 old_backend.py # Legacy monolith
# Install Poetry if you haven't already
curl -sSL https://install.python-poetry.org | python3 -
# Install dependencies
poetry install
# Add your OpenAI API key
echo "OPENAI_API_KEY=your_key_here" > .env
# Run the bot
poetry run python bot.pyThe orchestrator automatically routes between legacy and modern services, evaluates conditions, and generates natural responses.
This PoC shows that LLMs can intelligently route between old and new systems without hardcoded logic. No routing tables, no manual mapping. The orchestrator discovers capabilities and makes smart decisions automatically.
By tracking full conversation history, the bot understands follow-up questions and references to previous topics. "What about my orders?" works because the LLM has context about who you are and what you've been discussing. This makes the conversation more engaging and saves the user from having to repeat themselves.
Not every intent should always fire. The LLM can evaluate natural language conditions like "only if the order hasn't shipped yet" and build execution chains where later steps depend on earlier results.
Services self-describe their capabilities. Add a new microservice and the orchestrator automatically prefers it over legacy equivalents. No central registry to maintain, no deployment coordination needed.
- 🔒 This PoC has no guard rails, and will make available any feature it can access to any person. For example, you could ask for the phone number of any other user. In a production environment, this would be a huge security breach and we would need careful security guardrails and permission limitations.
- 🚀 Making this solution performant in a production environment in a scalable way may be challenging. We need to carefully profile execution time as we build this, and minimize the number of API calls. These must be made asynchronously, and the memory service would need to behave in an idempotent manner and allow a large number of concurrent instances to access it without running into nasty race conditions.
- 🍼 This PoC simplifies many things, like using function calls instead of API calls, file-based storage instead of a real database, it does not prove that this would work equally well with real-life production components.
- 💸 Working with real APIs may introduce concerns like rate limiting, out of control costs, this PoC does not prove that these would not be serious concerns.

