Skip to content

Parsing failure: Tuple[List[str], List[str]] with nested lists & accented characters #258

@claire83

Description

@claire83

Problem

emulate() raises a ValueError when the return type is Tuple[List[str], List[str]] and the LLM response contains accented Unicode characters. All parsing strategies (Native, Heuristic, Semantic, Knowledge) fail, despite the LLM returning syntactically valid Python.

Minimal reproduction

from OpenHosta import emulate
from typing import Tuple, List

def obtenir_routine_sommeil(description: str) -> Tuple[List[str], List[str]]:
    """
    Given a brief description of what the user wants, return a tuple of two lists:
    - First list: steps of a routine (list[str])
    - Second list: associated schedule times (list[str])
    """
    return emulate()

etapes, horaire = obtenir_routine_sommeil("une routine du soir pour bien dormir")
print("Etapes:", etapes)
print("Horaire:", horaire)
Expected behaviour
emulate() parses the LLM response and returns a valid Tuple[List[str], List[str]].
Actual behaviour
ValueError: Impossible de convertir la réponse du LLM vers le type typing.Tuple[typing.List[str], typing.List[str]].

=== Réponse du LLM ===
(
    [
        "Préparer la chambre et le lit (couverture, oreiller)",
        "Couper les écrans et réduire les stimulations lumineuses",
        "Réaliser 10 minutes de respiration calme ou relaxation",
        "Se mettre au lit et fermer les yeux",
        "Dormir en continu jusqu'au réveil"
    ],
    [
        "20:00 - Début de la routine du soir",
        "20:30 - Arrêt des activités intenses et écrans",
        "21:00 - Extinction des lumières",
        "21:15 - Mise au lit",
        "07:00 - Réveil programmé"
    ]
)

=== Détail de l'erreur ===
Type global invalide ou non parsable.
Raison: Native parsing failed: None
Heuristic parsing failed: Tuple length mismatch: expected 2, got 5
Semantic parsing failed: Could not parse tuple from string to <class 'tuple'>:
  unexpected character after line continuation character
Knowledge parsing failed: None
Analysis
- Native parsing fails (returns None)
- Heuristic parsing splits on commas/newlines inside nested lists, counting 5 items instead of 2
- Semantic parsing fails on multi-line formatting
- Knowledge parsing also fails
The LLM output is valid Python that ast.literal_eval can parse correctly. The issue lies in OpenHosta's parsing strategies not handling:
1. Multi-line tuples with nested lists
2. Accented UTF-8 characters in strings
3. ast.literal_eval could be added as a robust fallback
Environment
- OpenHosta: 4.3.1
- Python: 3.13
- OS: Windows 11

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions