Skip to content

FEATURE: T2B tool calling accuracy benchmark #256

@lilijap

Description

@lilijap

Is your feature request related to a problem? Please describe.
We would like to assess the precision of tool calling of T2B vs state of the art simulators.

Describe the solution you'd like
We would like to simulated models using the open-source Python package basico and compared the resulting time-course outputs with those obtained from T2B via tool calling, where T2B would utilize automatically generated user prompts to execute the simulations. Exact values at specific simulation timepoints should be compared to address the precision of T2B's answers regarding the concentration of species X at time T.

Following operation would be tested:

  • time course simulation
  • parameter scanning
  • readout of values at specified time-point

Approximately 100 models from open-source repositories will be employed to evaluate T2B performance in a high-throughput manner.

Results
he results should be presented as code in Jupyter notebook(s), including statistical analysis of T2B hit and miss rates.

Metadata

Metadata

Assignees

Labels

Type

No fields configured for Task.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions