Loading...

XML

Word

Printable

Details

Type: New Feature
Resolution: Done
Priority: Major
Fix Version/s: 0.3
Affects Version/s: 0.2.1
Labels:
None

Difficulty:
Unknown
Similar issues:

Description

We need an evaluation framework to test how well the system works for different types of tasks.

We need to develop or use a framework that supports:

List(s) of tasks/prompts for different types of tasks
Example content to index as context for these tasks
Storing answers of the LLM
Automated evaluation of the answers with another LLM
Manual evaluation by a human
Storing evaluation results
Generating visualizations how well the LLM performed on different tasks

Attachments

Activity

People

Assignee:: Paul Pantiru

Reporter:: Michael Hamann

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 21/Feb/24 17:30

Updated:: 24/May/24 14:49

Resolved:: 24/May/24 14:49