Loading...

XML

Word

Printable

Details

Type: Task
Resolution: Fixed
Priority: Major
Fix Version/s: 0.6
Affects Version/s: 0.3.1
Labels:
None

Difficulty:
Unknown
Documentation:
https://design.xwiki.org/xwiki/bin/view/Proposal/X-AI/WAISE/Evaluation%20Methodology/
Similar issues:

Description

The benchmark we created needs to be executed and results need to be compiled. For this, the following needs to be done:

Select a list of LLMs to evaluate
Execute the benchmark
Check the results, making sure that the automated evaluation is good
Document the results

Attachments

Sub-Tasks

There are no Sub-Tasks for this issue.

Activity

People

Assignee:: Paul Pantiru

Reporter:: Michael Hamann

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 20/Jun/24 11:25

Updated:: 13/Aug/24 21:00

Resolved:: 13/Aug/24 21:00

Date of First Response:: 13/Aug/24 9:00 PM