Details
-
Task
-
Resolution: Fixed
-
Major
-
0.3.1
-
None
Description
The benchmark we created needs to be executed and results need to be compiled. For this, the following needs to be done:
- Select a list of LLMs to evaluate
- Execute the benchmark
- Check the results, making sure that the automated evaluation is good
- Document the results