Uploaded image for project: 'LLM AI Integration'
  1. LLM AI Integration
  2. LLMAI-61

Implement an evaluation framework

    XMLWordPrintable

Details

    • New Feature
    • Resolution: Done
    • Major
    • 0.3
    • 0.2.1
    • None
    • Unknown

    Description

      We need an evaluation framework to test how well the system works for different types of tasks.

      We need to develop or use a framework that supports:

      • List(s) of tasks/prompts for different types of tasks
      • Example content to index as context for these tasks
      • Storing answers of the LLM
      • Automated evaluation of the answers with another LLM
      • Manual evaluation by a human
      • Storing evaluation results
      • Generating visualizations how well the LLM performed on different tasks

      Attachments

        Activity

          People

            ppantiru Paul Pantiru
            MichaelHamann Michael Hamann
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: