Loading...

XML

Word

Printable

Details

Type: New Feature
Resolution: Done
Priority: Major
Fix Version/s: 0.4
Affects Version/s: 0.3.1
Labels:
None

Difficulty:
Unknown
Similar issues:

Description

In the LLM benchmark, we need to measure energy consumption of the different tasks. For this, we should measure energy consumption on the inference server and associate this data to the different tasks we execute, depending on the running time. It seems hard to do this exactly, we should therefore probably work with average values and try to come up with some estimates of the consumed power per input and output token.

We should also compare our measurements to publicly reported performance in particular for parallel requests. When a publication reports a certain number of tokens per second on a certain GPU, we can, based on the maximum power consumption of that GPU, and the tokens per second derive a maximum of the consumed power per token.

Attachments

Activity

People

Assignee:: Paul Pantiru

Reporter:: Michael Hamann

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 27/May/24 15:05

Updated:: 18/Jun/24 12:29

Resolved:: 18/Jun/24 12:29