Add a field in the image dialog to define/update the image caption.
The main difficulty is not raw text but support wiki syntax. The minimum would be to convert the caption text to wiki syntax before displaying in the image dialog.
As a bonus, It could be a rich editor allowing inline content only.