Open
Description
Description of the bug:
The GA versions of both models completely hallucinate timestamps when performing transcriptions on Audio.
Actual vs expected behavior:
The timestamps should be accurate based on when that word or phrase was spoken. The preview models for both where excellent at this. The same models after going into GA completely hallucinate timestamps.
Any other information you'd like to share?
Interestingly, this does not apply to extracting timestamps from video, and is applicable to audio only.
You can replicate this by trying any audio file and comparing the accuracy to videos.