> AI also really has trouble with transcribing my speech. I noticed that as early as the '90s with early speech recognition software. It was completely unusable.
I don't know what your transcription use cases are, but you may be able to get an improvement by fine-tuning Whisper. This would require about $4 in training costs[1], and a dataset with 5-10 hours of your labeled (transcribed) speech, which may be the bigger hurdle[2].
1. 2000 steps took me 6 hours on an A100 on Collab, fine-tuning openai/whisper-large-v3 on 12 hours of data. I can shar my notebook/script with you if you'd like.
2. I am working on a PWA that makes it simple for humans to edit initial, automated transcriptions with mistakes for feeding the correct dataset back into the pipeline for fine-tuning, but its not ready yet
I don't know what your transcription use cases are, but you may be able to get an improvement by fine-tuning Whisper. This would require about $4 in training costs[1], and a dataset with 5-10 hours of your labeled (transcribed) speech, which may be the bigger hurdle[2].
1. 2000 steps took me 6 hours on an A100 on Collab, fine-tuning openai/whisper-large-v3 on 12 hours of data. I can shar my notebook/script with you if you'd like.
2. I am working on a PWA that makes it simple for humans to edit initial, automated transcriptions with mistakes for feeding the correct dataset back into the pipeline for fine-tuning, but its not ready yet