Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> AI also really has trouble with transcribing my speech. I noticed that as early as the '90s with early speech recognition software. It was completely unusable.

I don't know what your transcription use cases are, but you may be able to get an improvement by fine-tuning Whisper. This would require about $4 in training costs[1], and a dataset with 5-10 hours of your labeled (transcribed) speech, which may be the bigger hurdle[2].

1. 2000 steps took me 6 hours on an A100 on Collab, fine-tuning openai/whisper-large-v3 on 12 hours of data. I can shar my notebook/script with you if you'd like.

2. I am working on a PWA that makes it simple for humans to edit initial, automated transcriptions with mistakes for feeding the correct dataset back into the pipeline for fine-tuning, but its not ready yet



Any chance you could github your script for public use anyway?

It's an interesting self-contained example


We have a PWA for this at:

https://www.psyome.com/annotator


It is desktop-only - do you have plans to support mobile browsers? My PWA is mobile-first.


FWIW, it might be usable on mobile but I haven't tried it tested it


Oh nice! No plans at the moment


> with 5-10 hours of your labeled (transcribed) speech, which may be the bigger hurdle[2].

Can’t you just read from a known script?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: