Tom Larkworthy 2024-10-09 06:31:01 Just got the Realtime API working observablehq.com/@tomlarkworthy/openai-realtime-api in a pure browser (and forkable) browser implementation. Was quite complicated getting the audio working in a browser, there is a vid inside. I was building this for my daughter so she could have a decent cyber pet/tutor, but the running costs are actually insane, I spend $90 in 2 days building this 😕 So I think I will actually switch to push-to-talk interaction for the actual thing.
Jason Morris 2024-10-09 17:17:46 I was a little disappointed in the whole thing, TBH. The fact you can interrupt it is nice, but getting it to realize where it was interrupted is a minor nightmare, and it still doesn't have the ability to interrupt you, and its responses are designed to be read, not spoken. By comparison, the podcast feature of google NotebookLM is much more natural, so realtime ends up feeling clunky. Good if you want an automated service bot to answer phone calls, I suppose, but it didn't make me want to talk to my computer at all.
Tom Larkworthy 2024-10-09 17:45:41 it can use tools though so its maybe the control plane for something else that is realtime and high dimensional (e.g. star trek ship computer). The google thing is also very cool and thought provoking, but its also a 1 way channel of information. Not entirely sure people will want to actually provide cultural space for AI bots to chat to each other for any amount of time, I can imagine the novely will wear off quite fast. I can see an AI butler being useful basically forever (when the price is sane).
Jason Morris 2024-10-09 18:03:57 Yeah I just mean the conversation is more realistic. I did try it with tools, and like anywhere else that made it more compelling for specific uses. But tools are not new, only realtime voice is new. I see uses in tutorials and software help systems, and for audio-centric platforms like phone and conferencing, And could see it evolving so that canvas is the only thing you actually look at, and the chat window disappears, but I don't want to. I like being able to do things in public, without wearing headphones or talking to myself, and to be able to edit text and copy and paste things. I'm already using a computer, why deprive me of the ability to see what I'm doing? Why make the communication synchronous, again? I don't "get" it.