-
-
Notifications
You must be signed in to change notification settings - Fork 3.4k
feat(realtime): Add audio conversations #6245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for localai ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
It's not clear to me if we have audio support in llama.cpp: ggml-org/llama.cpp#15194 |
|
my initial thought on this was to use the whisper backend for transcribing from VAD, and give the text to a text-to-text backend, this way we can always go back at this. There was also an interface created exactly for this so a pipeline can be kinda seen as a "drag and drop" until omni models are really capable. However, yes audio input is actually supported by llama.cpp and our backends, try qwen2-omni, you will be able to give it an audio as input, but isn't super accurate (better transcribing for now). |
|
OK, I tried Qwen 2 omni and had issues with accuracy and context length which aren't a problem for a pipeline. |
2eae0d9 to
c1b9f23
Compare
c1b9f23 to
a24bc9c
Compare
06fa944 to
e2c1fad
Compare
|
OpenAI made quite some changes to the API that possibly it would have been better to handle before this, but there are also changes in-flight to the Go realtime API library AFAICT. I really want to get something working, so I am just ignoring these changes for now and will have to address them afterwards. |
2271f01 to
915824d
Compare
|
and it works. There is a long list of issues however I have the full pipeline working. |
|
To be clear probably nobody will want to use this given its current state, but we could merge it for my own experimentation and so I don't have to keep rebasing on master. Next I need to update the API to the current OpenAI GA. @mudler |
915824d to
91c4e02
Compare
|
Build error: "E: Failed to fetch http://security.ubuntu.com/ubuntu/pool/main/libc/libcaca/libcaca0_0.99.beta20-4ubuntu0.1_amd64.deb 404 Not Found [IP: 91.189.91.83 80]". Strange, I can download this file. |
Signed-off-by: Richard Palethorpe <[email protected]>
91c4e02 to
37606d4
Compare
Description
Add enough realtime API features to allow talking with an LLM using only audio.
Presently the realtime API only supports transcription which is a minor use-case for it. This PR should allow it to be used with a basic voice assistant.
This PR will ignore many of the options and edge-cases. Instead it'll just, for e.g., rely on server side VAD to commit conversation items.
Notes for Reviewers
Fixes: #3714 (but we'll need follow issues)
Signed commits