-
Notifications
You must be signed in to change notification settings - Fork 120
chore: agent daily test #554
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughA new comprehensive test suite for Deepgram agent websocket functionality has been added, including a parameterized test covering basic conversations, fallback providers, function calls, and agent message injection (with some cases currently disabled). Supporting configuration and event log JSON files were introduced for each scenario, capturing agent setups, injected messages, expected events, and error cases. The test suite validates event flows, error handling, and SDK compliance with API specifications. Changes
Sequence Diagram(s)sequenceDiagram
participant TestSuite
participant DeepgramClient
participant WebSocket
participant Agent
TestSuite->>DeepgramClient: Initialize with config
DeepgramClient->>WebSocket: Connect
WebSocket-->>TestSuite: Open event
WebSocket-->>TestSuite: Welcome event
WebSocket-->>TestSuite: SettingsApplied event
alt User message injection enabled
loop For each injected user message
TestSuite->>WebSocket: Send user message
WebSocket-->>TestSuite: ConversationText event
WebSocket-->>TestSuite: AgentStartedSpeaking event
WebSocket-->>TestSuite: AgentAudioDone event
end
end
alt Function call test enabled
WebSocket-->>TestSuite: FunctionCallRequest event
TestSuite->>WebSocket: Respond to function call
WebSocket-->>TestSuite: ConversationText event
end
alt Agent message injection enabled
TestSuite->>WebSocket: Inject agent message
WebSocket-->>TestSuite: ConversationText event
end
WebSocket-->>TestSuite: Other events (Unhandled, Error, etc.)
TestSuite->>DeepgramClient: Cleanup and close connection
Possibly related PRs
Suggested reviewers
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 Pylint (3.3.7)tests/daily_test/test_daily_agent_websocket.py✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
♻️ Duplicate comments (2)
tests/response_data/agent/websocket/function_call_conversation-71cdabfd-error.json (1)
2-3: Error artefact confirms config mismatch rather than SDK bugThe recorded failure is purely driven by the invalid
"method"value sent in the config.
Once the config is fixed (see previous comment), this error fixture becomes obsolete and should either be updated or removed to avoid confusion.tests/response_data/agent/websocket/comprehensive_conversation-67b42a09-error.json (1)
2-3: Same parsing error – root cause is config, not the testThis failure cascades from the uppercase
"GET"verb in the shared agent config.
Fixing the config should make this artefact unnecessary.
🧹 Nitpick comments (3)
tests/response_data/agent/websocket/fallback_providers-e16542b1-config.json (1)
38-46: Consider addingAgentStartedSpeakingtoexpected_eventsfor completeness
Even when multiple speak providers are configured, the websocket still emits aAgentStartedSpeakingevent beforeAgentAudioDone.
Including it tightens the assertion and catches regressions in TTS start-up."ConversationText", + "AgentStartedSpeaking", "AgentAudioDone"tests/response_data/agent/websocket/inject_agent_message-800037f7-events.json (1)
34-48: UnhandledHistory/EndOfThoughtevents indicate missing coverage in event routerThese messages currently fall through to the “Unhandled” bucket.
If they are expected, wire them up in the test harness to avoid polluting logs; if they are unexpected, assert they never appear.Example patch in the event dispatcher:
elif event_type == "History": handle_history(evt) elif event_type == "EndOfThought": handle_end_of_thought(evt) -else: - log_unhandled(evt)Also applies to: 58-66
tests/daily_test/test_daily_agent_websocket.py (1)
483-483: Missing line annotation marker.Line 483 appears to be missing the
~annotation marker that indicates it's a new/changed line. This might be an oversight in the file annotation.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (14)
tests/daily_test/test_daily_agent_websocket.py(1 hunks)tests/response_data/agent/websocket/basic_conversation-a40b2785-config.json(1 hunks)tests/response_data/agent/websocket/basic_conversation-a40b2785-events.json(1 hunks)tests/response_data/agent/websocket/comprehensive_conversation-67b42a09-config.json(1 hunks)tests/response_data/agent/websocket/comprehensive_conversation-67b42a09-error.json(1 hunks)tests/response_data/agent/websocket/comprehensive_conversation-67b42a09-events.json(1 hunks)tests/response_data/agent/websocket/fallback_providers-e16542b1-config.json(1 hunks)tests/response_data/agent/websocket/fallback_providers-e16542b1-events.json(1 hunks)tests/response_data/agent/websocket/function_call_conversation-71cdabfd-config.json(1 hunks)tests/response_data/agent/websocket/function_call_conversation-71cdabfd-error.json(1 hunks)tests/response_data/agent/websocket/function_call_conversation-71cdabfd-events.json(1 hunks)tests/response_data/agent/websocket/inject_agent_message-800037f7-config.json(1 hunks)tests/response_data/agent/websocket/inject_agent_message-800037f7-error.json(1 hunks)tests/response_data/agent/websocket/inject_agent_message-800037f7-events.json(1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: dvonthenen
PR: deepgram/deepgram-python-sdk#424
File: deepgram/clients/speak/v1/response.py:48-209
Timestamp: 2024-06-27T00:06:23.128Z
Learning: User dvonthenen prefers to defer certain suggestions, specifically regarding error handling and documentation enhancements in new data classes of `deepgram/clients/speak/v1/response.py`, and may revisit them later.
Learnt from: dvonthenen
PR: deepgram/deepgram-python-sdk#424
File: deepgram/clients/speak/v1/response.py:48-209
Timestamp: 2024-10-09T02:19:48.728Z
Learning: User dvonthenen prefers to defer certain suggestions, specifically regarding error handling and documentation enhancements in new data classes of `deepgram/clients/speak/v1/response.py`, and may revisit them later.
Learnt from: dvonthenen
PR: deepgram/deepgram-python-sdk#417
File: deepgram/clients/live/v1/client.py:14-14
Timestamp: 2024-10-09T02:19:48.728Z
Learning: Ignore suggestions to change import paths to local versions in test cases and examples as per user preference to use the actual `deepgram-sdk` package for testing.
Learnt from: dvonthenen
PR: deepgram/deepgram-python-sdk#417
File: deepgram/clients/live/v1/client.py:14-14
Timestamp: 2024-06-12T18:02:10.651Z
Learning: Ignore suggestions to change import paths to local versions in test cases and examples as per user preference to use the actual `deepgram-sdk` package for testing.
Learnt from: dvonthenen
PR: deepgram/deepgram-python-sdk#426
File: deepgram/clients/listen/v1/websocket/__init__.py:8-8
Timestamp: 2024-10-09T02:19:46.086Z
Learning: Unused imports in `deepgram/clients/listen/v1/websocket/__init__.py` are retained to maintain backward compatibility and should not be flagged for removal in reviews.
Learnt from: dvonthenen
PR: deepgram/deepgram-python-sdk#426
File: deepgram/clients/listen/v1/websocket/__init__.py:8-8
Timestamp: 2024-07-01T19:21:39.778Z
Learning: Unused imports in `deepgram/clients/listen/v1/websocket/__init__.py` are retained to maintain backward compatibility and should not be flagged for removal in reviews.
tests/response_data/agent/websocket/comprehensive_conversation-67b42a09-config.json (1)
Learnt from: dvonthenen
PR: deepgram/deepgram-python-sdk#468
File: examples/text-to-speech/websocket/complete/main.py:26-30
Timestamp: 2024-09-27T15:21:34.197Z
Learning: In `examples/text-to-speech/websocket/complete/main.py`, commented-out options like `auto_flush_speak_delta` and `verbose` are intentional in examples and should not be flagged in reviews.
tests/daily_test/test_daily_agent_websocket.py (5)
Learnt from: dvonthenen
PR: deepgram/deepgram-python-sdk#426
File: deepgram/clients/listen/v1/websocket/__init__.py:8-8
Timestamp: 2024-07-01T19:21:39.778Z
Learning: Unused imports in `deepgram/clients/listen/v1/websocket/__init__.py` are retained to maintain backward compatibility and should not be flagged for removal in reviews.
Learnt from: dvonthenen
PR: deepgram/deepgram-python-sdk#426
File: deepgram/clients/listen/v1/websocket/__init__.py:8-8
Timestamp: 2024-10-09T02:19:46.086Z
Learning: Unused imports in `deepgram/clients/listen/v1/websocket/__init__.py` are retained to maintain backward compatibility and should not be flagged for removal in reviews.
Learnt from: dvonthenen
PR: deepgram/deepgram-python-sdk#472
File: examples/text-to-speech/websocket/simple/main.py:41-50
Timestamp: 2024-10-18T00:25:58.316Z
Learning: In the `examples/text-to-speech/websocket/simple/main.py` file, the code is intended as a simple example and may include unused handlers or placeholder code meant for illustration purposes.
Learnt from: dvonthenen
PR: deepgram/deepgram-python-sdk#468
File: examples/text-to-speech/websocket/complete/main.py:26-30
Timestamp: 2024-09-27T15:21:34.197Z
Learning: In `examples/text-to-speech/websocket/complete/main.py`, commented-out options like `auto_flush_speak_delta` and `verbose` are intentional in examples and should not be flagged in reviews.
Learnt from: dvonthenen
PR: deepgram/deepgram-python-sdk#472
File: examples/text-to-speech/websocket/output_to_wav/main.py:52-56
Timestamp: 2024-10-18T00:30:04.220Z
Learning: In `examples/text-to-speech/websocket/output_to_wav/main.py`, the code is intended as an example demonstrating functionality, and refactoring suggestions may not be necessary.
🔇 Additional comments (10)
tests/response_data/agent/websocket/function_call_conversation-71cdabfd-events.json (1)
18-24:Errorevent flags a schema mismatch – confirm the test captures the intended failure
The error description points to anagent.think.functions[0].methodvalue that doesn’t comply with the API.
Double-check that:
- The companion -config.json purposely reproduces this mismatch (so the test is asserting the SDK bug rather than masking a real regression).
- A matching assertion exists in the pytest to expect this
Errorevent; otherwise the daily test will silently pass even if the SDK bug is fixed.tests/response_data/agent/websocket/inject_agent_message-800037f7-error.json (1)
36-50: Unhandled event logging looks correct – no action needed
TheHistoryandEndOfThoughtmessages are intentionally recorded asUnhandled, matching the current SDK limitations mentioned in the PR description.tests/response_data/agent/websocket/inject_agent_message-800037f7-config.json (1)
24-25: Position oflanguagekey – verify against latest Agent config schema
Some SDK versions requirelanguageat the top level of the request, not nestled insideagent_config. Confirm this matches the current contract; otherwise the agent may silently fall back to default language settings.tests/response_data/agent/websocket/comprehensive_conversation-67b42a09-config.json (2)
39-46: Event ordering looks realistic – no issues spotted
The sequence covers all critical phases (welcome → settings → text → function call → speech). Good coverage.
11-17: Confirm lowercase HTTP verb in websocket config
I couldn’t locate a central enum or validator for HTTP methods in the SDK—please verify whetherfunctions[].methodmust use lowercase. If so, update:• File: tests/response_data/agent/websocket/comprehensive_conversation-67b42a09-config.json
• Lines 11-17 snippet:- "method": "GET" + "method": "get"tests/response_data/agent/websocket/comprehensive_conversation-67b42a09-events.json (1)
18-25: Event log captured an early parse error instead of normal flowOnce the method casing is corrected you should expect a
SettingsAppliedevent next, followed by the rest of the conversation. Consider regenerating this fixture after the fix to keep the test data aligned with expected behaviour.tests/response_data/agent/websocket/fallback_providers-e16542b1-events.json (1)
34-48: Unhandled events are expected per PR objectives.The test data correctly captures the
HistoryandEndOfThoughtevents as "Unhandled" events. This aligns with the PR objectives stating these event types are planned to be addressed in future releases.Also applies to: 59-65, 85-91, 102-116, 134-140, 160-166
tests/response_data/agent/websocket/basic_conversation-a40b2785-config.json (1)
37-39: Test configuration correctly reflects current SDK limitations.The
test_inject_agent_messageis appropriately set tofalse, which aligns with the PR objectives mentioning the missing implementation of theinject_agent_messagemethod (issue #553).tests/daily_test/test_daily_agent_websocket.py (2)
83-87: Clear documentation of disabled tests due to known issues.Good documentation explaining why the
function_call_conversationandinject_agent_messagetests are temporarily disabled. The TODO clearly indicates these should be re-enabled once the bugs are fixed.
186-261: Excellent implementation of SDK bug detection for function calls.The function call handler thoroughly documents and tests for the known SDK/API mismatch (issue #528). The approach of detecting the bug while still attempting to use the SDK's current (incorrect) structure is pragmatic and provides valuable diagnostic information.
tests/response_data/agent/websocket/inject_agent_message-800037f7-error.json
Show resolved
Hide resolved
tests/response_data/agent/websocket/inject_agent_message-800037f7-config.json
Show resolved
Hide resolved
tests/response_data/agent/websocket/function_call_conversation-71cdabfd-config.json
Show resolved
Hide resolved
tests/response_data/agent/websocket/function_call_conversation-71cdabfd-config.json
Show resolved
Hide resolved
|
I guess I can't really tell what this is doing, can you explain? |
|
@lukeocodes Sure, the intent is offer a Daily Test that test the Agent end to end, similar to the other Daily Tests we have for the other Deepgram APIs in the Python SDK. What This Test DoesThis is a comprehensive daily validation test for our Agent WebSocket functionality. Think of it as a "health check" that runs against the live API to make sure everything is working as expected. 🎯 Primary PurposeInstead of waiting for users to report bugs, this test proactively:
🔍 What It Actually TestsBasic Scenario: "Can I have a normal conversation with the agent?"
Advanced Scenario: "Do fallback providers work?"
Note: We can add more advanced scenarios to it if we like. 📊 Real Results From Last Run |
|
Code Review: @jpvajda
|
|
@lukeocodes I made some improvements: TL;DR: Daily Agent WebSocket Test Code Review Improvements
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
tests/daily_test/test_daily_agent_websocket.py (2)
422-428: Improve timeout handling with more specific error messages.The connection establishment timeout uses a basic counter that could be improved for better debugging.
- # Wait for connection establishment with timeout - timeout = 0 - while not connection_established and timeout < 15: - time.sleep(0.5) - timeout += 1 - - assert connection_established, f"Test ID: {unique} - Should receive Open event within 15 seconds" + # Wait for connection establishment with timeout + timeout_seconds = 15 + start_time = time.time() + while not connection_established and (time.time() - start_time) < timeout_seconds: + time.sleep(0.5) + + elapsed_time = time.time() - start_time + assert connection_established, f"Test ID: {unique} - Should receive Open event within {timeout_seconds} seconds (waited {elapsed_time:.1f}s)"
464-466: Verify inject_agent_message method availability before calling.Since this method is known to be missing (issue #553), consider adding a hasattr check to provide more informative error handling.
+ # Check if method exists before calling (known SDK limitation #553) + if not hasattr(dg_connection, 'inject_agent_message'): + print(f"⚠️ inject_agent_message method not available in SDK (issue #553)") + continue + options = InjectAgentMessageOptions(message=message) inject_success = dg_connection.inject_agent_message(options)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
tests/daily_test/test_daily_agent_websocket.py(1 hunks)tests/response_data/agent/websocket/basic_conversation-a40b2785-events.json(1 hunks)tests/response_data/agent/websocket/basic_conversation-e5062e8c-config.json(1 hunks)tests/response_data/agent/websocket/basic_conversation-e5062e8c-error.json(1 hunks)tests/response_data/agent/websocket/basic_conversation-e5062e8c-events.json(1 hunks)tests/response_data/agent/websocket/fallback_providers-e16542b1-events.json(1 hunks)
✅ Files skipped from review due to trivial changes (3)
- tests/response_data/agent/websocket/basic_conversation-e5062e8c-events.json
- tests/response_data/agent/websocket/basic_conversation-e5062e8c-config.json
- tests/response_data/agent/websocket/basic_conversation-e5062e8c-error.json
🚧 Files skipped from review as they are similar to previous changes (2)
- tests/response_data/agent/websocket/basic_conversation-a40b2785-events.json
- tests/response_data/agent/websocket/fallback_providers-e16542b1-events.json
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: dvonthenen
PR: deepgram/deepgram-python-sdk#424
File: deepgram/clients/speak/v1/response.py:48-209
Timestamp: 2024-10-09T02:19:48.728Z
Learning: User dvonthenen prefers to defer certain suggestions, specifically regarding error handling and documentation enhancements in new data classes of `deepgram/clients/speak/v1/response.py`, and may revisit them later.
Learnt from: dvonthenen
PR: deepgram/deepgram-python-sdk#424
File: deepgram/clients/speak/v1/response.py:48-209
Timestamp: 2024-06-27T00:06:23.128Z
Learning: User dvonthenen prefers to defer certain suggestions, specifically regarding error handling and documentation enhancements in new data classes of `deepgram/clients/speak/v1/response.py`, and may revisit them later.
Learnt from: dvonthenen
PR: deepgram/deepgram-python-sdk#417
File: deepgram/clients/live/v1/client.py:14-14
Timestamp: 2024-10-09T02:19:48.728Z
Learning: Ignore suggestions to change import paths to local versions in test cases and examples as per user preference to use the actual `deepgram-sdk` package for testing.
Learnt from: dvonthenen
PR: deepgram/deepgram-python-sdk#417
File: deepgram/clients/live/v1/client.py:14-14
Timestamp: 2024-06-12T18:02:10.651Z
Learning: Ignore suggestions to change import paths to local versions in test cases and examples as per user preference to use the actual `deepgram-sdk` package for testing.
Learnt from: dvonthenen
PR: deepgram/deepgram-python-sdk#426
File: deepgram/clients/listen/v1/websocket/__init__.py:8-8
Timestamp: 2024-07-01T19:21:39.778Z
Learning: Unused imports in `deepgram/clients/listen/v1/websocket/__init__.py` are retained to maintain backward compatibility and should not be flagged for removal in reviews.
Learnt from: dvonthenen
PR: deepgram/deepgram-python-sdk#426
File: deepgram/clients/listen/v1/websocket/__init__.py:8-8
Timestamp: 2024-10-09T02:19:46.086Z
Learning: Unused imports in `deepgram/clients/listen/v1/websocket/__init__.py` are retained to maintain backward compatibility and should not be flagged for removal in reviews.
tests/daily_test/test_daily_agent_websocket.py (5)
Learnt from: dvonthenen
PR: deepgram/deepgram-python-sdk#426
File: deepgram/clients/listen/v1/websocket/__init__.py:8-8
Timestamp: 2024-07-01T19:21:39.778Z
Learning: Unused imports in `deepgram/clients/listen/v1/websocket/__init__.py` are retained to maintain backward compatibility and should not be flagged for removal in reviews.
Learnt from: dvonthenen
PR: deepgram/deepgram-python-sdk#426
File: deepgram/clients/listen/v1/websocket/__init__.py:8-8
Timestamp: 2024-10-09T02:19:46.086Z
Learning: Unused imports in `deepgram/clients/listen/v1/websocket/__init__.py` are retained to maintain backward compatibility and should not be flagged for removal in reviews.
Learnt from: dvonthenen
PR: deepgram/deepgram-python-sdk#468
File: examples/text-to-speech/websocket/complete/main.py:26-30
Timestamp: 2024-09-27T15:21:34.197Z
Learning: In `examples/text-to-speech/websocket/complete/main.py`, commented-out options like `auto_flush_speak_delta` and `verbose` are intentional in examples and should not be flagged in reviews.
Learnt from: dvonthenen
PR: deepgram/deepgram-python-sdk#472
File: examples/text-to-speech/websocket/simple/main.py:41-50
Timestamp: 2024-10-18T00:25:58.316Z
Learning: In the `examples/text-to-speech/websocket/simple/main.py` file, the code is intended as a simple example and may include unused handlers or placeholder code meant for illustration purposes.
Learnt from: dvonthenen
PR: deepgram/deepgram-python-sdk#472
File: examples/text-to-speech/websocket/output_to_wav/main.py:52-56
Timestamp: 2024-10-18T00:30:04.220Z
Learning: In `examples/text-to-speech/websocket/output_to_wav/main.py`, the code is intended as an example demonstrating functionality, and refactoring suggestions may not be necessary.
🔇 Additional comments (7)
tests/daily_test/test_daily_agent_websocket.py (7)
1-24: Clean import organization and proper licensing.The imports are well-organized and include all necessary components for comprehensive WebSocket testing. The licensing header follows proper format.
27-170: Well-structured test configurations with clear documentation.The test cases provide excellent coverage of different scenarios:
- Basic conversation flow
- Fallback provider testing
- Function calling with known SDK bugs
- Agent message injection
The configurations properly document expected vs conditional events and include helpful descriptions for each test case.
192-196: Excellent use of pytest.xfail for known issues.This approach properly marks tests as expected failures while documenting the specific GitHub issues (#553, #528) that need to be resolved. This prevents false negatives in CI while maintaining visibility of known problems.
279-354: Comprehensive function call validation with SDK bug detection.The function call handler excellently detects and documents the mismatch between the current SDK structure (flat fields like
function_name,function_call_id) and the official API specification (requiresfunctionsarray). This provides valuable debugging information while still attempting to work with the current SDK implementation.
569-580: Excellent error handling with comprehensive metadata capture.The exception handling properly captures all relevant test state (events, function calls, bugs, etc.) and saves it for debugging purposes. This will be invaluable for diagnosing issues in daily test runs.
621-630: Clear documentation of expected SDK bugs.The final reporting section excellently documents when SDK bugs are detected while clarifying that this is expected behavior for certain test cases. This prevents confusion about test results.
234-241: Fallback to environment variables works as intendedDeepgramClient’s
__init__first applies defaults from the passedconfig(which has emptyapi_key/access_token), then—since both remain empty—falls back toDEEPGRAM_ACCESS_TOKENand finallyDEEPGRAM_API_KEYfrom the environment. No changes required.
📋 PR Summary Highlights:
What the Test Successfully Validates ✅
Critical Issues Identified 🚨
inject_agent_messagemethod - Complete implementation gap. >> 🐛 see Inject agent message missing method #553HistoryandEndOfThoughtevents >> Will be handle in a future release to support new functionalityStrategic Value 💡
Summary by CodeRabbit