diff --git a/agent_skill.md b/agent_skill.md
new file mode 100644
index 0000000000..52c64dc8a9
--- /dev/null
+++ b/agent_skill.md
@@ -0,0 +1,2219 @@
+# Agent Skills Specification
+
+A comprehensive guide to understanding and implementing Agent Skills - an open standard for extending AI agent capabilities.
+
+## Table of Contents
+
+- [Overview](#overview)
+- [Core Concepts](#core-concepts)
+- [Directory Structure](#directory-structure)
+- [SKILL.md Specification](#skillmd-specification)
+- [Progressive Disclosure Architecture](#progressive-disclosure-architecture)
+- [Integration Approaches](#integration-approaches)
+- [LangChain/LangGraph Skills Integration](#langchainlanggraph-skills-integration)
+- [ADK Skills Integration](#adk-skills-integration)
+- [Security Considerations](#security-considerations)
+- [Reference Library (skills-ref)](#reference-library-skills-ref)
+- [Best Practices](#best-practices)
+- [Example Skills](#example-skills)
+- [Resources](#resources)
+
+---
+
+## Overview
+
+**Agent Skills** are organized folders of instructions, scripts, and resources that agents can discover and load dynamically to perform better at specific tasks. They represent a simple, open format for giving agents new capabilities and expertise.
+
+### What Problems Do Skills Solve?
+
+1. **Limited Context**: Agents don't inherently have access to specialized knowledge
+2. **Consistency**: Enable repeatable, auditable workflows
+3. **Reusability**: Build once, deploy across multiple agent products
+4. **Portability**: Same skill works across different compatible agent tools
+
+### Key Benefits
+
+| Stakeholder | Benefit |
+|-------------|---------|
+| **Skill Authors** | Build capabilities once, deploy across multiple agent products |
+| **Compatible Agents** | Users can give agents new capabilities out of the box |
+| **Teams & Enterprises** | Capture organizational knowledge in portable, version-controlled packages |
+
+### Governance
+
+- **Originally developed by**: Anthropic
+- **Release status**: Open standard
+- **Development model**: Open to ecosystem contributions
+- **Repository**: https://github.com/agentskills/agentskills
+- **Example Skills**: https://github.com/anthropics/skills
+
+---
+
+## Core Concepts
+
+### What Is a Skill?
+
+At its core, a skill is a **folder containing a `SKILL.md` file** that provides:
+
+- **Metadata**: `name` and `description` (minimum required)
+- **Instructions**: Markdown documentation on how to perform a task
+- **Optional resources**: scripts, templates, and reference materials
+
+### Capabilities Enabled
+
+1. **Domain Expertise** - Package specialized knowledge into reusable instructions
+2. **New Capabilities** - Enable agents to create presentations, build MCP servers, analyze datasets
+3. **Repeatable Workflows** - Turn multi-step tasks into consistent, auditable workflows
+4. **Interoperability** - Reuse the same skill across different skills-compatible agent products
+
+### Code Integration
+
+Skills can include pre-written Python scripts and other code that agents execute deterministically. This approach proves more efficient than token-based generation for operations like:
+- Sorting lists
+- Extracting PDF form fields
+- Data transformation
+- File manipulation
+
+---
+
+## Directory Structure
+
+### Minimal Structure
+
+```
+skill-name/
+└── SKILL.md # Required
+```
+
+### Full Structure with Optional Directories
+
+```
+skill-name/
+├── SKILL.md # Required: instructions + metadata
+├── scripts/ # Optional: executable code
+│ ├── extract.py
+│ └── transform.sh
+├── references/ # Optional: additional documentation
+│ ├── REFERENCE.md
+│ ├── FORMS.md
+│ └── domain-specific.md
+└── assets/ # Optional: static resources
+ ├── templates/
+ ├── images/
+ └── data/
+```
+
+### Directory Descriptions
+
+| Directory | Purpose |
+|-----------|---------|
+| `scripts/` | Executable code agents can run. Should be self-contained, include helpful error messages, and handle edge cases gracefully. Supported languages: Python, Bash, JavaScript |
+| `references/` | Additional documentation loaded on demand. Keep individual files focused for efficient context use |
+| `assets/` | Static resources: templates, images, diagrams, data files, lookup tables, schemas |
+
+---
+
+## SKILL.md Specification
+
+Every skill starts with **YAML frontmatter** followed by **Markdown content**.
+
+### Basic Format
+
+```markdown
+---
+name: skill-name
+description: A description of what this skill does and when to use it.
+---
+
+# Skill Title
+
+## When to use this skill
+Use this skill when the user needs to...
+
+## How to perform the task
+1. Step one...
+2. Step two...
+
+## Examples
+...
+```
+
+### Frontmatter Fields
+
+| Field | Required | Constraints | Description |
+|-------|----------|-------------|-------------|
+| `name` | **Yes** | Max 64 characters. Lowercase letters, numbers, and hyphens only. Must not start/end with hyphen or contain consecutive hyphens. Must match parent directory name. | Short identifier for the skill |
+| `description` | **Yes** | Max 1024 characters. Non-empty. | Describes what the skill does and when to use it (used for discovery) |
+| `license` | No | - | License name or reference to bundled license file |
+| `compatibility` | No | Max 500 characters | Environment requirements (product, system packages, network access, etc.) |
+| `metadata` | No | Arbitrary key-value mapping | Additional metadata (author, version, etc.) |
+| `allowed-tools` | No | Space-delimited list | Pre-approved tools (Experimental) |
+
+### Name Field Validation
+
+**Valid examples:**
+```yaml
+name: pdf-processing
+name: data-analysis
+name: code-review
+name: mcp-builder
+```
+
+**Invalid examples:**
+```yaml
+name: PDF-Processing # uppercase not allowed
+name: -pdf # cannot start with hyphen
+name: pdf--processing # consecutive hyphens not allowed
+name: pdf_processing # underscores not allowed
+```
+
+### Description Field Best Practices
+
+The description should describe both **what the skill does** and **when to use it**.
+
+**Good example:**
+```yaml
+description: Extracts text and tables from PDF files, fills PDF forms, and merges multiple PDFs. Use when working with PDF documents or when the user mentions PDFs, forms, or document extraction.
+```
+
+**Poor example:**
+```yaml
+description: PDF processing # Too vague, no usage context
+```
+
+### Complete Frontmatter Example
+
+```yaml
+---
+name: pdf-processing
+description: Extract text and tables from PDF files, fill forms, merge documents. Use when the user needs to work with PDF files.
+license: Apache-2.0
+compatibility: Requires Python 3.8+, pdfplumber, and PyPDF2 packages
+metadata:
+ author: example-org
+ version: "1.0"
+ category: documents
+allowed-tools: Bash(python:*) Read Write
+---
+```
+
+### Markdown Body Guidelines
+
+- **No structural restrictions** on content format
+- Can include text instructions, code examples, workflows, and references
+- Self-documenting format allows easy auditing and improvement
+- **Recommended maximum**: Keep main `SKILL.md` under 500 lines
+- Move detailed reference material to separate files in `references/`
+
+### Recommended Body Sections
+
+```markdown
+# Skill Name
+
+## When to use this skill
+Clear criteria for when this skill should be activated.
+
+## Prerequisites
+Any required tools, packages, or access needed.
+
+## Instructions
+Step-by-step guide for performing the task.
+
+## Examples
+Concrete examples of inputs and expected outputs.
+
+## Common Edge Cases
+Known limitations and how to handle them.
+
+## File References
+Links to additional resources in the skill directory.
+```
+
+### File References
+
+Use relative paths from skill root:
+
+```markdown
+See [the reference guide](references/REFERENCE.md) for details.
+
+Run the extraction script:
+`scripts/extract.py`
+```
+
+**Recommendation**: Keep references one level deep; avoid deeply nested chains.
+
+---
+
+## Progressive Disclosure Architecture
+
+Skills use a **context-efficient, three-stage approach** to information loading:
+
+### Stage 1: Discovery (Startup)
+
+- Agents load only the `name` and `description` of available skills
+- Minimal context overhead (~100 tokens per skill)
+- Enables agents to identify relevant skills without full loading
+
+### Stage 2: Activation (Task Matching)
+
+- When a task matches a skill's description, the agent reads the full `SKILL.md`
+- Complete instructions are loaded into context
+- Recommended: Keep under 5000 tokens for the body
+
+### Stage 3: Execution (Implementation)
+
+- Agent follows instructions
+- Optionally loads referenced files from `scripts/`, `references/`, `assets/`
+- Resources loaded only when required
+
+**Benefit**: The amount of context that can be bundled into a skill is effectively unbounded since agents with filesystem access don't require everything in their context window simultaneously.
+
+---
+
+## Integration Approaches
+
+To integrate Agent Skills support into your AI agent, implement five core steps:
+
+1. **Discover** skills in configured directories
+2. **Load metadata** (name and description) at startup
+3. **Match** user tasks to relevant skills
+4. **Activate** skills by loading full instructions
+5. **Execute** scripts and access resources as needed
+
+### Filesystem-Based Agents
+
+- Operate within a computer environment (bash/unix)
+- Most capable option
+- Skills activated when models issue shell commands like `cat /path/to/my-skill/SKILL.md`
+- Bundled resources accessed through shell commands
+
+### Tool-Based Agents
+
+- Function without a dedicated computer environment
+- Implement tools allowing models to trigger skills and access bundled assets
+- Specific tool implementation is up to the developer
+
+### Implementation Steps
+
+#### 1. Skill Discovery
+
+Scan configured directories for folders containing a `SKILL.md` file:
+
+```python
+def discover_skills(skill_dirs):
+ skills = []
+ for dir in skill_dirs:
+ for folder in os.listdir(dir):
+ skill_path = os.path.join(dir, folder, "SKILL.md")
+ if os.path.exists(skill_path):
+ skills.append(skill_path)
+ return skills
+```
+
+#### 2. Parse Metadata
+
+At startup, parse only the frontmatter to keep initial context usage low:
+
+```python
+def parse_metadata(skill_path):
+ content = read_file(skill_path)
+ frontmatter = extract_yaml_frontmatter(content)
+
+ return {
+ "name": frontmatter["name"],
+ "description": frontmatter["description"],
+ "path": skill_path
+ }
+```
+
+#### 3. Inject Metadata into System Prompt
+
+Use XML format for the system prompt:
+
+```xml
+
+
+ pdf-processing
+ Extracts text and tables from PDF files, fills forms, merges documents.
+ /path/to/skills/pdf-processing/SKILL.md
+
+
+ data-analysis
+ Analyzes datasets, generates charts, and creates summary reports.
+ /path/to/skills/data-analysis/SKILL.md
+
+
+```
+
+**Guidelines:**
+- For filesystem-based agents: include the `location` field with absolute path
+- For tool-based agents: omit the location field
+- Keep metadata concise (~50-100 tokens per skill)
+
+---
+
+## LangChain/LangGraph Skills Integration
+
+LangChain and LangGraph implement skills as a multi-agent pattern where specialized capabilities are packaged as invokable components that augment an agent's behavior. This section covers how skills work within the LangChain ecosystem.
+
+### Overview
+
+In LangChain/LangGraph, skills operate primarily through **prompt-driven specialization** rather than requiring full sub-agent implementations. A single agent loads specialized prompts and context on-demand while staying in control.
+
+**Key Design Principles:**
+
+1. **Prompt-Driven Specialization**: Skills are fundamentally defined by specialized prompts rather than complex implementations
+2. **Progressive Disclosure**: Skills become available contextually based on user needs or agent reasoning
+3. **Team Distribution**: Different teams can independently develop and maintain skills without tight coupling
+
+### When to Use the Skills Pattern
+
+The skills pattern is ideal for scenarios requiring:
+
+- A single agent with numerous possible specializations
+- No strict enforcement of constraints between capabilities
+- Independent team development of domain-specific features
+
+**Example Use Cases:**
+- Coding assistants with language-specific skills (Python, JavaScript, Rust)
+- Knowledge bases with domain skills (legal, medical, financial)
+- Creative tools with format-specific skills (writing, design, music)
+
+### Skills vs Other Multi-Agent Patterns
+
+LangChain identifies five core patterns for multi-agent systems. Here's how skills compare:
+
+| Pattern | Description | Performance | Best For |
+|---------|-------------|-------------|----------|
+| **Skills** | Single agent loads specialized prompts on-demand | 3 calls (one-shot), 5 calls (repeat) | Direct user interaction, moderate parallelization |
+| **Subagents** | Main agent coordinates specialized subagents as tools | 4 calls (one-shot), 8 calls (repeat) | Distributed development, parallelization |
+| **Handoffs** | Agent behavior changes dynamically based on state | 3 calls (one-shot), 5 calls (repeat) | Sequential multi-hop workflows |
+| **Router** | Routing step classifies input and directs to specialized agents | 3 calls (one-shot), 6 calls (repeat) | Parallel execution with explicit routing |
+| **Custom Workflow** | Bespoke execution flows mixing patterns | Varies | Complex hybrid requirements |
+
+**Key Insight**: Skills, Handoffs, and Router patterns are most efficient for single tasks (3 calls each). Subagents adds one extra call because results flow back through the main agent.
+
+### Skills Pattern Characteristics
+
+| Aspect | Rating | Notes |
+|--------|--------|-------|
+| **Parallelization** | ⭐⭐⭐ | Moderate - can load multiple skills but executes sequentially |
+| **Direct User Interaction** | ⭐⭐⭐⭐⭐ | Excellent - single agent maintains conversation context |
+| **Distributed Development** | ⭐⭐⭐⭐ | Good - teams can develop skills independently |
+| **Context Accumulation** | Higher | Accumulates context over time (~15K tokens in multi-domain scenarios) |
+
+### Basic Implementation
+
+Skills in LangChain are implemented using a tool decorator pattern:
+
+```python
+from langchain_core.tools import tool
+
+@tool
+def load_skill(skill_name: str) -> str:
+ """Load specialized skill prompt.
+
+ Available skills:
+ - write_sql: SQL query writing expertise
+ - review_legal_doc: Legal document review
+ - analyze_data: Data analysis and visualization
+ """
+ skills = {
+ "write_sql": load_skill_content("sql_expert.md"),
+ "review_legal_doc": load_skill_content("legal_review.md"),
+ "analyze_data": load_skill_content("data_analysis.md"),
+ }
+ return skills.get(skill_name, f"Unknown skill: {skill_name}")
+
+def load_skill_content(filename: str) -> str:
+ """Load skill content from storage."""
+ with open(f"skills/{filename}", "r") as f:
+ return f.read()
+```
+
+The agent receives a system prompt indicating available skills and uses `load_skill` to access them on-demand.
+
+### LangGraph Implementation
+
+In LangGraph, skills can be implemented as nodes in the graph:
+
+```python
+from langgraph.graph import StateGraph, END
+from typing import TypedDict, Annotated
+import operator
+
+class AgentState(TypedDict):
+ messages: Annotated[list, operator.add]
+ active_skill: str
+ skill_context: str
+
+def skill_loader(state: AgentState) -> AgentState:
+ """Load skill context based on detected need."""
+ skill_name = state.get("active_skill")
+ if skill_name:
+ skill_content = load_skill_content(skill_name)
+ return {"skill_context": skill_content}
+ return {}
+
+def agent_node(state: AgentState) -> AgentState:
+ """Main agent with skill context."""
+ skill_context = state.get("skill_context", "")
+ # Agent uses skill_context to enhance its response
+ response = llm.invoke(
+ system_prompt + skill_context,
+ state["messages"]
+ )
+ return {"messages": [response]}
+
+# Build the graph
+graph = StateGraph(AgentState)
+graph.add_node("skill_loader", skill_loader)
+graph.add_node("agent", agent_node)
+graph.add_edge("skill_loader", "agent")
+```
+
+### Extension Patterns
+
+#### Dynamic Tool Registration
+
+Loading a skill can simultaneously register new tools and update agent state:
+
+```python
+@tool
+def load_skill_with_tools(skill_name: str) -> str:
+ """Load skill and register associated tools."""
+ skill_config = get_skill_config(skill_name)
+
+ # Register skill-specific tools
+ for tool_def in skill_config.get("tools", []):
+ register_tool(tool_def)
+
+ # Return skill instructions
+ return skill_config["instructions"]
+```
+
+This enables progressive capability expansion as skills load.
+
+#### Hierarchical Skills
+
+Skills can define sub-skills in tree structures for fine-grained discovery:
+
+```python
+SKILL_HIERARCHY = {
+ "data_science": {
+ "description": "Data science and analytics capabilities",
+ "sub_skills": {
+ "pandas_expert": "DataFrame manipulation and analysis",
+ "visualization": "Charts, plots, and data visualization",
+ "statistical_analysis": "Statistical methods and hypothesis testing"
+ }
+ },
+ "web_development": {
+ "description": "Web application development",
+ "sub_skills": {
+ "frontend": "React, Vue, HTML/CSS",
+ "backend": "APIs, databases, server logic",
+ "devops": "Deployment, CI/CD, infrastructure"
+ }
+ }
+}
+
+@tool
+def load_skill(skill_path: str) -> str:
+ """Load skill by path (e.g., 'data_science/pandas_expert')."""
+ parts = skill_path.split("/")
+ # Navigate hierarchy and load appropriate skill
+ return get_nested_skill(SKILL_HIERARCHY, parts)
+```
+
+### Integration with Agent Skills Standard
+
+LangChain's skills pattern can integrate with the Agent Skills standard (SKILL.md format):
+
+```python
+import yaml
+import os
+
+def discover_agent_skills(skills_dir: str) -> dict:
+ """Discover Agent Skills format skills."""
+ skills = {}
+ for folder in os.listdir(skills_dir):
+ skill_md_path = os.path.join(skills_dir, folder, "SKILL.md")
+ if os.path.exists(skill_md_path):
+ with open(skill_md_path, "r") as f:
+ content = f.read()
+
+ # Parse YAML frontmatter
+ if content.startswith("---"):
+ _, frontmatter, body = content.split("---", 2)
+ metadata = yaml.safe_load(frontmatter)
+ skills[metadata["name"]] = {
+ "description": metadata["description"],
+ "content": body.strip(),
+ "path": skill_md_path
+ }
+ return skills
+
+def create_langchain_skill_tool(skills: dict):
+ """Create LangChain tool from Agent Skills."""
+ skill_descriptions = "\n".join(
+ f"- {name}: {info['description']}"
+ for name, info in skills.items()
+ )
+
+ @tool
+ def load_skill(skill_name: str) -> str:
+ f"""Load specialized skill.
+
+ Available skills:
+ {skill_descriptions}
+ """
+ if skill_name in skills:
+ return skills[skill_name]["content"]
+ return f"Unknown skill: {skill_name}"
+
+ return load_skill
+```
+
+### Context Engineering Considerations
+
+At the center of multi-agent design is **context engineering** - deciding what information each agent sees.
+
+**Skills Pattern Trade-offs:**
+
+| Scenario | Token Usage | Model Calls |
+|----------|-------------|-------------|
+| Single domain task | ~5K tokens | 3 calls |
+| Multi-domain task | ~15K tokens | 7+ calls |
+| Repeat requests | Accumulates | 5 calls per request |
+
+**Optimization Strategies:**
+
+1. **Lazy Loading**: Only load skill content when explicitly needed
+2. **Context Pruning**: Remove skill context after task completion
+3. **Skill Summarization**: Use condensed skill versions for initial matching
+4. **Caching**: Cache frequently-used skill content
+
+### Comparison: LangChain Skills vs Agent Skills Standard
+
+| Aspect | LangChain Skills | Agent Skills Standard |
+|--------|------------------|----------------------|
+| **Format** | Python code/prompts | SKILL.md files |
+| **Discovery** | Docstring/config | YAML frontmatter |
+| **Portability** | LangChain ecosystem | Cross-platform |
+| **Execution** | Tool invocation | File system access |
+| **Resources** | Python modules | scripts/, references/, assets/ |
+| **Validation** | Custom | skills-ref library |
+
+### Resources
+
+**LangChain Documentation:**
+- [Multi-Agent Patterns](https://docs.langchain.com/oss/python/langchain/multi-agent)
+- [Skills Pattern](https://docs.langchain.com/oss/python/langchain/multi-agent/skills)
+- [LangGraph Workflows](https://docs.langchain.com/oss/python/langgraph/workflows-agents)
+
+**LangGraph Resources:**
+- [LangGraph Official Site](https://www.langchain.com/langgraph)
+- [Multi-Agent Workflows Blog](https://www.blog.langchain.com/langgraph-multi-agent-workflows/)
+
+---
+
+## ADK Skills Integration
+
+This section describes how the Google Agent Development Kit (ADK) integrates with the Agent Skills standard, enabling skills built using the SKILL.md format to be used directly as ADK Skills with full support for progressive disclosure, scripts, and assets.
+
+### Design Goals
+
+1. **Full Agent Skills Standard Support**: Load and execute skills defined using SKILL.md format
+2. **Progressive Disclosure**: Three-stage loading (metadata → instructions → resources)
+3. **Script & Asset Support**: Execute bundled scripts and access assets
+4. **Bidirectional Compatibility**: ADK BaseSkill classes and SKILL.md files work interchangeably
+5. **Programmatic Tool Calling (PTC)**: Enable efficient code-based tool orchestration
+6. **Security**: Sandboxed execution with defense-in-depth
+
+### Architecture Overview
+
+```
+┌─────────────────────────────────────────────────────────────────────────────┐
+│ ADK Skills Architecture │
+├─────────────────────────────────────────────────────────────────────────────┤
+│ │
+│ ┌─────────────────────────────────────────────────────────────────────┐ │
+│ │ Skill Sources │ │
+│ │ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ │
+│ │ │ SKILL.md Files │ │ BaseSkill │ │ Remote Skills │ │ │
+│ │ │ (Agent Skills │ │ Classes │ │ (Future) │ │ │
+│ │ │ Standard) │ │ (Python) │ │ │ │ │
+│ │ └────────┬────────┘ └────────┬────────┘ └────────┬────────┘ │ │
+│ └───────────┼──────────────────────┼──────────────────────┼───────────┘ │
+│ │ │ │ │
+│ ▼ ▼ ▼ │
+│ ┌─────────────────────────────────────────────────────────────────────┐ │
+│ │ AgentSkillLoader │ │
+│ │ • Discovers skills from directories │ │
+│ │ • Parses SKILL.md frontmatter and content │ │
+│ │ • Creates unified MarkdownSkill instances │ │
+│ │ • Manages progressive disclosure stages │ │
+│ └────────────────────────────────┬────────────────────────────────────┘ │
+│ │ │
+│ ▼ │
+│ ┌─────────────────────────────────────────────────────────────────────┐ │
+│ │ SkillsManager │ │
+│ │ • Unified registry for all skill types │ │
+│ │ • Skill discovery and lookup │ │
+│ │ • Execution coordination │ │
+│ └────────────────────────────────┬────────────────────────────────────┘ │
+│ │ │
+│ ┌────────────────────┼────────────────────┐ │
+│ ▼ ▼ ▼ │
+│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────────────┐ │
+│ │ SkillTool │ │ ScriptExecutor │ │ ProgrammaticTool │ │
+│ │ (LLM-facing) │ │ (scripts/) │ │ Executor (PTC) │ │
+│ └─────────────────┘ └─────────────────┘ └─────────────────────────┘ │
+│ │
+└─────────────────────────────────────────────────────────────────────────────┘
+```
+
+### Core Components
+
+#### 1. MarkdownSkill Class
+
+A concrete `BaseSkill` implementation that loads from SKILL.md files:
+
+```python
+# src/google/adk/skills/markdown_skill.py
+
+from __future__ import annotations
+
+import os
+import re
+from pathlib import Path
+from typing import Any, Dict, List, Optional
+
+from pydantic import Field
+
+from .base_skill import BaseSkill, SkillConfig
+
+
+class MarkdownSkillMetadata(BaseModel):
+ """Metadata extracted from SKILL.md frontmatter."""
+
+ name: str
+ description: str
+ license: Optional[str] = None
+ compatibility: Optional[str] = None
+ metadata: Dict[str, Any] = Field(default_factory=dict)
+ allowed_tools: Optional[str] = None
+
+
+class MarkdownSkill(BaseSkill):
+ """Skill loaded from Agent Skills standard SKILL.md format.
+
+ Supports progressive disclosure with three loading stages:
+ - Stage 1 (Discovery): Only name and description loaded
+ - Stage 2 (Activation): Full SKILL.md content loaded
+ - Stage 3 (Execution): Scripts and references loaded on-demand
+
+ Example:
+ ```python
+ skill = MarkdownSkill.from_directory("/path/to/pdf-processing")
+
+ # Stage 1: Metadata only
+ print(skill.name) # "pdf-processing"
+ print(skill.description) # "Extract text from PDFs..."
+
+ # Stage 2: Full instructions
+ instructions = skill.get_instructions()
+
+ # Stage 3: Access scripts/references
+ script = skill.get_script("extract_text.py")
+ reference = skill.get_reference("FORMS.md")
+ ```
+ """
+
+ # Path to the skill directory
+ skill_path: Path
+
+ # Parsed frontmatter metadata
+ skill_metadata: MarkdownSkillMetadata
+
+ # Cached content (loaded on demand - Stage 2)
+ _instructions_cache: Optional[str] = None
+
+ # Scripts directory contents (loaded on demand - Stage 3)
+ _scripts_cache: Dict[str, str] = Field(default_factory=dict)
+
+ # References directory contents (loaded on demand - Stage 3)
+ _references_cache: Dict[str, str] = Field(default_factory=dict)
+
+ # Loading stage tracking
+ _current_stage: int = 1 # 1=Discovery, 2=Activation, 3=Execution
+
+ model_config = ConfigDict(
+ extra="forbid",
+ arbitrary_types_allowed=True,
+ )
+
+ @classmethod
+ def from_directory(cls, skill_dir: str | Path) -> "MarkdownSkill":
+ """Load a skill from a directory containing SKILL.md.
+
+ Args:
+ skill_dir: Path to the skill directory.
+
+ Returns:
+ MarkdownSkill instance with Stage 1 (metadata) loaded.
+
+ Raises:
+ FileNotFoundError: If SKILL.md doesn't exist.
+ ValueError: If frontmatter is invalid.
+ """
+ skill_path = Path(skill_dir)
+ skill_md_path = skill_path / "SKILL.md"
+
+ if not skill_md_path.exists():
+ raise FileNotFoundError(
+ f"SKILL.md not found in {skill_dir}"
+ )
+
+ # Parse only frontmatter for Stage 1
+ content = skill_md_path.read_text(encoding="utf-8")
+ metadata = cls._parse_frontmatter(content)
+
+ # Validate name matches directory
+ if metadata.name != skill_path.name:
+ raise ValueError(
+ f"Skill name '{metadata.name}' must match "
+ f"directory name '{skill_path.name}'"
+ )
+
+ return cls(
+ name=metadata.name,
+ description=metadata.description,
+ skill_path=skill_path,
+ skill_metadata=metadata,
+ config=cls._build_config(metadata),
+ )
+
+ @staticmethod
+ def _parse_frontmatter(content: str) -> MarkdownSkillMetadata:
+ """Parse YAML frontmatter from SKILL.md content."""
+ import yaml
+
+ if not content.startswith("---"):
+ raise ValueError("SKILL.md must start with YAML frontmatter")
+
+ # Split frontmatter from body
+ parts = content.split("---", 2)
+ if len(parts) < 3:
+ raise ValueError("Invalid frontmatter format")
+
+ frontmatter_yaml = parts[1].strip()
+ frontmatter = yaml.safe_load(frontmatter_yaml)
+
+ return MarkdownSkillMetadata(**frontmatter)
+
+ @staticmethod
+ def _build_config(metadata: MarkdownSkillMetadata) -> SkillConfig:
+ """Build SkillConfig from metadata."""
+ config = SkillConfig()
+
+ # Parse compatibility for network requirements
+ if metadata.compatibility:
+ if "network" in metadata.compatibility.lower():
+ config.allow_network = True
+
+ return config
+
+ # =========================================================================
+ # Progressive Disclosure Implementation
+ # =========================================================================
+
+ def get_instructions(self) -> str:
+ """Get full SKILL.md instructions (Stage 2).
+
+ Loads and caches the markdown body on first access.
+ """
+ if self._instructions_cache is None:
+ skill_md_path = self.skill_path / "SKILL.md"
+ content = skill_md_path.read_text(encoding="utf-8")
+
+ # Extract body after frontmatter
+ parts = content.split("---", 2)
+ self._instructions_cache = parts[2].strip() if len(parts) > 2 else ""
+ self._current_stage = max(self._current_stage, 2)
+
+ return self._instructions_cache
+
+ def get_script(self, script_name: str) -> Optional[str]:
+ """Get script content from scripts/ directory (Stage 3).
+
+ Args:
+ script_name: Name of the script file.
+
+ Returns:
+ Script content or None if not found.
+ """
+ if script_name not in self._scripts_cache:
+ script_path = self.skill_path / "scripts" / script_name
+ if script_path.exists():
+ self._scripts_cache[script_name] = script_path.read_text(
+ encoding="utf-8"
+ )
+ self._current_stage = 3
+ else:
+ return None
+
+ return self._scripts_cache.get(script_name)
+
+ def get_reference(self, ref_name: str) -> Optional[str]:
+ """Get reference content from references/ directory (Stage 3).
+
+ Args:
+ ref_name: Name of the reference file.
+
+ Returns:
+ Reference content or None if not found.
+ """
+ if ref_name not in self._references_cache:
+ ref_path = self.skill_path / "references" / ref_name
+ if ref_path.exists():
+ self._references_cache[ref_name] = ref_path.read_text(
+ encoding="utf-8"
+ )
+ self._current_stage = 3
+ else:
+ return None
+
+ return self._references_cache.get(ref_name)
+
+ def get_asset_path(self, asset_name: str) -> Optional[Path]:
+ """Get absolute path to an asset file (Stage 3).
+
+ Args:
+ asset_name: Relative path within assets/ directory.
+
+ Returns:
+ Absolute Path or None if not found.
+ """
+ asset_path = self.skill_path / "assets" / asset_name
+ if asset_path.exists():
+ self._current_stage = 3
+ return asset_path
+ return None
+
+ def list_scripts(self) -> List[str]:
+ """List available scripts in the skill."""
+ scripts_dir = self.skill_path / "scripts"
+ if scripts_dir.exists():
+ return [f.name for f in scripts_dir.iterdir() if f.is_file()]
+ return []
+
+ def list_references(self) -> List[str]:
+ """List available references in the skill."""
+ refs_dir = self.skill_path / "references"
+ if refs_dir.exists():
+ return [f.name for f in refs_dir.iterdir() if f.is_file()]
+ return []
+
+ def list_assets(self) -> List[str]:
+ """List available assets in the skill."""
+ assets_dir = self.skill_path / "assets"
+ if assets_dir.exists():
+ return [
+ str(f.relative_to(assets_dir))
+ for f in assets_dir.rglob("*")
+ if f.is_file()
+ ]
+ return []
+
+ # =========================================================================
+ # BaseSkill Abstract Method Implementations
+ # =========================================================================
+
+ def get_tool_declarations(self) -> List[dict[str, Any]]:
+ """Return tool declarations extracted from SKILL.md.
+
+ Parses the instructions to find tool references and
+ generates declarations for script-based tools.
+ """
+ declarations = []
+
+ # Add script-based tools
+ for script_name in self.list_scripts():
+ script_path = self.skill_path / "scripts" / script_name
+ docstring = self._extract_script_docstring(script_path)
+
+ declarations.append({
+ "name": f"run_{script_name.replace('.', '_')}",
+ "description": docstring or f"Execute {script_name}",
+ "parameters": {
+ "args": "Command-line arguments for the script"
+ }
+ })
+
+ # Add reference loading tools
+ for ref_name in self.list_references():
+ declarations.append({
+ "name": f"load_reference_{ref_name.replace('.', '_')}",
+ "description": f"Load reference document: {ref_name}",
+ })
+
+ return declarations
+
+ def get_orchestration_template(self) -> str:
+ """Return example orchestration code for this skill.
+
+ Generates a template based on available scripts and tools.
+ """
+ scripts = self.list_scripts()
+
+ if not scripts:
+ return f'''
+async def use_{self.name.replace("-", "_")}(tools):
+ """Example orchestration for {self.name} skill."""
+ # This skill provides instructions but no bundled scripts.
+ # Follow the instructions in the SKILL.md file.
+ return {{"status": "ready", "skill": "{self.name}"}}
+'''
+
+ script_calls = "\n ".join(
+ f'result_{i} = await tools.run_{s.replace(".", "_")}(args="")'
+ for i, s in enumerate(scripts[:3])
+ )
+
+ return f'''
+async def use_{self.name.replace("-", "_")}(tools):
+ """Example orchestration for {self.name} skill."""
+ {script_calls}
+ return {{"results": [result_0]}}
+'''
+
+ def get_skill_prompt(self) -> str:
+ """Generate LLM-friendly skill description with progressive detail."""
+ base_prompt = super().get_skill_prompt()
+
+ # Add available resources
+ scripts = self.list_scripts()
+ refs = self.list_references()
+
+ resources = []
+ if scripts:
+ resources.append(f"Scripts: {', '.join(scripts)}")
+ if refs:
+ resources.append(f"References: {', '.join(refs)}")
+
+ if resources:
+ base_prompt += f"\n\nAvailable resources:\n" + "\n".join(
+ f" - {r}" for r in resources
+ )
+
+ return base_prompt
+
+ @staticmethod
+ def _extract_script_docstring(script_path: Path) -> Optional[str]:
+ """Extract docstring from a Python script."""
+ if not script_path.suffix == ".py":
+ return None
+
+ try:
+ content = script_path.read_text(encoding="utf-8")
+ # Simple regex to extract module docstring
+ match = re.match(r'^"""(.+?)"""', content, re.DOTALL)
+ if match:
+ return match.group(1).strip().split("\n")[0]
+ except Exception:
+ pass
+
+ return None
+```
+
+#### 2. AgentSkillLoader Class
+
+Discovers and loads skills from directories:
+
+```python
+# src/google/adk/skills/agent_skill_loader.py
+
+from __future__ import annotations
+
+import logging
+from pathlib import Path
+from typing import Dict, List, Optional, Union
+
+from .base_skill import BaseSkill
+from .markdown_skill import MarkdownSkill
+from .skill_manager import SkillsManager
+
+logger = logging.getLogger("google_adk.skills")
+
+
+class AgentSkillLoader:
+ """Discovers and loads Agent Skills standard skills.
+
+ Implements progressive disclosure by loading only metadata initially,
+ with full content loaded on-demand when skills are activated.
+
+ Example:
+ ```python
+ loader = AgentSkillLoader()
+
+ # Discover skills from multiple directories
+ loader.add_skill_directory("/path/to/skills")
+ loader.add_skill_directory("/path/to/custom-skills")
+
+ # Get all discovered skills (Stage 1 - metadata only)
+ skills = loader.get_all_skills()
+
+ # Register with SkillsManager
+ manager = SkillsManager()
+ loader.register_all(manager)
+
+ # Generate discovery prompt for LLM
+ prompt = loader.generate_discovery_prompt()
+ ```
+ """
+
+ def __init__(self):
+ self._skill_directories: List[Path] = []
+ self._discovered_skills: Dict[str, MarkdownSkill] = {}
+ self._load_errors: Dict[str, str] = {}
+
+ def add_skill_directory(self, path: Union[str, Path]) -> int:
+ """Add a directory to scan for skills.
+
+ Args:
+ path: Directory containing skill folders.
+
+ Returns:
+ Number of skills discovered in this directory.
+
+ Raises:
+ FileNotFoundError: If directory doesn't exist.
+ """
+ dir_path = Path(path)
+ if not dir_path.exists():
+ raise FileNotFoundError(f"Skill directory not found: {path}")
+
+ if not dir_path.is_dir():
+ raise ValueError(f"Path is not a directory: {path}")
+
+ self._skill_directories.append(dir_path)
+ return self._discover_skills_in_directory(dir_path)
+
+ def _discover_skills_in_directory(self, dir_path: Path) -> int:
+ """Discover all skills in a directory."""
+ count = 0
+
+ for item in dir_path.iterdir():
+ if not item.is_dir():
+ continue
+
+ skill_md = item / "SKILL.md"
+ if not skill_md.exists():
+ continue
+
+ try:
+ skill = MarkdownSkill.from_directory(item)
+ self._discovered_skills[skill.name] = skill
+ count += 1
+ logger.info(f"Discovered skill: {skill.name}")
+ except Exception as e:
+ self._load_errors[str(item)] = str(e)
+ logger.warning(f"Failed to load skill from {item}: {e}")
+
+ return count
+
+ def get_skill(self, name: str) -> Optional[MarkdownSkill]:
+ """Get a discovered skill by name."""
+ return self._discovered_skills.get(name)
+
+ def get_all_skills(self) -> List[MarkdownSkill]:
+ """Get all discovered skills."""
+ return list(self._discovered_skills.values())
+
+ def get_skill_names(self) -> List[str]:
+ """Get names of all discovered skills."""
+ return list(self._discovered_skills.keys())
+
+ def get_load_errors(self) -> Dict[str, str]:
+ """Get any errors encountered during discovery."""
+ return self._load_errors.copy()
+
+ def register_all(self, manager: SkillsManager) -> int:
+ """Register all discovered skills with a SkillsManager.
+
+ Args:
+ manager: The SkillsManager to register skills with.
+
+ Returns:
+ Number of skills registered.
+ """
+ count = 0
+ for skill in self._discovered_skills.values():
+ try:
+ manager.register_skill(skill)
+ count += 1
+ except ValueError as e:
+ logger.warning(f"Failed to register skill {skill.name}: {e}")
+
+ return count
+
+ def generate_discovery_prompt(self) -> str:
+ """Generate XML prompt with skill metadata for LLM discovery.
+
+ This implements Stage 1 of progressive disclosure - only
+ name and description are included, keeping context minimal.
+
+ Returns:
+ XML-formatted string with available skills.
+ """
+ if not self._discovered_skills:
+ return ""
+
+ skills_xml = []
+ for skill in self._discovered_skills.values():
+ skill_xml = f"""
+ {skill.name}
+ {skill.description}
+ {len(skill.list_scripts()) > 0}
+ {len(skill.list_references()) > 0}
+ """
+ skills_xml.append(skill_xml)
+
+ return f"""
+{chr(10).join(skills_xml)}
+"""
+
+ def generate_activation_prompt(self, skill_name: str) -> Optional[str]:
+ """Generate full skill prompt for activation (Stage 2).
+
+ Args:
+ skill_name: Name of the skill to activate.
+
+ Returns:
+ Full skill instructions or None if skill not found.
+ """
+ skill = self._discovered_skills.get(skill_name)
+ if not skill:
+ return None
+
+ instructions = skill.get_instructions()
+ resources = []
+
+ scripts = skill.list_scripts()
+ if scripts:
+ resources.append(f"Available scripts: {', '.join(scripts)}")
+
+ refs = skill.list_references()
+ if refs:
+ resources.append(f"Available references: {', '.join(refs)}")
+
+ resource_section = ""
+ if resources:
+ resource_section = "\n\n## Available Resources\n" + "\n".join(
+ f"- {r}" for r in resources
+ )
+
+ return f"""# Skill: {skill.name}
+
+{instructions}
+{resource_section}
+"""
+```
+
+#### 3. ScriptExecutor Class
+
+Safely executes bundled scripts:
+
+```python
+# src/google/adk/skills/script_executor.py
+
+from __future__ import annotations
+
+import asyncio
+import os
+import subprocess
+import tempfile
+from pathlib import Path
+from typing import Any, Dict, List, Optional
+
+from pydantic import BaseModel, Field
+
+from ..utils.feature_decorator import experimental
+
+
+class ScriptExecutionResult(BaseModel):
+ """Result from script execution."""
+
+ success: bool
+ stdout: str = ""
+ stderr: str = ""
+ return_code: int = 0
+ execution_time_ms: float = 0.0
+
+
+@experimental
+class ScriptExecutor(BaseModel):
+ """Executes scripts from Agent Skills bundles.
+
+ Provides sandboxed execution of Python, Bash, and JavaScript
+ scripts bundled with skills.
+
+ Security features:
+ - Execution timeout
+ - Working directory isolation
+ - Environment variable filtering
+ - Optional container sandboxing
+
+ Example:
+ ```python
+ executor = ScriptExecutor(
+ timeout_seconds=30.0,
+ allow_network=False,
+ )
+
+ result = await executor.execute_script(
+ script_path=Path("/path/to/skill/scripts/extract.py"),
+ args=["--input", "file.pdf"],
+ working_dir=Path("/tmp/workspace"),
+ )
+
+ if result.success:
+ print(result.stdout)
+ else:
+ print(f"Error: {result.stderr}")
+ ```
+ """
+
+ timeout_seconds: float = Field(
+ default=60.0,
+ description="Maximum execution time in seconds.",
+ )
+ allow_network: bool = Field(
+ default=False,
+ description="Whether to allow network access.",
+ )
+ memory_limit_mb: int = Field(
+ default=256,
+ description="Memory limit in megabytes.",
+ )
+ use_container: bool = Field(
+ default=False,
+ description="Use container isolation (requires Docker).",
+ )
+ allowed_env_vars: List[str] = Field(
+ default_factory=lambda: ["PATH", "HOME", "LANG", "LC_ALL"],
+ description="Environment variables to pass through.",
+ )
+
+ model_config = ConfigDict(extra="forbid")
+
+ async def execute_script(
+ self,
+ script_path: Path,
+ args: List[str] = None,
+ working_dir: Optional[Path] = None,
+ env: Optional[Dict[str, str]] = None,
+ ) -> ScriptExecutionResult:
+ """Execute a script file.
+
+ Args:
+ script_path: Path to the script file.
+ args: Command-line arguments.
+ working_dir: Working directory for execution.
+ env: Additional environment variables.
+
+ Returns:
+ ScriptExecutionResult with stdout, stderr, and status.
+ """
+ import time
+
+ args = args or []
+ start_time = time.time()
+
+ # Determine interpreter based on file extension
+ interpreter = self._get_interpreter(script_path)
+
+ # Build command
+ cmd = [interpreter, str(script_path)] + args
+
+ # Build safe environment
+ safe_env = self._build_safe_env(env)
+
+ # Set working directory
+ cwd = working_dir or script_path.parent
+
+ try:
+ if self.use_container:
+ result = await self._execute_in_container(
+ cmd, cwd, safe_env
+ )
+ else:
+ result = await self._execute_subprocess(
+ cmd, cwd, safe_env
+ )
+
+ execution_time = (time.time() - start_time) * 1000
+ result.execution_time_ms = execution_time
+ return result
+
+ except asyncio.TimeoutError:
+ execution_time = (time.time() - start_time) * 1000
+ return ScriptExecutionResult(
+ success=False,
+ stderr=f"Execution timed out after {self.timeout_seconds}s",
+ return_code=-1,
+ execution_time_ms=execution_time,
+ )
+ except Exception as e:
+ execution_time = (time.time() - start_time) * 1000
+ return ScriptExecutionResult(
+ success=False,
+ stderr=str(e),
+ return_code=-1,
+ execution_time_ms=execution_time,
+ )
+
+ def _get_interpreter(self, script_path: Path) -> str:
+ """Determine the interpreter for a script."""
+ suffix = script_path.suffix.lower()
+
+ interpreters = {
+ ".py": "python3",
+ ".sh": "bash",
+ ".bash": "bash",
+ ".js": "node",
+ ".mjs": "node",
+ }
+
+ if suffix not in interpreters:
+ raise ValueError(f"Unsupported script type: {suffix}")
+
+ return interpreters[suffix]
+
+ def _build_safe_env(
+ self, additional_env: Optional[Dict[str, str]] = None
+ ) -> Dict[str, str]:
+ """Build a safe environment for script execution."""
+ safe_env = {}
+
+ # Only pass allowed environment variables
+ for var in self.allowed_env_vars:
+ if var in os.environ:
+ safe_env[var] = os.environ[var]
+
+ # Add additional environment variables
+ if additional_env:
+ safe_env.update(additional_env)
+
+ return safe_env
+
+ async def _execute_subprocess(
+ self,
+ cmd: List[str],
+ cwd: Path,
+ env: Dict[str, str],
+ ) -> ScriptExecutionResult:
+ """Execute script using subprocess."""
+ proc = await asyncio.create_subprocess_exec(
+ *cmd,
+ stdout=asyncio.subprocess.PIPE,
+ stderr=asyncio.subprocess.PIPE,
+ cwd=str(cwd),
+ env=env,
+ )
+
+ try:
+ stdout, stderr = await asyncio.wait_for(
+ proc.communicate(),
+ timeout=self.timeout_seconds,
+ )
+
+ return ScriptExecutionResult(
+ success=proc.returncode == 0,
+ stdout=stdout.decode("utf-8", errors="replace"),
+ stderr=stderr.decode("utf-8", errors="replace"),
+ return_code=proc.returncode or 0,
+ )
+ except asyncio.TimeoutError:
+ proc.kill()
+ raise
+
+ async def _execute_in_container(
+ self,
+ cmd: List[str],
+ cwd: Path,
+ env: Dict[str, str],
+ ) -> ScriptExecutionResult:
+ """Execute script in a Docker container."""
+ # Build Docker command
+ docker_cmd = [
+ "docker", "run", "--rm",
+ f"--memory={self.memory_limit_mb}m",
+ f"--cpus=1",
+ f"-v", f"{cwd}:/workspace:ro",
+ "-w", "/workspace",
+ ]
+
+ # Add network isolation if required
+ if not self.allow_network:
+ docker_cmd.extend(["--network", "none"])
+
+ # Add environment variables
+ for key, value in env.items():
+ docker_cmd.extend(["-e", f"{key}={value}"])
+
+ # Use appropriate base image
+ docker_cmd.extend(["python:3.11-slim"])
+ docker_cmd.extend(cmd)
+
+ return await self._execute_subprocess(
+ docker_cmd, cwd, os.environ.copy()
+ )
+```
+
+#### 4. SkillTool Wrapper
+
+Exposes skills as ADK tools for LLM invocation:
+
+```python
+# src/google/adk/skills/skill_tool.py
+
+from __future__ import annotations
+
+from typing import Any, Optional
+
+from google.genai import types
+
+from ..tools.base_tool import BaseTool
+from ..tools.tool_context import ToolContext
+from .base_skill import BaseSkill
+from .markdown_skill import MarkdownSkill
+from .script_executor import ScriptExecutor
+
+
+class SkillTool(BaseTool):
+ """Wraps a Skill as a BaseTool for LLM invocation.
+
+ Provides three action types:
+ - "activate": Load full skill instructions (Stage 2)
+ - "run_script": Execute a bundled script (Stage 3)
+ - "load_reference": Load a reference document (Stage 3)
+
+ Example:
+ ```python
+ skill = MarkdownSkill.from_directory("/path/to/pdf-processing")
+ tool = SkillTool(skill)
+
+ # LLM can invoke:
+ # - {"action": "activate"} → Returns full instructions
+ # - {"action": "run_script", "script": "extract.py", "args": [...]}
+ # - {"action": "load_reference", "reference": "FORMS.md"}
+ ```
+ """
+
+ def __init__(
+ self,
+ skill: BaseSkill,
+ script_executor: Optional[ScriptExecutor] = None,
+ ):
+ super().__init__(
+ name=f"skill_{skill.name.replace('-', '_')}",
+ description=self._build_description(skill),
+ )
+ self._skill = skill
+ self._script_executor = script_executor or ScriptExecutor()
+
+ def _build_description(self, skill: BaseSkill) -> str:
+ """Build tool description from skill metadata."""
+ desc = f"{skill.description}\n\n"
+ desc += "Actions:\n"
+ desc += "- activate: Load full skill instructions\n"
+
+ if isinstance(skill, MarkdownSkill):
+ scripts = skill.list_scripts()
+ if scripts:
+ desc += f"- run_script: Execute scripts ({', '.join(scripts)})\n"
+
+ refs = skill.list_references()
+ if refs:
+ desc += f"- load_reference: Load references ({', '.join(refs)})\n"
+
+ return desc
+
+ def _get_declaration(self) -> Optional[types.FunctionDeclaration]:
+ """Get function declaration for LLM."""
+ properties = {
+ "action": types.Schema(
+ type="STRING",
+ description="Action: 'activate', 'run_script', or 'load_reference'",
+ enum=["activate", "run_script", "load_reference"],
+ ),
+ "script": types.Schema(
+ type="STRING",
+ description="Script name (for run_script action)",
+ ),
+ "args": types.Schema(
+ type="ARRAY",
+ items=types.Schema(type="STRING"),
+ description="Arguments for script execution",
+ ),
+ "reference": types.Schema(
+ type="STRING",
+ description="Reference file name (for load_reference action)",
+ ),
+ }
+
+ return types.FunctionDeclaration(
+ name=self.name,
+ description=self.description,
+ parameters=types.Schema(
+ type="OBJECT",
+ properties=properties,
+ required=["action"],
+ ),
+ )
+
+ async def run_async(
+ self,
+ *,
+ args: dict[str, Any],
+ tool_context: ToolContext,
+ ) -> Any:
+ """Execute the skill action."""
+ action = args.get("action", "activate")
+
+ if action == "activate":
+ return self._handle_activate()
+
+ elif action == "run_script":
+ return await self._handle_run_script(args, tool_context)
+
+ elif action == "load_reference":
+ return self._handle_load_reference(args)
+
+ else:
+ return {"error": f"Unknown action: {action}"}
+
+ def _handle_activate(self) -> dict:
+ """Handle skill activation (Stage 2)."""
+ if isinstance(self._skill, MarkdownSkill):
+ instructions = self._skill.get_instructions()
+ return {
+ "status": "activated",
+ "skill": self._skill.name,
+ "instructions": instructions,
+ "available_scripts": self._skill.list_scripts(),
+ "available_references": self._skill.list_references(),
+ }
+ else:
+ return {
+ "status": "activated",
+ "skill": self._skill.name,
+ "prompt": self._skill.get_skill_prompt(),
+ }
+
+ async def _handle_run_script(
+ self, args: dict, tool_context: ToolContext
+ ) -> dict:
+ """Handle script execution (Stage 3)."""
+ if not isinstance(self._skill, MarkdownSkill):
+ return {"error": "Skill does not support scripts"}
+
+ script_name = args.get("script")
+ if not script_name:
+ return {"error": "Script name required"}
+
+ script_args = args.get("args", [])
+
+ # Get script path
+ script_path = self._skill.skill_path / "scripts" / script_name
+ if not script_path.exists():
+ available = self._skill.list_scripts()
+ return {
+ "error": f"Script not found: {script_name}",
+ "available_scripts": available,
+ }
+
+ # Execute script
+ result = await self._script_executor.execute_script(
+ script_path=script_path,
+ args=script_args,
+ working_dir=tool_context.get_working_directory(),
+ )
+
+ return {
+ "script": script_name,
+ "success": result.success,
+ "stdout": result.stdout,
+ "stderr": result.stderr,
+ "return_code": result.return_code,
+ "execution_time_ms": result.execution_time_ms,
+ }
+
+ def _handle_load_reference(self, args: dict) -> dict:
+ """Handle reference loading (Stage 3)."""
+ if not isinstance(self._skill, MarkdownSkill):
+ return {"error": "Skill does not support references"}
+
+ ref_name = args.get("reference")
+ if not ref_name:
+ return {"error": "Reference name required"}
+
+ content = self._skill.get_reference(ref_name)
+ if content is None:
+ available = self._skill.list_references()
+ return {
+ "error": f"Reference not found: {ref_name}",
+ "available_references": available,
+ }
+
+ return {
+ "reference": ref_name,
+ "content": content,
+ }
+```
+
+### Integration with LlmAgent
+
+Skills integrate with ADK agents through the `skills` field:
+
+```python
+from google.adk.agents import LlmAgent
+from google.adk.skills import SkillsManager, AgentSkillLoader, SkillTool
+
+# Load Agent Skills standard skills
+loader = AgentSkillLoader()
+loader.add_skill_directory("./skills")
+
+# Create skills manager
+skills_manager = SkillsManager()
+loader.register_all(skills_manager)
+
+# Convert skills to tools for LLM
+skill_tools = [
+ SkillTool(skill) for skill in skills_manager.get_all_skills()
+]
+
+# Create agent with skills
+agent = LlmAgent(
+ name="skilled_agent",
+ model="gemini-2.0-flash",
+ instruction=f"""You are a helpful assistant with access to skills.
+
+{loader.generate_discovery_prompt()}
+
+To use a skill:
+1. First activate it to get full instructions
+2. Then use run_script or load_reference as needed
+""",
+ tools=skill_tools,
+)
+```
+
+### Extended SKILL.md Frontmatter for ADK
+
+ADK extends the standard frontmatter with additional fields:
+
+```yaml
+---
+name: advanced-pdf-processing
+description: Advanced PDF processing with OCR and form filling capabilities.
+license: Apache-2.0
+compatibility: Requires Python 3.8+, Tesseract OCR
+
+# Standard metadata
+metadata:
+ author: google-adk
+ version: "2.0"
+ category: documents
+
+# ADK-specific extensions
+adk:
+ # Execution configuration
+ config:
+ max_parallel_calls: 5
+ timeout_seconds: 120
+ allow_network: true
+ memory_limit_mb: 512
+
+ # PTC enablement
+ allowed_callers:
+ - code_execution_20250825
+
+ # Tool declarations for scripts
+ tools:
+ - name: extract_text
+ script: scripts/extract_text.py
+ description: Extract text from PDF pages
+ parameters:
+ input_file: Path to PDF file
+ pages: Optional page range (e.g., "1-5")
+
+ - name: fill_form
+ script: scripts/fill_form.py
+ description: Fill PDF form fields
+ parameters:
+ input_file: Path to PDF form
+ field_values: JSON object with field names and values
+
+ # Result filtering rules
+ filter_rules:
+ - field: raw_text
+ action: truncate
+ max_length: 10000
+ - field: metadata.password
+ action: remove
+---
+```
+
+### Progressive Disclosure Flow
+
+```
+┌─────────────────────────────────────────────────────────────────────────────┐
+│ Progressive Disclosure in ADK │
+├─────────────────────────────────────────────────────────────────────────────┤
+│ │
+│ STAGE 1: Discovery (~100 tokens per skill) │
+│ ┌───────────────────────────────────────────────────────────────────────┐ │
+│ │ │ │
+│ │ │ │
+│ │ pdf-processing │ │
+│ │ Extract text, fill forms, merge PDFs │ │
+│ │ │ │
+│ │ │ │
+│ └───────────────────────────────────────────────────────────────────────┘ │
+│ │ │
+│ User: "I need to extract text from a PDF" │
+│ │ │
+│ ▼ │
+│ STAGE 2: Activation (~2000-5000 tokens) │
+│ ┌───────────────────────────────────────────────────────────────────────┐ │
+│ │ LLM calls: skill_pdf_processing(action="activate") │ │
+│ │ │ │
+│ │ Returns full SKILL.md instructions: │ │
+│ │ - When to use this skill │ │
+│ │ - Prerequisites │ │
+│ │ - Step-by-step instructions │ │
+│ │ - Available scripts: [extract_text.py, merge_pdfs.py] │ │
+│ │ - Available references: [FORMS.md] │ │
+│ └───────────────────────────────────────────────────────────────────────┘ │
+│ │ │
+│ LLM reads instructions, decides to run script │
+│ │ │
+│ ▼ │
+│ STAGE 3: Execution (on-demand resources) │
+│ ┌───────────────────────────────────────────────────────────────────────┐ │
+│ │ LLM calls: skill_pdf_processing( │ │
+│ │ action="run_script", │ │
+│ │ script="extract_text.py", │ │
+│ │ args=["--input", "document.pdf"] │ │
+│ │ ) │ │
+│ │ │ │
+│ │ ScriptExecutor runs extract_text.py in sandbox │ │
+│ │ Returns: {"stdout": "Extracted text...", "success": true} │ │
+│ └───────────────────────────────────────────────────────────────────────┘ │
+│ │
+└─────────────────────────────────────────────────────────────────────────────┘
+```
+
+### Module Exports
+
+```python
+# src/google/adk/skills/__init__.py
+
+from .base_skill import BaseSkill, SkillConfig
+from .skill_manager import SkillInvocationResult, SkillsManager
+from .markdown_skill import MarkdownSkill, MarkdownSkillMetadata
+from .agent_skill_loader import AgentSkillLoader
+from .script_executor import ScriptExecutor, ScriptExecutionResult
+from .skill_tool import SkillTool
+
+__all__ = [
+ # Core abstractions
+ "BaseSkill",
+ "SkillConfig",
+ "SkillsManager",
+ "SkillInvocationResult",
+ # Agent Skills standard support
+ "MarkdownSkill",
+ "MarkdownSkillMetadata",
+ "AgentSkillLoader",
+ # Execution
+ "ScriptExecutor",
+ "ScriptExecutionResult",
+ # Tool integration
+ "SkillTool",
+]
+```
+
+### Comparison: ADK Skills vs Agent Skills Standard
+
+| Feature | Agent Skills Standard | ADK Skills (Current) | ADK Skills (Enhanced) |
+|---------|----------------------|---------------------|----------------------|
+| **Format** | SKILL.md files | Python classes | Both supported |
+| **Discovery** | YAML frontmatter | Class attributes | Unified loader |
+| **Progressive Disclosure** | 3 stages | Partial | Full 3-stage support |
+| **Scripts** | scripts/ directory | Not supported | Full support |
+| **References** | references/ directory | Not supported | Full support |
+| **Assets** | assets/ directory | Not supported | Full support |
+| **PTC Support** | Not specified | Yes | Yes |
+| **Result Filtering** | Not specified | Yes | Yes + configurable |
+| **Security** | Recommendations | 4-layer defense | 4-layer defense |
+| **Tool Declarations** | Not specified | Required method | Auto-generated |
+| **Portability** | Cross-platform | ADK only | Cross-platform compatible |
+
+### Usage Example: Using Anthropic Skills in ADK
+
+```python
+# Clone the Anthropic skills repository
+# git clone https://github.com/anthropics/skills ./anthropic-skills
+
+from google.adk.agents import LlmAgent
+from google.adk.skills import AgentSkillLoader, SkillTool, SkillsManager
+
+# Load skills from Anthropic's repository
+loader = AgentSkillLoader()
+loader.add_skill_directory("./anthropic-skills/skills")
+
+# Check what was loaded
+print(f"Discovered {len(loader.get_skill_names())} skills:")
+for name in loader.get_skill_names():
+ skill = loader.get_skill(name)
+ print(f" - {name}: {skill.description[:50]}...")
+
+# Create agent with these skills
+skills_manager = SkillsManager()
+loader.register_all(skills_manager)
+
+skill_tools = [SkillTool(s) for s in skills_manager.get_all_skills()]
+
+agent = LlmAgent(
+ name="anthropic_skills_agent",
+ model="gemini-2.0-flash",
+ instruction=f"""You have access to skills from the Agent Skills standard.
+
+{loader.generate_discovery_prompt()}
+
+Use the activate action first to learn how to use each skill.
+""",
+ tools=skill_tools,
+)
+```
+
+### File Structure for Enhanced Skills Module
+
+```
+src/google/adk/skills/
+├── __init__.py # Module exports
+├── base_skill.py # BaseSkill abstract class (existing)
+├── skill_manager.py # SkillsManager (existing)
+├── markdown_skill.py # NEW: MarkdownSkill for SKILL.md
+├── agent_skill_loader.py # NEW: Discovery and loading
+├── script_executor.py # NEW: Script execution
+├── skill_tool.py # NEW: SkillTool wrapper
+└── builtin/ # Built-in skills
+ ├── __init__.py
+ └── ...
+```
+
+---
+
+## Security Considerations
+
+Script execution introduces security risks. Implement appropriate safeguards:
+
+| Measure | Description |
+|---------|-------------|
+| **Sandboxing** | Run scripts in isolated environments |
+| **Allowlisting** | Only execute scripts from trusted skills |
+| **Confirmation** | Ask users before running potentially dangerous operations |
+| **Logging** | Record all script executions for auditing |
+| **Source Verification** | Install skills only from trusted sources |
+| **Audit** | Review bundled files, dependencies, and external network connections before deployment |
+
+---
+
+## Reference Library (skills-ref)
+
+The `skills-ref` library provides Python utilities and a CLI for working with Agent Skills.
+
+### Installation
+
+```bash
+# Using pip
+pip install skills-ref
+
+# Using uv
+uv sync
+```
+
+### CLI Commands
+
+```bash
+# Validate a skill directory
+skills-ref validate
+
+# Extract skill metadata as JSON
+skills-ref read-properties
+
+# Generate XML for agent prompts
+skills-ref to-prompt ...
+```
+
+### Python API
+
+```python
+from skills_ref import validate, read_properties, to_prompt
+
+# Validate skill directories and get error reports
+errors = validate("/path/to/skill")
+
+# Read skill configuration and metadata
+metadata = read_properties("/path/to/skill")
+
+# Create XML formatted skill descriptions for system prompts
+xml_prompt = to_prompt(["/path/to/skill1", "/path/to/skill2"])
+```
+
+**Note**: This library is intended for demonstration purposes. It is not meant to be used in production.
+
+**Repository**: https://github.com/agentskills/agentskills/tree/main/skills-ref
+
+---
+
+## Best Practices
+
+### Development Workflow
+
+1. **Start with Evaluation**: Identify capability gaps through representative task testing
+2. **Structure for Scale**: Split unwieldy documentation into separate, logically organized files
+3. **Think from the Agent's Perspective**: Monitor real usage patterns and iterate based on skill triggering behavior
+4. **Iterate with the Agent**: Collaborate with the agent to capture successful approaches into reusable skill components
+
+### Writing Effective Skills
+
+| Guideline | Description |
+|-----------|-------------|
+| **Clear Descriptions** | Write descriptions that help agents determine when to activate the skill |
+| **Concise Instructions** | Keep the main SKILL.md focused; use references for detailed content |
+| **Concrete Examples** | Include input/output examples to demonstrate expected behavior |
+| **Handle Edge Cases** | Document known limitations and workarounds |
+| **Self-Contained Scripts** | Scripts should document dependencies and include helpful error messages |
+| **Logical Organization** | Group related information and use clear section headers |
+
+### Context Efficiency
+
+- Load metadata at startup (~100 tokens per skill)
+- Keep main instructions under 5000 tokens
+- Split large reference materials into separate files
+- Use progressive disclosure - load details only when needed
+
+---
+
+## Example Skills
+
+### Minimal Skill
+
+```
+hello-world/
+└── SKILL.md
+```
+
+```markdown
+---
+name: hello-world
+description: Demonstrates basic skill structure. Use when learning about skills.
+---
+
+# Hello World Skill
+
+## Instructions
+1. Respond with "Hello, World!" when activated
+2. Explain that this is a demonstration skill
+
+## Example
+User: "Can you demonstrate a skill?"
+Agent: "Hello, World! This is a demonstration of the Agent Skills format."
+```
+
+### Document Processing Skill
+
+```
+pdf-processing/
+├── SKILL.md
+├── scripts/
+│ ├── extract_text.py
+│ └── merge_pdfs.py
+├── references/
+│ └── FORMS.md
+└── assets/
+ └── templates/
+ └── invoice_template.pdf
+```
+
+```markdown
+---
+name: pdf-processing
+description: Extract text and tables from PDF files, fill PDF forms, and merge multiple PDFs. Use when working with PDF documents.
+license: Apache-2.0
+compatibility: Requires Python 3.8+, pdfplumber, PyPDF2
+metadata:
+ author: example-org
+ version: "1.0"
+---
+
+# PDF Processing
+
+## When to use this skill
+Use this skill when the user needs to:
+- Extract text from PDF documents
+- Extract tables from PDFs
+- Fill in PDF forms
+- Merge multiple PDFs into one
+
+## Prerequisites
+- Python 3.8 or higher
+- pdfplumber package
+- PyPDF2 package
+
+## Text Extraction
+
+### Using pdfplumber
+```python
+import pdfplumber
+
+with pdfplumber.open("document.pdf") as pdf:
+ for page in pdf.pages:
+ text = page.extract_text()
+ print(text)
+```
+
+### Using the bundled script
+Run `scripts/extract_text.py `
+
+## Table Extraction
+...
+
+## Form Filling
+See [FORMS.md](references/FORMS.md) for detailed form handling instructions.
+
+## Merging PDFs
+Run `scripts/merge_pdfs.py ...`
+```
+
+### Production Skills
+
+The following production-grade skills are available as reference implementations:
+
+| Skill | Description | Repository |
+|-------|-------------|------------|
+| `docx` | Word document creation/editing | anthropics/skills |
+| `pdf` | PDF manipulation | anthropics/skills |
+| `pptx` | PowerPoint creation/editing | anthropics/skills |
+| `xlsx` | Excel spreadsheet operations | anthropics/skills |
+
+---
+
+## Resources
+
+### Official Documentation
+
+- **Agent Skills Website**: https://agentskills.io
+- **Specification**: https://agentskills.io/specification
+- **What Are Skills?**: https://agentskills.io/what-are-skills
+- **Integration Guide**: https://agentskills.io/integrate-skills
+
+### GitHub Repositories
+
+- **Agent Skills Framework**: https://github.com/agentskills/agentskills
+- **Example Skills**: https://github.com/anthropics/skills
+- **Reference Library**: https://github.com/agentskills/agentskills/tree/main/skills-ref
+
+### LangChain/LangGraph Resources
+
+- **Multi-Agent Patterns**: https://docs.langchain.com/oss/python/langchain/multi-agent
+- **Skills Pattern**: https://docs.langchain.com/oss/python/langchain/multi-agent/skills
+- **LangGraph Workflows**: https://docs.langchain.com/oss/python/langgraph/workflows-agents
+- **LangGraph Official Site**: https://www.langchain.com/langgraph
+- **Multi-Agent Workflows Blog**: https://www.blog.langchain.com/langgraph-multi-agent-workflows/
+
+### Google ADK Resources
+
+- **ADK Repository**: https://github.com/google/adk-python
+- **ADK Skills Module**: `src/google/adk/skills/`
+- **PTC Design Document**: `docs/skills_programmatic_tool_calling_design.md`
+- **Key Files**:
+ - `base_skill.py` - BaseSkill abstract class
+ - `skill_manager.py` - SkillsManager registry
+ - `markdown_skill.py` - SKILL.md file loader (proposed)
+ - `agent_skill_loader.py` - Discovery and loading (proposed)
+ - `script_executor.py` - Script execution (proposed)
+ - `skill_tool.py` - Tool wrapper (proposed)
+
+### Anthropic Resources
+
+- **Engineering Blog**: https://anthropic.com/engineering/equipping-agents-for-the-real-world-with-agent-skills
+- **Claude Support - What are Skills?**: https://support.claude.com/en/articles/12512176-what-are-skills
+- **Claude Support - Creating Custom Skills**: https://support.claude.com/en/articles/12512198-creating-custom-skills
+- **Claude Support - Using Skills**: https://support.claude.com/en/articles/12512180-using-skills-in-claude
+
+### Platform Support
+
+Skills are supported across:
+- Claude.ai
+- Claude Code
+- Claude Agent SDK
+- Claude Developer Platform
+- LangChain/LangGraph (via skills pattern integration)
+- Google ADK (via MarkdownSkill and AgentSkillLoader)
+
+---
+
+## Appendix: Quick Reference
+
+### SKILL.md Template
+
+```markdown
+---
+name: my-skill-name
+description: Clear description of what this skill does and when to use it.
+license: Apache-2.0
+compatibility: List any environment requirements
+metadata:
+ author: your-name
+ version: "1.0"
+---
+
+# Skill Title
+
+## When to use this skill
+Describe the situations where this skill should be activated.
+
+## Prerequisites
+List any required tools, packages, or access.
+
+## Instructions
+Step-by-step guide for performing the task.
+
+## Examples
+Concrete examples with inputs and outputs.
+
+## Common Edge Cases
+Known limitations and how to handle them.
+
+## Additional Resources
+- [Reference Guide](references/REFERENCE.md)
+- [Scripts](scripts/)
+```
+
+### Validation Checklist
+
+- [ ] Directory name matches `name` field in frontmatter
+- [ ] Name follows conventions (lowercase, hyphens only, 64 chars max)
+- [ ] Description is clear and includes usage context (1024 chars max)
+- [ ] SKILL.md is under 500 lines
+- [ ] Scripts are self-contained with documented dependencies
+- [ ] All file references use relative paths
+- [ ] Security review completed for any executable code
diff --git a/contributing/samples/agent_skills_demo/README.md b/contributing/samples/agent_skills_demo/README.md
new file mode 100644
index 0000000000..698d4825c1
--- /dev/null
+++ b/contributing/samples/agent_skills_demo/README.md
@@ -0,0 +1,127 @@
+# Agent Skills Demo
+
+This demo showcases the Agent Skills standard integration with ADK. It demonstrates how to:
+
+1. Load skills from directories using `AgentSkillLoader`
+2. Convert skills to ADK tools using `SkillTool`
+3. Generate discovery prompts for the LLM
+4. Use skills with progressive disclosure
+
+## Running the Demo
+
+```bash
+# From the adk-python root directory
+adk web contributing/samples/agent_skills_demo
+```
+
+## What's Included
+
+### Skills Loaded
+
+The demo loads BigQuery ML skills from `src/google/adk/tools/bigquery/skills/`:
+
+- **bqml** - Machine learning in BigQuery
+- **bq-ai-operator** - AI operations (text generation, embeddings)
+
+### Skill Structure (Agent Skills Standard)
+
+Each skill follows the Agent Skills standard format:
+
+```
+skill-name/
+├── SKILL.md # Short description and instructions
+├── references/ # Detailed documentation
+│ ├── MODEL_TYPES.md
+│ └── BEST_PRACTICES.md
+├── scripts/ # Helper scripts
+│ └── validate_model.py
+└── assets/ # Templates, configs
+```
+
+### Progressive Disclosure
+
+Skills support three stages:
+
+1. **Discovery** (Stage 1): Minimal metadata shown in discovery prompt
+2. **Activation** (Stage 2): Full instructions loaded on demand
+3. **Execution** (Stage 3): Scripts and references loaded as needed
+
+## Example Interactions
+
+### Discover Available Skills
+
+```
+User: What ML skills do you have?
+Agent: [Reviews discovery prompt and lists available skills]
+```
+
+### Activate a Skill
+
+```
+User: Tell me about BQML
+Agent: [Activates bqml skill, provides detailed instructions]
+```
+
+### Load Reference Documentation
+
+```
+User: What model types are available?
+Agent: [Loads MODEL_TYPES.md reference, explains options]
+```
+
+### Run a Script
+
+```
+User: Validate my model configuration
+Agent: [Runs validate_model.py script]
+```
+
+## Code Walkthrough
+
+```python
+from google.adk.skills import AgentSkillLoader, SkillTool
+
+# Load skills from directory
+loader = AgentSkillLoader()
+loader.add_skill_directory("./skills")
+
+# Create tools for agent
+skill_tools = [SkillTool(skill) for skill in loader.get_all_skills()]
+
+# Generate system prompt
+discovery_prompt = loader.generate_discovery_prompt()
+
+# Create agent with skill tools
+agent = LlmAgent(
+ model="gemini-2.0-flash",
+ instruction=f"Available skills:\n{discovery_prompt}",
+ tools=skill_tools,
+)
+```
+
+## Customization
+
+### Adding Custom Skills
+
+1. Create a skill directory following the Agent Skills standard
+2. Add `SKILL.md` with YAML frontmatter
+3. Add references, scripts, and assets as needed
+4. Update the `SKILLS_DIR` path in `agent.py`
+
+### Skill Configuration
+
+Skills can include ADK-specific configuration in their frontmatter:
+
+```yaml
+adk:
+ config:
+ timeout_seconds: 300
+ max_parallel_calls: 5
+ allowed_callers:
+ - my_agent
+```
+
+## Learn More
+
+- [Agent Skills Standard](https://agentskills.io)
+- [ADK Skills Documentation](../../docs/skills_programmatic_tool_calling_design.md)
diff --git a/contributing/samples/agent_skills_demo/__init__.py b/contributing/samples/agent_skills_demo/__init__.py
new file mode 100644
index 0000000000..c8d8d2b9f0
--- /dev/null
+++ b/contributing/samples/agent_skills_demo/__init__.py
@@ -0,0 +1,17 @@
+# Copyright 2025 Google LLC
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Agent Skills Demo - demonstrates Agent Skills standard integration."""
+
+from . import agent
diff --git a/contributing/samples/agent_skills_demo/agent.py b/contributing/samples/agent_skills_demo/agent.py
new file mode 100644
index 0000000000..c7c3f27a9d
--- /dev/null
+++ b/contributing/samples/agent_skills_demo/agent.py
@@ -0,0 +1,163 @@
+# Copyright 2025 Google LLC
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Agent Skills Demo - demonstrates Agent Skills standard integration.
+
+This demo shows how to:
+1. Load skills from directories using AgentSkillLoader
+2. Convert skills to ADK tools using SkillTool
+3. Generate discovery prompts for the LLM
+4. Use skills with progressive disclosure
+5. Integrate with BigQuery toolset for data operations
+
+Run with: adk web contributing/samples
+Then select 'agent_skills_demo' from the list.
+"""
+
+from __future__ import annotations
+
+import os
+from pathlib import Path
+
+import google.auth
+
+from google.adk.agents.llm_agent import LlmAgent
+from google.adk.skills import AgentSkillLoader
+from google.adk.skills import SkillTool
+from google.adk.tools.bigquery.bigquery_credentials import BigQueryCredentialsConfig
+from google.adk.tools.bigquery.bigquery_toolset import BigQueryToolset
+from google.adk.tools.bigquery.config import BigQueryToolConfig
+from google.adk.tools.bigquery.config import WriteMode
+
+# Default Vertex AI settings
+os.environ.setdefault("GOOGLE_GENAI_USE_VERTEXAI", "true")
+os.environ.setdefault("GOOGLE_CLOUD_PROJECT", os.getenv("VERTEXAI_PROJECT", ""))
+os.environ.setdefault("GOOGLE_CLOUD_LOCATION", os.getenv("VERTEXAI_LOCATION", "us-central1"))
+
+# Path to skills directory (now in main skills module)
+SKILLS_DIR = Path(__file__).parent.parent.parent.parent / "src" / "google" / "adk" / "skills"
+
+# Initialize the skill loader
+skill_loader = AgentSkillLoader(validate_names=False)
+
+# Load skills from the BigQuery skills directory
+if SKILLS_DIR.exists():
+ skill_count = skill_loader.add_skill_directory(SKILLS_DIR)
+ print(f"Loaded {skill_count} skills from {SKILLS_DIR}")
+else:
+ print(f"Warning: Skills directory not found: {SKILLS_DIR}")
+
+# Create skill tools for the agent
+skill_tools = [SkillTool(skill) for skill in skill_loader.get_all_skills()]
+
+# Generate discovery prompt for the system instruction
+discovery_prompt = skill_loader.generate_discovery_prompt(include_resources=True)
+
+# Setup BigQuery toolset with Application Default Credentials
+try:
+ application_default_credentials, _ = google.auth.default()
+ credentials_config = BigQueryCredentialsConfig(
+ credentials=application_default_credentials
+ )
+
+ # Configure BigQuery tools - allow writes for ML model creation
+ tool_config = BigQueryToolConfig(
+ write_mode=WriteMode.ALLOWED,
+ application_name="agent_skills_demo"
+ )
+
+ bigquery_toolset = BigQueryToolset(
+ credentials_config=credentials_config,
+ bigquery_tool_config=tool_config
+ )
+ print("BigQuery toolset initialized successfully")
+except Exception as e:
+ print(f"Warning: Could not initialize BigQuery toolset: {e}")
+ bigquery_toolset = None
+
+# Build the system instruction
+SYSTEM_INSTRUCTION = f"""You are a data science assistant with access to BigQuery AI and ML skills.
+
+## Available Skills
+
+{discovery_prompt}
+
+## Skill Overview
+
+- **bigquery-ai**: Generative AI operations - text generation with LLMs (Gemini, Claude), embeddings, vector search, and RAG workflows
+- **bqml**: Traditional ML - classification, regression, clustering, time series forecasting, recommendations
+
+## How to Use Skills
+
+1. **Discover**: Review the available skills above to find the right one for your task
+2. **Activate**: Use the skill tool with action="activate" to get detailed instructions
+3. **Load References**: Use action="load_reference" to load detailed documentation for specific topics
+4. **Run Scripts**: Use action="run_script" to execute helper scripts for setup and validation
+
+## BigQuery Tools
+
+You also have direct access to BigQuery tools for:
+- Executing SQL queries (including CREATE MODEL, ML functions)
+- Exploring datasets and tables
+- Getting table schemas and metadata
+
+## Guidelines
+
+- For generative AI (text generation, embeddings, semantic search, RAG): Use **bigquery-ai** skill
+- For predictive ML (classification, regression, forecasting): Use **bqml** skill
+- Always activate a skill before using its detailed features
+- Load specific reference docs when you need in-depth information
+- Use BigQuery tools to run the actual SQL queries
+
+## Example Workflows
+
+**Generative AI Example:**
+User: "How do I build a RAG system in BigQuery?"
+1. Activate bigquery-ai skill
+2. Load RAG_WORKFLOW.md reference
+3. Use BigQuery tools to create models and run queries
+
+**Traditional ML Example:**
+User: "How do I train a churn prediction model?"
+1. Activate bqml skill
+2. Load MODEL_TYPES.md reference for classifier options
+3. Use BigQuery tools to create and evaluate the model
+"""
+
+# Combine all tools
+all_tools = skill_tools.copy()
+if bigquery_toolset:
+ all_tools.append(bigquery_toolset)
+
+# Create the agent with Gemini 2.5 Pro
+root_agent = LlmAgent(
+ model="gemini-2.5-pro",
+ name="agent_skills_demo",
+ description="A demo agent showcasing Agent Skills standard integration with ADK and BigQuery tools.",
+ instruction=SYSTEM_INSTRUCTION,
+ tools=all_tools,
+)
+
+# Print summary on load
+if __name__ == "__main__" or True:
+ print("\n" + "=" * 60)
+ print("Agent Skills Demo")
+ print("=" * 60)
+ print(f"\nModel: gemini-2.5-pro")
+ print(f"Loaded skills: {skill_loader.get_skill_names()}")
+ print(f"Skill tools: {[t.name for t in skill_tools]}")
+ print(f"BigQuery toolset: {'enabled' if bigquery_toolset else 'disabled'}")
+ print("\nRun with: adk web contributing/samples")
+ print("Then select 'agent_skills_demo' from the list")
+ print("=" * 60 + "\n")
diff --git a/src/google/adk/skills/__init__.py b/src/google/adk/skills/__init__.py
new file mode 100644
index 0000000000..140a298ec2
--- /dev/null
+++ b/src/google/adk/skills/__init__.py
@@ -0,0 +1,103 @@
+# Copyright 2025 Google LLC
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Skills module for ADK.
+
+Skills bundle related tools with orchestration logic for efficient
+programmatic tool calling (PTC). This module provides:
+
+Core abstractions:
+- BaseSkill: Abstract base class for all skills
+- SkillConfig: Configuration for skill execution
+- SkillsManager: Skill registration and execution management
+
+Agent Skills standard support (https://agentskills.io):
+- MarkdownSkill: Load skills from SKILL.md files
+- MarkdownSkillMetadata: Parsed frontmatter metadata
+- AgentSkillLoader: Discover and load skills from directories
+
+Execution:
+- ScriptExecutor: Execute bundled scripts safely
+- ScriptExecutionResult: Result from script execution
+
+Tool integration:
+- SkillTool: Wrap skills as ADK tools for LLM invocation
+- create_skill_tools: Convenience function to create tool wrappers
+
+Built-in skills are bundled in this module:
+- bqml: BigQuery ML skill for training and inference
+- bq-ai-operator: BigQuery AI operations (text generation, embeddings)
+
+Example usage:
+ ```python
+ from google.adk.skills import AgentSkillLoader, SkillTool, SkillsManager
+
+ # Load Agent Skills standard skills
+ loader = AgentSkillLoader()
+ loader.add_skill_directory("./skills")
+
+ # Register with manager
+ manager = SkillsManager()
+ loader.register_all(manager)
+
+ # Create tools for LLM
+ skill_tools = [SkillTool(s) for s in manager.get_all_skills()]
+
+ # Generate discovery prompt
+ prompt = loader.generate_discovery_prompt()
+ ```
+"""
+
+# Skills directory path for loading bundled skills
+from pathlib import Path
+
+from .agent_skill_loader import AgentSkillLoader
+# Core abstractions
+from .base_skill import BaseSkill
+from .base_skill import SkillConfig
+# Agent Skills standard support
+from .markdown_skill import MarkdownSkill
+from .markdown_skill import MarkdownSkillMetadata
+# Execution
+from .script_executor import ScriptExecutionError
+from .script_executor import ScriptExecutionResult
+from .script_executor import ScriptExecutor
+from .skill_manager import SkillInvocationResult
+from .skill_manager import SkillsManager
+# Tool integration
+from .skill_tool import create_skill_tools
+from .skill_tool import SkillTool
+
+SKILLS_DIR = Path(__file__).parent
+
+__all__ = [
+ # Skills directory
+ "SKILLS_DIR",
+ # Core abstractions
+ "BaseSkill",
+ "SkillConfig",
+ "SkillInvocationResult",
+ "SkillsManager",
+ # Agent Skills standard support
+ "MarkdownSkill",
+ "MarkdownSkillMetadata",
+ "AgentSkillLoader",
+ # Execution
+ "ScriptExecutor",
+ "ScriptExecutionResult",
+ "ScriptExecutionError",
+ # Tool integration
+ "SkillTool",
+ "create_skill_tools",
+]
diff --git a/src/google/adk/skills/adk_skills.md b/src/google/adk/skills/adk_skills.md
new file mode 100644
index 0000000000..6a33301b18
--- /dev/null
+++ b/src/google/adk/skills/adk_skills.md
@@ -0,0 +1,634 @@
+# ADK Skills - Agent Skills Standard Integration
+
+A comprehensive guide to the ADK Skills module, which implements support for the [Agent Skills standard](https://agentskills.io) - an open format for extending AI agent capabilities.
+
+## Table of Contents
+
+- [Overview](#overview)
+- [Agent Skills Standard](#agent-skills-standard)
+- [Directory Structure](#directory-structure)
+- [SKILL.md Specification](#skillmd-specification)
+- [Progressive Disclosure Architecture](#progressive-disclosure-architecture)
+- [ADK Implementation](#adk-implementation)
+- [Usage Examples](#usage-examples)
+- [Security Considerations](#security-considerations)
+- [Best Practices](#best-practices)
+- [Resources](#resources)
+
+---
+
+## Overview
+
+**Agent Skills** are organized folders of instructions, scripts, and resources that agents can discover and load dynamically to perform better at specific tasks. The ADK Skills module provides full support for this open standard.
+
+### Key Benefits
+
+| Stakeholder | Benefit |
+|-------------|---------|
+| **Skill Authors** | Build capabilities once, deploy across multiple agent products |
+| **ADK Users** | Give agents new capabilities using standard SKILL.md format |
+| **Teams & Enterprises** | Capture organizational knowledge in portable, version-controlled packages |
+
+### Module Components
+
+| Component | Description |
+|-----------|-------------|
+| `MarkdownSkill` | Load skills from SKILL.md files with YAML frontmatter |
+| `AgentSkillLoader` | Discover and load skills from directories |
+| `ScriptExecutor` | Safe sandboxed execution of bundled scripts |
+| `SkillTool` | Wrap skills as ADK tools for LLM invocation |
+| `SkillsManager` | Unified registry for skill management |
+
+---
+
+## Agent Skills Standard
+
+### What Is a Skill?
+
+At its core, a skill is a **folder containing a `SKILL.md` file** that provides:
+
+- **Metadata**: `name` and `description` (minimum required)
+- **Instructions**: Markdown documentation on how to perform a task
+- **Optional resources**: scripts, templates, and reference materials
+
+### Capabilities Enabled
+
+1. **Domain Expertise** - Package specialized knowledge into reusable instructions
+2. **New Capabilities** - Enable agents to create presentations, build MCP servers, analyze datasets
+3. **Repeatable Workflows** - Turn multi-step tasks into consistent, auditable workflows
+4. **Interoperability** - Reuse the same skill across different skills-compatible agent products
+
+### Code Integration
+
+Skills can include pre-written Python scripts and other code that agents execute deterministically. This approach proves more efficient than token-based generation for operations like:
+- Sorting lists
+- Extracting PDF form fields
+- Data transformation
+- File manipulation
+
+---
+
+## Directory Structure
+
+### Minimal Structure
+
+```
+skill-name/
+└── SKILL.md # Required
+```
+
+### Full Structure with Optional Directories
+
+```
+skill-name/
+├── SKILL.md # Required: instructions + metadata
+├── scripts/ # Optional: executable code
+│ ├── extract.py
+│ └── transform.sh
+├── references/ # Optional: additional documentation
+│ ├── REFERENCE.md
+│ ├── FORMS.md
+│ └── domain-specific.md
+└── assets/ # Optional: static resources
+ ├── templates/
+ ├── images/
+ └── data/
+```
+
+### Directory Descriptions
+
+| Directory | Purpose |
+|-----------|---------|
+| `scripts/` | Executable code agents can run. Should be self-contained, include helpful error messages, and handle edge cases gracefully. Supported languages: Python, Bash, JavaScript |
+| `references/` | Additional documentation loaded on demand. Keep individual files focused for efficient context use |
+| `assets/` | Static resources: templates, images, diagrams, data files, lookup tables, schemas |
+
+---
+
+## SKILL.md Specification
+
+Every skill starts with **YAML frontmatter** followed by **Markdown content**.
+
+### Basic Format
+
+```markdown
+---
+name: skill-name
+description: A description of what this skill does and when to use it.
+---
+
+# Skill Title
+
+## When to use this skill
+Use this skill when the user needs to...
+
+## How to perform the task
+1. Step one...
+2. Step two...
+
+## Examples
+...
+```
+
+### Frontmatter Fields
+
+| Field | Required | Constraints | Description |
+|-------|----------|-------------|-------------|
+| `name` | **Yes** | Max 64 characters. Lowercase letters, numbers, and hyphens only. | Short identifier for the skill |
+| `description` | **Yes** | Max 1024 characters. Non-empty. | Describes what the skill does and when to use it (used for discovery) |
+| `license` | No | - | License name or reference to bundled license file |
+| `compatibility` | No | Max 500 characters | Environment requirements (product, system packages, network access, etc.) |
+| `metadata` | No | Arbitrary key-value mapping | Additional metadata (author, version, etc.) |
+
+### ADK-Specific Extensions
+
+ADK extends the standard frontmatter with additional fields:
+
+```yaml
+---
+name: advanced-pdf-processing
+description: Advanced PDF processing with OCR and form filling capabilities.
+license: Apache-2.0
+compatibility: Requires Python 3.8+, Tesseract OCR
+
+# Standard metadata
+metadata:
+ author: google-adk
+ version: "2.0"
+ category: documents
+
+# ADK-specific extensions
+adk:
+ # Execution configuration
+ config:
+ max_parallel_calls: 5
+ timeout_seconds: 120
+ allow_network: true
+ memory_limit_mb: 512
+
+ # Tool declarations for scripts
+ tools:
+ - name: extract_text
+ script: scripts/extract_text.py
+ description: Extract text from PDF pages
+ parameters:
+ input_file: Path to PDF file
+ pages: Optional page range (e.g., "1-5")
+---
+```
+
+### Description Field Best Practices
+
+The description should describe both **what the skill does** and **when to use it**.
+
+**Good example:**
+```yaml
+description: Extracts text and tables from PDF files, fills PDF forms, and merges multiple PDFs. Use when working with PDF documents or when the user mentions PDFs, forms, or document extraction.
+```
+
+**Poor example:**
+```yaml
+description: PDF processing # Too vague, no usage context
+```
+
+---
+
+## Progressive Disclosure Architecture
+
+Skills use a **context-efficient, three-stage approach** to information loading:
+
+### Stage 1: Discovery (Startup)
+
+- Agents load only the `name` and `description` of available skills
+- Minimal context overhead (~100 tokens per skill)
+- Enables agents to identify relevant skills without full loading
+
+### Stage 2: Activation (Task Matching)
+
+- When a task matches a skill's description, the agent reads the full `SKILL.md`
+- Complete instructions are loaded into context
+- Recommended: Keep under 5000 tokens for the body
+
+### Stage 3: Execution (Implementation)
+
+- Agent follows instructions
+- Optionally loads referenced files from `scripts/`, `references/`, `assets/`
+- Resources loaded only when required
+
+**Benefit**: The amount of context that can be bundled into a skill is effectively unbounded since agents with filesystem access don't require everything in their context window simultaneously.
+
+### Progressive Disclosure Flow
+
+```
+┌─────────────────────────────────────────────────────────────────────────────┐
+│ Progressive Disclosure in ADK │
+├─────────────────────────────────────────────────────────────────────────────┤
+│ │
+│ STAGE 1: Discovery (~100 tokens per skill) │
+│ ┌───────────────────────────────────────────────────────────────────────┐ │
+│ │ │ │
+│ │ │ │
+│ │ pdf-processing │ │
+│ │ Extract text, fill forms, merge PDFs │ │
+│ │ │ │
+│ │ │ │
+│ └───────────────────────────────────────────────────────────────────────┘ │
+│ │ │
+│ User: "I need to extract text from a PDF" │
+│ │ │
+│ ▼ │
+│ STAGE 2: Activation (~2000-5000 tokens) │
+│ ┌───────────────────────────────────────────────────────────────────────┐ │
+│ │ LLM calls: skill_pdf_processing(action="activate") │ │
+│ │ │ │
+│ │ Returns full SKILL.md instructions: │ │
+│ │ - When to use this skill │ │
+│ │ - Prerequisites │ │
+│ │ - Step-by-step instructions │ │
+│ │ - Available scripts: [extract_text.py, merge_pdfs.py] │ │
+│ │ - Available references: [FORMS.md] │ │
+│ └───────────────────────────────────────────────────────────────────────┘ │
+│ │ │
+│ LLM reads instructions, decides to run script │
+│ │ │
+│ ▼ │
+│ STAGE 3: Execution (on-demand resources) │
+│ ┌───────────────────────────────────────────────────────────────────────┐ │
+│ │ LLM calls: skill_pdf_processing( │ │
+│ │ action="run_script", │ │
+│ │ script="extract_text.py", │ │
+│ │ args=["--input", "document.pdf"] │ │
+│ │ ) │ │
+│ │ │ │
+│ │ ScriptExecutor runs extract_text.py in sandbox │ │
+│ │ Returns: {"stdout": "Extracted text...", "success": true} │ │
+│ └───────────────────────────────────────────────────────────────────────┘ │
+│ │
+└─────────────────────────────────────────────────────────────────────────────┘
+```
+
+---
+
+## ADK Implementation
+
+### Architecture Overview
+
+```
+┌─────────────────────────────────────────────────────────────────────────────┐
+│ ADK Skills Architecture │
+├─────────────────────────────────────────────────────────────────────────────┤
+│ │
+│ ┌─────────────────────────────────────────────────────────────────────┐ │
+│ │ Skill Sources │ │
+│ │ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ │
+│ │ │ SKILL.md Files │ │ BaseSkill │ │ Remote Skills │ │ │
+│ │ │ (Agent Skills │ │ Classes │ │ (Future) │ │ │
+│ │ │ Standard) │ │ (Python) │ │ │ │ │
+│ │ └────────┬────────┘ └────────┬────────┘ └────────┬────────┘ │ │
+│ └───────────┼──────────────────────┼──────────────────────┼───────────┘ │
+│ │ │ │ │
+│ ▼ ▼ ▼ │
+│ ┌─────────────────────────────────────────────────────────────────────┐ │
+│ │ AgentSkillLoader │ │
+│ │ • Discovers skills from directories │ │
+│ │ • Parses SKILL.md frontmatter and content │ │
+│ │ • Creates unified MarkdownSkill instances │ │
+│ │ • Manages progressive disclosure stages │ │
+│ └────────────────────────────────┬────────────────────────────────────┘ │
+│ │ │
+│ ▼ │
+│ ┌─────────────────────────────────────────────────────────────────────┐ │
+│ │ SkillsManager │ │
+│ │ • Unified registry for all skill types │ │
+│ │ • Skill discovery and lookup │ │
+│ │ • Execution coordination │ │
+│ └────────────────────────────────┬────────────────────────────────────┘ │
+│ │ │
+│ ┌────────────────────┼────────────────────┐ │
+│ ▼ ▼ ▼ │
+│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────────────┐ │
+│ │ SkillTool │ │ ScriptExecutor │ │ ProgrammaticTool │ │
+│ │ (LLM-facing) │ │ (scripts/) │ │ Executor (PTC) │ │
+│ └─────────────────┘ └─────────────────┘ └─────────────────────────┘ │
+│ │
+└─────────────────────────────────────────────────────────────────────────────┘
+```
+
+### Core Components
+
+#### MarkdownSkill
+
+Loads skills from SKILL.md files with progressive disclosure support:
+
+```python
+from google.adk.skills import MarkdownSkill
+
+# Load a skill (Stage 1 - metadata only)
+skill = MarkdownSkill.from_directory("/path/to/pdf-processing")
+
+# Access metadata
+print(skill.name) # "pdf-processing"
+print(skill.description) # "Extract text from PDFs..."
+
+# Stage 2: Full instructions
+instructions = skill.get_instructions()
+
+# Stage 3: Access scripts/references
+script = skill.get_script("extract_text.py")
+reference = skill.get_reference("FORMS.md")
+```
+
+#### AgentSkillLoader
+
+Discovers and loads skills from directories:
+
+```python
+from google.adk.skills import AgentSkillLoader
+
+loader = AgentSkillLoader()
+
+# Discover skills from directories
+loader.add_skill_directory("/path/to/skills")
+loader.add_skill_directory("/path/to/custom-skills")
+
+# Get discovered skills
+skills = loader.get_all_skills()
+print(loader.get_skill_names()) # ['pdf-processing', 'data-analysis', ...]
+
+# Generate discovery prompt for LLM
+prompt = loader.generate_discovery_prompt()
+```
+
+#### ScriptExecutor
+
+Safely executes bundled scripts with sandboxing:
+
+```python
+from google.adk.skills import ScriptExecutor
+
+executor = ScriptExecutor(
+ timeout_seconds=30.0,
+ allow_network=False,
+)
+
+result = await executor.execute_script(
+ script_path=Path("/path/to/skill/scripts/extract.py"),
+ args=["--input", "file.pdf"],
+ working_dir=Path("/tmp/workspace"),
+)
+
+if result.success:
+ print(result.stdout)
+else:
+ print(f"Error: {result.stderr}")
+```
+
+#### SkillTool
+
+Wraps skills as ADK tools for LLM invocation:
+
+```python
+from google.adk.skills import SkillTool, MarkdownSkill
+
+skill = MarkdownSkill.from_directory("/path/to/pdf-processing")
+tool = SkillTool(skill)
+
+# LLM can invoke with these actions:
+# - {"action": "activate"} → Returns full instructions
+# - {"action": "run_script", "script": "extract.py", "args": [...]}
+# - {"action": "load_reference", "reference": "FORMS.md"}
+```
+
+### Module Exports
+
+```python
+from google.adk.skills import (
+ # Core abstractions
+ BaseSkill,
+ SkillConfig,
+ SkillsManager,
+ SkillInvocationResult,
+ # Agent Skills standard support
+ MarkdownSkill,
+ MarkdownSkillMetadata,
+ AgentSkillLoader,
+ # Execution
+ ScriptExecutor,
+ ScriptExecutionResult,
+ ScriptExecutionError,
+ # Tool integration
+ SkillTool,
+ create_skill_tools,
+ # Path to bundled skills
+ SKILLS_DIR,
+)
+```
+
+---
+
+## Usage Examples
+
+### Basic Usage with LlmAgent
+
+```python
+from google.adk.agents import LlmAgent
+from google.adk.skills import AgentSkillLoader, SkillTool, SkillsManager
+
+# Load Agent Skills standard skills
+loader = AgentSkillLoader()
+loader.add_skill_directory("./skills")
+
+# Create skills manager and register
+skills_manager = SkillsManager()
+loader.register_all(skills_manager)
+
+# Convert skills to tools for LLM
+skill_tools = [
+ SkillTool(skill) for skill in skills_manager.get_all_skills()
+]
+
+# Create agent with skills
+agent = LlmAgent(
+ name="skilled_agent",
+ model="gemini-2.0-flash",
+ instruction=f"""You are a helpful assistant with access to skills.
+
+{loader.generate_discovery_prompt()}
+
+To use a skill:
+1. First activate it to get full instructions
+2. Then use run_script or load_reference as needed
+""",
+ tools=skill_tools,
+)
+```
+
+### Using Built-in Skills
+
+```python
+from google.adk.skills import AgentSkillLoader, SkillTool, SKILLS_DIR
+
+# Load built-in skills (bqml, bq-ai-operator)
+loader = AgentSkillLoader()
+loader.add_skill_directory(SKILLS_DIR)
+
+print(f"Loaded skills: {loader.get_skill_names()}")
+# Output: ['bq-ai-operator', 'bqml']
+
+# Create tools
+skill_tools = [SkillTool(s) for s in loader.get_all_skills()]
+```
+
+### Using Anthropic Skills
+
+```python
+# Clone the Anthropic skills repository first:
+# git clone https://github.com/anthropics/skills ./anthropic-skills
+
+from google.adk.skills import AgentSkillLoader, SkillTool
+
+# Load skills from Anthropic's repository
+loader = AgentSkillLoader()
+loader.add_skill_directory("./anthropic-skills/skills")
+
+# Check what was loaded
+print(f"Discovered {len(loader.get_skill_names())} skills:")
+for name in loader.get_skill_names():
+ skill = loader.get_skill(name)
+ print(f" - {name}: {skill.description[:50]}...")
+```
+
+### Complete Demo Agent
+
+See `contributing/samples/agent_skills_demo/` for a complete example that:
+- Loads built-in BQML and BQ AI Operator skills
+- Integrates with BigQuery toolset
+- Demonstrates progressive disclosure in action
+
+---
+
+## Security Considerations
+
+Script execution introduces security risks. The ADK Skills module implements multiple safeguards:
+
+### ScriptExecutor Security Features
+
+| Feature | Description |
+|---------|-------------|
+| **Execution Timeout** | Configurable timeout prevents runaway scripts |
+| **Working Directory Isolation** | Scripts run in specified directories |
+| **Environment Filtering** | Only allowed environment variables passed through |
+| **Container Sandboxing** | Optional Docker isolation for untrusted scripts |
+| **Memory Limits** | Configurable memory limits for containerized execution |
+| **Network Isolation** | Optional network access restriction |
+
+### Best Practices
+
+| Measure | Description |
+|---------|-------------|
+| **Sandboxing** | Use `use_container=True` for untrusted skills |
+| **Allowlisting** | Only execute scripts from trusted skills |
+| **Confirmation** | Ask users before running potentially dangerous operations |
+| **Logging** | Record all script executions for auditing |
+| **Source Verification** | Install skills only from trusted sources |
+| **Audit** | Review bundled files, dependencies, and external network connections |
+
+---
+
+## Best Practices
+
+### Writing Effective Skills
+
+| Guideline | Description |
+|-----------|-------------|
+| **Clear Descriptions** | Write descriptions that help agents determine when to activate the skill |
+| **Concise Instructions** | Keep the main SKILL.md focused; use references for detailed content |
+| **Concrete Examples** | Include input/output examples to demonstrate expected behavior |
+| **Handle Edge Cases** | Document known limitations and workarounds |
+| **Self-Contained Scripts** | Scripts should document dependencies and include helpful error messages |
+| **Logical Organization** | Group related information and use clear section headers |
+
+### Context Efficiency
+
+- Load metadata at startup (~100 tokens per skill)
+- Keep main instructions under 5000 tokens
+- Split large reference materials into separate files
+- Use progressive disclosure - load details only when needed
+
+### SKILL.md Template
+
+```markdown
+---
+name: my-skill-name
+description: Clear description of what this skill does and when to use it.
+license: Apache-2.0
+compatibility: List any environment requirements
+metadata:
+ author: your-name
+ version: "1.0"
+---
+
+# Skill Title
+
+## When to use this skill
+Describe the situations where this skill should be activated.
+
+## Prerequisites
+List any required tools, packages, or access.
+
+## Instructions
+Step-by-step guide for performing the task.
+
+## Examples
+Concrete examples with inputs and outputs.
+
+## Common Edge Cases
+Known limitations and how to handle them.
+
+## Additional Resources
+- [Reference Guide](references/REFERENCE.md)
+- [Scripts](scripts/)
+```
+
+---
+
+## Resources
+
+### Official Agent Skills Documentation
+
+- **Agent Skills Website**: https://agentskills.io
+- **Specification**: https://agentskills.io/specification
+- **Integration Guide**: https://agentskills.io/integrate-skills
+
+### GitHub Repositories
+
+- **Agent Skills Framework**: https://github.com/agentskills/agentskills
+- **Example Skills (Anthropic)**: https://github.com/anthropics/skills
+- **Google ADK**: https://github.com/google/adk-python
+
+### ADK Skills Module Files
+
+```
+src/google/adk/skills/
+├── __init__.py # Module exports
+├── adk_skills.md # This documentation
+├── base_skill.py # BaseSkill abstract class
+├── skill_manager.py # SkillsManager registry
+├── markdown_skill.py # SKILL.md file loader
+├── agent_skill_loader.py # Discovery and loading
+├── script_executor.py # Script execution
+├── skill_tool.py # Tool wrapper
+├── bqml/ # Built-in BQML skill
+│ ├── SKILL.md
+│ ├── scripts/
+│ └── references/
+└── bq-ai-operator/ # Built-in BQ AI Operator skill
+ ├── SKILL.md
+ ├── scripts/
+ └── references/
+```
+
+### Related Documentation
+
+- **Anthropic Engineering Blog**: https://anthropic.com/engineering/equipping-agents-for-the-real-world-with-agent-skills
+- **Claude Support - Skills**: https://support.claude.com/en/articles/12512176-what-are-skills
diff --git a/src/google/adk/skills/agent_skill_loader.py b/src/google/adk/skills/agent_skill_loader.py
new file mode 100644
index 0000000000..0c0c6de603
--- /dev/null
+++ b/src/google/adk/skills/agent_skill_loader.py
@@ -0,0 +1,412 @@
+# Copyright 2025 Google LLC
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""AgentSkillLoader for discovering and loading Agent Skills standard skills.
+
+This module provides utilities for discovering skills from directories
+following the Agent Skills standard (https://agentskills.io).
+"""
+
+from __future__ import annotations
+
+import logging
+from pathlib import Path
+from typing import Dict
+from typing import List
+from typing import Optional
+from typing import TYPE_CHECKING
+from typing import Union
+
+from ..utils.feature_decorator import experimental
+from .markdown_skill import MarkdownSkill
+
+if TYPE_CHECKING:
+ from .skill_manager import SkillsManager
+
+logger = logging.getLogger("google_adk." + __name__)
+
+
+@experimental
+class AgentSkillLoader:
+ """Discovers and loads Agent Skills standard skills.
+
+ Implements progressive disclosure by loading only metadata initially,
+ with full content loaded on-demand when skills are activated.
+
+ This loader scans directories for folders containing SKILL.md files
+ and creates MarkdownSkill instances that can be registered with
+ a SkillsManager.
+
+ Example:
+ ```python
+ loader = AgentSkillLoader()
+
+ # Discover skills from multiple directories
+ loader.add_skill_directory("/path/to/skills")
+ loader.add_skill_directory("/path/to/custom-skills")
+
+ # Get all discovered skills (Stage 1 - metadata only)
+ skills = loader.get_all_skills()
+
+ # Register with SkillsManager
+ from google.adk.skills import SkillsManager
+ manager = SkillsManager()
+ loader.register_all(manager)
+
+ # Generate discovery prompt for LLM
+ prompt = loader.generate_discovery_prompt()
+ ```
+ """
+
+ def __init__(self, validate_names: bool = True):
+ """Initialize the AgentSkillLoader.
+
+ Args:
+ validate_names: Whether to validate that skill names match
+ their directory names (recommended for consistency).
+ """
+ self._skill_directories: List[Path] = []
+ self._discovered_skills: Dict[str, MarkdownSkill] = {}
+ self._load_errors: Dict[str, str] = {}
+ self._validate_names = validate_names
+
+ def add_skill_directory(self, path: Union[str, Path]) -> int:
+ """Add a directory to scan for skills.
+
+ Scans the directory for subdirectories containing SKILL.md files
+ and loads them as MarkdownSkill instances (Stage 1 only).
+
+ Args:
+ path: Directory containing skill folders.
+
+ Returns:
+ Number of skills discovered in this directory.
+
+ Raises:
+ FileNotFoundError: If directory doesn't exist.
+ ValueError: If path is not a directory.
+ """
+ dir_path = Path(path).resolve()
+
+ if not dir_path.exists():
+ raise FileNotFoundError(f"Skill directory not found: {path}")
+
+ if not dir_path.is_dir():
+ raise ValueError(f"Path is not a directory: {path}")
+
+ self._skill_directories.append(dir_path)
+ return self._discover_skills_in_directory(dir_path)
+
+ def add_skill(self, skill_dir: Union[str, Path]) -> bool:
+ """Add a single skill from a directory.
+
+ Args:
+ skill_dir: Path to a single skill directory containing SKILL.md.
+
+ Returns:
+ True if skill was loaded successfully, False otherwise.
+ """
+ skill_path = Path(skill_dir).resolve()
+
+ if not skill_path.is_dir():
+ self._load_errors[str(skill_path)] = "Not a directory"
+ return False
+
+ skill_md = skill_path / "SKILL.md"
+ if not skill_md.exists():
+ self._load_errors[str(skill_path)] = "SKILL.md not found"
+ return False
+
+ try:
+ skill = MarkdownSkill.from_directory(
+ skill_path, validate_name=self._validate_names
+ )
+ self._discovered_skills[skill.name] = skill
+ logger.info("Loaded skill: %s from %s", skill.name, skill_path)
+ return True
+ except Exception as e:
+ self._load_errors[str(skill_path)] = str(e)
+ logger.warning("Failed to load skill from %s: %s", skill_path, e)
+ return False
+
+ def _discover_skills_in_directory(self, dir_path: Path) -> int:
+ """Discover all skills in a directory.
+
+ Args:
+ dir_path: Directory to scan.
+
+ Returns:
+ Number of skills discovered.
+ """
+ count = 0
+
+ for item in sorted(dir_path.iterdir()):
+ if not item.is_dir():
+ continue
+
+ # Skip hidden directories
+ if item.name.startswith("."):
+ continue
+
+ skill_md = item / "SKILL.md"
+ if not skill_md.exists():
+ continue
+
+ try:
+ skill = MarkdownSkill.from_directory(
+ item, validate_name=self._validate_names
+ )
+
+ # Check for duplicate names
+ if skill.name in self._discovered_skills:
+ existing_path = self._discovered_skills[skill.name].skill_path
+ logger.warning(
+ "Duplicate skill name '%s': %s shadows %s",
+ skill.name,
+ item,
+ existing_path,
+ )
+
+ self._discovered_skills[skill.name] = skill
+ count += 1
+ logger.info("Discovered skill: %s", skill.name)
+
+ except Exception as e:
+ self._load_errors[str(item)] = str(e)
+ logger.warning("Failed to load skill from %s: %s", item, e)
+
+ return count
+
+ def get_skill(self, name: str) -> Optional[MarkdownSkill]:
+ """Get a discovered skill by name.
+
+ Args:
+ name: The skill name.
+
+ Returns:
+ MarkdownSkill instance or None if not found.
+ """
+ return self._discovered_skills.get(name)
+
+ def get_all_skills(self) -> List[MarkdownSkill]:
+ """Get all discovered skills.
+
+ Returns:
+ List of all MarkdownSkill instances.
+ """
+ return list(self._discovered_skills.values())
+
+ def get_skill_names(self) -> List[str]:
+ """Get names of all discovered skills.
+
+ Returns:
+ Sorted list of skill names.
+ """
+ return sorted(self._discovered_skills.keys())
+
+ def get_load_errors(self) -> Dict[str, str]:
+ """Get any errors encountered during discovery.
+
+ Returns:
+ Dictionary mapping paths to error messages.
+ """
+ return self._load_errors.copy()
+
+ def has_skill(self, name: str) -> bool:
+ """Check if a skill is loaded.
+
+ Args:
+ name: The skill name.
+
+ Returns:
+ True if skill is loaded.
+ """
+ return name in self._discovered_skills
+
+ def clear(self) -> None:
+ """Clear all discovered skills and errors."""
+ self._discovered_skills.clear()
+ self._load_errors.clear()
+ self._skill_directories.clear()
+
+ def register_all(self, manager: "SkillsManager") -> int:
+ """Register all discovered skills with a SkillsManager.
+
+ Args:
+ manager: The SkillsManager to register skills with.
+
+ Returns:
+ Number of skills successfully registered.
+ """
+ count = 0
+ for skill in self._discovered_skills.values():
+ try:
+ manager.register_skill(skill)
+ count += 1
+ except ValueError as e:
+ logger.warning("Failed to register skill %s: %s", skill.name, e)
+
+ return count
+
+ def generate_discovery_prompt(self, include_resources: bool = True) -> str:
+ """Generate XML prompt with skill metadata for LLM discovery.
+
+ This implements Stage 1 of progressive disclosure - only
+ name and description are included, keeping context minimal
+ (~100 tokens per skill).
+
+ Args:
+ include_resources: Whether to include has_scripts/has_references hints.
+
+ Returns:
+ XML-formatted string with available skills.
+ """
+ if not self._discovered_skills:
+ return ""
+
+ skills_xml = []
+ for skill in sorted(self._discovered_skills.values(), key=lambda s: s.name):
+ lines = [
+ " ",
+ f" {self._escape_xml(skill.name)}",
+ (
+ f" {self._escape_xml(skill.description)}"
+ ),
+ ]
+
+ if include_resources:
+ lines.append(f" {skill.has_scripts()}")
+ lines.append(
+ f" {skill.has_references()}"
+ )
+
+ lines.append(" ")
+ skills_xml.append("\n".join(lines))
+
+ return (
+ "\n" + "\n".join(skills_xml) + "\n"
+ )
+
+ def generate_activation_prompt(self, skill_name: str) -> Optional[str]:
+ """Generate full skill prompt for activation (Stage 2).
+
+ Loads the full SKILL.md instructions and includes information
+ about available resources.
+
+ Args:
+ skill_name: Name of the skill to activate.
+
+ Returns:
+ Full skill instructions or None if skill not found.
+ """
+ skill = self._discovered_skills.get(skill_name)
+ if not skill:
+ return None
+
+ instructions = skill.get_instructions()
+ sections = [f"# Skill: {skill.name}", "", instructions]
+
+ # Add resources section
+ resources = []
+ scripts = skill.list_scripts()
+ if scripts:
+ resources.append(f"**Available scripts:** {', '.join(scripts)}")
+
+ refs = skill.list_references()
+ if refs:
+ resources.append(f"**Available references:** {', '.join(refs)}")
+
+ assets = skill.list_assets()
+ if assets:
+ shown = assets[:10]
+ if len(assets) > 10:
+ shown.append(f"... and {len(assets) - 10} more")
+ resources.append(f"**Available assets:** {', '.join(shown)}")
+
+ if resources:
+ sections.append("")
+ sections.append("## Available Resources")
+ sections.extend(f"- {r}" for r in resources)
+
+ # Add compatibility info
+ if skill.skill_metadata.compatibility:
+ sections.append("")
+ sections.append("## Requirements")
+ sections.append(skill.skill_metadata.compatibility)
+
+ return "\n".join(sections)
+
+ def generate_summary(self) -> str:
+ """Generate a human-readable summary of loaded skills.
+
+ Returns:
+ Formatted summary string.
+ """
+ lines = [
+ f"Agent Skills Loader Summary",
+ f"=" * 40,
+ f"Directories scanned: {len(self._skill_directories)}",
+ f"Skills discovered: {len(self._discovered_skills)}",
+ f"Load errors: {len(self._load_errors)}",
+ "",
+ ]
+
+ if self._discovered_skills:
+ lines.append("Discovered Skills:")
+ for name in sorted(self._discovered_skills.keys()):
+ skill = self._discovered_skills[name]
+ desc = (
+ skill.description[:50] + "..."
+ if len(skill.description) > 50
+ else skill.description
+ )
+ lines.append(f" - {name}: {desc}")
+
+ if self._load_errors:
+ lines.append("")
+ lines.append("Load Errors:")
+ for path, error in self._load_errors.items():
+ lines.append(f" - {path}: {error}")
+
+ return "\n".join(lines)
+
+ @staticmethod
+ def _escape_xml(text: str) -> str:
+ """Escape special XML characters.
+
+ Args:
+ text: Text to escape.
+
+ Returns:
+ XML-safe string.
+ """
+ return (
+ text.replace("&", "&")
+ .replace("<", "<")
+ .replace(">", ">")
+ .replace('"', """)
+ .replace("'", "'")
+ )
+
+ def __len__(self) -> int:
+ """Return number of discovered skills."""
+ return len(self._discovered_skills)
+
+ def __contains__(self, name: str) -> bool:
+ """Check if a skill name is loaded."""
+ return name in self._discovered_skills
+
+ def __iter__(self):
+ """Iterate over discovered skills."""
+ return iter(self._discovered_skills.values())
diff --git a/src/google/adk/skills/base_skill.py b/src/google/adk/skills/base_skill.py
new file mode 100644
index 0000000000..18deee5493
--- /dev/null
+++ b/src/google/adk/skills/base_skill.py
@@ -0,0 +1,188 @@
+# Copyright 2025 Google LLC
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Base skill class for ADK skills.
+
+Skills bundle related tools with orchestration logic for efficient
+programmatic tool calling (PTC).
+"""
+
+from __future__ import annotations
+
+from abc import ABC
+from abc import abstractmethod
+from typing import Any
+from typing import List
+from typing import Optional
+
+from pydantic import BaseModel
+from pydantic import ConfigDict
+from pydantic import Field
+
+from ..utils.feature_decorator import experimental
+
+
+class SkillConfig(BaseModel):
+ """Configuration for skill execution."""
+
+ model_config = ConfigDict(extra="forbid")
+
+ max_parallel_calls: int = Field(
+ default=10,
+ description="Maximum concurrent tool invocations in orchestration code.",
+ )
+ timeout_seconds: float = Field(
+ default=60.0,
+ description="Execution timeout for the skill.",
+ )
+ allow_network: bool = Field(
+ default=False,
+ description="Whether to allow network access in sandbox execution.",
+ )
+ memory_limit_mb: int = Field(
+ default=256,
+ description="Memory limit for execution in megabytes.",
+ )
+
+
+@experimental
+class BaseSkill(ABC, BaseModel):
+ """Abstract base class for all skills.
+
+ Skills bundle related tools with orchestration logic for efficient
+ programmatic tool calling (PTC). A skill provides:
+
+ - A set of related tools that work together
+ - Semantic grouping with descriptions for LLM understanding
+ - Orchestration templates showing how to use the tools
+ - Result filtering to reduce context size
+ - Configuration for execution limits
+
+ Example:
+ ```python
+ class DatabaseSkill(BaseSkill):
+ name = "database"
+ description = "Query and manage database records"
+
+ def get_tool_declarations(self):
+ return [
+ {"name": "query", "description": "Execute SQL query"},
+ {"name": "insert", "description": "Insert records"},
+ ]
+
+ def get_orchestration_template(self):
+ return '''
+ async def db_operation(tools):
+ results = await tools.query(sql="SELECT * FROM users LIMIT 10")
+ return {"users": results, "count": len(results)}
+ '''
+
+ def filter_result(self, result):
+ # Remove sensitive fields
+ for user in result.get("users", []):
+ user.pop("password_hash", None)
+ return result
+ ```
+ """
+
+ model_config = ConfigDict(extra="forbid", arbitrary_types_allowed=True)
+
+ name: str = Field(
+ description="Unique identifier for the skill.",
+ )
+ description: str = Field(
+ description="LLM-readable description of what this skill does.",
+ )
+ config: SkillConfig = Field(
+ default_factory=SkillConfig,
+ description="Execution configuration for this skill.",
+ )
+ allowed_callers: List[str] = Field(
+ default_factory=lambda: ["code_execution_20250825"],
+ description=(
+ "Who can invoke this skill programmatically. "
+ "Set to ['code_execution_20250825'] to enable PTC."
+ ),
+ )
+
+ @abstractmethod
+ def get_orchestration_template(self) -> str:
+ """Return example orchestration code for this skill.
+
+ The template should demonstrate how to use the skill's tools
+ effectively. This helps the LLM understand usage patterns.
+
+ Returns:
+ A Python async function as a string that shows example usage.
+ """
+
+ @abstractmethod
+ def get_tool_declarations(self) -> List[dict[str, Any]]:
+ """Return tool declarations for LLM code generation.
+
+ Each declaration should include at minimum:
+ - name: The tool name
+ - description: What the tool does
+
+ Optionally include:
+ - parameters: Dict of parameter names to descriptions
+
+ Returns:
+ List of tool declaration dictionaries.
+ """
+
+ def filter_result(self, result: Any) -> Any:
+ """Filter results before returning to LLM context.
+
+ Override this method to implement custom filtering logic.
+ This is useful for:
+ - Removing sensitive data (passwords, tokens)
+ - Truncating large responses
+ - Summarizing verbose output
+
+ Args:
+ result: The raw result from skill execution.
+
+ Returns:
+ The filtered result to be added to LLM context.
+ """
+ return result
+
+ def get_skill_prompt(self) -> str:
+ """Generate a prompt describing this skill for the LLM.
+
+ Returns:
+ A formatted string describing the skill and its tools.
+ """
+ tool_list = "\n".join(
+ f" - {t['name']}: {t.get('description', '')}"
+ for t in self.get_tool_declarations()
+ )
+ return (
+ f"Skill: {self.name}\n"
+ f"Description: {self.description}\n\n"
+ f"Available tools:\n{tool_list}\n\n"
+ f"Example orchestration:\n{self.get_orchestration_template()}"
+ )
+
+ def is_programmatically_callable(self) -> bool:
+ """Check if this skill can be called programmatically via PTC.
+
+ Returns:
+ True if the skill is enabled for programmatic tool calling.
+ """
+ return (
+ self.allowed_callers is not None
+ and "code_execution_20250825" in self.allowed_callers
+ )
diff --git a/src/google/adk/skills/bigquery-admin/SKILL.md b/src/google/adk/skills/bigquery-admin/SKILL.md
new file mode 100644
index 0000000000..e0cdab3365
--- /dev/null
+++ b/src/google/adk/skills/bigquery-admin/SKILL.md
@@ -0,0 +1,523 @@
+---
+name: bigquery-admin
+description: Administer BigQuery resources - slot reservations, BI Engine, job management, monitoring, quotas, and cost optimization. Use when managing BigQuery capacity, monitoring performance, or optimizing costs.
+license: Apache-2.0
+compatibility: BigQuery, Cloud Monitoring
+metadata:
+ author: Google Cloud
+ version: "1.0"
+ category: administration
+adk:
+ config:
+ timeout_seconds: 300
+ max_parallel_calls: 5
+ allowed_callers:
+ - bigquery_agent
+ - admin_agent
+ - finops_agent
+---
+
+# BigQuery Admin Skill
+
+Administer BigQuery resources including slot reservations, BI Engine, job management, monitoring, quotas, and cost optimization.
+
+## When to Use This Skill
+
+Use this skill when you need to:
+- Manage slot reservations and capacity
+- Configure BI Engine for acceleration
+- Monitor and manage running jobs
+- Set up quotas and cost controls
+- Analyze query performance and costs
+- Troubleshoot performance issues
+
+## Administration Features
+
+| Feature | Description | Use Case |
+|---------|-------------|----------|
+| **Reservations** | Dedicated compute capacity | Predictable workloads |
+| **BI Engine** | In-memory acceleration | Dashboard queries |
+| **Jobs** | Query execution management | Monitoring, cancellation |
+| **Quotas** | Usage limits | Cost control |
+| **Monitoring** | Performance metrics | Optimization |
+
+## Quick Start
+
+### 1. View Running Jobs
+
+```sql
+SELECT
+ job_id,
+ user_email,
+ state,
+ total_bytes_processed,
+ TIMESTAMP_DIFF(CURRENT_TIMESTAMP(), creation_time, SECOND) AS running_seconds
+FROM `region-us.INFORMATION_SCHEMA.JOBS_BY_PROJECT`
+WHERE state = 'RUNNING'
+ORDER BY creation_time;
+```
+
+### 2. Check Slot Usage
+
+```sql
+SELECT
+ TIMESTAMP_TRUNC(period_start, HOUR) AS hour,
+ AVG(period_slot_ms) / 1000 / 60 AS avg_slot_minutes
+FROM `region-us.INFORMATION_SCHEMA.JOBS_TIMELINE_BY_PROJECT`
+WHERE period_start > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 24 HOUR)
+GROUP BY 1
+ORDER BY 1;
+```
+
+### 3. Create Reservation
+
+```sql
+-- Using BigQuery Reservation API (not SQL)
+-- See Reservations section below
+```
+
+## Slot Reservations
+
+### Pricing Models
+
+| Model | Description | Best For |
+|-------|-------------|----------|
+| **On-demand** | Pay per TB scanned | Variable workloads |
+| **Editions** | Committed slots (Standard/Enterprise/Enterprise Plus) | Predictable workloads |
+| **Autoscaling** | Automatic slot scaling | Variable with baseline |
+
+### Create Reservation (API/gcloud)
+
+```bash
+# Create a reservation
+gcloud bq reservations create my-reservation \
+ --project=PROJECT_ID \
+ --location=US \
+ --slots=500 \
+ --edition=ENTERPRISE
+
+# Create an assignment
+gcloud bq reservations assignments create \
+ --project=PROJECT_ID \
+ --location=US \
+ --reservation=my-reservation \
+ --assignee=projects/PROJECT_ID \
+ --job-type=QUERY
+```
+
+### View Reservations
+
+```sql
+SELECT
+ reservation_name,
+ slot_capacity,
+ target_job_concurrency
+FROM `region-us.INFORMATION_SCHEMA.RESERVATIONS`;
+```
+
+### View Assignments
+
+```sql
+SELECT
+ reservation_name,
+ assignment_name,
+ assignee_id,
+ job_type
+FROM `region-us.INFORMATION_SCHEMA.ASSIGNMENTS`;
+```
+
+### Autoscaling Configuration
+
+```bash
+# Enable autoscaling
+gcloud bq reservations update my-reservation \
+ --location=US \
+ --autoscale-max-slots=1000
+```
+
+## BI Engine
+
+### Enable BI Engine
+
+```bash
+# Create BI Engine reservation
+gcloud bq reservations create bi-engine-reservation \
+ --project=PROJECT_ID \
+ --location=US \
+ --bi-reservation-size=100 # GB of RAM
+```
+
+### Preferred Tables
+
+```bash
+# Configure preferred tables for BI Engine
+gcloud bq reservations update bi-engine-reservation \
+ --location=US \
+ --preferred-tables="project.dataset.table1,project.dataset.table2"
+```
+
+### Check BI Engine Status
+
+```sql
+SELECT
+ project_id,
+ bi_engine_mode,
+ bi_engine_reasons
+FROM `region-us.INFORMATION_SCHEMA.JOBS_BY_PROJECT`
+WHERE creation_time > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR)
+ AND bi_engine_mode IS NOT NULL;
+```
+
+### BI Engine Statistics
+
+```sql
+SELECT
+ COUNT(*) AS total_queries,
+ COUNTIF(bi_engine_mode = 'FULL') AS full_acceleration,
+ COUNTIF(bi_engine_mode = 'PARTIAL') AS partial_acceleration,
+ COUNTIF(bi_engine_mode = 'DISABLED') AS no_acceleration
+FROM `region-us.INFORMATION_SCHEMA.JOBS_BY_PROJECT`
+WHERE creation_time > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 24 HOUR)
+ AND statement_type = 'SELECT';
+```
+
+## Job Management
+
+### List Running Jobs
+
+```sql
+SELECT
+ job_id,
+ user_email,
+ creation_time,
+ state,
+ ROUND(total_bytes_processed / 1e9, 2) AS gb_processed,
+ query
+FROM `region-us.INFORMATION_SCHEMA.JOBS_BY_PROJECT`
+WHERE state = 'RUNNING'
+ORDER BY creation_time DESC;
+```
+
+### Job History
+
+```sql
+SELECT
+ job_id,
+ user_email,
+ creation_time,
+ end_time,
+ state,
+ ROUND(total_bytes_billed / 1e9, 2) AS gb_billed,
+ ROUND(total_slot_ms / 1000 / 60, 2) AS slot_minutes,
+ error_result.message AS error_message
+FROM `region-us.INFORMATION_SCHEMA.JOBS_BY_PROJECT`
+WHERE creation_time > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 24 HOUR)
+ORDER BY creation_time DESC
+LIMIT 100;
+```
+
+### Cancel Job
+
+```sql
+-- Cancel by job ID
+CALL BQ.JOBS.CANCEL('project:US.job_id_here');
+```
+
+```python
+# Using Python client
+from google.cloud import bigquery
+
+client = bigquery.Client()
+client.cancel_job("job_id", location="US")
+```
+
+### Job Performance Analysis
+
+```sql
+SELECT
+ job_id,
+ user_email,
+ ROUND(total_bytes_processed / 1e9, 2) AS gb_processed,
+ ROUND(total_slot_ms / 1000 / 60, 2) AS slot_minutes,
+ TIMESTAMP_DIFF(end_time, start_time, SECOND) AS duration_seconds,
+ cache_hit,
+ ARRAY_LENGTH(referenced_tables) AS tables_referenced
+FROM `region-us.INFORMATION_SCHEMA.JOBS_BY_PROJECT`
+WHERE creation_time > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY)
+ AND state = 'DONE'
+ AND total_bytes_processed > 1e9 -- > 1 GB
+ORDER BY total_bytes_processed DESC
+LIMIT 50;
+```
+
+## Monitoring
+
+### Slot Utilization
+
+```sql
+WITH slot_usage AS (
+ SELECT
+ TIMESTAMP_TRUNC(period_start, MINUTE) AS minute,
+ SUM(period_slot_ms) / 60000 AS slot_minutes
+ FROM `region-us.INFORMATION_SCHEMA.JOBS_TIMELINE_BY_PROJECT`
+ WHERE period_start > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR)
+ GROUP BY 1
+)
+SELECT
+ minute,
+ slot_minutes,
+ AVG(slot_minutes) OVER (
+ ORDER BY minute
+ ROWS BETWEEN 4 PRECEDING AND CURRENT ROW
+ ) AS avg_5min
+FROM slot_usage
+ORDER BY minute;
+```
+
+### Query Volume
+
+```sql
+SELECT
+ TIMESTAMP_TRUNC(creation_time, HOUR) AS hour,
+ COUNT(*) AS query_count,
+ COUNT(DISTINCT user_email) AS unique_users,
+ SUM(total_bytes_billed) / 1e12 AS tb_billed,
+ SUM(total_slot_ms) / 1000 / 60 / 60 AS slot_hours
+FROM `region-us.INFORMATION_SCHEMA.JOBS_BY_PROJECT`
+WHERE creation_time > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 24 HOUR)
+ AND statement_type = 'SELECT'
+GROUP BY 1
+ORDER BY 1;
+```
+
+### Error Analysis
+
+```sql
+SELECT
+ error_result.reason AS error_reason,
+ error_result.message AS error_message,
+ COUNT(*) AS occurrence_count,
+ COUNT(DISTINCT user_email) AS affected_users
+FROM `region-us.INFORMATION_SCHEMA.JOBS_BY_PROJECT`
+WHERE creation_time > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 24 HOUR)
+ AND error_result IS NOT NULL
+GROUP BY 1, 2
+ORDER BY occurrence_count DESC;
+```
+
+### User Activity
+
+```sql
+SELECT
+ user_email,
+ COUNT(*) AS query_count,
+ SUM(total_bytes_billed) / 1e9 AS gb_billed,
+ SUM(total_slot_ms) / 1000 / 60 AS slot_minutes,
+ ROUND(SUM(total_bytes_billed) / 1e12 * 5, 2) AS estimated_cost_usd
+FROM `region-us.INFORMATION_SCHEMA.JOBS_BY_PROJECT`
+WHERE creation_time > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)
+ AND statement_type = 'SELECT'
+GROUP BY user_email
+ORDER BY gb_billed DESC
+LIMIT 20;
+```
+
+## Quotas and Limits
+
+### View Current Quotas
+
+```sql
+-- Quotas are managed via Cloud Console or gcloud
+-- Common limits:
+-- - On-demand: 2000 concurrent queries per project
+-- - Reservations: Based on slot allocation
+-- - Streaming: 1 GB/s per table
+-- - Load jobs: 1000/table/day
+```
+
+### Set Custom Quotas
+
+```bash
+# Set query bytes limit per user per day
+gcloud projects set-quota bigquery.googleapis.com/Query_Usage_per_day \
+ --project=PROJECT_ID \
+ --consumer-quota-limit=10737418240 # 10 TB
+```
+
+### Query Cost Control
+
+```sql
+-- Set maximum bytes billed for a query
+-- In query settings or job configuration:
+-- maximum_bytes_billed: 10737418240 -- 10 GB
+
+-- Example using Python
+# job_config = bigquery.QueryJobConfig(
+# maximum_bytes_billed=10 * 1024**3 # 10 GB
+# )
+```
+
+## Cost Optimization
+
+### Cost Analysis by User
+
+```sql
+SELECT
+ user_email,
+ COUNT(*) AS queries,
+ ROUND(SUM(total_bytes_billed) / 1e12, 4) AS tb_scanned,
+ ROUND(SUM(total_bytes_billed) / 1e12 * 5, 2) AS estimated_cost_usd
+FROM `region-us.INFORMATION_SCHEMA.JOBS_BY_PROJECT`
+WHERE creation_time > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)
+GROUP BY user_email
+HAVING estimated_cost_usd > 10
+ORDER BY estimated_cost_usd DESC;
+```
+
+### Cost Analysis by Table
+
+```sql
+SELECT
+ CONCAT(ref.project_id, '.', ref.dataset_id, '.', ref.table_id) AS table_name,
+ COUNT(DISTINCT j.job_id) AS query_count,
+ SUM(j.total_bytes_billed) / 1e12 AS tb_scanned
+FROM `region-us.INFORMATION_SCHEMA.JOBS_BY_PROJECT` j,
+UNNEST(referenced_tables) AS ref
+WHERE creation_time > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)
+GROUP BY 1
+ORDER BY tb_scanned DESC
+LIMIT 20;
+```
+
+### Identify Expensive Queries
+
+```sql
+SELECT
+ job_id,
+ user_email,
+ ROUND(total_bytes_billed / 1e12 * 5, 2) AS estimated_cost_usd,
+ ROUND(total_bytes_billed / 1e9, 2) AS gb_billed,
+ cache_hit,
+ query
+FROM `region-us.INFORMATION_SCHEMA.JOBS_BY_PROJECT`
+WHERE creation_time > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY)
+ AND statement_type = 'SELECT'
+ORDER BY total_bytes_billed DESC
+LIMIT 10;
+```
+
+### Optimization Recommendations
+
+```sql
+-- Find queries that could benefit from partitioning
+SELECT
+ job_id,
+ user_email,
+ ROUND(total_bytes_billed / 1e9, 2) AS gb_billed,
+ referenced_tables
+FROM `region-us.INFORMATION_SCHEMA.JOBS_BY_PROJECT`
+WHERE creation_time > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY)
+ AND total_bytes_billed > 10e9 -- > 10 GB
+ AND NOT EXISTS (
+ SELECT 1 FROM UNNEST(referenced_tables) t
+ WHERE t.table_id LIKE '%$%' -- Partition decorator
+ )
+ORDER BY total_bytes_billed DESC;
+```
+
+## Scheduled Queries
+
+### View Scheduled Queries
+
+```sql
+SELECT
+ name,
+ schedule,
+ state,
+ destination_dataset_id,
+ update_time
+FROM `region-us.INFORMATION_SCHEMA.SCHEDULED_QUERIES`
+ORDER BY update_time DESC;
+```
+
+### Create Scheduled Query
+
+```python
+from google.cloud import bigquery_datatransfer
+
+client = bigquery_datatransfer.DataTransferServiceClient()
+
+transfer_config = bigquery_datatransfer.TransferConfig(
+ destination_dataset_id="destination_dataset",
+ display_name="Daily Summary",
+ data_source_id="scheduled_query",
+ schedule="every 24 hours",
+ params={
+ "query": """
+ SELECT DATE(timestamp) AS date, COUNT(*) AS events
+ FROM `project.dataset.events`
+ WHERE DATE(timestamp) = DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)
+ GROUP BY 1
+ """
+ }
+)
+
+client.create_transfer_config(
+ parent=f"projects/{project_id}/locations/{location}",
+ transfer_config=transfer_config
+)
+```
+
+## Performance Troubleshooting
+
+### Slow Query Analysis
+
+```sql
+SELECT
+ job_id,
+ query,
+ TIMESTAMP_DIFF(end_time, start_time, SECOND) AS duration_sec,
+ total_bytes_processed / 1e9 AS gb_processed,
+ total_slot_ms / TIMESTAMP_DIFF(end_time, start_time, MILLISECOND) AS avg_slots,
+ cache_hit
+FROM `region-us.INFORMATION_SCHEMA.JOBS_BY_PROJECT`
+WHERE creation_time > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 24 HOUR)
+ AND TIMESTAMP_DIFF(end_time, start_time, SECOND) > 60 -- > 1 minute
+ AND state = 'DONE'
+ORDER BY duration_sec DESC;
+```
+
+### Stage-Level Analysis
+
+```sql
+SELECT
+ job_id,
+ stage.name AS stage_name,
+ stage.status,
+ stage.records_read,
+ stage.records_written,
+ stage.shuffle_output_bytes / 1e9 AS shuffle_gb
+FROM `region-us.INFORMATION_SCHEMA.JOBS_BY_PROJECT`,
+UNNEST(job_stages) AS stage
+WHERE job_id = 'your-job-id'
+ORDER BY stage.start_ms;
+```
+
+## References
+
+- `RESERVATIONS.md` - Detailed reservation management
+- `MONITORING.md` - Cloud Monitoring integration
+- `COST_OPTIMIZATION.md` - Cost reduction strategies
+
+## Scripts
+
+- `cost_report.py` - Generate cost analysis report
+- `slot_monitor.py` - Real-time slot monitoring
+- `job_killer.py` - Automated job cancellation
+
+## Limitations
+
+- INFORMATION_SCHEMA: 180-day retention
+- Reservations: Minimum 100 slots
+- BI Engine: Limited to specific regions
+- Quotas: Some can't be customized
diff --git a/src/google/adk/skills/bigquery-admin/scripts/cost_report.py b/src/google/adk/skills/bigquery-admin/scripts/cost_report.py
new file mode 100644
index 0000000000..3b7db022a4
--- /dev/null
+++ b/src/google/adk/skills/bigquery-admin/scripts/cost_report.py
@@ -0,0 +1,290 @@
+"""Generate BigQuery cost analysis report.
+
+This script analyzes query costs and provides recommendations
+for cost optimization.
+
+Usage:
+ python cost_report.py --project PROJECT
+ python cost_report.py --project PROJECT --days 30 --format json
+"""
+
+import argparse
+from datetime import datetime
+from datetime import timedelta
+import json
+import sys
+
+
+def get_cost_report(project: str, days: int = 30) -> dict:
+ """Generate cost analysis report."""
+ try:
+ from google.cloud import bigquery
+
+ client = bigquery.Client(project=project)
+
+ # Query cost data from INFORMATION_SCHEMA
+ query = f"""
+ WITH job_costs AS (
+ SELECT
+ user_email,
+ DATE(creation_time) AS query_date,
+ job_id,
+ total_bytes_billed,
+ total_slot_ms,
+ cache_hit,
+ statement_type,
+ ARRAY_TO_STRING(
+ ARRAY(
+ SELECT CONCAT(t.project_id, '.', t.dataset_id, '.', t.table_id)
+ FROM UNNEST(referenced_tables) t
+ ),
+ ', '
+ ) AS tables_accessed
+ FROM `region-us.INFORMATION_SCHEMA.JOBS_BY_PROJECT`
+ WHERE creation_time > TIMESTAMP_SUB(
+ CURRENT_TIMESTAMP(),
+ INTERVAL {days} DAY
+ )
+ AND statement_type = 'SELECT'
+ AND total_bytes_billed > 0
+ )
+ SELECT
+ user_email,
+ COUNT(*) AS query_count,
+ SUM(total_bytes_billed) AS total_bytes_billed,
+ SUM(total_slot_ms) AS total_slot_ms,
+ COUNTIF(cache_hit) AS cache_hits,
+ COUNTIF(NOT cache_hit) AS cache_misses
+ FROM job_costs
+ GROUP BY user_email
+ ORDER BY total_bytes_billed DESC
+ LIMIT 50
+ """
+
+ results = list(client.query(query).result())
+
+ # Calculate totals
+ total_bytes = sum(r.total_bytes_billed or 0 for r in results)
+ total_queries = sum(r.query_count or 0 for r in results)
+ total_cache_hits = sum(r.cache_hits or 0 for r in results)
+
+ # On-demand pricing: $5 per TB
+ estimated_cost = (total_bytes / 1e12) * 5
+
+ users = []
+ for row in results:
+ user_cost = (row.total_bytes_billed / 1e12) * 5
+ users.append({
+ "email": row.user_email,
+ "query_count": row.query_count,
+ "tb_scanned": round(row.total_bytes_billed / 1e12, 6),
+ "slot_hours": round(row.total_slot_ms / 1000 / 60 / 60, 2),
+ "cache_hit_rate": round(
+ row.cache_hits / (row.cache_hits + row.cache_misses) * 100
+ if (row.cache_hits + row.cache_misses) > 0
+ else 0,
+ 1,
+ ),
+ "estimated_cost_usd": round(user_cost, 2),
+ })
+
+ # Get expensive queries
+ expensive_query = f"""
+ SELECT
+ job_id,
+ user_email,
+ total_bytes_billed,
+ TIMESTAMP_DIFF(end_time, start_time, SECOND) AS duration_sec,
+ query
+ FROM `region-us.INFORMATION_SCHEMA.JOBS_BY_PROJECT`
+ WHERE creation_time > TIMESTAMP_SUB(
+ CURRENT_TIMESTAMP(),
+ INTERVAL {days} DAY
+ )
+ AND statement_type = 'SELECT'
+ ORDER BY total_bytes_billed DESC
+ LIMIT 10
+ """
+
+ expensive_results = list(client.query(expensive_query).result())
+ expensive_queries = []
+ for row in expensive_results:
+ expensive_queries.append({
+ "job_id": row.job_id,
+ "user": row.user_email,
+ "tb_scanned": round(row.total_bytes_billed / 1e12, 6),
+ "duration_sec": row.duration_sec,
+ "cost_usd": round((row.total_bytes_billed / 1e12) * 5, 2),
+ "query_preview": (
+ row.query[:200] + "..." if len(row.query) > 200 else row.query
+ ),
+ })
+
+ return {
+ "project": project,
+ "period_days": days,
+ "generated_at": datetime.utcnow().isoformat(),
+ "summary": {
+ "total_queries": total_queries,
+ "total_tb_scanned": round(total_bytes / 1e12, 4),
+ "estimated_cost_usd": round(estimated_cost, 2),
+ "unique_users": len(users),
+ "cache_hit_rate": round(
+ total_cache_hits / total_queries * 100
+ if total_queries > 0
+ else 0,
+ 1,
+ ),
+ },
+ "users": users,
+ "expensive_queries": expensive_queries,
+ }
+ except Exception as e:
+ return {"error": str(e)}
+
+
+def generate_recommendations(report: dict) -> list:
+ """Generate cost optimization recommendations."""
+ recommendations = []
+
+ summary = report.get("summary", {})
+ users = report.get("users", [])
+
+ # Check overall cache hit rate
+ cache_rate = summary.get("cache_hit_rate", 0)
+ if cache_rate < 50:
+ recommendations.append({
+ "severity": "HIGH",
+ "area": "Caching",
+ "recommendation": (
+ f"Cache hit rate is only {cache_rate}%. Consider using "
+ "deterministic queries and avoiding CURRENT_TIMESTAMP() "
+ "to improve cache utilization."
+ ),
+ })
+
+ # Check for high-cost users
+ total_cost = summary.get("estimated_cost_usd", 0)
+ for user in users[:5]:
+ if user["estimated_cost_usd"] > total_cost * 0.3:
+ recommendations.append({
+ "severity": "MEDIUM",
+ "area": "User Cost",
+ "recommendation": (
+ f"User {user['email']} accounts for"
+ f" ${user['estimated_cost_usd']:.2f} ({user['estimated_cost_usd'] / total_cost * 100:.0f}%"
+ " of total). Review their query patterns."
+ ),
+ })
+
+ # Check expensive queries
+ for query in report.get("expensive_queries", [])[:3]:
+ if query["cost_usd"] > 1:
+ recommendations.append({
+ "severity": "MEDIUM",
+ "area": "Expensive Query",
+ "recommendation": (
+ f"Query {query['job_id'][:20]}... scanned "
+ f"{query['tb_scanned']:.4f} TB (${query['cost_usd']:.2f}). "
+ "Consider partitioning or filtering."
+ ),
+ })
+
+ # General recommendations
+ if total_cost > 100:
+ recommendations.append({
+ "severity": "INFO",
+ "area": "Reservations",
+ "recommendation": (
+ f"With ${total_cost:.2f}/month spend, consider "
+ "slot reservations for predictable costs."
+ ),
+ })
+
+ return recommendations
+
+
+def format_report(report: dict, recommendations: list) -> str:
+ """Format report for display."""
+ output = []
+ output.append("=" * 70)
+ output.append("BIGQUERY COST ANALYSIS REPORT")
+ output.append("=" * 70)
+
+ if "error" in report:
+ output.append(f"\nError: {report['error']}")
+ return "\n".join(output)
+
+ output.append(f"\nProject: {report['project']}")
+ output.append(f"Period: Last {report['period_days']} days")
+ output.append(f"Generated: {report['generated_at']}")
+
+ summary = report["summary"]
+ output.append("\n## Summary")
+ output.append(f" Total Queries: {summary['total_queries']:,}")
+ output.append(f" Total TB Scanned: {summary['total_tb_scanned']:.4f}")
+ output.append(f" Estimated Cost: ${summary['estimated_cost_usd']:.2f}")
+ output.append(f" Unique Users: {summary['unique_users']}")
+ output.append(f" Cache Hit Rate: {summary['cache_hit_rate']:.1f}%")
+
+ output.append("\n## Top Users by Cost")
+ output.append("-" * 70)
+ output.append(f"{'User':<35} {'Queries':>10} {'TB':>10} {'Cost':>10}")
+ output.append("-" * 70)
+ for user in report["users"][:10]:
+ output.append(
+ f"{user['email'][:35]:<35} "
+ f"{user['query_count']:>10,} "
+ f"{user['tb_scanned']:>10.4f} "
+ f"${user['estimated_cost_usd']:>9.2f}"
+ )
+
+ output.append("\n## Most Expensive Queries")
+ output.append("-" * 70)
+ for i, query in enumerate(report["expensive_queries"][:5], 1):
+ output.append(f"\n{i}. Job: {query['job_id'][:40]}")
+ output.append(f" User: {query['user']}")
+ output.append(
+ f" Cost: ${query['cost_usd']:.2f} ({query['tb_scanned']:.4f} TB)"
+ )
+ output.append(f" Query: {query['query_preview'][:60]}...")
+
+ output.append("\n## Recommendations")
+ output.append("-" * 70)
+ for rec in recommendations:
+ output.append(f"\n[{rec['severity']}] {rec['area']}")
+ output.append(f" {rec['recommendation']}")
+
+ output.append("\n" + "=" * 70)
+ return "\n".join(output)
+
+
+def main():
+ parser = argparse.ArgumentParser(
+ description="Generate BigQuery cost analysis report"
+ )
+ parser.add_argument("--project", required=True, help="GCP project ID")
+ parser.add_argument(
+ "--days",
+ type=int,
+ default=30,
+ help="Number of days to analyze (default: 30)",
+ )
+ parser.add_argument(
+ "--format", choices=["text", "json"], default="text", help="Output format"
+ )
+
+ args = parser.parse_args()
+
+ report = get_cost_report(args.project, args.days)
+ recommendations = generate_recommendations(report)
+
+ if args.format == "json":
+ output = {"report": report, "recommendations": recommendations}
+ print(json.dumps(output, indent=2, default=str))
+ else:
+ print(format_report(report, recommendations))
+
+
+if __name__ == "__main__":
+ main()
diff --git a/src/google/adk/skills/bigquery-ai/SKILL.md b/src/google/adk/skills/bigquery-ai/SKILL.md
new file mode 100644
index 0000000000..849cf206c0
--- /dev/null
+++ b/src/google/adk/skills/bigquery-ai/SKILL.md
@@ -0,0 +1,272 @@
+---
+name: bigquery-ai
+description: Execute generative AI operations in BigQuery - text generation, embeddings, vector search, and RAG workflows using Gemini, Claude, and other LLMs. Use when working with AI/ML inference, semantic search, or building RAG applications in BigQuery.
+license: Apache-2.0
+compatibility: BigQuery, Vertex AI, Gemini, Cloud AI APIs
+metadata:
+ author: Google Cloud
+ version: "2.0"
+ category: generative-ai
+adk:
+ config:
+ timeout_seconds: 180
+ max_parallel_calls: 10
+ allow_network: true
+ allowed_callers:
+ - bigquery_agent
+ - ai_agent
+ - rag_agent
+---
+
+# BigQuery AI Skill
+
+Execute generative AI operations directly in BigQuery using SQL. This skill covers text generation, embeddings, vector search, and retrieval-augmented generation (RAG) workflows.
+
+## When to Use This Skill
+
+Use this skill when you need to:
+- Generate text using LLMs (Gemini, Claude, Llama, Mistral) on BigQuery data
+- Create embeddings for semantic search and similarity matching
+- Build vector search and RAG pipelines entirely in SQL
+- Process documents, translate text, or analyze images at scale
+- Connect BigQuery to Vertex AI models for inference
+
+## Core Capabilities
+
+| Capability | Function | Description |
+|------------|----------|-------------|
+| Text Generation | `AI.GENERATE_TEXT` | Generate text using remote LLM models |
+| Embeddings | `ML.GENERATE_EMBEDDING` | Create vector embeddings from text/images |
+| Vector Search | `VECTOR_SEARCH` | Find semantically similar items |
+| Semantic Search | `AI.SEARCH` | Search with autonomous embeddings |
+| Remote Models | `CREATE MODEL` | Connect to Vertex AI endpoints |
+
+## Quick Start
+
+### 1. Create a Remote Model Connection
+
+```sql
+-- Create connection to Gemini
+CREATE OR REPLACE MODEL `project.dataset.gemini_model`
+ REMOTE WITH CONNECTION `project.region.connection_id`
+ OPTIONS (ENDPOINT = 'gemini-2.0-flash');
+```
+
+### 2. Generate Text
+
+```sql
+SELECT ml_generate_text_result
+FROM ML.GENERATE_TEXT(
+ MODEL `project.dataset.gemini_model`,
+ (SELECT 'Summarize this text: ' || content AS prompt FROM my_table),
+ STRUCT(256 AS max_output_tokens, 0.2 AS temperature)
+);
+```
+
+### 3. Create Embeddings
+
+```sql
+-- Create embedding model
+CREATE OR REPLACE MODEL `project.dataset.embedding_model`
+ REMOTE WITH CONNECTION DEFAULT
+ OPTIONS (ENDPOINT = 'text-embedding-005');
+
+-- Generate embeddings
+SELECT * FROM ML.GENERATE_EMBEDDING(
+ MODEL `project.dataset.embedding_model`,
+ (SELECT content FROM my_table)
+);
+```
+
+### 4. Vector Search
+
+```sql
+SELECT base.id, base.content, distance
+FROM VECTOR_SEARCH(
+ TABLE `project.dataset.embeddings`, 'embedding',
+ (SELECT embedding FROM query_embeddings),
+ top_k => 10,
+ distance_type => 'COSINE'
+);
+```
+
+## AI Functions Reference
+
+### AI.GENERATE_TEXT
+
+Full control over text generation with model parameters:
+
+```sql
+SELECT * FROM AI.GENERATE_TEXT(
+ MODEL `project.dataset.model`,
+ (SELECT prompt FROM prompts_table),
+ STRUCT(
+ 512 AS max_output_tokens,
+ 0.7 AS temperature,
+ 0.95 AS top_p,
+ TRUE AS ground_with_google_search
+ )
+);
+```
+
+**Key Parameters:**
+- `max_output_tokens`: 1-8192 (default: 128)
+- `temperature`: 0.0-1.0 (default: 0, higher = more creative)
+- `top_p`: 0.0-1.0 (default: 0.95)
+- `ground_with_google_search`: Enable web grounding
+
+### ML.GENERATE_EMBEDDING
+
+Generate vector embeddings for semantic operations:
+
+```sql
+SELECT * FROM ML.GENERATE_EMBEDDING(
+ MODEL `project.dataset.embedding_model`,
+ (SELECT id, text_column AS content FROM source_table)
+)
+WHERE LENGTH(ml_generate_embedding_status) = 0; -- Filter errors
+```
+
+**Supported Models:**
+- `text-embedding-005` (recommended)
+- `text-embedding-004`
+- `text-multilingual-embedding-002`
+- `multimodalembedding@001` (text + images)
+
+### VECTOR_SEARCH
+
+Find nearest neighbors using embeddings:
+
+```sql
+SELECT query.id, base.id, base.content, distance
+FROM VECTOR_SEARCH(
+ TABLE `project.dataset.base_embeddings`, 'embedding',
+ TABLE `project.dataset.query_embeddings`,
+ top_k => 5,
+ distance_type => 'COSINE',
+ options => '{"fraction_lists_to_search": 0.01}'
+);
+```
+
+**Distance Types:** `COSINE`, `EUCLIDEAN`, `DOT_PRODUCT`
+
+## Supported Models
+
+| Provider | Models | Use Case |
+|----------|--------|----------|
+| Google | Gemini 2.0, 1.5 Pro/Flash | Text generation, multimodal |
+| Anthropic | Claude 3.5, 3 Opus/Sonnet | Complex reasoning |
+| Meta | Llama 3.1, 3.2 | Open-source alternative |
+| Mistral | Mistral Large, Medium | European compliance |
+
+## Prerequisites
+
+1. **BigQuery Connection**: Create a Cloud resource connection
+2. **IAM Permissions**: Grant `bigquery.connectionUser` and `aiplatform.user`
+3. **APIs Enabled**: BigQuery API, Vertex AI API, BigQuery Connection API
+
+## References
+
+Load detailed documentation as needed:
+
+- `TEXT_GENERATION.md` - Complete AI.GENERATE_TEXT guide with all parameters
+- `EMBEDDINGS.md` - Embedding models, multimodal embeddings, best practices
+- `VECTOR_SEARCH.md` - Vector indexes, search optimization, recall tuning
+- `REMOTE_MODELS.md` - CREATE MODEL syntax for all supported providers
+- `RAG_WORKFLOW.md` - End-to-end RAG implementation patterns
+- `CLOUD_AI_SERVICES.md` - Translation, NLP, document processing, vision
+
+## Scripts
+
+Helper scripts for common operations:
+
+- `setup_remote_model.py` - Create remote model connections
+- `generate_embeddings.py` - Batch embedding generation
+- `semantic_search.py` - Build semantic search pipelines
+- `rag_pipeline.py` - Complete RAG workflow setup
+
+## Common Patterns
+
+### Batch Text Classification
+
+```sql
+SELECT id, content,
+ JSON_VALUE(ml_generate_text_result, '$.predictions[0].content') AS category
+FROM ML.GENERATE_TEXT(
+ MODEL `project.dataset.gemini`,
+ (SELECT id, content,
+ CONCAT('Classify this text into one of: Tech, Sports, Politics\n\nText: ', content) AS prompt
+ FROM articles)
+);
+```
+
+### Semantic Similarity Search
+
+```sql
+-- Find similar documents to a query
+WITH query_embedding AS (
+ SELECT ml_generate_embedding_result AS embedding
+ FROM ML.GENERATE_EMBEDDING(
+ MODEL `project.dataset.embedding_model`,
+ (SELECT 'machine learning best practices' AS content)
+ )
+)
+SELECT d.title, d.content, distance
+FROM VECTOR_SEARCH(
+ TABLE `project.dataset.doc_embeddings`, 'embedding',
+ TABLE query_embedding,
+ top_k => 10
+)
+JOIN `project.dataset.documents` d ON d.id = base.id;
+```
+
+### RAG with Context Injection
+
+```sql
+-- Retrieve relevant context and generate answer
+WITH context AS (
+ SELECT STRING_AGG(content, '\n\n') AS retrieved_context
+ FROM VECTOR_SEARCH(
+ TABLE `project.dataset.knowledge_base`, 'embedding',
+ (SELECT embedding FROM ML.GENERATE_EMBEDDING(MODEL m, (SELECT @query AS content))),
+ top_k => 5
+ )
+)
+SELECT ml_generate_text_result AS answer
+FROM ML.GENERATE_TEXT(
+ MODEL `project.dataset.gemini`,
+ (SELECT CONCAT(
+ 'Answer based on context:\n\n', retrieved_context,
+ '\n\nQuestion: ', @query
+ ) AS prompt FROM context)
+);
+```
+
+## Error Handling
+
+Check status columns for errors:
+
+```sql
+-- Text generation errors
+SELECT * FROM ML.GENERATE_TEXT(...)
+WHERE ml_generate_text_status != '';
+
+-- Embedding errors
+SELECT * FROM ML.GENERATE_EMBEDDING(...)
+WHERE LENGTH(ml_generate_embedding_status) > 0;
+```
+
+## Performance Tips
+
+1. **Use Vector Indexes**: Create indexes for tables with >100K embeddings
+2. **Batch Requests**: Process multiple rows in single function calls
+3. **Filter Before AI**: Apply WHERE clauses before expensive AI operations
+4. **Cache Embeddings**: Store embeddings in tables, don't regenerate
+5. **Tune Search**: Adjust `fraction_lists_to_search` for speed vs recall
+
+## Limitations
+
+- Max 10,000 rows per AI function call
+- Embedding dimensions vary by model (768-3072)
+- Rate limits apply based on Vertex AI quotas
+- Some models require specific regions
diff --git a/src/google/adk/skills/bigquery-ai/references/CLOUD_AI_SERVICES.md b/src/google/adk/skills/bigquery-ai/references/CLOUD_AI_SERVICES.md
new file mode 100644
index 0000000000..0f13ece001
--- /dev/null
+++ b/src/google/adk/skills/bigquery-ai/references/CLOUD_AI_SERVICES.md
@@ -0,0 +1,421 @@
+# Cloud AI Services in BigQuery
+
+Guide to using Google Cloud AI services directly from BigQuery SQL.
+
+## Overview
+
+BigQuery integrates with Cloud AI services for:
+- Translation
+- Natural Language Processing
+- Document Understanding
+- Speech Transcription
+- Computer Vision
+
+## Translation (Cloud Translation API)
+
+### Create Translation Model
+
+```sql
+CREATE OR REPLACE MODEL `project.dataset.translator`
+ REMOTE WITH CONNECTION `project.us.connection`
+ OPTIONS (ENDPOINT = 'cloud_ai_translate_v3');
+```
+
+### ML.TRANSLATE Function
+
+```sql
+SELECT *
+FROM ML.TRANSLATE(
+ MODEL `project.dataset.translator`,
+ (SELECT text AS text_to_translate FROM documents),
+ STRUCT(
+ 'es' AS target_language_code,
+ 'en' AS source_language_code -- Optional, auto-detected if omitted
+ )
+);
+```
+
+### Output Schema
+
+| Column | Type | Description |
+|--------|------|-------------|
+| `translated_text` | STRING | Translated content |
+| `detected_language_code` | STRING | Detected source language |
+| Original columns | Various | All input columns |
+
+### Examples
+
+```sql
+-- Translate to multiple languages
+SELECT
+ original_text,
+ es.translated_text AS spanish,
+ fr.translated_text AS french,
+ de.translated_text AS german
+FROM (SELECT content AS text_to_translate, content AS original_text FROM articles) src
+LEFT JOIN ML.TRANSLATE(MODEL `project.dataset.translator`, src,
+ STRUCT('es' AS target_language_code)) es ON TRUE
+LEFT JOIN ML.TRANSLATE(MODEL `project.dataset.translator`, src,
+ STRUCT('fr' AS target_language_code)) fr ON TRUE
+LEFT JOIN ML.TRANSLATE(MODEL `project.dataset.translator`, src,
+ STRUCT('de' AS target_language_code)) de ON TRUE;
+```
+
+```sql
+-- Detect language
+SELECT
+ text,
+ detected_language_code AS language
+FROM ML.TRANSLATE(
+ MODEL `project.dataset.translator`,
+ (SELECT text AS text_to_translate FROM unknown_language_texts),
+ STRUCT('en' AS target_language_code)
+);
+```
+
+## Natural Language (Cloud Natural Language API)
+
+### Create NLP Model
+
+```sql
+CREATE OR REPLACE MODEL `project.dataset.nlp_analyzer`
+ REMOTE WITH CONNECTION `project.us.connection`
+ OPTIONS (ENDPOINT = 'cloud_ai_natural_language_v1');
+```
+
+### ML.UNDERSTAND_TEXT Function
+
+```sql
+SELECT *
+FROM ML.UNDERSTAND_TEXT(
+ MODEL `project.dataset.nlp_analyzer`,
+ (SELECT text AS text_content FROM reviews),
+ STRUCT('ANALYZE_SENTIMENT' AS nlp_task)
+);
+```
+
+### NLP Tasks
+
+| Task | Description | Output |
+|------|-------------|--------|
+| `ANALYZE_SENTIMENT` | Sentiment analysis | Score, magnitude |
+| `ANALYZE_ENTITIES` | Entity extraction | Entities with types |
+| `ANALYZE_SYNTAX` | Syntax/grammar analysis | Tokens, POS tags |
+| `CLASSIFY_TEXT` | Text classification | Categories |
+| `ANALYZE_ENTITY_SENTIMENT` | Entity-level sentiment | Entities with sentiment |
+
+### Examples
+
+#### Sentiment Analysis
+
+```sql
+SELECT
+ review_id,
+ text,
+ ml_understand_text_result.document_sentiment.score AS sentiment_score,
+ ml_understand_text_result.document_sentiment.magnitude AS sentiment_magnitude,
+ CASE
+ WHEN ml_understand_text_result.document_sentiment.score > 0.25 THEN 'positive'
+ WHEN ml_understand_text_result.document_sentiment.score < -0.25 THEN 'negative'
+ ELSE 'neutral'
+ END AS sentiment_label
+FROM ML.UNDERSTAND_TEXT(
+ MODEL `project.dataset.nlp_analyzer`,
+ (SELECT review_id, review_text AS text_content FROM product_reviews),
+ STRUCT('ANALYZE_SENTIMENT' AS nlp_task)
+);
+```
+
+#### Entity Extraction
+
+```sql
+SELECT
+ doc_id,
+ entity.name AS entity_name,
+ entity.type AS entity_type,
+ entity.salience AS importance
+FROM ML.UNDERSTAND_TEXT(
+ MODEL `project.dataset.nlp_analyzer`,
+ (SELECT doc_id, content AS text_content FROM documents),
+ STRUCT('ANALYZE_ENTITIES' AS nlp_task)
+),
+UNNEST(ml_understand_text_result.entities) AS entity
+WHERE entity.salience > 0.1;
+```
+
+#### Text Classification
+
+```sql
+SELECT
+ article_id,
+ category.name AS category,
+ category.confidence
+FROM ML.UNDERSTAND_TEXT(
+ MODEL `project.dataset.nlp_analyzer`,
+ (SELECT article_id, content AS text_content FROM articles),
+ STRUCT('CLASSIFY_TEXT' AS nlp_task)
+),
+UNNEST(ml_understand_text_result.categories) AS category
+ORDER BY category.confidence DESC;
+```
+
+## Document AI (Document Understanding)
+
+### Create Document AI Model
+
+```sql
+CREATE OR REPLACE MODEL `project.dataset.doc_processor`
+ REMOTE WITH CONNECTION `project.us.connection`
+ OPTIONS (
+ ENDPOINT = 'cloud_ai_document_v1',
+ DOCUMENT_PROCESSOR = 'projects/my-project/locations/us/processors/abc123'
+ );
+```
+
+### ML.PROCESS_DOCUMENT Function
+
+```sql
+SELECT *
+FROM ML.PROCESS_DOCUMENT(
+ MODEL `project.dataset.doc_processor`,
+ (SELECT gcs_uri AS uri FROM pdf_files)
+);
+```
+
+### Processor Types
+
+| Processor | Description | Use Case |
+|-----------|-------------|----------|
+| Form Parser | Extract form fields | Surveys, applications |
+| Invoice Parser | Extract invoice data | Accounting |
+| Receipt Parser | Extract receipt info | Expense tracking |
+| ID Parser | Extract ID information | KYC |
+| OCR | General text extraction | Digitization |
+
+### Examples
+
+#### Process Invoices
+
+```sql
+SELECT
+ invoice_uri,
+ JSON_VALUE(ml_process_document_result, '$.entities[?(@.type=="invoice_id")].mentionText') AS invoice_id,
+ JSON_VALUE(ml_process_document_result, '$.entities[?(@.type=="total_amount")].mentionText') AS total,
+ JSON_VALUE(ml_process_document_result, '$.entities[?(@.type=="invoice_date")].mentionText') AS date
+FROM ML.PROCESS_DOCUMENT(
+ MODEL `project.dataset.invoice_processor`,
+ (SELECT uri AS uri FROM `project.dataset.invoice_pdfs`)
+);
+```
+
+#### Extract Text from PDFs
+
+```sql
+SELECT
+ pdf_uri,
+ ml_process_document_result.text AS extracted_text,
+ ARRAY_LENGTH(ml_process_document_result.pages) AS page_count
+FROM ML.PROCESS_DOCUMENT(
+ MODEL `project.dataset.ocr_processor`,
+ (SELECT gcs_uri AS uri FROM document_archive)
+);
+```
+
+## Speech-to-Text (Cloud Speech API)
+
+### Create Speech Model
+
+```sql
+CREATE OR REPLACE MODEL `project.dataset.speech_transcriber`
+ REMOTE WITH CONNECTION `project.us.connection`
+ OPTIONS (ENDPOINT = 'cloud_ai_speech_v2');
+```
+
+### ML.TRANSCRIBE Function
+
+```sql
+SELECT *
+FROM ML.TRANSCRIBE(
+ MODEL `project.dataset.speech_transcriber`,
+ (SELECT audio_uri AS uri FROM audio_files),
+ STRUCT(
+ 'en-US' AS language_code,
+ TRUE AS enable_automatic_punctuation
+ )
+);
+```
+
+### Parameters
+
+| Parameter | Type | Description |
+|-----------|------|-------------|
+| `language_code` | STRING | Language (e.g., 'en-US', 'es-ES') |
+| `enable_automatic_punctuation` | BOOL | Add punctuation |
+| `enable_word_time_offsets` | BOOL | Include timestamps |
+| `model` | STRING | 'latest_long', 'latest_short' |
+
+### Examples
+
+```sql
+-- Transcribe call recordings
+SELECT
+ call_id,
+ ml_transcribe_result.transcript AS transcription,
+ ml_transcribe_result.confidence AS confidence
+FROM ML.TRANSCRIBE(
+ MODEL `project.dataset.speech_transcriber`,
+ (SELECT call_id, recording_uri AS uri FROM call_recordings),
+ STRUCT('en-US' AS language_code, TRUE AS enable_automatic_punctuation)
+);
+
+-- With word timestamps
+SELECT
+ audio_id,
+ word.word,
+ word.start_time,
+ word.end_time
+FROM ML.TRANSCRIBE(
+ MODEL `project.dataset.speech_transcriber`,
+ (SELECT audio_id, uri FROM audio_files),
+ STRUCT('en-US' AS language_code, TRUE AS enable_word_time_offsets)
+),
+UNNEST(ml_transcribe_result.words) AS word;
+```
+
+## Computer Vision (Cloud Vision API)
+
+### Create Vision Model
+
+```sql
+CREATE OR REPLACE MODEL `project.dataset.image_analyzer`
+ REMOTE WITH CONNECTION `project.us.connection`
+ OPTIONS (ENDPOINT = 'cloud_ai_vision_v1');
+```
+
+### ML.ANNOTATE_IMAGE Function
+
+```sql
+SELECT *
+FROM ML.ANNOTATE_IMAGE(
+ MODEL `project.dataset.image_analyzer`,
+ (SELECT image_uri AS uri FROM images),
+ STRUCT(['LABEL_DETECTION', 'TEXT_DETECTION'] AS vision_features)
+);
+```
+
+### Vision Features
+
+| Feature | Description |
+|---------|-------------|
+| `LABEL_DETECTION` | Identify objects and concepts |
+| `TEXT_DETECTION` | OCR - extract text |
+| `FACE_DETECTION` | Detect faces |
+| `LANDMARK_DETECTION` | Identify landmarks |
+| `LOGO_DETECTION` | Detect logos |
+| `SAFE_SEARCH_DETECTION` | Content moderation |
+| `IMAGE_PROPERTIES` | Color analysis |
+| `OBJECT_LOCALIZATION` | Locate objects with bounding boxes |
+
+### Examples
+
+#### Label Detection
+
+```sql
+SELECT
+ image_id,
+ label.description AS label,
+ label.score AS confidence
+FROM ML.ANNOTATE_IMAGE(
+ MODEL `project.dataset.image_analyzer`,
+ (SELECT image_id, gcs_uri AS uri FROM product_images),
+ STRUCT(['LABEL_DETECTION'] AS vision_features)
+),
+UNNEST(ml_annotate_image_result.label_annotations) AS label
+WHERE label.score > 0.8;
+```
+
+#### Text Extraction (OCR)
+
+```sql
+SELECT
+ image_id,
+ ml_annotate_image_result.full_text_annotation.text AS extracted_text
+FROM ML.ANNOTATE_IMAGE(
+ MODEL `project.dataset.image_analyzer`,
+ (SELECT image_id, uri FROM document_images),
+ STRUCT(['TEXT_DETECTION'] AS vision_features)
+);
+```
+
+#### Content Moderation
+
+```sql
+SELECT
+ image_id,
+ ml_annotate_image_result.safe_search_annotation.adult AS adult_rating,
+ ml_annotate_image_result.safe_search_annotation.violence AS violence_rating,
+ CASE
+ WHEN ml_annotate_image_result.safe_search_annotation.adult IN ('LIKELY', 'VERY_LIKELY')
+ OR ml_annotate_image_result.safe_search_annotation.violence IN ('LIKELY', 'VERY_LIKELY')
+ THEN 'FLAGGED'
+ ELSE 'SAFE'
+ END AS moderation_status
+FROM ML.ANNOTATE_IMAGE(
+ MODEL `project.dataset.image_analyzer`,
+ (SELECT image_id, uri FROM user_uploads),
+ STRUCT(['SAFE_SEARCH_DETECTION'] AS vision_features)
+);
+```
+
+#### Object Detection with Bounding Boxes
+
+```sql
+SELECT
+ image_id,
+ obj.name AS object_name,
+ obj.score AS confidence,
+ obj.bounding_poly.normalized_vertices AS bounding_box
+FROM ML.ANNOTATE_IMAGE(
+ MODEL `project.dataset.image_analyzer`,
+ (SELECT image_id, uri FROM images),
+ STRUCT(['OBJECT_LOCALIZATION'] AS vision_features)
+),
+UNNEST(ml_annotate_image_result.localized_object_annotations) AS obj
+WHERE obj.score > 0.7;
+```
+
+## Best Practices
+
+### 1. Batch Processing
+
+Process data in batches to optimize costs and performance:
+
+```sql
+-- Process in batches of 1000
+SELECT * FROM ML.TRANSLATE(
+ MODEL `project.dataset.translator`,
+ (SELECT * FROM documents LIMIT 1000 OFFSET @batch),
+ ...
+);
+```
+
+### 2. Error Handling
+
+Always check for processing errors:
+
+```sql
+SELECT *
+FROM ML.PROCESS_DOCUMENT(...)
+WHERE ml_process_document_status = ''; -- Empty = success
+```
+
+### 3. Cost Management
+
+- Filter data before processing
+- Use appropriate features only
+- Monitor usage in Cloud Console
+
+### 4. Regional Considerations
+
+- Use connections in same region as data
+- Some services have regional restrictions
diff --git a/src/google/adk/skills/bigquery-ai/references/EMBEDDINGS.md b/src/google/adk/skills/bigquery-ai/references/EMBEDDINGS.md
new file mode 100644
index 0000000000..839e103bdf
--- /dev/null
+++ b/src/google/adk/skills/bigquery-ai/references/EMBEDDINGS.md
@@ -0,0 +1,329 @@
+# Embeddings in BigQuery
+
+Complete guide to generating and using vector embeddings with ML.GENERATE_EMBEDDING.
+
+## Overview
+
+Embeddings are dense vector representations that capture semantic meaning. Use them for:
+- Semantic search (find similar content)
+- Clustering (group related items)
+- Classification (as features for ML models)
+- Recommendations (similarity-based suggestions)
+- RAG (retrieve relevant context for LLMs)
+
+## ML.GENERATE_EMBEDDING Syntax
+
+```sql
+SELECT *
+FROM ML.GENERATE_EMBEDDING(
+ MODEL `project.dataset.embedding_model`,
+ { TABLE source_table | (SELECT query) },
+ STRUCT(
+ flatten_json_output AS flatten_json_output,
+ task_type AS task_type,
+ output_dimensionality AS output_dimensionality
+ )
+);
+```
+
+## Supported Embedding Models
+
+| Model | Dimensions | Languages | Modalities | Use Case |
+|-------|------------|-----------|------------|----------|
+| `text-embedding-005` | 768 | 100+ | Text | General purpose (recommended) |
+| `text-embedding-004` | 768 | 100+ | Text | Previous generation |
+| `text-multilingual-embedding-002` | 768 | 100+ | Text | Multilingual focus |
+| `textembedding-gecko@003` | 768 | English | Text | Legacy |
+| `multimodalembedding@001` | 1408 | - | Text, Image, Video | Multimodal search |
+
+## Creating an Embedding Model
+
+```sql
+-- Standard text embedding model
+CREATE OR REPLACE MODEL `project.dataset.text_embeddings`
+ REMOTE WITH CONNECTION DEFAULT
+ OPTIONS (ENDPOINT = 'text-embedding-005');
+
+-- Multimodal embedding model
+CREATE OR REPLACE MODEL `project.dataset.multimodal_embeddings`
+ REMOTE WITH CONNECTION DEFAULT
+ OPTIONS (ENDPOINT = 'multimodalembedding@001');
+
+-- With specific connection
+CREATE OR REPLACE MODEL `project.dataset.embeddings`
+ REMOTE WITH CONNECTION `project.us.my_connection`
+ OPTIONS (ENDPOINT = 'text-embedding-005');
+```
+
+## Parameters Reference
+
+| Parameter | Type | Default | Description |
+|-----------|------|---------|-------------|
+| `flatten_json_output` | BOOL | TRUE | Return embedding as ARRAY instead of JSON |
+| `task_type` | STRING | - | Optimize for specific task (see below) |
+| `output_dimensionality` | INT64 | Model default | Reduce embedding dimensions |
+
+### Task Types
+
+| Task Type | Description | Use Case |
+|-----------|-------------|----------|
+| `RETRIEVAL_QUERY` | Query for semantic search | Search queries |
+| `RETRIEVAL_DOCUMENT` | Document to be searched | Indexing documents |
+| `SEMANTIC_SIMILARITY` | General similarity | Clustering, deduplication |
+| `CLASSIFICATION` | Text classification | ML features |
+| `CLUSTERING` | Grouping similar items | Topic modeling |
+
+## Output Schema
+
+| Column | Type | Description |
+|--------|------|-------------|
+| `ml_generate_embedding_result` | ARRAY | Vector embedding |
+| `ml_generate_embedding_status` | STRING | Error (empty if success) |
+| `ml_generate_embedding_statistics` | JSON | Token counts, truncation info |
+| Original columns | Various | All input columns preserved |
+
+## Examples
+
+### Basic Text Embeddings
+
+```sql
+SELECT
+ id,
+ title,
+ ml_generate_embedding_result AS embedding
+FROM ML.GENERATE_EMBEDDING(
+ MODEL `project.dataset.embedding_model`,
+ (SELECT id, title, body AS content FROM articles)
+)
+WHERE ml_generate_embedding_status = '';
+```
+
+### Store Embeddings in Table
+
+```sql
+CREATE OR REPLACE TABLE `project.dataset.article_embeddings` AS
+SELECT
+ id,
+ title,
+ ml_generate_embedding_result AS embedding,
+ CURRENT_TIMESTAMP() AS embedded_at
+FROM ML.GENERATE_EMBEDDING(
+ MODEL `project.dataset.embedding_model`,
+ (SELECT id, title, body AS content FROM articles)
+)
+WHERE LENGTH(ml_generate_embedding_status) = 0;
+```
+
+### Query vs Document Embeddings
+
+For best search results, use different task types for queries and documents:
+
+```sql
+-- Index documents with RETRIEVAL_DOCUMENT
+INSERT INTO `project.dataset.doc_embeddings`
+SELECT id, content, ml_generate_embedding_result AS embedding
+FROM ML.GENERATE_EMBEDDING(
+ MODEL `project.dataset.embedding_model`,
+ (SELECT id, content FROM new_documents),
+ STRUCT('RETRIEVAL_DOCUMENT' AS task_type)
+);
+
+-- Embed queries with RETRIEVAL_QUERY
+SELECT ml_generate_embedding_result AS query_embedding
+FROM ML.GENERATE_EMBEDDING(
+ MODEL `project.dataset.embedding_model`,
+ (SELECT 'What is machine learning?' AS content),
+ STRUCT('RETRIEVAL_QUERY' AS task_type)
+);
+```
+
+### Multimodal Embeddings (Images)
+
+```sql
+-- Embed images from Cloud Storage
+SELECT
+ image_uri,
+ ml_generate_embedding_result AS embedding
+FROM ML.GENERATE_EMBEDDING(
+ MODEL `project.dataset.multimodal_model`,
+ (SELECT image_uri AS uri FROM image_catalog)
+);
+
+-- Embed text and images together
+SELECT *
+FROM ML.GENERATE_EMBEDDING(
+ MODEL `project.dataset.multimodal_model`,
+ (SELECT
+ product_name AS content,
+ image_gcs_uri AS uri
+ FROM products)
+);
+```
+
+### Reduced Dimensionality
+
+```sql
+-- Generate smaller embeddings for efficiency
+SELECT
+ id,
+ ml_generate_embedding_result AS embedding_256d
+FROM ML.GENERATE_EMBEDDING(
+ MODEL `project.dataset.embedding_model`,
+ (SELECT id, content FROM documents),
+ STRUCT(256 AS output_dimensionality)
+);
+```
+
+## Creating Vector Indexes
+
+For tables with many embeddings, create indexes for faster search:
+
+```sql
+-- IVF index (recommended for most cases)
+CREATE OR REPLACE VECTOR INDEX article_embedding_idx
+ON `project.dataset.article_embeddings`(embedding)
+OPTIONS (
+ index_type = 'IVF',
+ distance_type = 'COSINE',
+ ivf_options = '{"num_lists": 1000}'
+);
+
+-- Tree-AH index (for very large datasets)
+CREATE OR REPLACE VECTOR INDEX large_embedding_idx
+ON `project.dataset.large_embeddings`(embedding)
+OPTIONS (
+ index_type = 'TREE_AH',
+ distance_type = 'DOT_PRODUCT'
+);
+```
+
+### Index Parameters
+
+| Parameter | Options | Recommendation |
+|-----------|---------|----------------|
+| `index_type` | IVF, TREE_AH | IVF for <100M rows, TREE_AH for larger |
+| `distance_type` | COSINE, EUCLIDEAN, DOT_PRODUCT | COSINE for normalized embeddings |
+| `num_lists` | 100-10000 | sqrt(num_rows) as starting point |
+
+## Batch Processing
+
+### Incremental Embedding Generation
+
+```sql
+-- Only embed new documents
+INSERT INTO `project.dataset.embeddings`
+SELECT id, content, ml_generate_embedding_result AS embedding
+FROM ML.GENERATE_EMBEDDING(
+ MODEL `project.dataset.embedding_model`,
+ (SELECT id, content
+ FROM documents d
+ WHERE NOT EXISTS (
+ SELECT 1 FROM `project.dataset.embeddings` e WHERE e.id = d.id
+ ))
+);
+```
+
+### Chunked Processing
+
+```sql
+-- Process in batches for large tables
+DECLARE batch_size INT64 DEFAULT 10000;
+DECLARE offset_val INT64 DEFAULT 0;
+
+LOOP
+ INSERT INTO `project.dataset.embeddings`
+ SELECT id, ml_generate_embedding_result AS embedding
+ FROM ML.GENERATE_EMBEDDING(
+ MODEL `project.dataset.embedding_model`,
+ (SELECT id, content FROM documents LIMIT batch_size OFFSET offset_val)
+ )
+ WHERE ml_generate_embedding_status = '';
+
+ SET offset_val = offset_val + batch_size;
+ IF offset_val >= (SELECT COUNT(*) FROM documents) THEN
+ LEAVE;
+ END IF;
+END LOOP;
+```
+
+## Error Handling
+
+### Common Errors
+
+| Error | Cause | Solution |
+|-------|-------|----------|
+| `content too long` | Text exceeds model limit | Truncate or chunk text |
+| `RESOURCE_EXHAUSTED` | Rate limit | Reduce batch size |
+| `INVALID_ARGUMENT` | Missing content column | Ensure `content` column exists |
+
+### Filter Errors
+
+```sql
+-- Get only successful embeddings
+SELECT * FROM ML.GENERATE_EMBEDDING(...)
+WHERE LENGTH(ml_generate_embedding_status) = 0;
+
+-- Log errors separately
+SELECT id, ml_generate_embedding_status AS error
+FROM ML.GENERATE_EMBEDDING(...)
+WHERE ml_generate_embedding_status != '';
+```
+
+### Handle Long Text
+
+```sql
+-- Truncate to first 5000 characters
+SELECT *
+FROM ML.GENERATE_EMBEDDING(
+ MODEL `project.dataset.embedding_model`,
+ (SELECT id, LEFT(content, 5000) AS content FROM documents)
+);
+
+-- Or chunk into multiple embeddings
+SELECT
+ id,
+ chunk_id,
+ ml_generate_embedding_result AS embedding
+FROM ML.GENERATE_EMBEDDING(
+ MODEL `project.dataset.embedding_model`,
+ (SELECT
+ id,
+ chunk_id,
+ chunk_text AS content
+ FROM document_chunks)
+);
+```
+
+## Best Practices
+
+### 1. Choose the Right Model
+- Use `text-embedding-005` for general text
+- Use `multimodalembedding@001` for images
+- Match model to your language needs
+
+### 2. Use Task Types
+- `RETRIEVAL_DOCUMENT` for indexing
+- `RETRIEVAL_QUERY` for search queries
+- Improves search relevance significantly
+
+### 3. Normalize Content
+- Clean text before embedding
+- Remove excessive whitespace
+- Consider lowercasing for consistency
+
+### 4. Manage Embedding Tables
+- Add metadata columns (created_at, source)
+- Create primary keys for updates
+- Partition by date for large tables
+
+### 5. Monitor Quality
+- Sample and inspect embeddings
+- Test search relevance
+- Compare models if needed
+
+## Cost Optimization
+
+- Embeddings charged per 1000 characters
+- Cache embeddings - don't regenerate
+- Use reduced dimensionality when possible
+- Filter before embedding (not after)
diff --git a/src/google/adk/skills/bigquery-ai/references/RAG_WORKFLOW.md b/src/google/adk/skills/bigquery-ai/references/RAG_WORKFLOW.md
new file mode 100644
index 0000000000..b7d93cffef
--- /dev/null
+++ b/src/google/adk/skills/bigquery-ai/references/RAG_WORKFLOW.md
@@ -0,0 +1,517 @@
+# RAG Workflows in BigQuery
+
+Complete guide to building Retrieval-Augmented Generation (RAG) pipelines in BigQuery SQL.
+
+## Overview
+
+RAG combines semantic search with text generation to create responses grounded in your data:
+
+1. **Embed** - Convert documents and queries to vectors
+2. **Index** - Create vector indexes for fast retrieval
+3. **Retrieve** - Find relevant documents via semantic search
+4. **Generate** - Use retrieved context to generate accurate responses
+
+## Architecture
+
+```
+┌─────────────────────────────────────────────────────────────────────┐
+│ BigQuery RAG Pipeline │
+├─────────────────────────────────────────────────────────────────────┤
+│ │
+│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────────┐ │
+│ │ Documents │───▶│ Embeddings │───▶│ Vector Index │ │
+│ │ (text) │ │ (vectors) │ │ (fast search) │ │
+│ └──────────────┘ └──────────────┘ └──────────────────────┘ │
+│ │ │
+│ ┌──────────────┐ ┌──────────────┐ │ │
+│ │ User Query │───▶│ Query │────────────┘ │
+│ │ │ │ Embedding │ │
+│ └──────────────┘ └──────────────┘ │
+│ │ │
+│ ▼ │
+│ ┌──────────────┐ │
+│ │ VECTOR_SEARCH│ │
+│ │ (retrieve │ │
+│ │ context) │ │
+│ └──────────────┘ │
+│ │ │
+│ ▼ │
+│ ┌──────────────┐ ┌──────────────────────┐ │
+│ │ Context + │───▶│ AI.GENERATE_TEXT │ │
+│ │ Query │ │ (generate answer) │ │
+│ └──────────────┘ └──────────────────────┘ │
+│ │ │
+│ ▼ │
+│ ┌──────────────────────┐ │
+│ │ Grounded Response │ │
+│ └──────────────────────┘ │
+│ │
+└─────────────────────────────────────────────────────────────────────┘
+```
+
+## Step 1: Prepare Knowledge Base
+
+### Create Document Table
+
+```sql
+CREATE OR REPLACE TABLE `project.dataset.knowledge_base` (
+ doc_id STRING,
+ title STRING,
+ content STRING,
+ source STRING,
+ created_at TIMESTAMP,
+ metadata JSON
+);
+
+-- Load documents
+INSERT INTO `project.dataset.knowledge_base`
+SELECT
+ GENERATE_UUID() AS doc_id,
+ title,
+ body AS content,
+ 'internal_docs' AS source,
+ CURRENT_TIMESTAMP() AS created_at,
+ TO_JSON(STRUCT(author, category)) AS metadata
+FROM `project.dataset.source_documents`;
+```
+
+### Chunk Long Documents
+
+```sql
+-- Split documents into chunks for better retrieval
+CREATE OR REPLACE TABLE `project.dataset.document_chunks` AS
+WITH chunks AS (
+ SELECT
+ doc_id,
+ title,
+ chunk_index,
+ TRIM(chunk) AS chunk_text,
+ LENGTH(chunk) AS chunk_length
+ FROM `project.dataset.knowledge_base`,
+ UNNEST(REGEXP_EXTRACT_ALL(content, r'.{1,1000}(?:\s|$)')) AS chunk WITH OFFSET AS chunk_index
+)
+SELECT
+ CONCAT(doc_id, '_', chunk_index) AS chunk_id,
+ doc_id,
+ title,
+ chunk_index,
+ chunk_text,
+ chunk_length
+FROM chunks
+WHERE chunk_length > 50; -- Filter tiny chunks
+```
+
+## Step 2: Generate Embeddings
+
+### Create Embedding Model
+
+```sql
+CREATE OR REPLACE MODEL `project.dataset.embedding_model`
+ REMOTE WITH CONNECTION DEFAULT
+ OPTIONS (ENDPOINT = 'text-embedding-005');
+```
+
+### Embed Documents
+
+```sql
+CREATE OR REPLACE TABLE `project.dataset.kb_embeddings` AS
+SELECT
+ chunk_id,
+ doc_id,
+ title,
+ chunk_text,
+ ml_generate_embedding_result AS embedding
+FROM ML.GENERATE_EMBEDDING(
+ MODEL `project.dataset.embedding_model`,
+ (SELECT chunk_id, doc_id, title, chunk_text AS content
+ FROM `project.dataset.document_chunks`),
+ STRUCT('RETRIEVAL_DOCUMENT' AS task_type)
+)
+WHERE LENGTH(ml_generate_embedding_status) = 0;
+```
+
+## Step 3: Create Vector Index
+
+```sql
+CREATE OR REPLACE VECTOR INDEX kb_embedding_idx
+ON `project.dataset.kb_embeddings`(embedding)
+OPTIONS (
+ index_type = 'IVF',
+ distance_type = 'COSINE',
+ ivf_options = '{"num_lists": 500}'
+);
+```
+
+## Step 4: Create Generation Model
+
+```sql
+CREATE OR REPLACE MODEL `project.dataset.gemini_rag`
+ REMOTE WITH CONNECTION DEFAULT
+ OPTIONS (ENDPOINT = 'gemini-2.0-flash');
+```
+
+## Step 5: Build RAG Query
+
+### Basic RAG Query
+
+```sql
+DECLARE user_query STRING DEFAULT 'What is the refund policy?';
+
+WITH query_embedding AS (
+ SELECT ml_generate_embedding_result AS embedding
+ FROM ML.GENERATE_EMBEDDING(
+ MODEL `project.dataset.embedding_model`,
+ (SELECT user_query AS content),
+ STRUCT('RETRIEVAL_QUERY' AS task_type)
+ )
+),
+retrieved_context AS (
+ SELECT
+ base.chunk_id,
+ base.title,
+ base.chunk_text,
+ distance
+ FROM VECTOR_SEARCH(
+ TABLE `project.dataset.kb_embeddings`,
+ 'embedding',
+ TABLE query_embedding,
+ top_k => 5,
+ distance_type => 'COSINE'
+ )
+ ORDER BY distance
+),
+context_string AS (
+ SELECT STRING_AGG(
+ CONCAT('Source: ', title, '\n', chunk_text),
+ '\n\n---\n\n'
+ ) AS context
+ FROM retrieved_context
+)
+SELECT
+ user_query AS question,
+ JSON_VALUE(ml_generate_text_result, '$.predictions[0].content') AS answer,
+ (SELECT ARRAY_AGG(STRUCT(title, chunk_text, distance))
+ FROM retrieved_context) AS sources
+FROM ML.GENERATE_TEXT(
+ MODEL `project.dataset.gemini_rag`,
+ (SELECT CONCAT(
+ 'Answer the question based ONLY on the following context. ',
+ 'If the answer is not in the context, say "I don\'t have information about that."\n\n',
+ 'Context:\n', context, '\n\n',
+ 'Question: ', user_query, '\n\n',
+ 'Answer:'
+ ) AS prompt FROM context_string),
+ STRUCT(512 AS max_output_tokens, 0.1 AS temperature)
+);
+```
+
+### RAG as Stored Procedure
+
+```sql
+CREATE OR REPLACE PROCEDURE `project.dataset.ask_knowledge_base`(
+ IN user_query STRING,
+ IN num_sources INT64,
+ OUT answer STRING,
+ OUT sources ARRAY>
+)
+BEGIN
+ DECLARE query_emb ARRAY;
+
+ -- Get query embedding
+ SET query_emb = (
+ SELECT ml_generate_embedding_result
+ FROM ML.GENERATE_EMBEDDING(
+ MODEL `project.dataset.embedding_model`,
+ (SELECT user_query AS content),
+ STRUCT('RETRIEVAL_QUERY' AS task_type)
+ )
+ );
+
+ -- Retrieve and generate
+ SET (answer, sources) = (
+ WITH retrieved AS (
+ SELECT base.title, base.chunk_text, distance
+ FROM VECTOR_SEARCH(
+ TABLE `project.dataset.kb_embeddings`,
+ 'embedding',
+ (SELECT query_emb AS embedding),
+ top_k => num_sources
+ )
+ ),
+ context AS (
+ SELECT STRING_AGG(chunk_text, '\n\n') AS ctx,
+ ARRAY_AGG(STRUCT(title, LEFT(chunk_text, 200) AS excerpt,
+ 1.0 - distance AS relevance)) AS src
+ FROM retrieved
+ )
+ SELECT
+ JSON_VALUE(r.ml_generate_text_result, '$.predictions[0].content'),
+ c.src
+ FROM context c,
+ ML.GENERATE_TEXT(
+ MODEL `project.dataset.gemini_rag`,
+ (SELECT CONCAT('Context:\n', c.ctx, '\n\nQuestion: ', user_query) AS prompt),
+ STRUCT(512 AS max_output_tokens)
+ ) r
+ );
+END;
+
+-- Usage
+DECLARE answer STRING;
+DECLARE sources ARRAY>;
+CALL `project.dataset.ask_knowledge_base`('What is the return policy?', 5, answer, sources);
+SELECT answer, sources;
+```
+
+## Advanced Patterns
+
+### Hybrid Search (Vector + Keyword)
+
+```sql
+WITH semantic_results AS (
+ SELECT base.chunk_id, base.chunk_text, distance,
+ 1.0 / (1.0 + distance) AS semantic_score
+ FROM VECTOR_SEARCH(
+ TABLE `project.dataset.kb_embeddings`,
+ 'embedding',
+ TABLE query_embedding,
+ top_k => 20
+ )
+),
+keyword_results AS (
+ SELECT chunk_id, chunk_text, search_score
+ FROM `project.dataset.document_chunks`
+ WHERE SEARCH(chunk_text, @query)
+ ORDER BY search_score DESC
+ LIMIT 20
+),
+combined AS (
+ SELECT
+ COALESCE(s.chunk_id, k.chunk_id) AS chunk_id,
+ COALESCE(s.chunk_text, k.chunk_text) AS chunk_text,
+ COALESCE(s.semantic_score, 0) * 0.7 +
+ COALESCE(k.search_score, 0) * 0.3 AS combined_score
+ FROM semantic_results s
+ FULL OUTER JOIN keyword_results k ON s.chunk_id = k.chunk_id
+ ORDER BY combined_score DESC
+ LIMIT 5
+)
+SELECT STRING_AGG(chunk_text, '\n\n') AS context
+FROM combined;
+```
+
+### Multi-Query RAG
+
+```sql
+-- Generate multiple search queries for better coverage
+WITH expanded_queries AS (
+ SELECT query
+ FROM UNNEST([
+ @original_query,
+ -- LLM-generated query variations
+ (SELECT JSON_VALUE(ml_generate_text_result, '$.predictions[0].content')
+ FROM ML.GENERATE_TEXT(
+ MODEL `project.dataset.gemini_rag`,
+ (SELECT CONCAT('Rephrase this query: ', @original_query) AS prompt)
+ ))
+ ]) AS query
+),
+all_embeddings AS (
+ SELECT query, ml_generate_embedding_result AS embedding
+ FROM ML.GENERATE_EMBEDDING(
+ MODEL `project.dataset.embedding_model`,
+ (SELECT query AS content FROM expanded_queries)
+ )
+),
+all_results AS (
+ SELECT DISTINCT base.chunk_id, base.chunk_text, MIN(distance) AS best_distance
+ FROM VECTOR_SEARCH(
+ TABLE `project.dataset.kb_embeddings`,
+ 'embedding',
+ TABLE all_embeddings,
+ top_k => 5
+ )
+ GROUP BY chunk_id, chunk_text
+)
+SELECT * FROM all_results ORDER BY best_distance LIMIT 5;
+```
+
+### Conversational RAG
+
+```sql
+CREATE OR REPLACE TABLE `project.dataset.chat_history` (
+ session_id STRING,
+ turn_id INT64,
+ role STRING, -- 'user' or 'assistant'
+ content STRING,
+ timestamp TIMESTAMP
+);
+
+-- Include chat history in context
+WITH recent_history AS (
+ SELECT STRING_AGG(
+ CONCAT(role, ': ', content),
+ '\n'
+ ORDER BY turn_id
+ ) AS history
+ FROM `project.dataset.chat_history`
+ WHERE session_id = @session_id
+ AND turn_id >= (SELECT MAX(turn_id) - 5 FROM `project.dataset.chat_history`
+ WHERE session_id = @session_id)
+),
+-- ... rest of RAG pipeline
+SELECT
+ JSON_VALUE(ml_generate_text_result, '$.predictions[0].content') AS response
+FROM ML.GENERATE_TEXT(
+ MODEL `project.dataset.gemini_rag`,
+ (SELECT CONCAT(
+ 'Chat history:\n', history, '\n\n',
+ 'Context:\n', context, '\n\n',
+ 'User: ', @user_query, '\n',
+ 'Assistant:'
+ ) AS prompt
+ FROM recent_history, context_table)
+);
+```
+
+### RAG with Source Citations
+
+```sql
+SELECT
+ JSON_VALUE(ml_generate_text_result, '$.predictions[0].content') AS answer
+FROM ML.GENERATE_TEXT(
+ MODEL `project.dataset.gemini_rag`,
+ (SELECT CONCAT(
+ 'Answer the question using the sources below. ',
+ 'Cite sources using [1], [2], etc.\n\n',
+ (SELECT STRING_AGG(
+ CONCAT('[', chunk_index + 1, '] ', chunk_text),
+ '\n\n'
+ ) FROM retrieved_context),
+ '\n\nQuestion: ', @query,
+ '\n\nAnswer with citations:'
+ ) AS prompt),
+ STRUCT(512 AS max_output_tokens, 0.2 AS temperature)
+);
+```
+
+## Incremental Updates
+
+### Add New Documents
+
+```sql
+-- 1. Insert new documents
+INSERT INTO `project.dataset.knowledge_base` ...;
+
+-- 2. Chunk new documents
+INSERT INTO `project.dataset.document_chunks`
+SELECT ... FROM new_documents;
+
+-- 3. Generate embeddings for new chunks
+INSERT INTO `project.dataset.kb_embeddings`
+SELECT chunk_id, doc_id, title, chunk_text, ml_generate_embedding_result
+FROM ML.GENERATE_EMBEDDING(
+ MODEL `project.dataset.embedding_model`,
+ (SELECT * FROM `project.dataset.document_chunks`
+ WHERE chunk_id NOT IN (SELECT chunk_id FROM `project.dataset.kb_embeddings`))
+);
+
+-- Vector index updates automatically
+```
+
+### Delete Documents
+
+```sql
+-- Delete embeddings
+DELETE FROM `project.dataset.kb_embeddings`
+WHERE doc_id = @doc_to_delete;
+
+-- Delete chunks
+DELETE FROM `project.dataset.document_chunks`
+WHERE doc_id = @doc_to_delete;
+
+-- Delete source document
+DELETE FROM `project.dataset.knowledge_base`
+WHERE doc_id = @doc_to_delete;
+```
+
+## Performance Optimization
+
+### 1. Pre-compute Common Queries
+
+```sql
+CREATE OR REPLACE TABLE `project.dataset.query_cache` AS
+SELECT
+ query_text,
+ answer,
+ sources,
+ CURRENT_TIMESTAMP() AS cached_at
+FROM common_queries_with_answers;
+```
+
+### 2. Filter Before Search
+
+```sql
+-- Restrict search to relevant category
+FROM VECTOR_SEARCH(
+ (SELECT * FROM `project.dataset.kb_embeddings`
+ WHERE category = @user_category),
+ 'embedding',
+ ...
+)
+```
+
+### 3. Tune Retrieval Parameters
+
+- Start with `top_k = 5`
+- Increase if answers miss context
+- Decrease if context is too noisy
+
+### 4. Optimize Chunk Size
+
+- 500-1000 characters typically optimal
+- Larger for dense technical content
+- Smaller for diverse Q&A
+
+## Monitoring
+
+### Track RAG Quality
+
+```sql
+CREATE OR REPLACE TABLE `project.dataset.rag_feedback` (
+ query_id STRING,
+ query_text STRING,
+ answer STRING,
+ sources ARRAY,
+ user_rating INT64, -- 1-5
+ feedback_text STRING,
+ timestamp TIMESTAMP
+);
+
+-- Analyze low-rated responses
+SELECT
+ query_text,
+ answer,
+ user_rating,
+ feedback_text
+FROM `project.dataset.rag_feedback`
+WHERE user_rating <= 2
+ORDER BY timestamp DESC;
+```
+
+### Monitor Retrieval Quality
+
+```sql
+-- Check average retrieval distance
+SELECT
+ DATE(timestamp) AS date,
+ AVG(min_distance) AS avg_top_distance,
+ COUNT(*) AS query_count
+FROM (
+ SELECT timestamp, MIN(distance) AS min_distance
+ FROM query_logs
+ GROUP BY query_id, timestamp
+)
+GROUP BY date
+ORDER BY date DESC;
+```
diff --git a/src/google/adk/skills/bigquery-ai/references/REMOTE_MODELS.md b/src/google/adk/skills/bigquery-ai/references/REMOTE_MODELS.md
new file mode 100644
index 0000000000..021aeabfee
--- /dev/null
+++ b/src/google/adk/skills/bigquery-ai/references/REMOTE_MODELS.md
@@ -0,0 +1,393 @@
+# Remote Models in BigQuery
+
+Complete guide to creating and managing connections to LLMs and AI services.
+
+## Overview
+
+Remote models connect BigQuery to external AI services:
+- Google Gemini models
+- Partner models (Claude, Llama, Mistral)
+- Vertex AI endpoints
+- Cloud AI services
+
+## CREATE MODEL Syntax
+
+```sql
+CREATE [ OR REPLACE ] MODEL `project.dataset.model_name`
+ REMOTE WITH CONNECTION `project.region.connection_name`
+ OPTIONS (
+ ENDPOINT = 'model_endpoint',
+ [additional_options]
+ );
+```
+
+## Prerequisites
+
+### 1. Create a Cloud Resource Connection
+
+```sql
+-- Using BigQuery UI or bq command:
+bq mk --connection \
+ --connection_type=CLOUD_RESOURCE \
+ --location=US \
+ --project_id=my_project \
+ my_connection
+```
+
+Or via API:
+```bash
+gcloud bigquery connections create my_connection \
+ --connection-type=CLOUD_RESOURCE \
+ --location=US
+```
+
+### 2. Grant IAM Permissions
+
+The connection's service account needs roles:
+- `roles/aiplatform.user` - For Vertex AI models
+- `roles/bigquery.connectionUser` - For BigQuery access
+
+```bash
+# Get service account
+bq show --connection --location=US my_connection
+
+# Grant permission
+gcloud projects add-iam-policy-binding my_project \
+ --member="serviceAccount:bqcx-xxx@gcp-sa-bigquery-condel.iam.gserviceaccount.com" \
+ --role="roles/aiplatform.user"
+```
+
+## Google Models
+
+### Gemini Models
+
+```sql
+-- Gemini 2.0 Flash (recommended for speed)
+CREATE OR REPLACE MODEL `project.dataset.gemini_flash`
+ REMOTE WITH CONNECTION `project.us.connection`
+ OPTIONS (ENDPOINT = 'gemini-2.0-flash');
+
+-- Gemini 1.5 Pro (higher quality)
+CREATE OR REPLACE MODEL `project.dataset.gemini_pro`
+ REMOTE WITH CONNECTION `project.us.connection`
+ OPTIONS (ENDPOINT = 'gemini-1.5-pro');
+
+-- Gemini 1.5 Flash
+CREATE OR REPLACE MODEL `project.dataset.gemini_15_flash`
+ REMOTE WITH CONNECTION `project.us.connection`
+ OPTIONS (ENDPOINT = 'gemini-1.5-flash');
+```
+
+### Embedding Models
+
+```sql
+-- Text Embedding (recommended)
+CREATE OR REPLACE MODEL `project.dataset.text_embedding`
+ REMOTE WITH CONNECTION DEFAULT
+ OPTIONS (ENDPOINT = 'text-embedding-005');
+
+-- Previous generation
+CREATE OR REPLACE MODEL `project.dataset.text_embedding_004`
+ REMOTE WITH CONNECTION DEFAULT
+ OPTIONS (ENDPOINT = 'text-embedding-004');
+
+-- Multilingual
+CREATE OR REPLACE MODEL `project.dataset.multilingual_embedding`
+ REMOTE WITH CONNECTION DEFAULT
+ OPTIONS (ENDPOINT = 'text-multilingual-embedding-002');
+
+-- Multimodal (text + images)
+CREATE OR REPLACE MODEL `project.dataset.multimodal_embedding`
+ REMOTE WITH CONNECTION DEFAULT
+ OPTIONS (ENDPOINT = 'multimodalembedding@001');
+```
+
+## Partner Models
+
+### Anthropic Claude
+
+```sql
+-- Claude 3.5 Sonnet
+CREATE OR REPLACE MODEL `project.dataset.claude_sonnet`
+ REMOTE WITH CONNECTION `project.us.connection`
+ OPTIONS (
+ ENDPOINT = 'claude-3-5-sonnet@20241022'
+ );
+
+-- Claude 3 Opus (highest quality)
+CREATE OR REPLACE MODEL `project.dataset.claude_opus`
+ REMOTE WITH CONNECTION `project.us.connection`
+ OPTIONS (
+ ENDPOINT = 'claude-3-opus@20240229'
+ );
+
+-- Claude 3 Haiku (fast)
+CREATE OR REPLACE MODEL `project.dataset.claude_haiku`
+ REMOTE WITH CONNECTION `project.us.connection`
+ OPTIONS (
+ ENDPOINT = 'claude-3-haiku@20240307'
+ );
+```
+
+### Meta Llama
+
+```sql
+-- Llama 3.1 405B
+CREATE OR REPLACE MODEL `project.dataset.llama_405b`
+ REMOTE WITH CONNECTION `project.us.connection`
+ OPTIONS (
+ ENDPOINT = 'llama-3.1-405b-instruct-maas'
+ );
+
+-- Llama 3.1 70B
+CREATE OR REPLACE MODEL `project.dataset.llama_70b`
+ REMOTE WITH CONNECTION `project.us.connection`
+ OPTIONS (
+ ENDPOINT = 'llama-3.1-70b-instruct-maas'
+ );
+
+-- Llama 3.2 90B Vision
+CREATE OR REPLACE MODEL `project.dataset.llama_vision`
+ REMOTE WITH CONNECTION `project.us.connection`
+ OPTIONS (
+ ENDPOINT = 'llama-3.2-90b-vision-instruct-maas'
+ );
+```
+
+### Mistral AI
+
+```sql
+-- Mistral Large
+CREATE OR REPLACE MODEL `project.dataset.mistral_large`
+ REMOTE WITH CONNECTION `project.us.connection`
+ OPTIONS (
+ ENDPOINT = 'mistral-large@2411'
+ );
+
+-- Mistral Nemo
+CREATE OR REPLACE MODEL `project.dataset.mistral_nemo`
+ REMOTE WITH CONNECTION `project.us.connection`
+ OPTIONS (
+ ENDPOINT = 'mistral-nemo@2407'
+ );
+
+-- Codestral (code generation)
+CREATE OR REPLACE MODEL `project.dataset.codestral`
+ REMOTE WITH CONNECTION `project.us.connection`
+ OPTIONS (
+ ENDPOINT = 'codestral@2405'
+ );
+```
+
+## Custom Vertex AI Endpoints
+
+Connect to your own deployed models:
+
+```sql
+CREATE OR REPLACE MODEL `project.dataset.custom_model`
+ REMOTE WITH CONNECTION `project.us.connection`
+ OPTIONS (
+ ENDPOINT = 'https://us-central1-aiplatform.googleapis.com/v1/projects/my-project/locations/us-central1/endpoints/1234567890'
+ );
+```
+
+## Cloud AI Services
+
+### Translation
+
+```sql
+CREATE OR REPLACE MODEL `project.dataset.translate`
+ REMOTE WITH CONNECTION `project.us.connection`
+ OPTIONS (
+ ENDPOINT = 'cloud_ai_translate_v3'
+ );
+
+-- Usage
+SELECT *
+FROM ML.TRANSLATE(
+ MODEL `project.dataset.translate`,
+ (SELECT text AS text_to_translate FROM documents),
+ STRUCT('es' AS target_language_code)
+);
+```
+
+### Natural Language
+
+```sql
+CREATE OR REPLACE MODEL `project.dataset.nlp`
+ REMOTE WITH CONNECTION `project.us.connection`
+ OPTIONS (
+ ENDPOINT = 'cloud_ai_natural_language_v1'
+ );
+
+-- Usage (sentiment, entities, syntax)
+SELECT *
+FROM ML.UNDERSTAND_TEXT(
+ MODEL `project.dataset.nlp`,
+ (SELECT text AS text_content FROM reviews),
+ STRUCT('ANALYZE_SENTIMENT' AS nlp_task)
+);
+```
+
+### Vision
+
+```sql
+CREATE OR REPLACE MODEL `project.dataset.vision`
+ REMOTE WITH CONNECTION `project.us.connection`
+ OPTIONS (
+ ENDPOINT = 'cloud_ai_vision_v1'
+ );
+
+-- Usage
+SELECT *
+FROM ML.ANNOTATE_IMAGE(
+ MODEL `project.dataset.vision`,
+ (SELECT uri FROM images),
+ STRUCT(['LABEL_DETECTION', 'TEXT_DETECTION'] AS vision_features)
+);
+```
+
+### Document AI
+
+```sql
+CREATE OR REPLACE MODEL `project.dataset.document_ai`
+ REMOTE WITH CONNECTION `project.us.connection`
+ OPTIONS (
+ ENDPOINT = 'cloud_ai_document_v1',
+ DOCUMENT_PROCESSOR = 'projects/my-project/locations/us/processors/abc123'
+ );
+
+-- Usage
+SELECT *
+FROM ML.PROCESS_DOCUMENT(
+ MODEL `project.dataset.document_ai`,
+ (SELECT uri FROM pdfs)
+);
+```
+
+### Speech-to-Text
+
+```sql
+CREATE OR REPLACE MODEL `project.dataset.speech`
+ REMOTE WITH CONNECTION `project.us.connection`
+ OPTIONS (
+ ENDPOINT = 'cloud_ai_speech_v2'
+ );
+
+-- Usage
+SELECT *
+FROM ML.TRANSCRIBE(
+ MODEL `project.dataset.speech`,
+ (SELECT uri FROM audio_files),
+ STRUCT('en-US' AS language_code)
+);
+```
+
+## Model Management
+
+### List Models
+
+```sql
+SELECT
+ model_name,
+ model_type,
+ creation_time,
+ training_runs[SAFE_OFFSET(0)].training_options.model_type AS model_subtype
+FROM `project.dataset.INFORMATION_SCHEMA.MODELS`;
+```
+
+### Get Model Details
+
+```sql
+SELECT *
+FROM ML.MODEL_INFO(MODEL `project.dataset.my_model`);
+```
+
+### Drop Model
+
+```sql
+DROP MODEL IF EXISTS `project.dataset.my_model`;
+```
+
+## Model Comparison
+
+| Model | Provider | Speed | Quality | Cost | Best For |
+|-------|----------|-------|---------|------|----------|
+| gemini-2.0-flash | Google | Fast | Good | Low | High volume |
+| gemini-1.5-pro | Google | Medium | High | Medium | Complex tasks |
+| claude-3.5-sonnet | Anthropic | Medium | High | Medium | Reasoning |
+| claude-3-opus | Anthropic | Slow | Highest | High | Critical tasks |
+| llama-3.1-405b | Meta | Slow | High | Medium | Open source |
+| mistral-large | Mistral | Medium | High | Medium | EU compliance |
+
+## Regional Availability
+
+| Model | US | EU | Asia |
+|-------|----|----|------|
+| Gemini | Yes | Yes | Yes |
+| Claude | Yes | Yes | Limited |
+| Llama | Yes | Yes | Limited |
+| Mistral | Yes | Yes | Limited |
+
+## Best Practices
+
+### 1. Use Connection Defaults
+
+```sql
+-- Simplest syntax with DEFAULT connection
+CREATE MODEL `project.dataset.model`
+ REMOTE WITH CONNECTION DEFAULT
+ OPTIONS (ENDPOINT = 'gemini-2.0-flash');
+```
+
+### 2. Version Your Models
+
+```sql
+-- Include version in name
+CREATE MODEL `project.dataset.gemini_20_flash_v1`
+ ...
+```
+
+### 3. Create Per-Use-Case Models
+
+```sql
+-- Different models for different tasks
+CREATE MODEL `project.dataset.summarizer` ...
+CREATE MODEL `project.dataset.classifier` ...
+CREATE MODEL `project.dataset.embedder` ...
+```
+
+### 4. Test Before Production
+
+```sql
+-- Test with small sample
+SELECT * FROM ML.GENERATE_TEXT(
+ MODEL `project.dataset.new_model`,
+ (SELECT prompt FROM test_prompts LIMIT 10)
+);
+```
+
+## Troubleshooting
+
+### Connection Issues
+
+```sql
+-- Verify connection exists
+SELECT * FROM `project.INFORMATION_SCHEMA.CONNECTIONS`;
+
+-- Check service account permissions
+SELECT * FROM `project.INFORMATION_SCHEMA.OBJECT_PRIVILEGES`
+WHERE grantee LIKE '%bqcx%';
+```
+
+### Model Not Found
+
+- Verify endpoint spelling
+- Check regional availability
+- Ensure connection has proper IAM roles
+
+### Rate Limits
+
+- Use appropriate request_type
+- Implement retry logic
+- Consider dedicated capacity
diff --git a/src/google/adk/skills/bigquery-ai/references/TEXT_GENERATION.md b/src/google/adk/skills/bigquery-ai/references/TEXT_GENERATION.md
new file mode 100644
index 0000000000..fc59cab079
--- /dev/null
+++ b/src/google/adk/skills/bigquery-ai/references/TEXT_GENERATION.md
@@ -0,0 +1,291 @@
+# Text Generation in BigQuery
+
+Complete guide to generating text using AI.GENERATE_TEXT and ML.GENERATE_TEXT functions.
+
+## Function Overview
+
+BigQuery provides two primary functions for text generation:
+
+| Function | Description | Use Case |
+|----------|-------------|----------|
+| `AI.GENERATE_TEXT` | Table function with full parameter control | Complex generation tasks |
+| `ML.GENERATE_TEXT` | Scalar function for simpler use | Single-row or inline generation |
+
+## AI.GENERATE_TEXT Syntax
+
+```sql
+SELECT *
+FROM AI.GENERATE_TEXT(
+ MODEL `project.dataset.model_name`,
+ { TABLE source_table | (SELECT query) },
+ STRUCT(
+ max_output_tokens AS max_output_tokens,
+ temperature AS temperature,
+ top_p AS top_p,
+ top_k AS top_k,
+ stop_sequences AS stop_sequences,
+ ground_with_google_search AS ground_with_google_search,
+ safety_settings AS safety_settings,
+ request_type AS request_type
+ )
+);
+```
+
+## Parameters Reference
+
+### Generation Parameters
+
+| Parameter | Type | Range | Default | Description |
+|-----------|------|-------|---------|-------------|
+| `max_output_tokens` | INT64 | 1-8192 | 128 | Maximum tokens in response |
+| `temperature` | FLOAT64 | 0.0-2.0 | 0.0 | Randomness (0=deterministic, higher=creative) |
+| `top_p` | FLOAT64 | 0.0-1.0 | 0.95 | Nucleus sampling probability |
+| `top_k` | INT64 | 1-40 | 40 | Top-k token selection |
+| `stop_sequences` | ARRAY | - | [] | Strings that stop generation |
+
+### Advanced Parameters
+
+| Parameter | Type | Default | Description |
+|-----------|------|---------|-------------|
+| `ground_with_google_search` | BOOL | FALSE | Enable web grounding for factual responses |
+| `request_type` | STRING | UNSPECIFIED | Quota type: DEDICATED, SHARED, UNSPECIFIED |
+| `safety_settings` | ARRAY | - | Content filtering configuration |
+
+### Safety Settings
+
+Configure content filtering with category and threshold pairs:
+
+```sql
+STRUCT(
+ [STRUCT('HARM_CATEGORY_HATE_SPEECH' AS category, 'BLOCK_LOW_AND_ABOVE' AS threshold),
+ STRUCT('HARM_CATEGORY_DANGEROUS_CONTENT' AS category, 'BLOCK_MEDIUM_AND_ABOVE' AS threshold)]
+ AS safety_settings
+)
+```
+
+**Harm Categories:**
+- `HARM_CATEGORY_HATE_SPEECH`
+- `HARM_CATEGORY_DANGEROUS_CONTENT`
+- `HARM_CATEGORY_HARASSMENT`
+- `HARM_CATEGORY_SEXUALLY_EXPLICIT`
+
+**Thresholds:**
+- `BLOCK_NONE` (requires allowlisting)
+- `BLOCK_LOW_AND_ABOVE`
+- `BLOCK_MEDIUM_AND_ABOVE` (default)
+- `BLOCK_ONLY_HIGH`
+
+## Output Schema
+
+The function returns these columns:
+
+| Column | Type | Description |
+|--------|------|-------------|
+| `ml_generate_text_result` | JSON | Generated text and metadata |
+| `ml_generate_text_status` | STRING | Error message (empty if success) |
+| Original columns | Various | All columns from input table |
+
+### Parsing Results
+
+```sql
+SELECT
+ JSON_VALUE(ml_generate_text_result, '$.predictions[0].content') AS generated_text,
+ JSON_VALUE(ml_generate_text_result, '$.predictions[0].safetyAttributes.blocked') AS was_blocked,
+ CAST(JSON_VALUE(ml_generate_text_result, '$.tokenMetadata.outputTokenCount.totalTokens') AS INT64) AS output_tokens
+FROM ML.GENERATE_TEXT(...);
+```
+
+## Examples
+
+### Basic Text Generation
+
+```sql
+SELECT
+ id,
+ JSON_VALUE(ml_generate_text_result, '$.predictions[0].content') AS summary
+FROM ML.GENERATE_TEXT(
+ MODEL `myproject.mydataset.gemini_model`,
+ (SELECT id, CONCAT('Summarize in 2 sentences: ', article_text) AS prompt
+ FROM `myproject.mydataset.articles`
+ WHERE date = CURRENT_DATE()),
+ STRUCT(150 AS max_output_tokens, 0.3 AS temperature)
+);
+```
+
+### Creative Writing with Higher Temperature
+
+```sql
+SELECT *
+FROM AI.GENERATE_TEXT(
+ MODEL `myproject.mydataset.gemini_pro`,
+ (SELECT CONCAT('Write a creative tagline for: ', product_name) AS prompt
+ FROM products),
+ STRUCT(
+ 50 AS max_output_tokens,
+ 0.9 AS temperature,
+ 0.95 AS top_p
+ )
+);
+```
+
+### Factual Q&A with Web Grounding
+
+```sql
+SELECT
+ question,
+ JSON_VALUE(ml_generate_text_result, '$.predictions[0].content') AS answer
+FROM ML.GENERATE_TEXT(
+ MODEL `myproject.mydataset.gemini_model`,
+ (SELECT question, CONCAT('Answer factually: ', question) AS prompt
+ FROM questions),
+ STRUCT(
+ 256 AS max_output_tokens,
+ 0.0 AS temperature,
+ TRUE AS ground_with_google_search
+ )
+);
+```
+
+### Multi-turn Conversation
+
+```sql
+SELECT *
+FROM AI.GENERATE_TEXT(
+ MODEL `myproject.mydataset.gemini_model`,
+ (SELECT
+ CONCAT(
+ 'Previous conversation:\n',
+ conversation_history,
+ '\n\nUser: ', user_message,
+ '\n\nAssistant:'
+ ) AS prompt
+ FROM conversations),
+ STRUCT(512 AS max_output_tokens, 0.7 AS temperature)
+);
+```
+
+### Structured Output (JSON)
+
+```sql
+SELECT
+ id,
+ JSON_VALUE(ml_generate_text_result, '$.predictions[0].content') AS extracted_json
+FROM ML.GENERATE_TEXT(
+ MODEL `myproject.mydataset.gemini_model`,
+ (SELECT id,
+ CONCAT(
+ 'Extract entities as JSON with keys: name, date, amount\n\n',
+ 'Text: ', document_text,
+ '\n\nJSON:'
+ ) AS prompt
+ FROM documents),
+ STRUCT(
+ 200 AS max_output_tokens,
+ 0.0 AS temperature,
+ ['```'] AS stop_sequences
+ )
+);
+```
+
+### Batch Classification
+
+```sql
+SELECT
+ id,
+ content,
+ TRIM(JSON_VALUE(ml_generate_text_result, '$.predictions[0].content')) AS category
+FROM ML.GENERATE_TEXT(
+ MODEL `myproject.mydataset.gemini_flash`,
+ (SELECT id, content,
+ CONCAT(
+ 'Classify the following text into exactly one category: ',
+ 'Technology, Sports, Politics, Entertainment, Business\n\n',
+ 'Text: ', content, '\n\nCategory:'
+ ) AS prompt
+ FROM articles
+ WHERE published_date > DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY)),
+ STRUCT(10 AS max_output_tokens, 0.0 AS temperature)
+);
+```
+
+## Error Handling
+
+### Check for Errors
+
+```sql
+SELECT
+ id,
+ CASE
+ WHEN ml_generate_text_status != '' THEN CONCAT('ERROR: ', ml_generate_text_status)
+ ELSE JSON_VALUE(ml_generate_text_result, '$.predictions[0].content')
+ END AS result
+FROM ML.GENERATE_TEXT(...);
+```
+
+### Filter Successful Results
+
+```sql
+SELECT *
+FROM ML.GENERATE_TEXT(...)
+WHERE ml_generate_text_status = '';
+```
+
+### Common Errors
+
+| Error | Cause | Solution |
+|-------|-------|----------|
+| `RESOURCE_EXHAUSTED` | Rate limit exceeded | Reduce batch size, add delays |
+| `INVALID_ARGUMENT` | Bad prompt or parameters | Check prompt format, parameter ranges |
+| `PERMISSION_DENIED` | Missing IAM roles | Grant `aiplatform.user` role |
+| `BLOCKED` | Safety filter triggered | Adjust safety settings or modify prompt |
+
+## Performance Optimization
+
+### Batch Processing
+
+```sql
+-- Process in batches of 1000
+SELECT * FROM ML.GENERATE_TEXT(
+ MODEL `myproject.mydataset.gemini_flash`,
+ (SELECT * FROM source_table LIMIT 1000 OFFSET @batch_offset),
+ STRUCT(100 AS max_output_tokens)
+);
+```
+
+### Use Appropriate Model
+
+| Model | Speed | Quality | Cost | Best For |
+|-------|-------|---------|------|----------|
+| Gemini Flash | Fast | Good | Low | High-volume, simple tasks |
+| Gemini Pro | Medium | High | Medium | Complex reasoning |
+| Gemini Ultra | Slow | Highest | High | Research, critical tasks |
+
+### Minimize Token Usage
+
+1. Keep prompts concise
+2. Set appropriate `max_output_tokens`
+3. Use `stop_sequences` to cut off unnecessary output
+4. Filter data before calling AI functions
+
+## Cost Considerations
+
+- Charged per 1000 characters (input + output)
+- Different rates for different models
+- Web grounding adds additional cost
+- Monitor usage in Cloud Console > BigQuery > Quotas
+
+## Supported Models
+
+### Google Models
+- `gemini-2.0-flash` - Latest, fastest
+- `gemini-1.5-pro` - High quality
+- `gemini-1.5-flash` - Balanced
+- `gemini-1.0-pro` - Legacy
+
+### Partner Models
+- `claude-3-5-sonnet` - Anthropic
+- `claude-3-opus` - Anthropic (highest quality)
+- `llama-3.1-405b` - Meta
+- `mistral-large` - Mistral AI
+
+See `REMOTE_MODELS.md` for model connection setup.
diff --git a/src/google/adk/skills/bigquery-ai/references/VECTOR_SEARCH.md b/src/google/adk/skills/bigquery-ai/references/VECTOR_SEARCH.md
new file mode 100644
index 0000000000..135192cffe
--- /dev/null
+++ b/src/google/adk/skills/bigquery-ai/references/VECTOR_SEARCH.md
@@ -0,0 +1,389 @@
+# Vector Search in BigQuery
+
+Complete guide to semantic search using VECTOR_SEARCH function.
+
+## Overview
+
+VECTOR_SEARCH finds the most similar items to a query based on vector embeddings. It supports:
+- Exact search (brute force)
+- Approximate nearest neighbor (ANN) with vector indexes
+- Multiple distance metrics
+- Top-k retrieval
+
+## VECTOR_SEARCH Syntax
+
+```sql
+SELECT *
+FROM VECTOR_SEARCH(
+ TABLE `project.dataset.base_table`,
+ 'embedding_column',
+ { TABLE query_table | (SELECT query) },
+ top_k => k,
+ distance_type => 'COSINE',
+ options => '{"option": value}'
+);
+```
+
+## Parameters
+
+| Parameter | Required | Type | Description |
+|-----------|----------|------|-------------|
+| `TABLE` | Yes | Table reference | Table containing embeddings to search |
+| `embedding_column` | Yes | STRING | Column name with vector embeddings |
+| `query_table` | Yes | Table/Subquery | Query embeddings to find matches for |
+| `top_k` | No | INT64 | Number of results per query (default: 10) |
+| `distance_type` | No | STRING | Distance metric (default: COSINE) |
+| `options` | No | JSON STRING | Additional configuration |
+
+### Distance Types
+
+| Type | Formula | Best For | Range |
+|------|---------|----------|-------|
+| `COSINE` | 1 - cos(a,b) | Normalized embeddings | [0, 2] |
+| `EUCLIDEAN` | ||a-b||_2 | Absolute distances | [0, inf] |
+| `DOT_PRODUCT` | -a·b | Pre-normalized, high performance | (-inf, inf) |
+
+### Options
+
+```sql
+options => '{
+ "fraction_lists_to_search": 0.01,
+ "use_brute_force": false
+}'
+```
+
+| Option | Type | Default | Description |
+|--------|------|---------|-------------|
+| `fraction_lists_to_search` | FLOAT | Auto | Portion of index to search (ANN) |
+| `use_brute_force` | BOOL | FALSE | Force exact search |
+
+## Output Schema
+
+The function returns a joined result with:
+
+| Column | Type | Description |
+|--------|------|-------------|
+| `query.*` | Various | All columns from query table |
+| `base.*` | Various | All columns from base table |
+| `distance` | FLOAT64 | Distance between query and match |
+
+## Examples
+
+### Basic Semantic Search
+
+```sql
+-- Find 5 most similar documents to a query
+SELECT
+ base.id,
+ base.title,
+ base.content,
+ distance
+FROM VECTOR_SEARCH(
+ TABLE `project.dataset.document_embeddings`,
+ 'embedding',
+ (SELECT embedding
+ FROM ML.GENERATE_EMBEDDING(
+ MODEL `project.dataset.embedding_model`,
+ (SELECT 'machine learning tutorial' AS content)
+ )),
+ top_k => 5,
+ distance_type => 'COSINE'
+);
+```
+
+### Search with Query Table
+
+```sql
+-- Find similar items for multiple queries
+WITH query_embeddings AS (
+ SELECT
+ query_id,
+ query_text,
+ ml_generate_embedding_result AS embedding
+ FROM ML.GENERATE_EMBEDDING(
+ MODEL `project.dataset.embedding_model`,
+ (SELECT query_id, query_text AS content FROM user_queries)
+ )
+)
+SELECT
+ query.query_id,
+ query.query_text,
+ base.document_id,
+ base.title,
+ distance
+FROM VECTOR_SEARCH(
+ TABLE `project.dataset.document_embeddings`,
+ 'embedding',
+ TABLE query_embeddings,
+ top_k => 3,
+ distance_type => 'COSINE'
+)
+ORDER BY query.query_id, distance;
+```
+
+### Search with Filters
+
+```sql
+-- Combine vector search with WHERE clause
+SELECT
+ base.id,
+ base.title,
+ base.category,
+ distance
+FROM VECTOR_SEARCH(
+ (SELECT * FROM `project.dataset.embeddings` WHERE category = 'technology'),
+ 'embedding',
+ (SELECT embedding FROM query_embeddings),
+ top_k => 10
+);
+```
+
+### ANN Search with Index
+
+```sql
+-- Fast approximate search (requires vector index)
+SELECT *
+FROM VECTOR_SEARCH(
+ TABLE `project.dataset.large_embeddings`,
+ 'embedding',
+ TABLE query_embeddings,
+ top_k => 100,
+ distance_type => 'COSINE',
+ options => '{"fraction_lists_to_search": 0.005}'
+);
+```
+
+### Exact (Brute Force) Search
+
+```sql
+-- Force exact search for highest accuracy
+SELECT *
+FROM VECTOR_SEARCH(
+ TABLE `project.dataset.embeddings`,
+ 'embedding',
+ TABLE query_embeddings,
+ top_k => 10,
+ options => '{"use_brute_force": true}'
+);
+```
+
+## Vector Indexes
+
+### Creating Indexes
+
+```sql
+-- IVF index (most common)
+CREATE OR REPLACE VECTOR INDEX my_index
+ON `project.dataset.embeddings`(embedding)
+OPTIONS (
+ index_type = 'IVF',
+ distance_type = 'COSINE',
+ ivf_options = '{"num_lists": 1000}'
+);
+
+-- Check index status
+SELECT
+ table_name,
+ index_name,
+ index_status,
+ coverage_percentage
+FROM `project.dataset.INFORMATION_SCHEMA.VECTOR_INDEXES`;
+```
+
+### Index Types
+
+| Type | Description | Best For |
+|------|-------------|----------|
+| `IVF` | Inverted file index | General purpose, <100M rows |
+| `TREE_AH` | Tree-based with asymmetric hashing | Very large datasets |
+
+### Index Parameters
+
+```sql
+-- IVF options
+OPTIONS (
+ index_type = 'IVF',
+ distance_type = 'COSINE',
+ ivf_options = '{"num_lists": 1000}' -- sqrt(n) as starting point
+)
+
+-- TREE_AH options
+OPTIONS (
+ index_type = 'TREE_AH',
+ distance_type = 'DOT_PRODUCT',
+ tree_ah_options = '{"leaf_node_embedding_count": 1000}'
+)
+```
+
+### Tuning Search Quality
+
+The `fraction_lists_to_search` parameter trades speed for accuracy:
+
+| Value | Speed | Recall | Use Case |
+|-------|-------|--------|----------|
+| 0.001 | Fastest | ~90% | Large-scale, speed critical |
+| 0.01 | Fast | ~95% | Balanced (recommended) |
+| 0.1 | Medium | ~99% | High accuracy needed |
+| 1.0 | Slowest | 100% | Equivalent to brute force |
+
+```sql
+-- High speed, lower recall
+options => '{"fraction_lists_to_search": 0.001}'
+
+-- Balanced
+options => '{"fraction_lists_to_search": 0.01}'
+
+-- High recall
+options => '{"fraction_lists_to_search": 0.1}'
+```
+
+## Common Patterns
+
+### Similarity Threshold
+
+```sql
+-- Only return results above similarity threshold
+SELECT *
+FROM VECTOR_SEARCH(
+ TABLE `project.dataset.embeddings`,
+ 'embedding',
+ TABLE query_embeddings,
+ top_k => 100,
+ distance_type => 'COSINE'
+)
+WHERE distance < 0.5; -- COSINE distance < 0.5 means high similarity
+```
+
+### Deduplicate Results
+
+```sql
+-- Find and group near-duplicates
+WITH similarities AS (
+ SELECT
+ query.id AS id1,
+ base.id AS id2,
+ distance
+ FROM VECTOR_SEARCH(
+ TABLE `project.dataset.embeddings`,
+ 'embedding',
+ TABLE `project.dataset.embeddings`,
+ top_k => 5,
+ distance_type => 'COSINE'
+ )
+ WHERE query.id < base.id -- Avoid self-matches and duplicates
+ AND distance < 0.1 -- Very similar
+)
+SELECT * FROM similarities;
+```
+
+### Multi-Vector Search
+
+```sql
+-- Search across multiple embedding columns
+WITH text_results AS (
+ SELECT base.id, distance AS text_distance
+ FROM VECTOR_SEARCH(
+ TABLE `project.dataset.items`,
+ 'text_embedding',
+ TABLE query_embeddings,
+ top_k => 50
+ )
+),
+image_results AS (
+ SELECT base.id, distance AS image_distance
+ FROM VECTOR_SEARCH(
+ TABLE `project.dataset.items`,
+ 'image_embedding',
+ TABLE query_image_embeddings,
+ top_k => 50
+ )
+)
+SELECT
+ COALESCE(t.id, i.id) AS id,
+ t.text_distance,
+ i.image_distance,
+ COALESCE(t.text_distance, 1) * 0.7 +
+ COALESCE(i.image_distance, 1) * 0.3 AS combined_score
+FROM text_results t
+FULL OUTER JOIN image_results i ON t.id = i.id
+ORDER BY combined_score;
+```
+
+### Hybrid Search (Vector + Keyword)
+
+```sql
+-- Combine semantic and keyword search
+WITH semantic_results AS (
+ SELECT base.id, distance, 1.0 / (1.0 + distance) AS semantic_score
+ FROM VECTOR_SEARCH(
+ TABLE `project.dataset.docs`,
+ 'embedding',
+ TABLE query_embeddings,
+ top_k => 100
+ )
+),
+keyword_results AS (
+ SELECT id, search_score
+ FROM `project.dataset.docs`
+ WHERE SEARCH(content, @query)
+)
+SELECT
+ COALESCE(s.id, k.id) AS id,
+ s.semantic_score,
+ k.search_score,
+ COALESCE(s.semantic_score, 0) * 0.6 +
+ COALESCE(k.search_score, 0) * 0.4 AS hybrid_score
+FROM semantic_results s
+FULL OUTER JOIN keyword_results k ON s.id = k.id
+ORDER BY hybrid_score DESC
+LIMIT 20;
+```
+
+## Performance Tips
+
+### 1. Use Vector Indexes
+- Create indexes for tables > 10K rows
+- Significant speedup for > 100K rows
+
+### 2. Limit Base Table
+- Filter base table before search when possible
+- Reduces search space
+
+### 3. Tune Recall vs Speed
+- Start with `fraction_lists_to_search: 0.01`
+- Increase if quality is insufficient
+
+### 4. Batch Queries
+- Process multiple queries in one call
+- More efficient than individual queries
+
+### 5. Monitor Costs
+- Search costs scale with table size
+- Index maintenance has ongoing costs
+
+## Troubleshooting
+
+### Slow Queries
+
+```sql
+-- Check if index exists
+SELECT * FROM `project.dataset.INFORMATION_SCHEMA.VECTOR_INDEXES`;
+
+-- Check index coverage
+SELECT coverage_percentage FROM ...;
+
+-- Ensure index is ACTIVE status
+```
+
+### Poor Results
+
+1. Check embedding model consistency
+2. Verify distance type matches index
+3. Increase `fraction_lists_to_search`
+4. Compare with brute force results
+
+### Memory Errors
+
+- Reduce `top_k`
+- Filter base table
+- Process queries in batches
diff --git a/src/google/adk/skills/bigquery-ai/scripts/generate_embeddings.py b/src/google/adk/skills/bigquery-ai/scripts/generate_embeddings.py
new file mode 100644
index 0000000000..91b2f30685
--- /dev/null
+++ b/src/google/adk/skills/bigquery-ai/scripts/generate_embeddings.py
@@ -0,0 +1,303 @@
+#!/usr/bin/env python3
+"""Generate embeddings for a BigQuery table.
+
+This script generates vector embeddings for text data in BigQuery using
+ML.GENERATE_EMBEDDING, with support for batching and incremental updates.
+
+Usage:
+ python generate_embeddings.py --project-id PROJECT --source-table TABLE
+
+Examples:
+ # Generate embeddings for a table
+ python generate_embeddings.py \\
+ --project-id my-project \\
+ --source-table my_dataset.documents \\
+ --content-column body \\
+ --output-table my_dataset.document_embeddings
+
+ # Incremental update (only embed new rows)
+ python generate_embeddings.py \\
+ --project-id my-project \\
+ --source-table my_dataset.documents \\
+ --output-table my_dataset.document_embeddings \\
+ --incremental --id-column doc_id
+
+ # Custom embedding model
+ python generate_embeddings.py \\
+ --project-id my-project \\
+ --source-table my_dataset.documents \\
+ --embedding-model my_dataset.custom_embedding_model
+"""
+
+import argparse
+import json
+import sys
+from typing import Optional
+
+
+def generate_embedding_sql(
+ project_id: str,
+ source_table: str,
+ output_table: str,
+ content_column: str,
+ id_column: str,
+ embedding_model: str,
+ task_type: Optional[str] = None,
+ batch_size: Optional[int] = None,
+ incremental: bool = False,
+ create_table: bool = True,
+) -> str:
+ """Generate SQL for embedding generation."""
+ # Build task type option
+ task_option = ""
+ if task_type:
+ task_option = f", STRUCT('{task_type}' AS task_type)"
+
+ # Build source query
+ if incremental:
+ source_query = f"""
+ SELECT s.{id_column}, s.{content_column} AS content
+ FROM `{source_table}` s
+ LEFT JOIN `{output_table}` e ON s.{id_column} = e.{id_column}
+ WHERE e.{id_column} IS NULL"""
+ else:
+ source_query = f"""
+ SELECT {id_column}, {content_column} AS content
+ FROM `{source_table}`"""
+
+ if batch_size:
+ source_query += f"\n LIMIT {batch_size}"
+
+ # Build the main SQL
+ if create_table and not incremental:
+ create_clause = f"CREATE OR REPLACE TABLE `{output_table}` AS"
+ else:
+ create_clause = f"INSERT INTO `{output_table}`"
+
+ sql = f"""{create_clause}
+SELECT
+ {id_column},
+ content,
+ ml_generate_embedding_result AS embedding,
+ ml_generate_embedding_status AS status,
+ CURRENT_TIMESTAMP() AS embedded_at
+FROM ML.GENERATE_EMBEDDING(
+ MODEL `{embedding_model}`,
+ ({source_query}){task_option}
+)
+WHERE LENGTH(ml_generate_embedding_status) = 0;"""
+
+ return sql
+
+
+def generate_index_sql(
+ output_table: str,
+ index_name: str,
+ distance_type: str = "COSINE",
+ num_lists: int = 500,
+) -> str:
+ """Generate SQL for vector index creation."""
+ return f"""CREATE OR REPLACE VECTOR INDEX {index_name}
+ON `{output_table}`(embedding)
+OPTIONS (
+ index_type = 'IVF',
+ distance_type = '{distance_type}',
+ ivf_options = '{{"num_lists": {num_lists}}}'
+);"""
+
+
+def main():
+ parser = argparse.ArgumentParser(
+ description="Generate embeddings for BigQuery table",
+ formatter_class=argparse.RawDescriptionHelpFormatter,
+ epilog=__doc__,
+ )
+
+ parser.add_argument(
+ "--project-id",
+ required=True,
+ help="Google Cloud project ID",
+ )
+ parser.add_argument(
+ "--source-table",
+ required=True,
+ help="Source table (dataset.table)",
+ )
+ parser.add_argument(
+ "--output-table",
+ help="Output table for embeddings (default: source_table_embeddings)",
+ )
+ parser.add_argument(
+ "--content-column",
+ default="content",
+ help="Column containing text to embed (default: content)",
+ )
+ parser.add_argument(
+ "--id-column",
+ default="id",
+ help="Primary key column (default: id)",
+ )
+ parser.add_argument(
+ "--embedding-model",
+ help="Embedding model to use (default: creates text-embedding-005)",
+ )
+ parser.add_argument(
+ "--task-type",
+ choices=[
+ "RETRIEVAL_DOCUMENT",
+ "RETRIEVAL_QUERY",
+ "SEMANTIC_SIMILARITY",
+ "CLASSIFICATION",
+ "CLUSTERING",
+ ],
+ default="RETRIEVAL_DOCUMENT",
+ help="Task type for embeddings (default: RETRIEVAL_DOCUMENT)",
+ )
+ parser.add_argument(
+ "--batch-size",
+ type=int,
+ help="Process in batches of this size",
+ )
+ parser.add_argument(
+ "--incremental",
+ action="store_true",
+ help="Only embed rows not already in output table",
+ )
+ parser.add_argument(
+ "--create-index",
+ action="store_true",
+ help="Create vector index after embedding",
+ )
+ parser.add_argument(
+ "--index-name",
+ help="Name for vector index (default: table_embedding_idx)",
+ )
+ parser.add_argument(
+ "--dry-run",
+ action="store_true",
+ help="Print SQL without executing",
+ )
+ parser.add_argument(
+ "--output",
+ choices=["sql", "json"],
+ default="sql",
+ help="Output format (default: sql)",
+ )
+
+ args = parser.parse_args()
+
+ # Set defaults
+ output_table = args.output_table or f"{args.source_table}_embeddings"
+ embedding_model = (
+ args.embedding_model
+ or f"{args.project_id}.{args.source_table.split('.')[0]}.text_embedding_model"
+ )
+ index_name = args.index_name or f"{output_table.split('.')[-1]}_embedding_idx"
+
+ # Generate SQL statements
+ sql_statements = []
+
+ # Create embedding model if not specified
+ if not args.embedding_model:
+ dataset = args.source_table.split(".")[0]
+ model_sql = f"""-- Create embedding model if not exists
+CREATE MODEL IF NOT EXISTS `{args.project_id}.{dataset}.text_embedding_model`
+ REMOTE WITH CONNECTION DEFAULT
+ OPTIONS (ENDPOINT = 'text-embedding-005');"""
+ sql_statements.append(model_sql)
+
+ # Generate embeddings
+ embed_sql = generate_embedding_sql(
+ project_id=args.project_id,
+ source_table=f"{args.project_id}.{args.source_table}",
+ output_table=f"{args.project_id}.{output_table}",
+ content_column=args.content_column,
+ id_column=args.id_column,
+ embedding_model=embedding_model,
+ task_type=args.task_type,
+ batch_size=args.batch_size,
+ incremental=args.incremental,
+ create_table=not args.incremental,
+ )
+ sql_statements.append(embed_sql)
+
+ # Create index if requested
+ if args.create_index:
+ index_sql = generate_index_sql(
+ output_table=f"{args.project_id}.{output_table}",
+ index_name=index_name,
+ )
+ sql_statements.append(index_sql)
+
+ # Output results
+ full_sql = "\n\n".join(sql_statements)
+
+ if args.output == "json":
+ result = {
+ "project_id": args.project_id,
+ "source_table": args.source_table,
+ "output_table": output_table,
+ "embedding_model": embedding_model,
+ "task_type": args.task_type,
+ "incremental": args.incremental,
+ "create_index": args.create_index,
+ "sql": full_sql,
+ "dry_run": args.dry_run,
+ }
+ print(json.dumps(result, indent=2))
+ else:
+ print(f"-- Generate embeddings for {args.source_table}")
+ print(f"-- Output: {output_table}")
+ print(f"-- Model: {embedding_model}")
+ print(f"-- Task type: {args.task_type}")
+ if args.incremental:
+ print("-- Mode: Incremental (new rows only)")
+ print()
+ print(full_sql)
+ print()
+
+ if args.dry_run:
+ print("-- Dry run mode: SQL not executed")
+ return 0
+
+ # Execute the SQL
+ try:
+ from google.cloud import bigquery
+
+ client = bigquery.Client(project=args.project_id)
+
+ for i, sql in enumerate(sql_statements):
+ print(f"\nExecuting statement {i + 1}/{len(sql_statements)}...")
+ query_job = client.query(sql)
+ result = query_job.result()
+ if hasattr(result, "total_rows"):
+ print(f" Processed {result.total_rows} rows")
+
+ print("\nEmbedding generation complete!")
+
+ # Get stats
+ stats_query = f"""
+ SELECT
+ COUNT(*) as total_embeddings,
+ MAX(embedded_at) as last_updated
+ FROM `{args.project_id}.{output_table}`
+ """
+ stats = list(client.query(stats_query).result())[0]
+ print(f"Total embeddings: {stats.total_embeddings}")
+ print(f"Last updated: {stats.last_updated}")
+
+ return 0
+
+ except ImportError:
+ print("\n-- Note: google-cloud-bigquery not installed")
+ print("-- To execute, install it with: pip install google-cloud-bigquery")
+ print("-- Or run the SQL manually in BigQuery Console")
+ return 0
+
+ except Exception as e:
+ print(f"Error: {e}", file=sys.stderr)
+ return 1
+
+
+if __name__ == "__main__":
+ sys.exit(main())
diff --git a/src/google/adk/skills/bigquery-ai/scripts/rag_pipeline.py b/src/google/adk/skills/bigquery-ai/scripts/rag_pipeline.py
new file mode 100644
index 0000000000..cb131bcdd9
--- /dev/null
+++ b/src/google/adk/skills/bigquery-ai/scripts/rag_pipeline.py
@@ -0,0 +1,411 @@
+#!/usr/bin/env python3
+"""Build and run RAG (Retrieval-Augmented Generation) pipelines in BigQuery.
+
+This script helps set up and execute RAG workflows that combine semantic
+search with text generation for grounded AI responses.
+
+Usage:
+ python rag_pipeline.py --project-id PROJECT --kb-table TABLE --query "question"
+
+Examples:
+ # Ask a question using RAG
+ python rag_pipeline.py \\
+ --project-id my-project \\
+ --kb-table my_dataset.knowledge_base_embeddings \\
+ --query "What is the refund policy?"
+
+ # Setup new RAG pipeline
+ python rag_pipeline.py \\
+ --project-id my-project \\
+ --source-table my_dataset.documents \\
+ --setup
+
+ # RAG with custom models
+ python rag_pipeline.py \\
+ --project-id my-project \\
+ --kb-table my_dataset.kb_embeddings \\
+ --query "How do I configure logging?" \\
+ --generation-model my_dataset.gemini_pro \\
+ --num-sources 10
+"""
+
+import argparse
+import json
+import sys
+from typing import Optional
+
+
+def generate_setup_sql(
+ project_id: str,
+ source_table: str,
+ kb_table: str,
+ content_column: str,
+ id_column: str,
+ chunk_size: int = 1000,
+) -> str:
+ """Generate SQL to set up a RAG knowledge base."""
+ dataset = source_table.split(".")[0]
+
+ sql = f"""-- RAG Pipeline Setup
+-- Step 1: Create embedding model
+CREATE MODEL IF NOT EXISTS `{project_id}.{dataset}.rag_embedding_model`
+ REMOTE WITH CONNECTION DEFAULT
+ OPTIONS (ENDPOINT = 'text-embedding-005');
+
+-- Step 2: Create generation model
+CREATE MODEL IF NOT EXISTS `{project_id}.{dataset}.rag_generation_model`
+ REMOTE WITH CONNECTION DEFAULT
+ OPTIONS (ENDPOINT = 'gemini-2.0-flash');
+
+-- Step 3: Chunk documents (if needed)
+CREATE OR REPLACE TABLE `{project_id}.{dataset}.document_chunks` AS
+WITH chunks AS (
+ SELECT
+ {id_column} AS doc_id,
+ chunk_index,
+ TRIM(chunk) AS chunk_text
+ FROM `{project_id}.{source_table}`,
+ UNNEST(REGEXP_EXTRACT_ALL({content_column}, r'.{{{{{chunk_size}}}}}(?:\\s|$)|.+$')) AS chunk
+ WITH OFFSET AS chunk_index
+)
+SELECT
+ CONCAT(doc_id, '_', chunk_index) AS chunk_id,
+ doc_id,
+ chunk_index,
+ chunk_text
+FROM chunks
+WHERE LENGTH(chunk_text) > 50;
+
+-- Step 4: Generate embeddings
+CREATE OR REPLACE TABLE `{project_id}.{kb_table}` AS
+SELECT
+ chunk_id,
+ doc_id,
+ chunk_text AS content,
+ ml_generate_embedding_result AS embedding
+FROM ML.GENERATE_EMBEDDING(
+ MODEL `{project_id}.{dataset}.rag_embedding_model`,
+ (SELECT chunk_id, doc_id, chunk_text AS content
+ FROM `{project_id}.{dataset}.document_chunks`),
+ STRUCT('RETRIEVAL_DOCUMENT' AS task_type)
+)
+WHERE LENGTH(ml_generate_embedding_status) = 0;
+
+-- Step 5: Create vector index
+CREATE OR REPLACE VECTOR INDEX kb_embedding_idx
+ON `{project_id}.{kb_table}`(embedding)
+OPTIONS (
+ index_type = 'IVF',
+ distance_type = 'COSINE',
+ ivf_options = '{{"num_lists": 500}}'
+);
+
+-- Setup complete! Use rag_pipeline.py --query to ask questions.
+SELECT
+ 'RAG pipeline setup complete' AS status,
+ (SELECT COUNT(*) FROM `{project_id}.{kb_table}`) AS total_chunks;"""
+
+ return sql
+
+
+def generate_rag_query_sql(
+ project_id: str,
+ kb_table: str,
+ embedding_model: str,
+ generation_model: str,
+ query_text: str,
+ num_sources: int = 5,
+ max_output_tokens: int = 512,
+ temperature: float = 0.2,
+ include_sources: bool = True,
+) -> str:
+ """Generate SQL for RAG query."""
+ escaped_query = query_text.replace("'", "''")
+
+ sources_select = ""
+ if include_sources:
+ sources_select = """,
+ (SELECT ARRAY_AGG(STRUCT(
+ chunk_id,
+ LEFT(content, 200) AS excerpt,
+ ROUND(1.0 - distance, 3) AS relevance
+ ))
+ FROM retrieved_context) AS sources"""
+
+ sql = f"""-- RAG Query: {query_text[:50]}...
+DECLARE user_query STRING DEFAULT '{escaped_query}';
+
+WITH query_embedding AS (
+ SELECT ml_generate_embedding_result AS embedding
+ FROM ML.GENERATE_EMBEDDING(
+ MODEL `{embedding_model}`,
+ (SELECT user_query AS content),
+ STRUCT('RETRIEVAL_QUERY' AS task_type)
+ )
+),
+retrieved_context AS (
+ SELECT
+ base.chunk_id,
+ base.content,
+ distance
+ FROM VECTOR_SEARCH(
+ TABLE `{kb_table}`,
+ 'embedding',
+ TABLE query_embedding,
+ top_k => {num_sources},
+ distance_type => 'COSINE',
+ options => '{{"fraction_lists_to_search": 0.01}}'
+ )
+ ORDER BY distance
+),
+context_string AS (
+ SELECT STRING_AGG(
+ CONCAT('[Source ', ROW_NUMBER() OVER (ORDER BY distance), ']: ', content),
+ '\\n\\n'
+ ) AS context
+ FROM retrieved_context
+),
+rag_prompt AS (
+ SELECT CONCAT(
+ 'You are a helpful assistant. Answer the question based ONLY on the following context. ',
+ 'If the answer is not in the context, say "I don\\'t have enough information to answer that."\\n\\n',
+ 'Context:\\n', context, '\\n\\n',
+ 'Question: ', user_query, '\\n\\n',
+ 'Answer:'
+ ) AS prompt
+ FROM context_string
+)
+SELECT
+ user_query AS question,
+ JSON_VALUE(ml_generate_text_result, '$.predictions[0].content') AS answer{sources_select}
+FROM ML.GENERATE_TEXT(
+ MODEL `{generation_model}`,
+ TABLE rag_prompt,
+ STRUCT({max_output_tokens} AS max_output_tokens, {temperature} AS temperature)
+),
+retrieved_context;"""
+
+ return sql
+
+
+def main():
+ parser = argparse.ArgumentParser(
+ description="Build and run RAG pipelines in BigQuery",
+ formatter_class=argparse.RawDescriptionHelpFormatter,
+ epilog=__doc__,
+ )
+
+ parser.add_argument(
+ "--project-id",
+ required=True,
+ help="Google Cloud project ID",
+ )
+ parser.add_argument(
+ "--kb-table",
+ help="Knowledge base embeddings table (dataset.table)",
+ )
+ parser.add_argument(
+ "--query",
+ help="Question to answer using RAG",
+ )
+ parser.add_argument(
+ "--setup",
+ action="store_true",
+ help="Setup new RAG pipeline from source documents",
+ )
+ parser.add_argument(
+ "--source-table",
+ help="Source documents table for setup (dataset.table)",
+ )
+ parser.add_argument(
+ "--content-column",
+ default="content",
+ help="Column containing document text (default: content)",
+ )
+ parser.add_argument(
+ "--id-column",
+ default="id",
+ help="Column containing document ID (default: id)",
+ )
+ parser.add_argument(
+ "--chunk-size",
+ type=int,
+ default=1000,
+ help="Characters per chunk (default: 1000)",
+ )
+ parser.add_argument(
+ "--embedding-model",
+ help="Embedding model (default: auto-detect)",
+ )
+ parser.add_argument(
+ "--generation-model",
+ help="Generation model (default: auto-detect)",
+ )
+ parser.add_argument(
+ "--num-sources",
+ type=int,
+ default=5,
+ help="Number of sources to retrieve (default: 5)",
+ )
+ parser.add_argument(
+ "--max-tokens",
+ type=int,
+ default=512,
+ help="Max output tokens (default: 512)",
+ )
+ parser.add_argument(
+ "--temperature",
+ type=float,
+ default=0.2,
+ help="Generation temperature (default: 0.2)",
+ )
+ parser.add_argument(
+ "--no-sources",
+ action="store_true",
+ help="Don't include source references in output",
+ )
+ parser.add_argument(
+ "--dry-run",
+ action="store_true",
+ help="Print SQL without executing",
+ )
+ parser.add_argument(
+ "--output",
+ choices=["text", "json", "sql"],
+ default="text",
+ help="Output format (default: text)",
+ )
+
+ args = parser.parse_args()
+
+ # Handle setup mode
+ if args.setup:
+ if not args.source_table:
+ print("Error: --source-table required for setup mode")
+ return 1
+
+ kb_table = (
+ args.kb_table or f"{args.source_table.split('.')[0]}.kb_embeddings"
+ )
+
+ sql = generate_setup_sql(
+ project_id=args.project_id,
+ source_table=args.source_table,
+ kb_table=kb_table,
+ content_column=args.content_column,
+ id_column=args.id_column,
+ chunk_size=args.chunk_size,
+ )
+
+ print("-- RAG Pipeline Setup SQL")
+ print(sql)
+
+ if args.dry_run:
+ print("\n-- Dry run mode: SQL not executed")
+ return 0
+
+ try:
+ from google.cloud import bigquery
+
+ client = bigquery.Client(project=args.project_id)
+
+ print("\nExecuting setup (this may take several minutes)...")
+ for statement in sql.split(";"):
+ statement = statement.strip()
+ if statement and not statement.startswith("--"):
+ print(f" Running: {statement[:60]}...")
+ client.query(statement + ";").result()
+
+ print("\nSetup complete!")
+ return 0
+
+ except ImportError:
+ print("\n-- Run in BigQuery Console to execute")
+ return 0
+
+ # Handle query mode
+ if not args.query:
+ print("Error: --query required (or use --setup for pipeline setup)")
+ return 1
+
+ if not args.kb_table:
+ print("Error: --kb-table required for query mode")
+ return 1
+
+ # Determine models
+ dataset = args.kb_table.split(".")[0]
+ embedding_model = (
+ args.embedding_model or f"{args.project_id}.{dataset}.rag_embedding_model"
+ )
+ generation_model = (
+ args.generation_model
+ or f"{args.project_id}.{dataset}.rag_generation_model"
+ )
+
+ sql = generate_rag_query_sql(
+ project_id=args.project_id,
+ kb_table=f"{args.project_id}.{args.kb_table}",
+ embedding_model=embedding_model,
+ generation_model=generation_model,
+ query_text=args.query,
+ num_sources=args.num_sources,
+ max_output_tokens=args.max_tokens,
+ temperature=args.temperature,
+ include_sources=not args.no_sources,
+ )
+
+ if args.output == "sql" or args.dry_run:
+ print(sql)
+ if args.dry_run:
+ print("\n-- Dry run mode: SQL not executed")
+ return 0
+
+ # Execute query
+ try:
+ from google.cloud import bigquery
+
+ client = bigquery.Client(project=args.project_id)
+
+ print(f"Querying: {args.query[:60]}...")
+ result = list(client.query(sql).result())[0]
+
+ if args.output == "json":
+ output = {
+ "question": result.question,
+ "answer": result.answer,
+ }
+ if hasattr(result, "sources") and result.sources:
+ output["sources"] = [
+ {
+ "chunk_id": s["chunk_id"],
+ "excerpt": s["excerpt"],
+ "relevance": s["relevance"],
+ }
+ for s in result.sources
+ ]
+ print(json.dumps(output, indent=2))
+ else:
+ print(f"\nQuestion: {result.question}")
+ print(f"\nAnswer: {result.answer}")
+
+ if hasattr(result, "sources") and result.sources:
+ print("\nSources:")
+ for i, source in enumerate(result.sources, 1):
+ print(f" [{i}] (relevance: {source['relevance']:.2f})")
+ print(f" {source['excerpt'][:100]}...")
+
+ return 0
+
+ except ImportError:
+ print("\n-- google-cloud-bigquery not installed")
+ print("-- Run: pip install google-cloud-bigquery")
+ print("-- Or execute SQL in BigQuery Console")
+ return 0
+
+ except Exception as e:
+ print(f"Error: {e}", file=sys.stderr)
+ return 1
+
+
+if __name__ == "__main__":
+ sys.exit(main())
diff --git a/src/google/adk/skills/bigquery-ai/scripts/semantic_search.py b/src/google/adk/skills/bigquery-ai/scripts/semantic_search.py
new file mode 100644
index 0000000000..117607fa26
--- /dev/null
+++ b/src/google/adk/skills/bigquery-ai/scripts/semantic_search.py
@@ -0,0 +1,326 @@
+#!/usr/bin/env python3
+"""Perform semantic search on BigQuery embeddings.
+
+This script performs semantic search using VECTOR_SEARCH function,
+finding similar items based on query text or embeddings.
+
+Usage:
+ python semantic_search.py --project-id PROJECT --table TABLE --query "search text"
+
+Examples:
+ # Search by text query
+ python semantic_search.py \\
+ --project-id my-project \\
+ --table my_dataset.document_embeddings \\
+ --query "machine learning best practices"
+
+ # Search with more results
+ python semantic_search.py \\
+ --project-id my-project \\
+ --table my_dataset.embeddings \\
+ --query "data pipeline architecture" \\
+ --top-k 20
+
+ # Search with similarity threshold
+ python semantic_search.py \\
+ --project-id my-project \\
+ --table my_dataset.embeddings \\
+ --query "API documentation" \\
+ --threshold 0.3
+"""
+
+import argparse
+import json
+import sys
+from typing import Optional
+
+
+def generate_search_sql(
+ project_id: str,
+ embeddings_table: str,
+ embedding_model: str,
+ query_text: str,
+ top_k: int = 10,
+ distance_type: str = "COSINE",
+ threshold: Optional[float] = None,
+ content_column: str = "content",
+ id_column: str = "id",
+ use_index: bool = True,
+ fraction_lists: float = 0.01,
+) -> str:
+ """Generate SQL for semantic search."""
+ # Build options for index usage
+ if use_index:
+ options = (
+ f", options => '{{\"fraction_lists_to_search\": {fraction_lists}}}'"
+ )
+ else:
+ options = ", options => '{\"use_brute_force\": true}'"
+
+ # Build threshold filter
+ threshold_filter = ""
+ if threshold is not None:
+ threshold_filter = f"\nWHERE distance < {threshold}"
+
+ sql = f"""-- Semantic search for: {query_text[:50]}...
+WITH query_embedding AS (
+ SELECT ml_generate_embedding_result AS embedding
+ FROM ML.GENERATE_EMBEDDING(
+ MODEL `{embedding_model}`,
+ (SELECT '{query_text.replace("'", "''")}' AS content),
+ STRUCT('RETRIEVAL_QUERY' AS task_type)
+ )
+)
+SELECT
+ base.{id_column} AS id,
+ base.{content_column} AS content,
+ distance,
+ 1.0 - distance AS similarity_score
+FROM VECTOR_SEARCH(
+ TABLE `{embeddings_table}`,
+ 'embedding',
+ TABLE query_embedding,
+ top_k => {top_k},
+ distance_type => '{distance_type}'{options}
+){threshold_filter}
+ORDER BY distance ASC;"""
+
+ return sql
+
+
+def generate_batch_search_sql(
+ project_id: str,
+ embeddings_table: str,
+ embedding_model: str,
+ queries_table: str,
+ query_column: str,
+ top_k: int = 10,
+ distance_type: str = "COSINE",
+) -> str:
+ """Generate SQL for batch semantic search."""
+ sql = f"""-- Batch semantic search
+WITH query_embeddings AS (
+ SELECT
+ query_id,
+ query_text,
+ ml_generate_embedding_result AS embedding
+ FROM ML.GENERATE_EMBEDDING(
+ MODEL `{embedding_model}`,
+ (SELECT query_id, {query_column} AS content FROM `{queries_table}`),
+ STRUCT('RETRIEVAL_QUERY' AS task_type)
+ )
+ WHERE LENGTH(ml_generate_embedding_status) = 0
+)
+SELECT
+ query.query_id,
+ query.query_text,
+ base.id,
+ base.content,
+ distance,
+ 1.0 - distance AS similarity_score
+FROM VECTOR_SEARCH(
+ TABLE `{embeddings_table}`,
+ 'embedding',
+ TABLE query_embeddings,
+ top_k => {top_k},
+ distance_type => '{distance_type}'
+)
+ORDER BY query.query_id, distance ASC;"""
+
+ return sql
+
+
+def main():
+ parser = argparse.ArgumentParser(
+ description="Perform semantic search on BigQuery embeddings",
+ formatter_class=argparse.RawDescriptionHelpFormatter,
+ epilog=__doc__,
+ )
+
+ parser.add_argument(
+ "--project-id",
+ required=True,
+ help="Google Cloud project ID",
+ )
+ parser.add_argument(
+ "--table",
+ required=True,
+ help="Embeddings table (dataset.table)",
+ )
+ parser.add_argument(
+ "--query",
+ help="Search query text",
+ )
+ parser.add_argument(
+ "--queries-table",
+ help="Table containing multiple queries for batch search",
+ )
+ parser.add_argument(
+ "--query-column",
+ default="query_text",
+ help="Column name for query text in batch mode (default: query_text)",
+ )
+ parser.add_argument(
+ "--embedding-model",
+ help="Embedding model to use (default: auto-detect or create)",
+ )
+ parser.add_argument(
+ "--top-k",
+ type=int,
+ default=10,
+ help="Number of results to return (default: 10)",
+ )
+ parser.add_argument(
+ "--threshold",
+ type=float,
+ help="Maximum distance threshold (0.0-1.0 for COSINE)",
+ )
+ parser.add_argument(
+ "--distance-type",
+ choices=["COSINE", "EUCLIDEAN", "DOT_PRODUCT"],
+ default="COSINE",
+ help="Distance metric (default: COSINE)",
+ )
+ parser.add_argument(
+ "--content-column",
+ default="content",
+ help="Column containing text content (default: content)",
+ )
+ parser.add_argument(
+ "--id-column",
+ default="id",
+ help="Column containing ID (default: id)",
+ )
+ parser.add_argument(
+ "--brute-force",
+ action="store_true",
+ help="Use brute force search (exact but slower)",
+ )
+ parser.add_argument(
+ "--fraction-lists",
+ type=float,
+ default=0.01,
+ help="Fraction of index lists to search (default: 0.01)",
+ )
+ parser.add_argument(
+ "--dry-run",
+ action="store_true",
+ help="Print SQL without executing",
+ )
+ parser.add_argument(
+ "--output",
+ choices=["table", "json", "sql"],
+ default="table",
+ help="Output format (default: table)",
+ )
+
+ args = parser.parse_args()
+
+ # Validate inputs
+ if not args.query and not args.queries_table:
+ print("Error: Either --query or --queries-table must be specified")
+ return 1
+
+ # Set defaults
+ dataset = args.table.split(".")[0]
+ embedding_model = (
+ args.embedding_model
+ or f"{args.project_id}.{dataset}.text_embedding_model"
+ )
+
+ # Generate SQL
+ if args.queries_table:
+ sql = generate_batch_search_sql(
+ project_id=args.project_id,
+ embeddings_table=f"{args.project_id}.{args.table}",
+ embedding_model=embedding_model,
+ queries_table=f"{args.project_id}.{args.queries_table}",
+ query_column=args.query_column,
+ top_k=args.top_k,
+ distance_type=args.distance_type,
+ )
+ else:
+ sql = generate_search_sql(
+ project_id=args.project_id,
+ embeddings_table=f"{args.project_id}.{args.table}",
+ embedding_model=embedding_model,
+ query_text=args.query,
+ top_k=args.top_k,
+ distance_type=args.distance_type,
+ threshold=args.threshold,
+ content_column=args.content_column,
+ id_column=args.id_column,
+ use_index=not args.brute_force,
+ fraction_lists=args.fraction_lists,
+ )
+
+ if args.output == "sql" or args.dry_run:
+ print(sql)
+ if args.dry_run:
+ print("\n-- Dry run mode: SQL not executed")
+ return 0
+
+ # Execute the SQL
+ try:
+ from google.cloud import bigquery
+
+ client = bigquery.Client(project=args.project_id)
+
+ print(
+ "Searching for:"
+ f" {args.query[:50] if args.query else 'batch queries'}..."
+ )
+ query_job = client.query(sql)
+ results = list(query_job.result())
+
+ if not results:
+ print("No results found.")
+ return 0
+
+ if args.output == "json":
+ output = []
+ for row in results:
+ output.append({
+ "id": row.id,
+ "content": (
+ row.content[:200] + "..."
+ if len(row.content) > 200
+ else row.content
+ ),
+ "distance": row.distance,
+ "similarity_score": row.similarity_score,
+ })
+ print(json.dumps(output, indent=2))
+ else:
+ # Table output
+ print(f"\n{'Rank':<6}{'ID':<20}{'Similarity':<12}{'Content':<60}")
+ print("-" * 98)
+ for i, row in enumerate(results, 1):
+ content_preview = (
+ row.content[:57] + "..." if len(row.content) > 60 else row.content
+ )
+ content_preview = content_preview.replace("\n", " ")
+ print(
+ f"{i:<6}{str(row.id)[:18]:<20}{row.similarity_score:.4f} "
+ f" {content_preview}"
+ )
+
+ print(f"\nFound {len(results)} results")
+
+ return 0
+
+ except ImportError:
+ print("\n-- Note: google-cloud-bigquery not installed")
+ print("-- To execute, install it with: pip install google-cloud-bigquery")
+ print("-- Here's the SQL to run manually:")
+ print()
+ print(sql)
+ return 0
+
+ except Exception as e:
+ print(f"Error: {e}", file=sys.stderr)
+ return 1
+
+
+if __name__ == "__main__":
+ sys.exit(main())
diff --git a/src/google/adk/skills/bigquery-ai/scripts/setup_remote_model.py b/src/google/adk/skills/bigquery-ai/scripts/setup_remote_model.py
new file mode 100644
index 0000000000..a157ee73b3
--- /dev/null
+++ b/src/google/adk/skills/bigquery-ai/scripts/setup_remote_model.py
@@ -0,0 +1,247 @@
+#!/usr/bin/env python3
+"""Setup remote model connections in BigQuery for AI operations.
+
+This script creates BigQuery remote models connected to Vertex AI endpoints
+for text generation, embeddings, and other AI operations.
+
+Usage:
+ python setup_remote_model.py --project-id PROJECT --model-type text-generation
+ python setup_remote_model.py --project-id PROJECT --model-type embeddings
+ python setup_remote_model.py --project-id PROJECT --endpoint gemini-2.0-flash
+
+Examples:
+ # Create Gemini model for text generation
+ python setup_remote_model.py \\
+ --project-id my-project \\
+ --dataset my_dataset \\
+ --model-name gemini_model \\
+ --endpoint gemini-2.0-flash
+
+ # Create embedding model
+ python setup_remote_model.py \\
+ --project-id my-project \\
+ --dataset my_dataset \\
+ --model-name embedding_model \\
+ --endpoint text-embedding-005
+
+ # Use specific connection
+ python setup_remote_model.py \\
+ --project-id my-project \\
+ --dataset my_dataset \\
+ --model-name my_model \\
+ --endpoint gemini-1.5-pro \\
+ --connection my-project.us.my-connection
+"""
+
+import argparse
+import json
+import sys
+from typing import Optional
+
+# Model presets for common use cases
+MODEL_PRESETS = {
+ "text-generation": {
+ "endpoint": "gemini-2.0-flash",
+ "description": "Fast text generation with Gemini 2.0 Flash",
+ },
+ "text-generation-pro": {
+ "endpoint": "gemini-1.5-pro",
+ "description": "High-quality text generation with Gemini 1.5 Pro",
+ },
+ "embeddings": {
+ "endpoint": "text-embedding-005",
+ "description": "Text embeddings for semantic search",
+ },
+ "embeddings-multilingual": {
+ "endpoint": "text-multilingual-embedding-002",
+ "description": "Multilingual text embeddings",
+ },
+ "multimodal-embeddings": {
+ "endpoint": "multimodalembedding@001",
+ "description": "Multimodal embeddings for text, images, and video",
+ },
+ "claude-sonnet": {
+ "endpoint": "claude-3-5-sonnet@20241022",
+ "description": "Anthropic Claude 3.5 Sonnet",
+ },
+ "llama": {
+ "endpoint": "llama-3.1-70b-instruct-maas",
+ "description": "Meta Llama 3.1 70B",
+ },
+}
+
+
+def generate_create_model_sql(
+ project_id: str,
+ dataset: str,
+ model_name: str,
+ endpoint: str,
+ connection: Optional[str] = None,
+ replace: bool = True,
+) -> str:
+ """Generate CREATE MODEL SQL statement."""
+ replace_clause = "OR REPLACE " if replace else ""
+ connection_clause = f"`{connection}`" if connection else "DEFAULT"
+
+ sql = f"""CREATE {replace_clause}MODEL `{project_id}.{dataset}.{model_name}`
+ REMOTE WITH CONNECTION {connection_clause}
+ OPTIONS (ENDPOINT = '{endpoint}');"""
+
+ return sql
+
+
+def list_presets() -> None:
+ """Print available model presets."""
+ print("\nAvailable model presets:")
+ print("-" * 60)
+ for name, config in MODEL_PRESETS.items():
+ print(f" {name:25} - {config['description']}")
+ print(f" {'':25} Endpoint: {config['endpoint']}")
+ print("-" * 60)
+
+
+def main():
+ parser = argparse.ArgumentParser(
+ description="Setup BigQuery remote models for AI operations",
+ formatter_class=argparse.RawDescriptionHelpFormatter,
+ epilog=__doc__,
+ )
+
+ parser.add_argument(
+ "--project-id",
+ required=True,
+ help="Google Cloud project ID",
+ )
+ parser.add_argument(
+ "--dataset",
+ default="ai_models",
+ help="BigQuery dataset name (default: ai_models)",
+ )
+ parser.add_argument(
+ "--model-name",
+ help="Name for the model (auto-generated if not specified)",
+ )
+ parser.add_argument(
+ "--endpoint",
+ help="Vertex AI endpoint (e.g., gemini-2.0-flash, text-embedding-005)",
+ )
+ parser.add_argument(
+ "--model-type",
+ choices=list(MODEL_PRESETS.keys()),
+ help="Use a preset model type",
+ )
+ parser.add_argument(
+ "--connection",
+ help="BigQuery connection ID (uses DEFAULT if not specified)",
+ )
+ parser.add_argument(
+ "--region",
+ default="us",
+ help="Region for connection (default: us)",
+ )
+ parser.add_argument(
+ "--list-presets",
+ action="store_true",
+ help="List available model presets",
+ )
+ parser.add_argument(
+ "--dry-run",
+ action="store_true",
+ help="Print SQL without executing",
+ )
+ parser.add_argument(
+ "--output",
+ choices=["sql", "json"],
+ default="sql",
+ help="Output format (default: sql)",
+ )
+
+ args = parser.parse_args()
+
+ if args.list_presets:
+ list_presets()
+ return 0
+
+ # Resolve endpoint from preset or direct specification
+ if args.model_type:
+ preset = MODEL_PRESETS[args.model_type]
+ endpoint = preset["endpoint"]
+ default_model_name = args.model_type.replace("-", "_") + "_model"
+ elif args.endpoint:
+ endpoint = args.endpoint
+ default_model_name = (
+ endpoint.replace("-", "_").replace("@", "_").replace(".", "_")
+ )
+ else:
+ print("Error: Either --model-type or --endpoint must be specified")
+ print("Use --list-presets to see available presets")
+ return 1
+
+ model_name = args.model_name or default_model_name
+
+ # Generate SQL
+ sql = generate_create_model_sql(
+ project_id=args.project_id,
+ dataset=args.dataset,
+ model_name=model_name,
+ endpoint=endpoint,
+ connection=args.connection,
+ replace=True,
+ )
+
+ if args.output == "json":
+ result = {
+ "project_id": args.project_id,
+ "dataset": args.dataset,
+ "model_name": model_name,
+ "endpoint": endpoint,
+ "connection": args.connection or "DEFAULT",
+ "sql": sql,
+ "dry_run": args.dry_run,
+ }
+ print(json.dumps(result, indent=2))
+ else:
+ print(f"\n-- Create remote model: {model_name}")
+ print(f"-- Endpoint: {endpoint}")
+ print(f"-- Connection: {args.connection or 'DEFAULT'}")
+ print()
+ print(sql)
+ print()
+
+ if args.dry_run:
+ print("-- Dry run mode: SQL not executed")
+ return 0
+
+ # Execute the SQL
+ try:
+ from google.cloud import bigquery
+
+ client = bigquery.Client(project=args.project_id)
+
+ print(f"Creating model {args.project_id}.{args.dataset}.{model_name}...")
+ query_job = client.query(sql)
+ query_job.result() # Wait for completion
+
+ print(f"Successfully created model: {model_name}")
+
+ # Verify the model
+ model_ref = f"{args.project_id}.{args.dataset}.{model_name}"
+ model = client.get_model(model_ref)
+ print(f"Model type: {model.model_type}")
+ print(f"Created: {model.created}")
+
+ return 0
+
+ except ImportError:
+ print("\n-- Note: google-cloud-bigquery not installed")
+ print("-- To execute, install it with: pip install google-cloud-bigquery")
+ print("-- Or run the SQL manually in BigQuery Console")
+ return 0
+
+ except Exception as e:
+ print(f"Error creating model: {e}", file=sys.stderr)
+ return 1
+
+
+if __name__ == "__main__":
+ sys.exit(main())
diff --git a/src/google/adk/skills/bigquery-analytics/SKILL.md b/src/google/adk/skills/bigquery-analytics/SKILL.md
new file mode 100644
index 0000000000..985e8672d7
--- /dev/null
+++ b/src/google/adk/skills/bigquery-analytics/SKILL.md
@@ -0,0 +1,551 @@
+---
+name: bigquery-analytics
+description: Execute advanced SQL analytics in BigQuery - window functions, aggregations, geospatial analysis, statistical functions, and BI integrations. Use when performing data analysis, building dashboards, or running complex SQL queries.
+license: Apache-2.0
+compatibility: BigQuery, Looker, Data Studio
+metadata:
+ author: Google Cloud
+ version: "1.0"
+ category: analytics
+adk:
+ config:
+ timeout_seconds: 600
+ max_parallel_calls: 10
+ allowed_callers:
+ - bigquery_agent
+ - analytics_agent
+ - bi_agent
+---
+
+# BigQuery Analytics Skill
+
+Execute advanced SQL analytics in BigQuery. This skill covers window functions, aggregations, geospatial analysis, statistical functions, and BI tool integrations.
+
+## When to Use This Skill
+
+Use this skill when you need to:
+- Write complex analytical SQL queries
+- Use window functions for rankings, running totals, and time-series analysis
+- Perform geospatial analysis on location data
+- Calculate statistical metrics and distributions
+- Build data for dashboards and BI tools
+- Optimize query performance for analytics workloads
+
+**Note**: For ML model training, use the `bqml` skill. For AI/text generation, use the `bigquery-ai` skill.
+
+## SQL Analytics Functions
+
+| Category | Functions | Use Cases |
+|----------|-----------|-----------|
+| **Aggregation** | SUM, AVG, COUNT, MIN, MAX | Basic metrics |
+| **Window** | ROW_NUMBER, RANK, LAG, LEAD | Rankings, time series |
+| **Statistical** | STDDEV, VARIANCE, CORR, PERCENTILE | Data distribution |
+| **Geospatial** | ST_DISTANCE, ST_CONTAINS, ST_AREA | Location analysis |
+| **Approximate** | APPROX_COUNT_DISTINCT, APPROX_QUANTILES | Large-scale estimates |
+
+## Quick Start
+
+### 1. Window Functions for Rankings
+
+```sql
+SELECT
+ product_name,
+ category,
+ revenue,
+ RANK() OVER (PARTITION BY category ORDER BY revenue DESC) AS category_rank,
+ revenue / SUM(revenue) OVER (PARTITION BY category) AS category_share
+FROM `project.dataset.sales`
+WHERE sale_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY);
+```
+
+### 2. Time Series Analysis
+
+```sql
+SELECT
+ sale_date,
+ daily_revenue,
+ AVG(daily_revenue) OVER (
+ ORDER BY sale_date
+ ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
+ ) AS rolling_7day_avg,
+ daily_revenue - LAG(daily_revenue, 7) OVER (ORDER BY sale_date) AS wow_change
+FROM (
+ SELECT DATE(sale_time) AS sale_date, SUM(amount) AS daily_revenue
+ FROM `project.dataset.transactions`
+ GROUP BY 1
+);
+```
+
+### 3. Geospatial Query
+
+```sql
+SELECT
+ store_name,
+ ST_DISTANCE(store_location, ST_GEOGPOINT(-122.4194, 37.7749)) / 1000 AS distance_km
+FROM `project.dataset.stores`
+WHERE ST_DWITHIN(store_location, ST_GEOGPOINT(-122.4194, 37.7749), 10000)
+ORDER BY distance_km;
+```
+
+## Window Functions
+
+### Ranking Functions
+
+```sql
+SELECT
+ employee_id,
+ department,
+ salary,
+ -- Dense rank (no gaps)
+ DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dense_rank,
+ -- Rank (with gaps for ties)
+ RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS rank,
+ -- Row number (unique)
+ ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS row_num,
+ -- Percentile rank
+ PERCENT_RANK() OVER (PARTITION BY department ORDER BY salary) AS percentile
+FROM `project.dataset.employees`;
+```
+
+### Navigation Functions
+
+```sql
+SELECT
+ user_id,
+ event_time,
+ event_type,
+ -- Previous event
+ LAG(event_type) OVER (PARTITION BY user_id ORDER BY event_time) AS prev_event,
+ -- Next event
+ LEAD(event_type) OVER (PARTITION BY user_id ORDER BY event_time) AS next_event,
+ -- First event in session
+ FIRST_VALUE(event_type) OVER (
+ PARTITION BY user_id ORDER BY event_time
+ ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
+ ) AS first_event,
+ -- Time since last event
+ TIMESTAMP_DIFF(
+ event_time,
+ LAG(event_time) OVER (PARTITION BY user_id ORDER BY event_time),
+ SECOND
+ ) AS seconds_since_last
+FROM `project.dataset.events`;
+```
+
+### Running Aggregates
+
+```sql
+SELECT
+ transaction_date,
+ amount,
+ -- Running total
+ SUM(amount) OVER (ORDER BY transaction_date) AS running_total,
+ -- Running average
+ AVG(amount) OVER (ORDER BY transaction_date) AS running_avg,
+ -- Running count
+ COUNT(*) OVER (ORDER BY transaction_date) AS running_count,
+ -- 30-day moving average
+ AVG(amount) OVER (
+ ORDER BY transaction_date
+ RANGE BETWEEN INTERVAL 29 DAY PRECEDING AND CURRENT ROW
+ ) AS moving_avg_30d
+FROM `project.dataset.transactions`;
+```
+
+### Frame Specifications
+
+```sql
+-- ROWS vs RANGE
+SELECT
+ date,
+ value,
+ -- ROWS: exact number of rows
+ AVG(value) OVER (ORDER BY date ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING) AS rows_avg,
+ -- RANGE: by value range (for dates/timestamps)
+ AVG(value) OVER (
+ ORDER BY date
+ RANGE BETWEEN INTERVAL 2 DAY PRECEDING AND INTERVAL 2 DAY FOLLOWING
+ ) AS range_avg
+FROM table;
+
+-- Common frame patterns
+-- ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW -- Running total
+-- ROWS BETWEEN 6 PRECEDING AND CURRENT ROW -- 7-day window
+-- ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING -- 3-point smoothing
+-- ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING -- Entire partition
+```
+
+## Aggregation Functions
+
+### Basic Aggregations
+
+```sql
+SELECT
+ category,
+ COUNT(*) AS total_count,
+ COUNT(DISTINCT customer_id) AS unique_customers,
+ SUM(amount) AS total_amount,
+ AVG(amount) AS avg_amount,
+ MIN(amount) AS min_amount,
+ MAX(amount) AS max_amount,
+ -- Conditional aggregation
+ COUNTIF(amount > 100) AS high_value_count,
+ SUMIF(amount, status = 'completed') AS completed_amount
+FROM `project.dataset.orders`
+GROUP BY category;
+```
+
+### GROUPING SETS, ROLLUP, CUBE
+
+```sql
+-- ROLLUP: Hierarchical totals
+SELECT
+ COALESCE(region, 'ALL REGIONS') AS region,
+ COALESCE(product_category, 'ALL CATEGORIES') AS category,
+ SUM(revenue) AS total_revenue
+FROM `project.dataset.sales`
+GROUP BY ROLLUP(region, product_category);
+
+-- CUBE: All combinations
+SELECT
+ COALESCE(region, 'ALL') AS region,
+ COALESCE(year, 'ALL') AS year,
+ SUM(revenue) AS total_revenue
+FROM `project.dataset.sales`
+GROUP BY CUBE(region, year);
+
+-- GROUPING SETS: Custom combinations
+SELECT
+ region,
+ product_category,
+ SUM(revenue) AS total_revenue
+FROM `project.dataset.sales`
+GROUP BY GROUPING SETS (
+ (region, product_category),
+ (region),
+ (product_category),
+ ()
+);
+```
+
+### Array Aggregations
+
+```sql
+SELECT
+ user_id,
+ -- Collect values into array
+ ARRAY_AGG(product_name) AS purchased_products,
+ -- Collect distinct values
+ ARRAY_AGG(DISTINCT category) AS categories,
+ -- Collect ordered values
+ ARRAY_AGG(product_name ORDER BY purchase_date DESC LIMIT 5) AS recent_products,
+ -- String aggregation
+ STRING_AGG(product_name, ', ') AS products_list
+FROM `project.dataset.purchases`
+GROUP BY user_id;
+```
+
+## Statistical Functions
+
+### Descriptive Statistics
+
+```sql
+SELECT
+ category,
+ COUNT(*) AS n,
+ AVG(value) AS mean,
+ STDDEV(value) AS std_dev,
+ VARIANCE(value) AS variance,
+ -- Coefficient of variation
+ STDDEV(value) / NULLIF(AVG(value), 0) AS cv,
+ -- Min/Max
+ MIN(value) AS min_val,
+ MAX(value) AS max_val,
+ -- Percentiles
+ APPROX_QUANTILES(value, 4)[OFFSET(2)] AS median,
+ APPROX_QUANTILES(value, 100)[OFFSET(25)] AS p25,
+ APPROX_QUANTILES(value, 100)[OFFSET(75)] AS p75
+FROM `project.dataset.metrics`
+GROUP BY category;
+```
+
+### Correlation and Covariance
+
+```sql
+SELECT
+ CORR(price, quantity) AS price_quantity_corr,
+ COVAR_POP(price, quantity) AS covariance_pop,
+ COVAR_SAMP(price, quantity) AS covariance_samp
+FROM `project.dataset.sales`;
+
+-- Correlation matrix
+WITH metrics AS (
+ SELECT metric_a, metric_b, metric_c FROM `project.dataset.data`
+)
+SELECT
+ 'metric_a' AS metric,
+ CORR(metric_a, metric_a) AS corr_a,
+ CORR(metric_a, metric_b) AS corr_b,
+ CORR(metric_a, metric_c) AS corr_c
+FROM metrics
+UNION ALL
+SELECT
+ 'metric_b',
+ CORR(metric_b, metric_a),
+ CORR(metric_b, metric_b),
+ CORR(metric_b, metric_c)
+FROM metrics;
+```
+
+### Distribution Analysis
+
+```sql
+-- Histogram buckets
+SELECT
+ FLOOR(value / 10) * 10 AS bucket,
+ COUNT(*) AS frequency,
+ REPEAT('*', CAST(COUNT(*) / 100 AS INT64)) AS histogram
+FROM `project.dataset.data`
+GROUP BY bucket
+ORDER BY bucket;
+
+-- Z-scores
+WITH stats AS (
+ SELECT AVG(value) AS mean, STDDEV(value) AS stddev
+ FROM `project.dataset.data`
+)
+SELECT
+ id,
+ value,
+ (value - mean) / NULLIF(stddev, 0) AS z_score
+FROM `project.dataset.data`, stats;
+```
+
+## Geospatial Analysis
+
+### Creating Geography Objects
+
+```sql
+-- Point from coordinates
+SELECT ST_GEOGPOINT(longitude, latitude) AS location
+FROM `project.dataset.places`;
+
+-- Well-Known Text (WKT)
+SELECT ST_GEOGFROMTEXT('POLYGON((0 0, 1 0, 1 1, 0 1, 0 0))') AS polygon;
+
+-- GeoJSON
+SELECT ST_GEOGFROMGEOJSON('{"type":"Point","coordinates":[-122.4194,37.7749]}');
+```
+
+### Distance and Area
+
+```sql
+SELECT
+ store_a.name AS store_a,
+ store_b.name AS store_b,
+ -- Distance in meters
+ ST_DISTANCE(store_a.location, store_b.location) AS distance_m,
+ -- Distance in kilometers
+ ST_DISTANCE(store_a.location, store_b.location) / 1000 AS distance_km
+FROM `project.dataset.stores` store_a
+CROSS JOIN `project.dataset.stores` store_b
+WHERE store_a.id < store_b.id;
+
+-- Area of polygons
+SELECT
+ region_name,
+ ST_AREA(boundary) / 1000000 AS area_sq_km
+FROM `project.dataset.regions`;
+```
+
+### Spatial Queries
+
+```sql
+-- Find points within distance
+SELECT store_name
+FROM `project.dataset.stores`
+WHERE ST_DWITHIN(
+ location,
+ ST_GEOGPOINT(-122.4194, 37.7749), -- San Francisco
+ 5000 -- 5km radius
+);
+
+-- Find points within polygon
+SELECT customer_id
+FROM `project.dataset.customers`
+WHERE ST_CONTAINS(
+ (SELECT boundary FROM `project.dataset.regions` WHERE name = 'Bay Area'),
+ customer_location
+);
+
+-- Nearest neighbor
+SELECT
+ customer_id,
+ ARRAY_AGG(
+ store_name
+ ORDER BY ST_DISTANCE(customer_location, store_location)
+ LIMIT 3
+ ) AS nearest_stores
+FROM `project.dataset.customers`
+CROSS JOIN `project.dataset.stores`
+GROUP BY customer_id;
+```
+
+### Geospatial Joins
+
+```sql
+-- Assign customers to regions
+SELECT
+ c.customer_id,
+ r.region_name
+FROM `project.dataset.customers` c
+JOIN `project.dataset.regions` r
+ ON ST_CONTAINS(r.boundary, c.location);
+
+-- Find overlapping areas
+SELECT
+ a.name AS area_a,
+ b.name AS area_b,
+ ST_AREA(ST_INTERSECTION(a.boundary, b.boundary)) AS overlap_area
+FROM `project.dataset.zones` a
+JOIN `project.dataset.zones` b
+ ON ST_INTERSECTS(a.boundary, b.boundary)
+ AND a.id < b.id;
+```
+
+## Approximate Aggregations
+
+For large-scale analytics where exact results aren't required:
+
+```sql
+SELECT
+ -- Approximate count distinct (HyperLogLog++)
+ APPROX_COUNT_DISTINCT(user_id) AS approx_unique_users,
+ -- Exact for comparison
+ COUNT(DISTINCT user_id) AS exact_unique_users,
+ -- Approximate quantiles
+ APPROX_QUANTILES(amount, 100)[OFFSET(50)] AS approx_median,
+ -- Approximate top count
+ APPROX_TOP_COUNT(category, 10) AS top_categories,
+ -- Approximate top sum
+ APPROX_TOP_SUM(product_name, revenue, 10) AS top_products_by_revenue
+FROM `project.dataset.transactions`;
+```
+
+## Common Analytical Patterns
+
+### Cohort Analysis
+
+```sql
+WITH user_cohorts AS (
+ SELECT
+ user_id,
+ DATE_TRUNC(first_purchase_date, MONTH) AS cohort_month
+ FROM `project.dataset.users`
+),
+monthly_activity AS (
+ SELECT
+ user_id,
+ DATE_TRUNC(activity_date, MONTH) AS activity_month
+ FROM `project.dataset.activity`
+)
+SELECT
+ c.cohort_month,
+ DATE_DIFF(a.activity_month, c.cohort_month, MONTH) AS months_since_cohort,
+ COUNT(DISTINCT a.user_id) AS active_users,
+ COUNT(DISTINCT a.user_id) / (
+ SELECT COUNT(DISTINCT user_id)
+ FROM user_cohorts
+ WHERE cohort_month = c.cohort_month
+ ) AS retention_rate
+FROM user_cohorts c
+JOIN monthly_activity a ON c.user_id = a.user_id
+GROUP BY 1, 2
+ORDER BY 1, 2;
+```
+
+### Funnel Analysis
+
+```sql
+WITH funnel AS (
+ SELECT
+ user_id,
+ MAX(IF(event_name = 'page_view', 1, 0)) AS step_1_view,
+ MAX(IF(event_name = 'add_to_cart', 1, 0)) AS step_2_cart,
+ MAX(IF(event_name = 'checkout', 1, 0)) AS step_3_checkout,
+ MAX(IF(event_name = 'purchase', 1, 0)) AS step_4_purchase
+ FROM `project.dataset.events`
+ WHERE event_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
+ GROUP BY user_id
+)
+SELECT
+ COUNT(*) AS total_users,
+ SUM(step_1_view) AS viewed,
+ SUM(step_2_cart) AS added_to_cart,
+ SUM(step_3_checkout) AS checked_out,
+ SUM(step_4_purchase) AS purchased,
+ SAFE_DIVIDE(SUM(step_2_cart), SUM(step_1_view)) AS view_to_cart_rate,
+ SAFE_DIVIDE(SUM(step_4_purchase), SUM(step_1_view)) AS conversion_rate
+FROM funnel;
+```
+
+### Year-over-Year Comparison
+
+```sql
+SELECT
+ FORMAT_DATE('%Y-%m', date) AS month,
+ SUM(revenue) AS revenue,
+ SUM(IF(EXTRACT(YEAR FROM date) = EXTRACT(YEAR FROM CURRENT_DATE()), revenue, 0)) AS current_year,
+ SUM(IF(EXTRACT(YEAR FROM date) = EXTRACT(YEAR FROM CURRENT_DATE()) - 1, revenue, 0)) AS prior_year,
+ SAFE_DIVIDE(
+ SUM(IF(EXTRACT(YEAR FROM date) = EXTRACT(YEAR FROM CURRENT_DATE()), revenue, 0)),
+ SUM(IF(EXTRACT(YEAR FROM date) = EXTRACT(YEAR FROM CURRENT_DATE()) - 1, revenue, 0))
+ ) - 1 AS yoy_growth
+FROM `project.dataset.sales`
+GROUP BY 1
+ORDER BY 1;
+```
+
+## Query Optimization
+
+### Best Practices
+
+1. **Filter early**: Apply WHERE clauses as early as possible
+2. **Select only needed columns**: Avoid SELECT *
+3. **Use approximate functions**: For large-scale analytics
+4. **Partition pruning**: Always filter on partition column
+5. **Avoid CROSS JOINs**: Unless necessary
+
+### Analyzing Query Performance
+
+```sql
+-- Check bytes processed
+SELECT @@project_id;
+
+-- Use EXPLAIN to understand query plan
+EXPLAIN
+SELECT * FROM `project.dataset.table`
+WHERE date_column = CURRENT_DATE();
+```
+
+## References
+
+Load detailed documentation as needed:
+
+- `WINDOW_FUNCTIONS.md` - Complete window function reference
+- `GEOSPATIAL.md` - Advanced geospatial operations
+- `OPTIMIZATION.md` - Query performance tuning
+
+## Scripts
+
+Helper scripts for common operations:
+
+- `query_analyzer.py` - Analyze query performance
+- `data_profiler.py` - Generate data profiling reports
+
+## Limitations
+
+- Window functions process all rows before returning
+- Geospatial functions have precision limits
+- Approximate functions have error margins
+- Large CROSS JOINs can be expensive
diff --git a/src/google/adk/skills/bigquery-analytics/references/GEOSPATIAL.md b/src/google/adk/skills/bigquery-analytics/references/GEOSPATIAL.md
new file mode 100644
index 0000000000..887ecfeaab
--- /dev/null
+++ b/src/google/adk/skills/bigquery-analytics/references/GEOSPATIAL.md
@@ -0,0 +1,406 @@
+# BigQuery Geospatial Reference
+
+Complete guide to geospatial functions and analysis in BigQuery.
+
+## Geography Data Types
+
+BigQuery supports the `GEOGRAPHY` type for geospatial data.
+
+### Creating Geography Objects
+
+```sql
+-- Point from coordinates (longitude, latitude)
+SELECT ST_GEOGPOINT(-122.4194, 37.7749) AS san_francisco;
+
+-- From Well-Known Text (WKT)
+SELECT ST_GEOGFROMTEXT('POINT(-122.4194 37.7749)') AS point;
+SELECT ST_GEOGFROMTEXT('LINESTRING(0 0, 1 1, 2 0)') AS line;
+SELECT ST_GEOGFROMTEXT('POLYGON((0 0, 1 0, 1 1, 0 1, 0 0))') AS polygon;
+
+-- From GeoJSON
+SELECT ST_GEOGFROMGEOJSON('{"type":"Point","coordinates":[-122.4194,37.7749]}');
+
+-- From WKB (Well-Known Binary)
+SELECT ST_GEOGFROMWKB(wkb_column) FROM table;
+```
+
+### Geography Constructors
+
+| Function | Description | Example |
+|----------|-------------|---------|
+| `ST_GEOGPOINT(lng, lat)` | Create point | `ST_GEOGPOINT(-122, 37)` |
+| `ST_MAKELINE(points)` | Create line from points | `ST_MAKELINE([p1, p2, p3])` |
+| `ST_MAKEPOLYGON(ring)` | Create polygon from ring | `ST_MAKEPOLYGON(line)` |
+| `ST_GEOGFROMTEXT(wkt)` | Parse WKT | `ST_GEOGFROMTEXT('POINT(0 0)')` |
+| `ST_GEOGFROMGEOJSON(json)` | Parse GeoJSON | `ST_GEOGFROMGEOJSON(json_col)` |
+
+## Measurement Functions
+
+### Distance
+
+```sql
+-- Distance in meters
+SELECT ST_DISTANCE(
+ ST_GEOGPOINT(-122.4194, 37.7749), -- San Francisco
+ ST_GEOGPOINT(-118.2437, 34.0522) -- Los Angeles
+) AS distance_meters;
+-- Returns: ~559,044 meters
+
+-- Distance in kilometers
+SELECT ST_DISTANCE(point_a, point_b) / 1000 AS distance_km;
+
+-- Distance in miles
+SELECT ST_DISTANCE(point_a, point_b) / 1609.34 AS distance_miles;
+```
+
+### Length and Perimeter
+
+```sql
+-- Length of a line (meters)
+SELECT ST_LENGTH(ST_GEOGFROMTEXT('LINESTRING(0 0, 1 1, 2 0)')) AS length;
+
+-- Perimeter of a polygon (meters)
+SELECT ST_PERIMETER(ST_GEOGFROMTEXT('POLYGON((0 0, 1 0, 1 1, 0 1, 0 0))'));
+```
+
+### Area
+
+```sql
+-- Area in square meters
+SELECT ST_AREA(ST_GEOGFROMTEXT('POLYGON((0 0, 1 0, 1 1, 0 1, 0 0))'));
+
+-- Area in square kilometers
+SELECT ST_AREA(boundary) / 1000000 AS area_sq_km FROM regions;
+
+-- Area in acres
+SELECT ST_AREA(boundary) / 4046.86 AS area_acres FROM parcels;
+```
+
+## Spatial Relationships
+
+### Containment and Intersection
+
+```sql
+-- Point in polygon
+SELECT ST_CONTAINS(polygon, point); -- TRUE if point is inside polygon
+
+-- Covers (includes boundary)
+SELECT ST_COVERS(polygon, point); -- TRUE if point is inside or on boundary
+
+-- Intersects
+SELECT ST_INTERSECTS(geog1, geog2); -- TRUE if geometries share any point
+
+-- Disjoint
+SELECT ST_DISJOINT(geog1, geog2); -- TRUE if geometries don't touch
+
+-- Touches (only boundaries touch)
+SELECT ST_TOUCHES(geog1, geog2); -- TRUE if only boundaries meet
+
+-- Within
+SELECT ST_WITHIN(point, polygon); -- TRUE if point is inside polygon
+```
+
+### Distance Relationships
+
+```sql
+-- Within distance
+SELECT ST_DWITHIN(
+ store_location,
+ customer_location,
+ 5000 -- 5km in meters
+); -- TRUE if within 5km
+
+-- Find all stores within 10km
+SELECT store_name
+FROM stores
+WHERE ST_DWITHIN(
+ location,
+ ST_GEOGPOINT(-122.4194, 37.7749),
+ 10000
+);
+```
+
+## Spatial Operations
+
+### Intersection and Union
+
+```sql
+-- Intersection of two geometries
+SELECT ST_INTERSECTION(polygon_a, polygon_b) AS overlap;
+
+-- Union of geometries
+SELECT ST_UNION(geog1, geog2) AS combined;
+
+-- Union aggregate (combine many geometries)
+SELECT ST_UNION_AGG(boundary) AS merged_boundary
+FROM regions
+WHERE state = 'California';
+```
+
+### Buffers
+
+```sql
+-- Create buffer around point (radius in meters)
+SELECT ST_BUFFER(
+ ST_GEOGPOINT(-122.4194, 37.7749),
+ 1000 -- 1km radius
+) AS buffer_zone;
+
+-- Create buffer around line
+SELECT ST_BUFFER(route_line, 100) AS corridor; -- 100m buffer
+```
+
+### Simplification
+
+```sql
+-- Simplify geometry (reduce vertices)
+SELECT ST_SIMPLIFY(complex_polygon, 100); -- tolerance in meters
+
+-- Convex hull
+SELECT ST_CONVEXHULL(multi_point) AS hull;
+```
+
+### Centroid and Boundary
+
+```sql
+-- Centroid (center point)
+SELECT ST_CENTROID(polygon) AS center;
+
+-- Boundary of polygon
+SELECT ST_BOUNDARY(polygon) AS boundary_line;
+
+-- Bounding box
+SELECT ST_BOUNDINGBOX(geography) AS bbox;
+```
+
+## Accessors
+
+```sql
+-- Get coordinates
+SELECT ST_X(point) AS longitude; -- -122.4194
+SELECT ST_Y(point) AS latitude; -- 37.7749
+
+-- Number of points
+SELECT ST_NUMPOINTS(line) AS point_count;
+
+-- Check geometry type
+SELECT ST_GEOMETRYTYPE(geog) AS geom_type; -- 'ST_Point', 'ST_Polygon', etc.
+
+-- Check if valid
+SELECT ST_ISVALID(geog) AS is_valid;
+
+-- Check if empty
+SELECT ST_ISEMPTY(geog) AS is_empty;
+
+-- Dimension
+SELECT ST_DIMENSION(geog) AS dim; -- 0=point, 1=line, 2=polygon
+```
+
+## Output Functions
+
+```sql
+-- Convert to GeoJSON
+SELECT ST_ASGEOJSON(geography) AS geojson;
+
+-- Convert to WKT
+SELECT ST_ASTEXT(geography) AS wkt;
+
+-- Convert to WKB
+SELECT ST_ASBINARY(geography) AS wkb;
+```
+
+## Clustering and Aggregation
+
+### Geographic Clustering
+
+```sql
+-- Cluster points by grid
+SELECT
+ ST_SNAPTOGRID(location, 0.01) AS grid_cell, -- ~1km grid
+ COUNT(*) AS point_count
+FROM locations
+GROUP BY grid_cell;
+
+-- Geohash clustering
+SELECT
+ ST_GEOHASH(location, 5) AS geohash, -- precision 5 (~5km)
+ COUNT(*) AS count
+FROM locations
+GROUP BY geohash;
+```
+
+### Aggregate Functions
+
+```sql
+-- Collect points into multipoint
+SELECT ST_UNION_AGG(location) AS all_points
+FROM stores
+WHERE region = 'West';
+
+-- Centroid of all points
+SELECT ST_CENTROID(ST_UNION_AGG(location)) AS center
+FROM stores;
+
+-- Bounding box of all geometries
+SELECT ST_BOUNDINGBOX(ST_UNION_AGG(boundary))
+FROM regions;
+```
+
+## Spatial Joins
+
+### Point in Polygon Join
+
+```sql
+SELECT
+ c.customer_id,
+ r.region_name
+FROM customers c
+JOIN regions r
+ ON ST_CONTAINS(r.boundary, c.location);
+```
+
+### Nearest Neighbor
+
+```sql
+-- Find nearest store for each customer
+SELECT
+ c.customer_id,
+ (
+ SELECT s.store_name
+ FROM stores s
+ ORDER BY ST_DISTANCE(c.location, s.location)
+ LIMIT 1
+ ) AS nearest_store
+FROM customers c;
+
+-- With distance
+SELECT
+ c.customer_id,
+ s.store_name,
+ ST_DISTANCE(c.location, s.location) AS distance_m
+FROM customers c
+CROSS JOIN stores s
+QUALIFY ROW_NUMBER() OVER (
+ PARTITION BY c.customer_id
+ ORDER BY ST_DISTANCE(c.location, s.location)
+) = 1;
+```
+
+### K Nearest Neighbors
+
+```sql
+SELECT
+ customer_id,
+ ARRAY_AGG(
+ STRUCT(store_name, distance_m)
+ ORDER BY distance_m
+ LIMIT 3
+ ) AS nearest_3_stores
+FROM (
+ SELECT
+ c.customer_id,
+ s.store_name,
+ ST_DISTANCE(c.location, s.location) AS distance_m
+ FROM customers c
+ CROSS JOIN stores s
+)
+GROUP BY customer_id;
+```
+
+## Public Datasets
+
+BigQuery has several public geospatial datasets:
+
+```sql
+-- US ZIP codes
+SELECT * FROM `bigquery-public-data.geo_us_boundaries.zip_codes`;
+
+-- US Census tracts
+SELECT * FROM `bigquery-public-data.geo_census_tracts.us_census_tracts_national`;
+
+-- OpenStreetMap
+SELECT * FROM `bigquery-public-data.geo_openstreetmap.planet_features`;
+
+-- World country boundaries
+SELECT * FROM `bigquery-public-data.geo_international_ports.world_port_index`;
+```
+
+## Performance Optimization
+
+### Index Usage
+
+BigQuery automatically indexes GEOGRAPHY columns. Optimize by:
+
+1. Using `ST_DWITHIN` instead of `ST_DISTANCE < threshold`
+2. Using `ST_INTERSECTS` with bounding boxes
+3. Pre-filtering with geohash
+
+### Pre-filtering Example
+
+```sql
+-- Efficient: use spatial predicate
+SELECT * FROM locations
+WHERE ST_DWITHIN(location, @query_point, 10000);
+
+-- Less efficient: compute all distances then filter
+SELECT * FROM locations
+WHERE ST_DISTANCE(location, @query_point) < 10000;
+```
+
+### Geohash Pre-filter
+
+```sql
+-- Pre-filter with geohash before expensive spatial operations
+WITH candidates AS (
+ SELECT *
+ FROM locations
+ WHERE ST_GEOHASH(location, 4) IN (
+ ST_GEOHASH(@query_point, 4),
+ -- Include adjacent cells
+ 'abc1', 'abc2', 'abc3'
+ )
+)
+SELECT *
+FROM candidates
+WHERE ST_DWITHIN(location, @query_point, 5000);
+```
+
+## Common Patterns
+
+### Service Area Analysis
+
+```sql
+-- Find customers within each store's service radius
+SELECT
+ s.store_id,
+ COUNT(c.customer_id) AS customers_in_area
+FROM stores s
+LEFT JOIN customers c
+ ON ST_DWITHIN(s.location, c.location, s.service_radius_m)
+GROUP BY s.store_id;
+```
+
+### Route Analysis
+
+```sql
+-- Calculate total route distance
+SELECT
+ route_id,
+ ST_LENGTH(route_line) AS total_distance_m,
+ ST_NUMPOINTS(route_line) AS waypoints
+FROM routes;
+```
+
+### Hotspot Analysis
+
+```sql
+-- Identify dense clusters
+SELECT
+ ST_GEOHASH(location, 6) AS cell,
+ COUNT(*) AS incident_count,
+ ST_CENTROID(ST_UNION_AGG(location)) AS cell_center
+FROM incidents
+GROUP BY cell
+HAVING COUNT(*) > 10
+ORDER BY incident_count DESC;
+```
diff --git a/src/google/adk/skills/bigquery-analytics/references/WINDOW_FUNCTIONS.md b/src/google/adk/skills/bigquery-analytics/references/WINDOW_FUNCTIONS.md
new file mode 100644
index 0000000000..327948806b
--- /dev/null
+++ b/src/google/adk/skills/bigquery-analytics/references/WINDOW_FUNCTIONS.md
@@ -0,0 +1,386 @@
+# BigQuery Window Functions Reference
+
+Complete guide to analytical window functions in BigQuery.
+
+## Window Function Syntax
+
+```sql
+function_name(expression) OVER (
+ [PARTITION BY partition_expression [, ...]]
+ [ORDER BY sort_expression [ASC|DESC] [NULLS {FIRST|LAST}] [, ...]]
+ [window_frame_clause]
+)
+```
+
+## Ranking Functions
+
+### ROW_NUMBER
+
+Assigns unique sequential integers to rows.
+
+```sql
+SELECT
+ name,
+ department,
+ salary,
+ ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS row_num
+FROM employees;
+-- Result: 1, 2, 3, 4, 5... (no ties)
+```
+
+### RANK
+
+Assigns rank with gaps for ties.
+
+```sql
+SELECT
+ name,
+ score,
+ RANK() OVER (ORDER BY score DESC) AS rank
+FROM players;
+-- Scores: 100, 95, 95, 90 -> Ranks: 1, 2, 2, 4 (gap at 3)
+```
+
+### DENSE_RANK
+
+Assigns rank without gaps for ties.
+
+```sql
+SELECT
+ name,
+ score,
+ DENSE_RANK() OVER (ORDER BY score DESC) AS dense_rank
+FROM players;
+-- Scores: 100, 95, 95, 90 -> Ranks: 1, 2, 2, 3 (no gap)
+```
+
+### NTILE
+
+Divides rows into N buckets.
+
+```sql
+SELECT
+ customer_id,
+ total_spent,
+ NTILE(4) OVER (ORDER BY total_spent DESC) AS quartile
+FROM customers;
+-- Assigns 1, 2, 3, or 4 to each row
+```
+
+### PERCENT_RANK
+
+Returns percentile rank (0 to 1).
+
+```sql
+SELECT
+ name,
+ salary,
+ PERCENT_RANK() OVER (ORDER BY salary) AS percentile
+FROM employees;
+-- Returns values between 0 and 1
+```
+
+### CUME_DIST
+
+Returns cumulative distribution.
+
+```sql
+SELECT
+ name,
+ salary,
+ CUME_DIST() OVER (ORDER BY salary) AS cumulative_distribution
+FROM employees;
+-- Returns fraction of rows <= current row
+```
+
+## Navigation Functions
+
+### LAG
+
+Access value from previous row.
+
+```sql
+SELECT
+ date,
+ value,
+ LAG(value, 1) OVER (ORDER BY date) AS prev_value,
+ LAG(value, 7, 0) OVER (ORDER BY date) AS week_ago_value -- with default
+FROM metrics;
+```
+
+### LEAD
+
+Access value from following row.
+
+```sql
+SELECT
+ date,
+ value,
+ LEAD(value, 1) OVER (ORDER BY date) AS next_value,
+ value - LEAD(value) OVER (ORDER BY date) AS change_to_next
+FROM metrics;
+```
+
+### FIRST_VALUE
+
+Get first value in window.
+
+```sql
+SELECT
+ user_id,
+ event_time,
+ event_type,
+ FIRST_VALUE(event_type) OVER (
+ PARTITION BY user_id
+ ORDER BY event_time
+ ) AS first_event
+FROM events;
+```
+
+### LAST_VALUE
+
+Get last value in window (requires frame specification).
+
+```sql
+SELECT
+ user_id,
+ event_time,
+ event_type,
+ LAST_VALUE(event_type) OVER (
+ PARTITION BY user_id
+ ORDER BY event_time
+ ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
+ ) AS last_event
+FROM events;
+```
+
+### NTH_VALUE
+
+Get Nth value in window.
+
+```sql
+SELECT
+ user_id,
+ NTH_VALUE(product_name, 2) OVER (
+ PARTITION BY user_id
+ ORDER BY purchase_date
+ ) AS second_purchase
+FROM purchases;
+```
+
+## Aggregate Window Functions
+
+All aggregate functions can be used as window functions.
+
+### Running Totals
+
+```sql
+SELECT
+ date,
+ amount,
+ SUM(amount) OVER (ORDER BY date) AS running_total,
+ COUNT(*) OVER (ORDER BY date) AS running_count,
+ AVG(amount) OVER (ORDER BY date) AS running_avg
+FROM transactions;
+```
+
+### Partition Totals
+
+```sql
+SELECT
+ department,
+ employee,
+ salary,
+ SUM(salary) OVER (PARTITION BY department) AS dept_total,
+ salary / SUM(salary) OVER (PARTITION BY department) AS salary_share
+FROM employees;
+```
+
+### Moving Averages
+
+```sql
+SELECT
+ date,
+ value,
+ -- 7-day moving average
+ AVG(value) OVER (
+ ORDER BY date
+ ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
+ ) AS ma_7,
+ -- Centered moving average
+ AVG(value) OVER (
+ ORDER BY date
+ ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING
+ ) AS ma_centered_7
+FROM daily_metrics;
+```
+
+## Window Frame Specifications
+
+### ROWS vs RANGE
+
+```sql
+-- ROWS: Physical row offset
+SELECT
+ date,
+ value,
+ SUM(value) OVER (
+ ORDER BY date
+ ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
+ ) AS sum_3_rows
+FROM data;
+
+-- RANGE: Logical value range
+SELECT
+ date,
+ value,
+ SUM(value) OVER (
+ ORDER BY date
+ RANGE BETWEEN INTERVAL 2 DAY PRECEDING AND CURRENT ROW
+ ) AS sum_3_days
+FROM data;
+```
+
+### Frame Boundaries
+
+| Boundary | Description |
+|----------|-------------|
+| `UNBOUNDED PRECEDING` | Start of partition |
+| `n PRECEDING` | n rows/range before current |
+| `CURRENT ROW` | Current row |
+| `n FOLLOWING` | n rows/range after current |
+| `UNBOUNDED FOLLOWING` | End of partition |
+
+### Common Frame Patterns
+
+```sql
+-- Running total (default with ORDER BY)
+ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
+
+-- Entire partition
+ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
+
+-- 7-day rolling window
+ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
+
+-- Centered window
+ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING
+
+-- Previous row only
+ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING
+
+-- Future rows only
+ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
+```
+
+## Named Windows
+
+Define reusable window specifications.
+
+```sql
+SELECT
+ date,
+ value,
+ SUM(value) OVER rolling_week AS weekly_sum,
+ AVG(value) OVER rolling_week AS weekly_avg,
+ MAX(value) OVER rolling_week AS weekly_max
+FROM metrics
+WINDOW rolling_week AS (
+ ORDER BY date
+ ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
+);
+```
+
+## Practical Examples
+
+### Gap and Island Detection
+
+```sql
+WITH numbered AS (
+ SELECT
+ user_id,
+ login_date,
+ login_date - INTERVAL ROW_NUMBER() OVER (
+ PARTITION BY user_id ORDER BY login_date
+ ) DAY AS grp
+ FROM logins
+)
+SELECT
+ user_id,
+ MIN(login_date) AS streak_start,
+ MAX(login_date) AS streak_end,
+ COUNT(*) AS streak_days
+FROM numbered
+GROUP BY user_id, grp
+ORDER BY user_id, streak_start;
+```
+
+### Session Detection
+
+```sql
+WITH events_with_prev AS (
+ SELECT
+ user_id,
+ event_time,
+ LAG(event_time) OVER (PARTITION BY user_id ORDER BY event_time) AS prev_time
+ FROM events
+),
+session_starts AS (
+ SELECT
+ *,
+ CASE
+ WHEN prev_time IS NULL THEN 1
+ WHEN TIMESTAMP_DIFF(event_time, prev_time, MINUTE) > 30 THEN 1
+ ELSE 0
+ END AS is_session_start
+ FROM events_with_prev
+)
+SELECT
+ *,
+ SUM(is_session_start) OVER (
+ PARTITION BY user_id ORDER BY event_time
+ ) AS session_id
+FROM session_starts;
+```
+
+### Top N per Group
+
+```sql
+WITH ranked AS (
+ SELECT
+ *,
+ ROW_NUMBER() OVER (PARTITION BY category ORDER BY sales DESC) AS rn
+ FROM products
+)
+SELECT * FROM ranked WHERE rn <= 3;
+```
+
+### Running Difference
+
+```sql
+SELECT
+ date,
+ value,
+ value - LAG(value) OVER (ORDER BY date) AS daily_change,
+ (value - LAG(value) OVER (ORDER BY date)) / NULLIF(LAG(value) OVER (ORDER BY date), 0) AS pct_change
+FROM daily_metrics;
+```
+
+### Cumulative Distribution
+
+```sql
+SELECT
+ product_id,
+ revenue,
+ SUM(revenue) OVER (ORDER BY revenue DESC) AS cumulative_revenue,
+ SUM(revenue) OVER (ORDER BY revenue DESC) / SUM(revenue) OVER () AS cumulative_pct
+FROM products;
+```
+
+## Performance Tips
+
+1. **Minimize partitions**: Large partitions require more memory
+2. **Use bounded frames**: Avoid UNBOUNDED when possible
+3. **Pre-filter data**: Apply WHERE before window functions
+4. **Index considerations**: ORDER BY columns benefit from clustering
+5. **Avoid unnecessary ORDER BY**: Only include when needed
diff --git a/src/google/adk/skills/bigquery-analytics/scripts/query_analyzer.py b/src/google/adk/skills/bigquery-analytics/scripts/query_analyzer.py
new file mode 100644
index 0000000000..625554918b
--- /dev/null
+++ b/src/google/adk/skills/bigquery-analytics/scripts/query_analyzer.py
@@ -0,0 +1,284 @@
+"""Analyze BigQuery query performance and suggest optimizations.
+
+This script helps analyze query execution plans and provides
+recommendations for improving query performance.
+
+Usage:
+ python query_analyzer.py --project PROJECT --query "SELECT ..."
+ python query_analyzer.py --project PROJECT --job-id JOB_ID
+"""
+
+import argparse
+import json
+import sys
+from typing import Any
+
+
+def analyze_dry_run(project: str, query: str) -> dict:
+ """Perform dry run analysis of a query."""
+ try:
+ from google.cloud import bigquery
+
+ client = bigquery.Client(project=project)
+
+ job_config = bigquery.QueryJobConfig(dry_run=True, use_query_cache=False)
+ query_job = client.query(query, job_config=job_config)
+
+ return {
+ "total_bytes_processed": query_job.total_bytes_processed,
+ "total_bytes_billed": query_job.total_bytes_billed,
+ "estimated_cost_usd": query_job.total_bytes_billed / (1024**4) * 5,
+ "referenced_tables": [
+ f"{t.project}.{t.dataset_id}.{t.table_id}"
+ for t in (query_job.referenced_tables or [])
+ ],
+ "schema": [
+ {"name": f.name, "type": f.field_type}
+ for f in (query_job.schema or [])
+ ],
+ }
+ except Exception as e:
+ return {"error": str(e)}
+
+
+def analyze_job(project: str, job_id: str) -> dict:
+ """Analyze a completed job's performance."""
+ try:
+ from google.cloud import bigquery
+
+ client = bigquery.Client(project=project)
+ job = client.get_job(job_id)
+
+ stats = job.query_plan if hasattr(job, "query_plan") else []
+ timeline = job.timeline if hasattr(job, "timeline") else []
+
+ analysis = {
+ "job_id": job_id,
+ "state": job.state,
+ "total_bytes_processed": job.total_bytes_processed,
+ "total_bytes_billed": job.total_bytes_billed,
+ "slot_millis": getattr(job, "slot_millis", None),
+ "cache_hit": getattr(job, "cache_hit", False),
+ "creation_time": str(job.created),
+ "start_time": str(job.started) if job.started else None,
+ "end_time": str(job.ended) if job.ended else None,
+ "execution_ms": (
+ (job.ended - job.started).total_seconds() * 1000
+ if job.ended and job.started
+ else None
+ ),
+ "stage_count": len(stats),
+ "stages": [],
+ }
+
+ for stage in stats:
+ stage_info = {
+ "name": stage.name,
+ "id": stage.id,
+ "status": stage.status,
+ "input_stages": (
+ list(stage.input_stages) if stage.input_stages else []
+ ),
+ "records_read": stage.records_read,
+ "records_written": stage.records_written,
+ "shuffle_output_bytes": stage.shuffle_output_bytes,
+ "steps": [],
+ }
+
+ for step in stage.steps or []:
+ stage_info["steps"].append({
+ "kind": step.kind,
+ "substeps": list(step.substeps) if step.substeps else [],
+ })
+
+ analysis["stages"].append(stage_info)
+
+ return analysis
+ except Exception as e:
+ return {"error": str(e)}
+
+
+def suggest_optimizations(analysis: dict) -> list:
+ """Generate optimization suggestions based on analysis."""
+ suggestions = []
+
+ # Check bytes processed
+ bytes_processed = analysis.get("total_bytes_processed", 0)
+ if bytes_processed and bytes_processed > 10 * 1024**3: # 10 GB
+ suggestions.append({
+ "severity": "HIGH",
+ "category": "Data Volume",
+ "suggestion": (
+ f"Query processes {bytes_processed / 1024**3:.2f} GB. "
+ "Consider partitioning/clustering tables, or adding filters."
+ ),
+ })
+
+ # Check for cache hit
+ if analysis.get("cache_hit") is False:
+ suggestions.append({
+ "severity": "LOW",
+ "category": "Caching",
+ "suggestion": (
+ "Query didn't hit cache. For repeated queries, ensure "
+ "deterministic queries to benefit from caching."
+ ),
+ })
+
+ # Check slot usage
+ slot_millis = analysis.get("slot_millis")
+ exec_ms = analysis.get("execution_ms")
+ if slot_millis and exec_ms and exec_ms > 0:
+ parallelism = slot_millis / exec_ms
+ if parallelism < 10:
+ suggestions.append({
+ "severity": "MEDIUM",
+ "category": "Parallelism",
+ "suggestion": (
+ f"Low parallelism detected ({parallelism:.1f}x). "
+ "Query may be bottlenecked on sequential operations."
+ ),
+ })
+
+ # Check stages
+ stages = analysis.get("stages", [])
+ for stage in stages:
+ # Large shuffle
+ shuffle_bytes = stage.get("shuffle_output_bytes", 0)
+ if shuffle_bytes and shuffle_bytes > 1 * 1024**3: # 1 GB
+ suggestions.append({
+ "severity": "MEDIUM",
+ "category": "Shuffle",
+ "suggestion": (
+ f"Stage '{stage['name']}' has large shuffle "
+ f"({shuffle_bytes / 1024**3:.2f} GB). "
+ "Consider reducing data before joins/aggregations."
+ ),
+ })
+
+ # Check for expensive operations
+ for step in stage.get("steps", []):
+ kind = step.get("kind", "")
+ if kind == "CROSS_JOIN":
+ suggestions.append({
+ "severity": "HIGH",
+ "category": "Query Pattern",
+ "suggestion": (
+ f"CROSS JOIN detected in stage '{stage['name']}'. "
+ "This can be very expensive. Consider adding join predicates."
+ ),
+ })
+ if kind == "SORT":
+ suggestions.append({
+ "severity": "LOW",
+ "category": "Sorting",
+ "suggestion": (
+ f"Sort operation in stage '{stage['name']}'. "
+ "ORDER BY on large results is expensive. "
+ "Add LIMIT or sort in application."
+ ),
+ })
+
+ # No suggestions
+ if not suggestions:
+ suggestions.append({
+ "severity": "INFO",
+ "category": "General",
+ "suggestion": "No obvious optimization opportunities found.",
+ })
+
+ return suggestions
+
+
+def format_output(analysis: dict, suggestions: list) -> str:
+ """Format analysis results for display."""
+ output = []
+ output.append("=" * 60)
+ output.append("BIGQUERY QUERY ANALYSIS")
+ output.append("=" * 60)
+
+ if "error" in analysis:
+ output.append(f"\nError: {analysis['error']}")
+ return "\n".join(output)
+
+ # Summary
+ output.append("\n## Summary")
+ bytes_p = analysis.get("total_bytes_processed", 0)
+ bytes_b = analysis.get("total_bytes_billed", 0)
+ output.append(f" Bytes Processed: {bytes_p / 1024**3:.4f} GB")
+ output.append(f" Bytes Billed: {bytes_b / 1024**3:.4f} GB")
+ output.append(
+ f" Estimated Cost: ${analysis.get('estimated_cost_usd', 0):.4f}"
+ )
+
+ if analysis.get("cache_hit"):
+ output.append(" Cache Hit: Yes (no bytes billed)")
+
+ exec_ms = analysis.get("execution_ms")
+ if exec_ms:
+ output.append(f" Execution Time: {exec_ms:.0f} ms")
+
+ # Referenced tables
+ tables = analysis.get("referenced_tables", [])
+ if tables:
+ output.append("\n## Referenced Tables")
+ for t in tables:
+ output.append(f" - {t}")
+
+ # Stages
+ stages = analysis.get("stages", [])
+ if stages:
+ output.append(f"\n## Query Stages ({len(stages)} stages)")
+ for stage in stages[:5]: # Limit to first 5
+ output.append(f"\n Stage: {stage['name']}")
+ output.append(f" Records Read: {stage.get('records_read', 'N/A')}")
+ output.append(
+ f" Records Written: {stage.get('records_written', 'N/A')}"
+ )
+
+ # Suggestions
+ output.append("\n## Optimization Suggestions")
+ for s in suggestions:
+ output.append(f"\n [{s['severity']}] {s['category']}")
+ output.append(f" {s['suggestion']}")
+
+ output.append("\n" + "=" * 60)
+ return "\n".join(output)
+
+
+def main():
+ parser = argparse.ArgumentParser(
+ description="Analyze BigQuery query performance"
+ )
+ parser.add_argument("--project", required=True, help="GCP project ID")
+ parser.add_argument("--query", help="SQL query to analyze (dry run)")
+ parser.add_argument("--job-id", help="Completed job ID to analyze")
+ parser.add_argument("--json", action="store_true", help="Output as JSON")
+
+ args = parser.parse_args()
+
+ if not args.query and not args.job_id:
+ print("Provide either --query or --job-id")
+ sys.exit(1)
+
+ if args.query:
+ analysis = analyze_dry_run(args.project, args.query)
+ else:
+ analysis = analyze_job(args.project, args.job_id)
+
+ suggestions = suggest_optimizations(analysis)
+
+ if args.json:
+ print(
+ json.dumps(
+ {"analysis": analysis, "suggestions": suggestions},
+ indent=2,
+ default=str,
+ )
+ )
+ else:
+ print(format_output(analysis, suggestions))
+
+
+if __name__ == "__main__":
+ main()
diff --git a/src/google/adk/skills/bigquery-data-management/SKILL.md b/src/google/adk/skills/bigquery-data-management/SKILL.md
new file mode 100644
index 0000000000..72290d6009
--- /dev/null
+++ b/src/google/adk/skills/bigquery-data-management/SKILL.md
@@ -0,0 +1,446 @@
+---
+name: bigquery-data-management
+description: Load, transform, and manage data in BigQuery - batch/streaming ingestion, partitioning, clustering, external tables, and data formats. Use when importing data, optimizing table structures, or connecting to external data sources.
+license: Apache-2.0
+compatibility: BigQuery, Cloud Storage, BigLake
+metadata:
+ author: Google Cloud
+ version: "1.0"
+ category: data-management
+adk:
+ config:
+ timeout_seconds: 900
+ max_parallel_calls: 5
+ allowed_callers:
+ - bigquery_agent
+ - data_engineer_agent
+ - etl_agent
+---
+
+# BigQuery Data Management Skill
+
+Comprehensive data loading, transformation, and table optimization in BigQuery. This skill covers ingestion patterns, table partitioning, clustering, and external data connections.
+
+## When to Use This Skill
+
+Use this skill when you need to:
+- Load data from various sources (GCS, local files, Cloud SQL, etc.)
+- Configure partitioned or clustered tables for performance
+- Set up external tables or BigLake connections
+- Transform data during loading
+- Manage data formats (Parquet, Avro, ORC, CSV, JSON)
+- Implement streaming ingestion patterns
+
+**Note**: For ML model training, use the `bqml` skill. For AI/text generation, use the `bigquery-ai` skill.
+
+## Data Loading Methods
+
+| Method | Use Case | Throughput | Cost |
+|--------|----------|------------|------|
+| `LOAD DATA` | Batch from GCS/local | High | Free (slot usage) |
+| `INSERT INTO` | Small inserts from query | Low | Query cost |
+| `MERGE` | Upsert operations | Medium | Query cost |
+| Streaming API | Real-time ingestion | Medium | Per-row cost |
+| Storage Write API | High-throughput streaming | Very High | Per-byte cost |
+| Data Transfer Service | Scheduled imports | Varies | Free + source cost |
+
+## Quick Start
+
+### 1. Load Data from Cloud Storage
+
+```sql
+-- Load CSV from GCS
+LOAD DATA OVERWRITE `project.dataset.my_table`
+FROM FILES (
+ format = 'CSV',
+ uris = ['gs://bucket/data/*.csv'],
+ skip_leading_rows = 1
+);
+```
+
+### 2. Create Partitioned Table
+
+```sql
+CREATE TABLE `project.dataset.events`
+(
+ event_id STRING,
+ event_name STRING,
+ event_timestamp TIMESTAMP,
+ user_id STRING,
+ event_data JSON
+)
+PARTITION BY DATE(event_timestamp)
+CLUSTER BY user_id, event_name;
+```
+
+### 3. Query External Data
+
+```sql
+CREATE EXTERNAL TABLE `project.dataset.external_logs`
+WITH CONNECTION `project.region.connection_id`
+OPTIONS (
+ format = 'PARQUET',
+ uris = ['gs://bucket/logs/*.parquet']
+);
+```
+
+## Supported Data Formats
+
+| Format | Extension | Compression | Best For |
+|--------|-----------|-------------|----------|
+| **Parquet** | .parquet | Snappy, GZIP | Analytics (recommended) |
+| **Avro** | .avro | Deflate, Snappy | Schema evolution |
+| **ORC** | .orc | Snappy, ZLIB | Hive compatibility |
+| **CSV** | .csv | GZIP | Simple data |
+| **JSON** | .json, .jsonl | GZIP | Semi-structured |
+| **NEWLINE_DELIMITED_JSON** | .jsonl | GZIP | Streaming data |
+
+## LOAD DATA Statement
+
+### Full Syntax
+
+```sql
+LOAD DATA [OVERWRITE] target_table
+[PARTITIONS (partition_clause)]
+[CLUSTER BY column_list]
+FROM FILES (
+ format = 'FORMAT',
+ uris = ['gs://bucket/path/*.ext'],
+ -- Format-specific options
+ skip_leading_rows = 1, -- CSV
+ field_delimiter = ',', -- CSV
+ quote = '"', -- CSV
+ allow_quoted_newlines = TRUE, -- CSV
+ allow_jagged_rows = FALSE, -- CSV
+ null_marker = 'NULL', -- CSV
+ encoding = 'UTF-8', -- CSV/JSON
+ hive_partition_uri_prefix = 'gs://bucket/data/', -- Hive
+ require_hive_partition_filter = TRUE,
+ projection_fields = ['field1', 'field2'] -- Specific columns
+)
+[WITH PARTITION COLUMNS]
+[WITH CONNECTION `connection_id`];
+```
+
+### Load CSV with Schema
+
+```sql
+LOAD DATA OVERWRITE `project.dataset.sales`
+(
+ sale_id INT64,
+ product_name STRING,
+ amount NUMERIC,
+ sale_date DATE
+)
+FROM FILES (
+ format = 'CSV',
+ uris = ['gs://bucket/sales/2024/*.csv'],
+ skip_leading_rows = 1,
+ allow_jagged_rows = TRUE,
+ null_marker = ''
+);
+```
+
+### Load Parquet with Partitions
+
+```sql
+LOAD DATA INTO `project.dataset.events`
+FROM FILES (
+ format = 'PARQUET',
+ uris = ['gs://bucket/events/year=*/month=*/*.parquet'],
+ hive_partition_uri_prefix = 'gs://bucket/events/'
+)
+WITH PARTITION COLUMNS (
+ year INT64,
+ month INT64
+);
+```
+
+## Table Partitioning
+
+### Partition Types
+
+| Type | Syntax | Best For |
+|------|--------|----------|
+| **Time-unit (DATE)** | `PARTITION BY DATE(ts)` | Daily queries |
+| **Time-unit (DATETIME)** | `PARTITION BY DATETIME_TRUNC(dt, MONTH)` | Monthly aggregations |
+| **Time-unit (TIMESTAMP)** | `PARTITION BY TIMESTAMP_TRUNC(ts, HOUR)` | Hourly data |
+| **Integer range** | `PARTITION BY RANGE_BUCKET(id, ...)` | Sequential IDs |
+| **Ingestion time** | `PARTITION BY _PARTITIONDATE` | Append-only logs |
+
+### Time-based Partitioning
+
+```sql
+-- Partition by date column
+CREATE TABLE `project.dataset.user_events`
+(
+ user_id STRING,
+ event_type STRING,
+ event_time TIMESTAMP,
+ properties JSON
+)
+PARTITION BY DATE(event_time)
+OPTIONS (
+ partition_expiration_days = 365,
+ require_partition_filter = TRUE
+);
+```
+
+### Integer Range Partitioning
+
+```sql
+CREATE TABLE `project.dataset.orders`
+(
+ order_id INT64,
+ customer_id INT64,
+ order_total NUMERIC
+)
+PARTITION BY RANGE_BUCKET(order_id, GENERATE_ARRAY(0, 100000000, 1000000));
+```
+
+### Partition Management
+
+```sql
+-- Delete old partitions
+DELETE FROM `project.dataset.events`
+WHERE DATE(event_time) < DATE_SUB(CURRENT_DATE(), INTERVAL 90 DAY);
+
+-- Copy partition
+INSERT INTO `project.dataset.archive`
+SELECT * FROM `project.dataset.events`
+WHERE DATE(event_time) = '2024-01-01';
+```
+
+## Clustering
+
+### Create Clustered Table
+
+```sql
+CREATE TABLE `project.dataset.logs`
+(
+ log_id STRING,
+ log_level STRING,
+ service_name STRING,
+ message STRING,
+ timestamp TIMESTAMP
+)
+PARTITION BY DATE(timestamp)
+CLUSTER BY service_name, log_level;
+```
+
+### Clustering Guidelines
+
+1. **Order matters**: Most frequently filtered column first
+2. **Up to 4 columns**: Diminishing returns beyond 4
+3. **Low cardinality first**: Put columns with fewer unique values first
+4. **Combine with partitioning**: Cluster within partitions for best results
+
+### Re-cluster Existing Table
+
+```sql
+-- Force re-clustering by overwriting
+CREATE OR REPLACE TABLE `project.dataset.logs`
+CLUSTER BY service_name, log_level
+AS SELECT * FROM `project.dataset.logs`;
+```
+
+## External Tables
+
+### BigLake Table (Managed)
+
+```sql
+CREATE EXTERNAL TABLE `project.dataset.biglake_sales`
+WITH CONNECTION `project.us.my_connection`
+OPTIONS (
+ format = 'PARQUET',
+ uris = ['gs://bucket/sales/*.parquet'],
+ metadata_cache_mode = 'AUTOMATIC'
+);
+```
+
+### External Table with Hive Partitioning
+
+```sql
+CREATE EXTERNAL TABLE `project.dataset.partitioned_logs`
+WITH PARTITION COLUMNS (
+ year INT64,
+ month INT64,
+ day INT64
+)
+OPTIONS (
+ format = 'PARQUET',
+ uris = ['gs://bucket/logs/*'],
+ hive_partition_uri_prefix = 'gs://bucket/logs/',
+ require_hive_partition_filter = TRUE
+);
+```
+
+### Object Tables (Unstructured Data)
+
+```sql
+CREATE EXTERNAL TABLE `project.dataset.images`
+WITH CONNECTION `project.us.my_connection`
+OPTIONS (
+ object_metadata = 'SIMPLE',
+ uris = ['gs://bucket/images/*']
+);
+```
+
+## INSERT and MERGE Operations
+
+### Insert from Query
+
+```sql
+INSERT INTO `project.dataset.summary`
+SELECT
+ DATE(event_time) AS date,
+ COUNT(*) AS event_count,
+ COUNT(DISTINCT user_id) AS unique_users
+FROM `project.dataset.events`
+WHERE DATE(event_time) = CURRENT_DATE()
+GROUP BY 1;
+```
+
+### MERGE (Upsert)
+
+```sql
+MERGE INTO `project.dataset.customers` AS target
+USING `project.dataset.customer_updates` AS source
+ON target.customer_id = source.customer_id
+WHEN MATCHED THEN
+ UPDATE SET
+ email = source.email,
+ updated_at = CURRENT_TIMESTAMP()
+WHEN NOT MATCHED THEN
+ INSERT (customer_id, email, created_at, updated_at)
+ VALUES (source.customer_id, source.email, CURRENT_TIMESTAMP(), CURRENT_TIMESTAMP());
+```
+
+### Multi-statement Transaction
+
+```sql
+BEGIN TRANSACTION;
+
+DELETE FROM `project.dataset.orders`
+WHERE status = 'cancelled' AND created_at < DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY);
+
+UPDATE `project.dataset.inventory`
+SET last_cleaned = CURRENT_TIMESTAMP()
+WHERE TRUE;
+
+COMMIT TRANSACTION;
+```
+
+## Streaming Ingestion
+
+### Storage Write API (Recommended)
+
+```python
+from google.cloud import bigquery_storage_v1
+from google.cloud.bigquery_storage_v1 import types
+from google.protobuf import descriptor_pb2
+
+client = bigquery_storage_v1.BigQueryWriteClient()
+parent = client.table_path("project", "dataset", "table")
+
+write_stream = client.create_write_stream(
+ parent=parent,
+ write_stream=types.WriteStream(type_=types.WriteStream.Type.COMMITTED)
+)
+
+# Append rows
+request = types.AppendRowsRequest(
+ write_stream=write_stream.name,
+ rows=types.AppendRowsRequest.ProtoData(
+ rows=types.ProtoRows(serialized_rows=[...])
+ )
+)
+```
+
+### Legacy Streaming API
+
+```python
+from google.cloud import bigquery
+
+client = bigquery.Client()
+table_ref = client.dataset("dataset").table("table")
+
+rows = [
+ {"user_id": "123", "event": "click", "timestamp": "2024-01-15T10:30:00Z"},
+ {"user_id": "456", "event": "view", "timestamp": "2024-01-15T10:30:01Z"},
+]
+
+errors = client.insert_rows_json(table_ref, rows)
+if errors:
+ print(f"Errors: {errors}")
+```
+
+## Data Transfer Service
+
+### Scheduled Query
+
+```sql
+-- Create in BigQuery Console or via API
+-- Runs daily at 6 AM UTC
+SELECT
+ DATE(event_time) AS date,
+ COUNT(*) AS total_events
+FROM `project.dataset.raw_events`
+WHERE DATE(event_time) = DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)
+GROUP BY 1;
+```
+
+### Cross-region Copy
+
+```sql
+-- Copy dataset to another region
+-- Use BigQuery Data Transfer Service API
+-- or bq command: bq mk --transfer_config ...
+```
+
+## Best Practices
+
+### Partitioning Strategy
+
+1. **Choose the right granularity**: Match partition size to query patterns
+2. **Require partition filters**: Prevent full-table scans
+3. **Set expiration**: Auto-delete old partitions
+4. **Avoid over-partitioning**: Aim for >1GB per partition
+
+### Clustering Strategy
+
+1. **Cluster on filter columns**: Most queried columns first
+2. **Re-cluster periodically**: After many small inserts
+3. **Monitor effectiveness**: Check bytes scanned reduction
+
+### Loading Best Practices
+
+1. **Use Parquet/Avro**: Better compression and performance
+2. **Batch small files**: Combine files >1GB each
+3. **Avoid streaming for bulk**: Use batch for large loads
+4. **Parallel loads**: Load multiple files simultaneously
+
+## References
+
+Load detailed documentation as needed:
+
+- `DATA_FORMATS.md` - Complete format specifications and options
+- `PARTITIONING.md` - Advanced partitioning strategies
+- `EXTERNAL_TABLES.md` - BigLake and external data connections
+- `STREAMING.md` - Real-time ingestion patterns
+
+## Scripts
+
+Helper scripts for common operations:
+
+- `validate_schema.py` - Validate data against table schema
+- `partition_manager.py` - Manage partition lifecycle
+- `load_monitor.py` - Monitor load job progress
+
+## Limitations
+
+- Maximum 10,000 partitions per table
+- Clustering limited to 4 columns
+- Streaming buffer not immediately queryable
+- External table query performance varies
+- Load jobs limited to 15TB per job
diff --git a/src/google/adk/skills/bigquery-data-management/references/DATA_FORMATS.md b/src/google/adk/skills/bigquery-data-management/references/DATA_FORMATS.md
new file mode 100644
index 0000000000..dfa6fb3ddb
--- /dev/null
+++ b/src/google/adk/skills/bigquery-data-management/references/DATA_FORMATS.md
@@ -0,0 +1,359 @@
+# BigQuery Data Formats Reference
+
+Complete guide to supported data formats for loading and external tables.
+
+## Format Comparison
+
+| Format | Compression | Schema | Nested Data | Best For |
+|--------|-------------|--------|-------------|----------|
+| Parquet | Excellent | Embedded | Full support | Analytics |
+| Avro | Good | Embedded | Full support | Streaming, CDC |
+| ORC | Excellent | Embedded | Full support | Hive migration |
+| CSV | Moderate | Required | None | Simple flat data |
+| JSON | Moderate | Optional | Full support | Semi-structured |
+
+## Parquet
+
+### Overview
+
+Apache Parquet is a columnar storage format optimized for analytics workloads.
+
+**Pros:**
+- Best query performance (columnar)
+- Excellent compression
+- Schema embedded in file
+- Predicate pushdown support
+
+**Cons:**
+- Not human-readable
+- Write overhead
+
+### Load Options
+
+```sql
+LOAD DATA INTO `project.dataset.table`
+FROM FILES (
+ format = 'PARQUET',
+ uris = ['gs://bucket/data/*.parquet'],
+ -- Optional: read specific columns only
+ projection_fields = ['column1', 'column2', 'column3'],
+ -- Optional: for BigLake tables
+ metadata_cache_mode = 'AUTOMATIC'
+);
+```
+
+### Supported Compression
+
+- Snappy (default, recommended)
+- GZIP
+- LZO
+- BROTLI
+- LZ4
+- ZSTD
+
+### Type Mapping
+
+| Parquet Type | BigQuery Type |
+|--------------|---------------|
+| BOOLEAN | BOOL |
+| INT32/INT64 | INT64 |
+| FLOAT/DOUBLE | FLOAT64 |
+| BYTE_ARRAY (UTF8) | STRING |
+| BYTE_ARRAY | BYTES |
+| INT96 | TIMESTAMP |
+| DATE | DATE |
+| DECIMAL | NUMERIC/BIGNUMERIC |
+| LIST | ARRAY |
+| MAP | STRUCT (key/value) |
+| STRUCT | STRUCT |
+
+## Avro
+
+### Overview
+
+Apache Avro is a row-based format with strong schema support.
+
+**Pros:**
+- Schema evolution support
+- Compact binary format
+- Good for streaming
+- Self-describing
+
+**Cons:**
+- Less efficient for analytics
+- Larger than Parquet for analytics
+
+### Load Options
+
+```sql
+LOAD DATA INTO `project.dataset.table`
+FROM FILES (
+ format = 'AVRO',
+ uris = ['gs://bucket/data/*.avro'],
+ -- Use Avro logical types
+ use_avro_logical_types = TRUE,
+ -- Enable schema inference for missing tables
+ enable_list_inference = TRUE
+);
+```
+
+### Type Mapping
+
+| Avro Type | BigQuery Type |
+|-----------|---------------|
+| boolean | BOOL |
+| int | INT64 |
+| long | INT64 |
+| float | FLOAT64 |
+| double | FLOAT64 |
+| bytes | BYTES |
+| string | STRING |
+| record | STRUCT |
+| array | ARRAY |
+| map | ARRAY> |
+| enum | STRING |
+| fixed | BYTES |
+| union | Nullable type |
+
+### Logical Types
+
+| Avro Logical Type | BigQuery Type |
+|-------------------|---------------|
+| date | DATE |
+| time-millis | TIME |
+| time-micros | TIME |
+| timestamp-millis | TIMESTAMP |
+| timestamp-micros | TIMESTAMP |
+| decimal | NUMERIC/BIGNUMERIC |
+
+## ORC
+
+### Overview
+
+Optimized Row Columnar format from Hive ecosystem.
+
+**Pros:**
+- Excellent compression
+- Good Hive compatibility
+- Predicate pushdown
+
+**Cons:**
+- Less common outside Hive
+- Limited tooling
+
+### Load Options
+
+```sql
+LOAD DATA INTO `project.dataset.table`
+FROM FILES (
+ format = 'ORC',
+ uris = ['gs://bucket/data/*.orc']
+);
+```
+
+### Supported Compression
+
+- ZLIB (default)
+- Snappy
+- LZO
+- LZ4
+- ZSTD
+
+## CSV
+
+### Overview
+
+Comma-separated values, the most universal format.
+
+**Pros:**
+- Human-readable
+- Universal compatibility
+- Easy to generate
+
+**Cons:**
+- No schema
+- Poor compression
+- Type inference needed
+- No nested data
+
+### Load Options
+
+```sql
+LOAD DATA INTO `project.dataset.table`
+FROM FILES (
+ format = 'CSV',
+ uris = ['gs://bucket/data/*.csv'],
+ -- Schema handling
+ skip_leading_rows = 1,
+ autodetect = TRUE, -- or provide explicit schema
+ -- Delimiters
+ field_delimiter = ',',
+ -- Quoting
+ quote = '"',
+ allow_quoted_newlines = TRUE,
+ -- Null handling
+ null_marker = '',
+ -- Error handling
+ allow_jagged_rows = FALSE,
+ max_bad_records = 0,
+ -- Encoding
+ encoding = 'UTF-8',
+ -- Compression
+ compression = 'GZIP' -- if files are compressed
+);
+```
+
+### Common Issues
+
+| Issue | Solution |
+|-------|----------|
+| Wrong delimiter | Set `field_delimiter` |
+| Quotes in values | Set `quote` and `allow_quoted_newlines` |
+| Header row | Set `skip_leading_rows = 1` |
+| Empty values | Set `null_marker` |
+| Encoding errors | Set `encoding = 'UTF-8'` |
+| Extra columns | Set `allow_jagged_rows = TRUE` |
+
+### Type Inference
+
+When using `autodetect = TRUE`:
+- Numbers → INT64 or FLOAT64
+- Dates → DATE (if recognized)
+- Everything else → STRING
+
+Recommendation: Provide explicit schema for production.
+
+## JSON / NEWLINE_DELIMITED_JSON
+
+### Overview
+
+JSON format for semi-structured data.
+
+**Pros:**
+- Human-readable
+- Flexible schema
+- Nested data support
+
+**Cons:**
+- Verbose
+- Poor compression
+- Parsing overhead
+
+### Load Options
+
+```sql
+LOAD DATA INTO `project.dataset.table`
+FROM FILES (
+ format = 'NEWLINE_DELIMITED_JSON', -- One JSON object per line
+ uris = ['gs://bucket/data/*.jsonl'],
+ -- Schema handling
+ autodetect = TRUE,
+ -- Error handling
+ max_bad_records = 10,
+ ignore_unknown_values = TRUE,
+ -- Encoding
+ encoding = 'UTF-8'
+);
+```
+
+### JSON Array Format
+
+```sql
+-- For JSON arrays (not newline-delimited)
+LOAD DATA INTO `project.dataset.table`
+FROM FILES (
+ format = 'JSON',
+ uris = ['gs://bucket/data/*.json'],
+ json_extension = 'GEOJSON' -- For GeoJSON files
+);
+```
+
+### Nested Data
+
+```json
+// Input JSON
+{
+ "user_id": "123",
+ "profile": {
+ "name": "John",
+ "age": 30
+ },
+ "events": [
+ {"type": "click", "timestamp": "2024-01-15T10:00:00Z"},
+ {"type": "view", "timestamp": "2024-01-15T10:01:00Z"}
+ ]
+}
+```
+
+```sql
+-- Resulting schema
+CREATE TABLE example (
+ user_id STRING,
+ profile STRUCT,
+ events ARRAY>
+);
+```
+
+## Google Sheets
+
+### External Table from Sheets
+
+```sql
+CREATE EXTERNAL TABLE `project.dataset.sheet_data`
+OPTIONS (
+ format = 'GOOGLE_SHEETS',
+ uris = ['https://docs.google.com/spreadsheets/d/SHEET_ID/edit'],
+ skip_leading_rows = 1,
+ range = 'Sheet1!A1:Z1000'
+);
+```
+
+### Limitations
+
+- Maximum 100,000 rows
+- Read-only
+- Performance varies
+- Authentication required
+
+## Compression
+
+### Supported Compression by Format
+
+| Format | GZIP | SNAPPY | LZ4 | ZSTD | BROTLI |
+|--------|------|--------|-----|------|--------|
+| Parquet | Yes | Yes (default) | Yes | Yes | Yes |
+| Avro | Yes | Yes (default) | - | - | - |
+| ORC | Yes | Yes | Yes | Yes | - |
+| CSV | Yes | - | - | - | - |
+| JSON | Yes | - | - | - | - |
+
+### Compression Recommendations
+
+1. **Parquet**: Use Snappy for speed, ZSTD for size
+2. **Avro**: Use Snappy (default)
+3. **CSV/JSON**: Use GZIP for storage, uncompressed for speed
+4. **ORC**: Use ZLIB for size, Snappy for speed
+
+## Best Practices
+
+### Format Selection
+
+1. **Analytics workloads**: Parquet
+2. **Streaming/CDC**: Avro
+3. **Hive migration**: ORC
+4. **Quick exports**: CSV
+5. **APIs/events**: JSON
+
+### Schema Management
+
+1. **Provide explicit schemas** for production loads
+2. **Use schema files** for version control
+3. **Test schema changes** before deployment
+4. **Document transformations** for auditing
+
+### Performance Tips
+
+1. **File size**: Target 100MB-1GB per file
+2. **Avoid many small files**: Combine before loading
+3. **Use columnar formats**: Parquet/ORC for analytics
+4. **Enable predicate pushdown**: Filter at source
diff --git a/src/google/adk/skills/bigquery-data-management/references/PARTITIONING.md b/src/google/adk/skills/bigquery-data-management/references/PARTITIONING.md
new file mode 100644
index 0000000000..6e39f5136c
--- /dev/null
+++ b/src/google/adk/skills/bigquery-data-management/references/PARTITIONING.md
@@ -0,0 +1,329 @@
+# BigQuery Partitioning Reference
+
+Complete guide to table partitioning strategies and management.
+
+## Partition Types
+
+### Time-based Partitioning
+
+Partition data by a TIMESTAMP, DATE, or DATETIME column.
+
+```sql
+-- Partition by DATE column (daily)
+CREATE TABLE `project.dataset.events`
+(
+ event_id STRING,
+ event_time TIMESTAMP,
+ data JSON
+)
+PARTITION BY DATE(event_time);
+
+-- Partition by DATETIME with monthly granularity
+CREATE TABLE `project.dataset.monthly_summary`
+(
+ month_start DATETIME,
+ total NUMERIC
+)
+PARTITION BY DATETIME_TRUNC(month_start, MONTH);
+
+-- Partition by TIMESTAMP with hourly granularity
+CREATE TABLE `project.dataset.hourly_logs`
+(
+ log_time TIMESTAMP,
+ message STRING
+)
+PARTITION BY TIMESTAMP_TRUNC(log_time, HOUR);
+```
+
+### Granularity Options
+
+| Granularity | Function | Partitions/Year | Use Case |
+|-------------|----------|-----------------|----------|
+| HOUR | `TIMESTAMP_TRUNC(col, HOUR)` | 8,760 | High-frequency data |
+| DAY | `DATE(col)` | 365 | Most common |
+| MONTH | `DATE_TRUNC(col, MONTH)` | 12 | Low-volume data |
+| YEAR | `DATE_TRUNC(col, YEAR)` | 1 | Historical archives |
+
+### Integer Range Partitioning
+
+Partition by an integer column with defined ranges.
+
+```sql
+CREATE TABLE `project.dataset.orders`
+(
+ order_id INT64,
+ customer_id INT64,
+ amount NUMERIC
+)
+PARTITION BY RANGE_BUCKET(order_id, GENERATE_ARRAY(0, 1000000000, 10000000));
+
+-- Creates partitions: [0, 10000000), [10000000, 20000000), ...
+```
+
+### Ingestion-time Partitioning
+
+Partition by when data was loaded (system-managed).
+
+```sql
+CREATE TABLE `project.dataset.raw_logs`
+(
+ log_message STRING,
+ source STRING
+)
+PARTITION BY _PARTITIONDATE;
+
+-- Query specific partition
+SELECT * FROM `project.dataset.raw_logs`
+WHERE _PARTITIONDATE = '2024-01-15';
+```
+
+## Partition Options
+
+### Table Options
+
+```sql
+CREATE TABLE `project.dataset.events`
+(...)
+PARTITION BY DATE(event_time)
+OPTIONS (
+ -- Automatically delete partitions older than N days
+ partition_expiration_days = 365,
+
+ -- Require WHERE clause to include partition column
+ require_partition_filter = TRUE,
+
+ -- Description
+ description = 'User events partitioned by date'
+);
+```
+
+### Partition Expiration
+
+```sql
+-- Set expiration on existing table
+ALTER TABLE `project.dataset.events`
+SET OPTIONS (partition_expiration_days = 90);
+
+-- Remove expiration
+ALTER TABLE `project.dataset.events`
+SET OPTIONS (partition_expiration_days = NULL);
+```
+
+### Require Partition Filter
+
+```sql
+-- Enable filter requirement
+ALTER TABLE `project.dataset.events`
+SET OPTIONS (require_partition_filter = TRUE);
+
+-- Query must include partition filter
+SELECT * FROM `project.dataset.events`
+WHERE DATE(event_time) = '2024-01-15'; -- Required
+
+-- This will fail:
+-- SELECT * FROM `project.dataset.events`; -- Error!
+```
+
+## Partition Management
+
+### View Partition Information
+
+```sql
+-- List all partitions
+SELECT
+ table_name,
+ partition_id,
+ total_rows,
+ total_logical_bytes / 1024 / 1024 AS size_mb,
+ last_modified_time
+FROM `project.dataset.INFORMATION_SCHEMA.PARTITIONS`
+WHERE table_name = 'events'
+ORDER BY partition_id DESC;
+```
+
+### Delete Specific Partition
+
+```sql
+-- Delete by partition column
+DELETE FROM `project.dataset.events`
+WHERE DATE(event_time) = '2024-01-01';
+
+-- Delete using partition decorator (legacy)
+DELETE FROM `project.dataset.events$20240101`
+WHERE TRUE;
+```
+
+### Copy Partition
+
+```sql
+-- Copy partition to another table
+INSERT INTO `project.dataset.archive`
+SELECT * FROM `project.dataset.events`
+WHERE DATE(event_time) = '2024-01-01';
+
+-- Copy with partition decorator
+INSERT INTO `project.dataset.archive$20240101`
+SELECT * FROM `project.dataset.events$20240101`;
+```
+
+### Update Partition
+
+```sql
+-- Overwrite entire partition
+MERGE INTO `project.dataset.events` AS target
+USING (SELECT * FROM `project.dataset.staging` WHERE DATE(event_time) = '2024-01-15') AS source
+ON FALSE -- Always not matched for overwrite
+WHEN NOT MATCHED BY SOURCE AND DATE(target.event_time) = '2024-01-15' THEN DELETE
+WHEN NOT MATCHED THEN INSERT ROW;
+```
+
+## Partitioned External Tables
+
+### Hive-style Partitioning
+
+```sql
+-- External table with Hive partitions
+CREATE EXTERNAL TABLE `project.dataset.logs`
+WITH PARTITION COLUMNS (
+ year INT64,
+ month INT64,
+ day INT64
+)
+OPTIONS (
+ format = 'PARQUET',
+ uris = ['gs://bucket/logs/*'],
+ hive_partition_uri_prefix = 'gs://bucket/logs/',
+ require_hive_partition_filter = TRUE
+);
+
+-- Query with partition filter
+SELECT * FROM `project.dataset.logs`
+WHERE year = 2024 AND month = 1 AND day = 15;
+```
+
+### Auto-detect Partitions
+
+```sql
+CREATE EXTERNAL TABLE `project.dataset.auto_partitioned`
+WITH PARTITION COLUMNS
+OPTIONS (
+ format = 'PARQUET',
+ uris = ['gs://bucket/data/*'],
+ hive_partition_uri_prefix = 'gs://bucket/data/'
+);
+```
+
+## Performance Optimization
+
+### Query Optimization
+
+```sql
+-- Good: Uses partition pruning
+SELECT * FROM `project.dataset.events`
+WHERE DATE(event_time) = '2024-01-15';
+
+-- Good: Range filter uses pruning
+SELECT * FROM `project.dataset.events`
+WHERE event_time BETWEEN '2024-01-01' AND '2024-01-31';
+
+-- Bad: Function prevents pruning
+SELECT * FROM `project.dataset.events`
+WHERE EXTRACT(YEAR FROM event_time) = 2024;
+
+-- Bad: Cast prevents pruning
+SELECT * FROM `project.dataset.events`
+WHERE CAST(event_time AS DATE) = '2024-01-15';
+```
+
+### Partition Pruning Check
+
+```sql
+-- Check estimated bytes scanned
+SELECT
+ @bytes_billed_estimate
+FROM (
+ SELECT * FROM `project.dataset.events`
+ WHERE DATE(event_time) = '2024-01-15'
+);
+```
+
+## Design Patterns
+
+### Daily Partitions (Most Common)
+
+```sql
+CREATE TABLE `project.dataset.web_events`
+(
+ session_id STRING,
+ user_id STRING,
+ page_url STRING,
+ event_type STRING,
+ event_time TIMESTAMP
+)
+PARTITION BY DATE(event_time)
+CLUSTER BY user_id
+OPTIONS (
+ partition_expiration_days = 730, -- 2 years
+ require_partition_filter = TRUE
+);
+```
+
+### Monthly Aggregates
+
+```sql
+CREATE TABLE `project.dataset.monthly_revenue`
+(
+ month_start DATE,
+ product_category STRING,
+ total_revenue NUMERIC,
+ order_count INT64
+)
+PARTITION BY DATE_TRUNC(month_start, MONTH)
+CLUSTER BY product_category;
+```
+
+### Real-time with Hourly Partitions
+
+```sql
+CREATE TABLE `project.dataset.realtime_metrics`
+(
+ metric_name STRING,
+ metric_value FLOAT64,
+ recorded_at TIMESTAMP
+)
+PARTITION BY TIMESTAMP_TRUNC(recorded_at, HOUR)
+OPTIONS (
+ partition_expiration_days = 7 -- Keep 1 week
+);
+```
+
+### ID-based Sharding
+
+```sql
+CREATE TABLE `project.dataset.user_data`
+(
+ user_id INT64,
+ user_name STRING,
+ email STRING
+)
+PARTITION BY RANGE_BUCKET(user_id, GENERATE_ARRAY(0, 1000000000, 1000000));
+```
+
+## Limitations
+
+| Limit | Value |
+|-------|-------|
+| Maximum partitions per table | 10,000 |
+| Maximum partitions per load | 4,000 |
+| Minimum partition size (recommended) | 1 GB |
+| Partition expiration granularity | Days |
+
+## Best Practices
+
+1. **Choose appropriate granularity**: Match query patterns
+2. **Avoid over-partitioning**: >1GB per partition ideal
+3. **Use partition expiration**: Auto-cleanup old data
+4. **Require partition filters**: Prevent full scans
+5. **Combine with clustering**: Further optimize within partitions
+6. **Monitor partition sizes**: Balance across partitions
+7. **Test partition pruning**: Verify queries use pruning
diff --git a/src/google/adk/skills/bigquery-data-management/scripts/validate_schema.py b/src/google/adk/skills/bigquery-data-management/scripts/validate_schema.py
new file mode 100644
index 0000000000..f4d9362ea1
--- /dev/null
+++ b/src/google/adk/skills/bigquery-data-management/scripts/validate_schema.py
@@ -0,0 +1,248 @@
+"""Validate data files against BigQuery table schema.
+
+This script checks if data files (CSV, JSON, Parquet) are compatible
+with a BigQuery table schema before loading.
+
+Usage:
+ python validate_schema.py --project PROJECT --dataset DATASET --table TABLE --file FILE
+ python validate_schema.py --schema schema.json --file data.csv
+"""
+
+import argparse
+import json
+from pathlib import Path
+import sys
+
+
+def get_bigquery_schema(project: str, dataset: str, table: str) -> list:
+ """Fetch schema from BigQuery table."""
+ try:
+ from google.cloud import bigquery
+
+ client = bigquery.Client(project=project)
+ table_ref = client.get_table(f"{project}.{dataset}.{table}")
+ return [
+ {"name": f.name, "type": f.field_type, "mode": f.mode}
+ for f in table_ref.schema
+ ]
+ except Exception as e:
+ print(f"Error fetching schema: {e}", file=sys.stderr)
+ sys.exit(1)
+
+
+def load_schema_file(schema_path: str) -> list:
+ """Load schema from JSON file."""
+ with open(schema_path) as f:
+ return json.load(f)
+
+
+def infer_csv_schema(file_path: str, delimiter: str = ",") -> list:
+ """Infer schema from CSV file header."""
+ import csv
+
+ with open(file_path, newline="") as f:
+ reader = csv.reader(f, delimiter=delimiter)
+ header = next(reader)
+ first_row = next(reader, None)
+
+ schema = []
+ for i, col in enumerate(header):
+ col_type = "STRING"
+ if first_row:
+ val = first_row[i] if i < len(first_row) else ""
+ col_type = infer_type(val)
+ schema.append({"name": col, "type": col_type, "mode": "NULLABLE"})
+ return schema
+
+
+def infer_json_schema(file_path: str) -> list:
+ """Infer schema from JSON file."""
+ with open(file_path) as f:
+ # Read first line for newline-delimited JSON
+ line = f.readline()
+ obj = json.loads(line)
+
+ schema = []
+ for key, value in obj.items():
+ col_type = infer_type_from_value(value)
+ schema.append({"name": key, "type": col_type, "mode": "NULLABLE"})
+ return schema
+
+
+def infer_parquet_schema(file_path: str) -> list:
+ """Infer schema from Parquet file."""
+ try:
+ import pyarrow.parquet as pq
+
+ table = pq.read_table(file_path)
+ schema = []
+ for field in table.schema:
+ bq_type = arrow_to_bigquery_type(str(field.type))
+ schema.append({
+ "name": field.name,
+ "type": bq_type,
+ "mode": "NULLABLE" if field.nullable else "REQUIRED",
+ })
+ return schema
+ except ImportError:
+ print("pyarrow required for Parquet validation", file=sys.stderr)
+ sys.exit(1)
+
+
+def arrow_to_bigquery_type(arrow_type: str) -> str:
+ """Convert Arrow type to BigQuery type."""
+ type_map = {
+ "int64": "INT64",
+ "int32": "INT64",
+ "float64": "FLOAT64",
+ "float32": "FLOAT64",
+ "double": "FLOAT64",
+ "bool": "BOOL",
+ "string": "STRING",
+ "binary": "BYTES",
+ "date32": "DATE",
+ "timestamp": "TIMESTAMP",
+ }
+ for key, val in type_map.items():
+ if key in arrow_type.lower():
+ return val
+ return "STRING"
+
+
+def infer_type(value: str) -> str:
+ """Infer BigQuery type from string value."""
+ if not value:
+ return "STRING"
+ try:
+ int(value)
+ return "INT64"
+ except ValueError:
+ pass
+ try:
+ float(value)
+ return "FLOAT64"
+ except ValueError:
+ pass
+ if value.lower() in ("true", "false"):
+ return "BOOL"
+ return "STRING"
+
+
+def infer_type_from_value(value) -> str:
+ """Infer BigQuery type from Python value."""
+ if isinstance(value, bool):
+ return "BOOL"
+ if isinstance(value, int):
+ return "INT64"
+ if isinstance(value, float):
+ return "FLOAT64"
+ if isinstance(value, list):
+ return "ARRAY"
+ if isinstance(value, dict):
+ return "STRUCT"
+ return "STRING"
+
+
+def validate_schema(expected: list, actual: list) -> list:
+ """Compare schemas and return list of issues."""
+ issues = []
+ expected_map = {f["name"].lower(): f for f in expected}
+ actual_map = {f["name"].lower(): f for f in actual}
+
+ # Check for missing columns
+ for name, field in expected_map.items():
+ if name not in actual_map:
+ if field.get("mode") == "REQUIRED":
+ issues.append(f"MISSING REQUIRED: Column '{name}' not found in data")
+ else:
+ issues.append(f"WARNING: Column '{name}' not found in data")
+
+ # Check for extra columns
+ for name in actual_map:
+ if name not in expected_map:
+ issues.append(f"EXTRA: Column '{name}' in data not in schema")
+
+ # Check type compatibility
+ for name, expected_field in expected_map.items():
+ if name in actual_map:
+ actual_field = actual_map[name]
+ if not types_compatible(expected_field["type"], actual_field["type"]):
+ issues.append(
+ f"TYPE MISMATCH: Column '{name}' expected "
+ f"{expected_field['type']}, got {actual_field['type']}"
+ )
+
+ return issues
+
+
+def types_compatible(expected: str, actual: str) -> bool:
+ """Check if types are compatible for loading."""
+ expected = expected.upper()
+ actual = actual.upper()
+
+ if expected == actual:
+ return True
+
+ # String can accept anything
+ if expected == "STRING":
+ return True
+
+ # Numeric compatibility
+ numeric_types = {"INT64", "FLOAT64", "NUMERIC", "BIGNUMERIC"}
+ if expected in numeric_types and actual in numeric_types:
+ return True
+
+ return False
+
+
+def main():
+ parser = argparse.ArgumentParser(
+ description="Validate data files against BigQuery schema"
+ )
+ parser.add_argument("--project", help="GCP project ID")
+ parser.add_argument("--dataset", help="BigQuery dataset")
+ parser.add_argument("--table", help="BigQuery table")
+ parser.add_argument("--schema", help="Path to schema JSON file")
+ parser.add_argument("--file", required=True, help="Data file to validate")
+ parser.add_argument("--delimiter", default=",", help="CSV delimiter")
+
+ args = parser.parse_args()
+
+ # Get expected schema
+ if args.schema:
+ expected_schema = load_schema_file(args.schema)
+ elif args.project and args.dataset and args.table:
+ expected_schema = get_bigquery_schema(
+ args.project, args.dataset, args.table
+ )
+ else:
+ print("Provide either --schema or --project/--dataset/--table")
+ sys.exit(1)
+
+ # Infer actual schema from file
+ file_path = Path(args.file)
+ if file_path.suffix.lower() == ".csv":
+ actual_schema = infer_csv_schema(args.file, args.delimiter)
+ elif file_path.suffix.lower() in (".json", ".jsonl"):
+ actual_schema = infer_json_schema(args.file)
+ elif file_path.suffix.lower() == ".parquet":
+ actual_schema = infer_parquet_schema(args.file)
+ else:
+ print(f"Unsupported file format: {file_path.suffix}")
+ sys.exit(1)
+
+ # Validate
+ issues = validate_schema(expected_schema, actual_schema)
+
+ if issues:
+ print("Schema validation issues found:")
+ for issue in issues:
+ print(f" - {issue}")
+ sys.exit(1)
+ else:
+ print("Schema validation passed")
+ sys.exit(0)
+
+
+if __name__ == "__main__":
+ main()
diff --git a/src/google/adk/skills/bigquery-governance/SKILL.md b/src/google/adk/skills/bigquery-governance/SKILL.md
new file mode 100644
index 0000000000..b515dd5a41
--- /dev/null
+++ b/src/google/adk/skills/bigquery-governance/SKILL.md
@@ -0,0 +1,508 @@
+---
+name: bigquery-governance
+description: Implement data governance in BigQuery - IAM access control, column/row-level security, data masking, encryption, audit logging, and data catalog integration. Use when securing data, managing access, or implementing compliance requirements.
+license: Apache-2.0
+compatibility: BigQuery, IAM, Data Catalog, DLP
+metadata:
+ author: Google Cloud
+ version: "1.0"
+ category: governance
+adk:
+ config:
+ timeout_seconds: 300
+ max_parallel_calls: 3
+ allowed_callers:
+ - bigquery_agent
+ - security_agent
+ - compliance_agent
+---
+
+# BigQuery Governance Skill
+
+Implement comprehensive data governance in BigQuery including access control, data masking, encryption, audit logging, and compliance management.
+
+## When to Use This Skill
+
+Use this skill when you need to:
+- Configure IAM permissions for datasets and tables
+- Implement column-level and row-level security
+- Set up data masking and anonymization
+- Manage encryption (CMEK)
+- Enable and analyze audit logs
+- Integrate with Data Catalog for metadata management
+- Meet compliance requirements (GDPR, HIPAA, PCI-DSS)
+
+## Governance Features
+
+| Feature | Description | Use Case |
+|---------|-------------|----------|
+| **IAM** | Identity-based access control | User/group permissions |
+| **Column Security** | Hide sensitive columns | PII protection |
+| **Row Security** | Filter rows by user | Multi-tenant data |
+| **Data Masking** | Mask/redact values | Privacy compliance |
+| **CMEK** | Customer-managed keys | Key control |
+| **Audit Logs** | Activity tracking | Compliance auditing |
+
+## Quick Start
+
+### 1. Grant Dataset Access
+
+```sql
+-- Grant viewer access to dataset
+GRANT `roles/bigquery.dataViewer`
+ON SCHEMA `project.dataset`
+TO 'user:analyst@company.com';
+```
+
+### 2. Create Column-Level Policy
+
+```sql
+-- Create policy tag taxonomy
+-- (Done via Data Catalog API or Console)
+
+-- Apply policy tag to column
+ALTER TABLE `project.dataset.customers`
+ALTER COLUMN ssn SET OPTIONS (
+ policy_tags = ['projects/project/locations/us/taxonomies/123/policyTags/456']
+);
+```
+
+### 3. Create Row-Level Policy
+
+```sql
+CREATE ROW ACCESS POLICY region_filter
+ON `project.dataset.sales`
+GRANT TO ('user:regional_manager@company.com')
+FILTER USING (region = 'West');
+```
+
+## IAM Access Control
+
+### Predefined Roles
+
+| Role | Description | Typical Use |
+|------|-------------|-------------|
+| `bigquery.admin` | Full BigQuery access | Administrators |
+| `bigquery.dataOwner` | Full dataset access | Dataset owners |
+| `bigquery.dataEditor` | Read/write tables | Data engineers |
+| `bigquery.dataViewer` | Read-only access | Analysts |
+| `bigquery.jobUser` | Run queries | Query users |
+| `bigquery.user` | List datasets, run jobs | Basic access |
+
+### Grant Permissions
+
+```sql
+-- Grant role on dataset
+GRANT `roles/bigquery.dataViewer`
+ON SCHEMA `project.dataset`
+TO 'user:user@company.com';
+
+-- Grant role on table
+GRANT `roles/bigquery.dataViewer`
+ON TABLE `project.dataset.table`
+TO 'group:analysts@company.com';
+
+-- Grant to service account
+GRANT `roles/bigquery.dataEditor`
+ON SCHEMA `project.dataset`
+TO 'serviceAccount:etl@project.iam.gserviceaccount.com';
+
+-- Grant to all authenticated users
+GRANT `roles/bigquery.dataViewer`
+ON TABLE `project.dataset.public_data`
+TO 'allAuthenticatedUsers';
+```
+
+### Revoke Permissions
+
+```sql
+REVOKE `roles/bigquery.dataViewer`
+ON SCHEMA `project.dataset`
+FROM 'user:former_employee@company.com';
+```
+
+### View Permissions
+
+```sql
+-- List dataset permissions
+SELECT * FROM `project.dataset.INFORMATION_SCHEMA.OBJECT_PRIVILEGES`;
+
+-- List table permissions
+SELECT *
+FROM `project.dataset.INFORMATION_SCHEMA.OBJECT_PRIVILEGES`
+WHERE object_name = 'table_name';
+```
+
+## Column-Level Security
+
+### Setup Policy Tags
+
+Policy tags are created in Data Catalog and applied to columns.
+
+```python
+# Using Data Catalog API
+from google.cloud import datacatalog_v1
+
+client = datacatalog_v1.PolicyTagManagerClient()
+
+# Create taxonomy
+taxonomy = client.create_taxonomy(
+ parent=f"projects/{project}/locations/{location}",
+ taxonomy=datacatalog_v1.Taxonomy(
+ display_name="PII_Taxonomy",
+ description="Policy tags for PII data",
+ activated_policy_types=[
+ datacatalog_v1.Taxonomy.PolicyType.FINE_GRAINED_ACCESS_CONTROL
+ ]
+ )
+)
+
+# Create policy tag
+policy_tag = client.create_policy_tag(
+ parent=taxonomy.name,
+ policy_tag=datacatalog_v1.PolicyTag(
+ display_name="SSN",
+ description="Social Security Numbers"
+ )
+)
+```
+
+### Apply Policy Tags
+
+```sql
+-- Apply policy tag to column
+ALTER TABLE `project.dataset.customers`
+ALTER COLUMN ssn SET OPTIONS (
+ policy_tags = ['projects/project/locations/us/taxonomies/123/policyTags/ssn']
+);
+
+-- Apply to multiple columns
+ALTER TABLE `project.dataset.customers`
+ALTER COLUMN email SET OPTIONS (
+ policy_tags = ['projects/project/locations/us/taxonomies/123/policyTags/email']
+),
+ALTER COLUMN phone SET OPTIONS (
+ policy_tags = ['projects/project/locations/us/taxonomies/123/policyTags/phone']
+);
+```
+
+### Grant Fine-Grained Access
+
+```sql
+-- Grant access to policy tag
+GRANT `roles/datacatalog.categoryFineGrainedReader`
+ON POLICY TAG `projects/project/locations/us/taxonomies/123/policyTags/ssn`
+TO 'user:compliance_officer@company.com';
+```
+
+## Row-Level Security
+
+### Create Row Access Policy
+
+```sql
+-- Basic filter policy
+CREATE ROW ACCESS POLICY sales_region_policy
+ON `project.dataset.sales`
+GRANT TO ('user:west_manager@company.com')
+FILTER USING (region = 'West');
+
+-- Policy using function
+CREATE ROW ACCESS POLICY dept_policy
+ON `project.dataset.employees`
+GRANT TO ('group:managers@company.com')
+FILTER USING (
+ department IN (
+ SELECT department FROM `project.dataset.manager_departments`
+ WHERE manager_email = SESSION_USER()
+ )
+);
+
+-- Policy for multiple groups
+CREATE ROW ACCESS POLICY multi_region_policy
+ON `project.dataset.sales`
+GRANT TO (
+ 'user:ceo@company.com',
+ 'group:executives@company.com'
+)
+FILTER USING (TRUE); -- Full access
+```
+
+### Manage Policies
+
+```sql
+-- View existing policies
+SELECT *
+FROM `project.dataset.INFORMATION_SCHEMA.ROW_ACCESS_POLICIES`
+WHERE table_name = 'sales';
+
+-- Drop policy
+DROP ROW ACCESS POLICY sales_region_policy
+ON `project.dataset.sales`;
+
+-- Drop all policies on table
+DROP ALL ROW ACCESS POLICIES ON `project.dataset.sales`;
+```
+
+### Best Practices
+
+1. **Use groups instead of individuals** for easier management
+2. **Test policies** with different users before production
+3. **Document policies** for audit purposes
+4. **Consider performance** - complex filters add overhead
+
+## Data Masking
+
+### Dynamic Data Masking
+
+```sql
+-- Create masking rule (preview feature)
+CREATE DATA MASKING RULE email_mask
+ON `project.dataset.customers` (email)
+USING MASK_FUNCTION('email');
+
+-- Custom masking function
+CREATE DATA MASKING RULE ssn_mask
+ON `project.dataset.customers` (ssn)
+USING MASK_FUNCTION(
+ 'partial',
+ STRUCT(
+ 'show_first' AS INT64(0),
+ 'show_last' AS INT64(4),
+ 'mask_char' AS STRING('X')
+ )
+);
+```
+
+### SHA256 Hashing
+
+```sql
+-- Hash sensitive values for analytics
+SELECT
+ TO_HEX(SHA256(email)) AS email_hash,
+ TO_HEX(SHA256(phone)) AS phone_hash,
+ purchase_amount
+FROM `project.dataset.transactions`;
+```
+
+### Tokenization
+
+```sql
+-- Create tokenized view
+CREATE VIEW `project.dataset.tokenized_customers` AS
+SELECT
+ customer_id,
+ FARM_FINGERPRINT(email) AS email_token,
+ CONCAT(
+ SUBSTR(ssn, 1, 3),
+ '-XX-XXXX'
+ ) AS masked_ssn,
+ state,
+ signup_date
+FROM `project.dataset.customers`;
+```
+
+### Data Redaction with DLP
+
+```python
+# Using Cloud DLP API for redaction
+from google.cloud import dlp_v2
+
+dlp = dlp_v2.DlpServiceClient()
+
+# De-identify configuration
+deidentify_config = {
+ "record_transformations": {
+ "field_transformations": [
+ {
+ "fields": [{"name": "email"}],
+ "primitive_transformation": {
+ "character_mask_config": {
+ "masking_character": "*",
+ "number_to_mask": 0,
+ "characters_to_ignore": [
+ {"characters_to_skip": "@."}
+ ]
+ }
+ }
+ }
+ ]
+ }
+}
+```
+
+## Encryption
+
+### Default Encryption
+
+BigQuery encrypts all data at rest by default using Google-managed keys.
+
+### Customer-Managed Encryption Keys (CMEK)
+
+```sql
+-- Create dataset with CMEK
+CREATE SCHEMA `project.dataset`
+OPTIONS (
+ default_kms_key_name = 'projects/project/locations/us/keyRings/ring/cryptoKeys/key'
+);
+
+-- Create table with CMEK
+CREATE TABLE `project.dataset.secure_table`
+(id INT64, data STRING)
+OPTIONS (
+ kms_key_name = 'projects/project/locations/us/keyRings/ring/cryptoKeys/key'
+);
+
+-- Query encryption info
+SELECT
+ table_name,
+ kms_key_name
+FROM `project.dataset.INFORMATION_SCHEMA.TABLE_OPTIONS`
+WHERE option_name = 'kms_key_name';
+```
+
+### Key Rotation
+
+CMEK keys can be rotated in Cloud KMS. BigQuery automatically uses new key versions for encryption while maintaining access to data encrypted with previous versions.
+
+## Audit Logging
+
+### Enable Audit Logs
+
+Audit logs are enabled at the project level in Cloud Console or via gcloud:
+
+```bash
+# Enable data access logs
+gcloud projects set-iam-policy PROJECT policy.yaml
+```
+
+### Query Audit Logs
+
+```sql
+-- Query job audit logs
+SELECT
+ timestamp,
+ protopayload_auditlog.methodName,
+ protopayload_auditlog.resourceName,
+ protopayload_auditlog.authenticationInfo.principalEmail
+FROM `project.region.INFORMATION_SCHEMA.JOBS_BY_PROJECT`
+WHERE creation_time > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 24 HOUR);
+
+-- Query from Cloud Logging export
+SELECT
+ timestamp,
+ JSON_VALUE(protopayload_auditlog, '$.methodName') AS method,
+ JSON_VALUE(protopayload_auditlog, '$.resourceName') AS resource,
+ JSON_VALUE(protopayload_auditlog, '$.authenticationInfo.principalEmail') AS user
+FROM `project.dataset.cloudaudit_googleapis_com_data_access`
+WHERE DATE(timestamp) = CURRENT_DATE();
+```
+
+### Key Audit Events
+
+| Event | Method Name | Description |
+|-------|-------------|-------------|
+| Query Run | `jobservice.jobcompleted` | Query executed |
+| Data Read | `tabledata.list` | Table data accessed |
+| Table Create | `tables.insert` | Table created |
+| Permission Change | `setIamPolicy` | Permissions modified |
+
+## Data Catalog Integration
+
+### Tag Tables with Business Metadata
+
+```sql
+-- Add table description
+ALTER TABLE `project.dataset.customers`
+SET OPTIONS (description = 'Customer master data. Contains PII.');
+
+-- Add labels
+ALTER TABLE `project.dataset.customers`
+SET OPTIONS (
+ labels = [
+ ('data_classification', 'confidential'),
+ ('data_owner', 'customer_team'),
+ ('pii', 'true')
+ ]
+);
+```
+
+### Search Data Catalog
+
+```python
+from google.cloud import datacatalog_v1
+
+client = datacatalog_v1.DataCatalogClient()
+
+# Search for PII tables
+scope = datacatalog_v1.SearchCatalogRequest.Scope(
+ include_project_ids=["my-project"]
+)
+
+results = client.search_catalog(
+ scope=scope,
+ query="tag:pii=true"
+)
+
+for result in results:
+ print(result.relative_resource_name)
+```
+
+## Compliance Patterns
+
+### GDPR - Right to Erasure
+
+```sql
+-- Delete user data
+DELETE FROM `project.dataset.customers`
+WHERE customer_id = @customer_id;
+
+DELETE FROM `project.dataset.orders`
+WHERE customer_id = @customer_id;
+
+DELETE FROM `project.dataset.activity_logs`
+WHERE user_id = @customer_id;
+```
+
+### GDPR - Data Export
+
+```sql
+-- Export user data
+EXPORT DATA OPTIONS(
+ uri='gs://bucket/exports/user_*.json',
+ format='JSON'
+) AS
+SELECT * FROM `project.dataset.customers`
+WHERE customer_id = @customer_id;
+```
+
+### HIPAA - Minimum Necessary
+
+```sql
+-- Create limited view for specific use case
+CREATE VIEW `project.dataset.treatment_summary` AS
+SELECT
+ patient_id, -- De-identified
+ treatment_category,
+ treatment_date,
+ outcome
+FROM `project.dataset.treatments`;
+-- Excludes PHI columns
+```
+
+## References
+
+- `IAM_ROLES.md` - Complete IAM role reference
+- `POLICY_TAGS.md` - Policy tag setup guide
+- `AUDIT_QUERIES.md` - Common audit log queries
+
+## Scripts
+
+- `audit_report.py` - Generate access audit report
+- `permission_scanner.py` - Scan for permission issues
+
+## Limitations
+
+- Row-level policies: Max 100 per table
+- Column-level security: Requires Data Catalog
+- Data masking: Preview feature
+- Audit logs: 30-day retention (default)
diff --git a/src/google/adk/skills/bigquery-governance/scripts/audit_report.py b/src/google/adk/skills/bigquery-governance/scripts/audit_report.py
new file mode 100644
index 0000000000..ffb944fdf9
--- /dev/null
+++ b/src/google/adk/skills/bigquery-governance/scripts/audit_report.py
@@ -0,0 +1,282 @@
+"""Generate BigQuery access audit report.
+
+This script analyzes BigQuery access patterns and generates
+reports for security and compliance purposes.
+
+Usage:
+ python audit_report.py --project PROJECT
+ python audit_report.py --project PROJECT --days 7 --format json
+"""
+
+import argparse
+from datetime import datetime
+import json
+import sys
+
+
+def get_audit_report(project: str, days: int = 7) -> dict:
+ """Generate access audit report."""
+ try:
+ from google.cloud import bigquery
+
+ client = bigquery.Client(project=project)
+
+ # Query access patterns
+ access_query = f"""
+ SELECT
+ user_email,
+ statement_type,
+ COUNT(*) AS operation_count,
+ COUNT(DISTINCT DATE(creation_time)) AS active_days,
+ ARRAY_AGG(DISTINCT t.dataset_id IGNORE NULLS) AS datasets_accessed,
+ MIN(creation_time) AS first_access,
+ MAX(creation_time) AS last_access
+ FROM `region-us.INFORMATION_SCHEMA.JOBS_BY_PROJECT` j,
+ UNNEST(referenced_tables) AS t
+ WHERE creation_time > TIMESTAMP_SUB(
+ CURRENT_TIMESTAMP(),
+ INTERVAL {days} DAY
+ )
+ GROUP BY user_email, statement_type
+ ORDER BY operation_count DESC
+ """
+
+ access_results = list(client.query(access_query).result())
+
+ # Query data modifications
+ modification_query = f"""
+ SELECT
+ user_email,
+ destination_table.dataset_id AS dataset,
+ destination_table.table_id AS table_name,
+ statement_type,
+ COUNT(*) AS modification_count,
+ MAX(creation_time) AS last_modified
+ FROM `region-us.INFORMATION_SCHEMA.JOBS_BY_PROJECT`
+ WHERE creation_time > TIMESTAMP_SUB(
+ CURRENT_TIMESTAMP(),
+ INTERVAL {days} DAY
+ )
+ AND statement_type IN ('INSERT', 'UPDATE', 'DELETE', 'MERGE', 'TRUNCATE_TABLE')
+ GROUP BY 1, 2, 3, 4
+ ORDER BY modification_count DESC
+ LIMIT 50
+ """
+
+ modification_results = list(client.query(modification_query).result())
+
+ # Query failed operations
+ failure_query = f"""
+ SELECT
+ user_email,
+ error_result.reason AS error_reason,
+ error_result.message AS error_message,
+ COUNT(*) AS failure_count
+ FROM `region-us.INFORMATION_SCHEMA.JOBS_BY_PROJECT`
+ WHERE creation_time > TIMESTAMP_SUB(
+ CURRENT_TIMESTAMP(),
+ INTERVAL {days} DAY
+ )
+ AND error_result IS NOT NULL
+ GROUP BY 1, 2, 3
+ ORDER BY failure_count DESC
+ LIMIT 20
+ """
+
+ failure_results = list(client.query(failure_query).result())
+
+ # Build report
+ user_activity = []
+ for row in access_results:
+ user_activity.append({
+ "user": row.user_email,
+ "operation_type": row.statement_type,
+ "count": row.operation_count,
+ "active_days": row.active_days,
+ "datasets": (
+ list(row.datasets_accessed) if row.datasets_accessed else []
+ ),
+ "first_access": str(row.first_access) if row.first_access else None,
+ "last_access": str(row.last_access) if row.last_access else None,
+ })
+
+ modifications = []
+ for row in modification_results:
+ modifications.append({
+ "user": row.user_email,
+ "dataset": row.dataset,
+ "table": row.table_name,
+ "operation": row.statement_type,
+ "count": row.modification_count,
+ "last_modified": (
+ str(row.last_modified) if row.last_modified else None
+ ),
+ })
+
+ failures = []
+ for row in failure_results:
+ failures.append({
+ "user": row.user_email,
+ "reason": row.error_reason,
+ "message": row.error_message[:100] if row.error_message else "",
+ "count": row.failure_count,
+ })
+
+ # Calculate summary
+ unique_users = len(set(r["user"] for r in user_activity))
+ total_operations = sum(r["count"] for r in user_activity)
+ total_modifications = sum(r["count"] for r in modifications)
+ total_failures = sum(r["count"] for r in failures)
+
+ return {
+ "project": project,
+ "period_days": days,
+ "generated_at": datetime.utcnow().isoformat(),
+ "summary": {
+ "unique_users": unique_users,
+ "total_operations": total_operations,
+ "total_modifications": total_modifications,
+ "total_failures": total_failures,
+ },
+ "user_activity": user_activity[:30],
+ "data_modifications": modifications,
+ "access_failures": failures,
+ }
+ except Exception as e:
+ return {"error": str(e)}
+
+
+def check_anomalies(report: dict) -> list:
+ """Check for anomalous access patterns."""
+ anomalies = []
+
+ # Check for unusual access times (would need timestamp analysis)
+ # Check for high failure rates
+ failures = report.get("access_failures", [])
+ for failure in failures:
+ if failure["count"] > 10 and "PERMISSION_DENIED" in (
+ failure.get("reason") or ""
+ ):
+ anomalies.append({
+ "severity": "HIGH",
+ "type": "Permission Denied",
+ "description": (
+ f"User {failure['user']} had {failure['count']} "
+ "permission denied errors. Possible unauthorized access attempt."
+ ),
+ })
+
+ # Check for bulk modifications
+ modifications = report.get("data_modifications", [])
+ for mod in modifications:
+ if mod["count"] > 100 and mod["operation"] in ("DELETE", "TRUNCATE_TABLE"):
+ anomalies.append({
+ "severity": "MEDIUM",
+ "type": "Bulk Deletion",
+ "description": (
+ f"User {mod['user']} performed {mod['count']} {mod['operation']} "
+ f"operations on {mod['dataset']}.{mod['table']}."
+ ),
+ })
+
+ # Check for new users accessing sensitive datasets
+ # (This would require a list of sensitive datasets)
+
+ return anomalies
+
+
+def format_report(report: dict, anomalies: list) -> str:
+ """Format report for display."""
+ output = []
+ output.append("=" * 70)
+ output.append("BIGQUERY ACCESS AUDIT REPORT")
+ output.append("=" * 70)
+
+ if "error" in report:
+ output.append(f"\nError: {report['error']}")
+ return "\n".join(output)
+
+ output.append(f"\nProject: {report['project']}")
+ output.append(f"Period: Last {report['period_days']} days")
+ output.append(f"Generated: {report['generated_at']}")
+
+ summary = report["summary"]
+ output.append("\n## Summary")
+ output.append(f" Unique Users: {summary['unique_users']}")
+ output.append(f" Total Operations: {summary['total_operations']:,}")
+ output.append(f" Data Modifications: {summary['total_modifications']:,}")
+ output.append(f" Failed Operations: {summary['total_failures']:,}")
+
+ if anomalies:
+ output.append("\n## ANOMALIES DETECTED")
+ output.append("-" * 70)
+ for anomaly in anomalies:
+ output.append(f"\n[{anomaly['severity']}] {anomaly['type']}")
+ output.append(f" {anomaly['description']}")
+
+ output.append("\n## User Activity (Top 20)")
+ output.append("-" * 70)
+ output.append(f"{'User':<40} {'Operation':<15} {'Count':>10}")
+ output.append("-" * 70)
+ for activity in report["user_activity"][:20]:
+ output.append(
+ f"{activity['user'][:40]:<40} "
+ f"{activity['operation_type']:<15} "
+ f"{activity['count']:>10,}"
+ )
+
+ if report["data_modifications"]:
+ output.append("\n## Data Modifications")
+ output.append("-" * 70)
+ output.append(f"{'User':<30} {'Table':<30} {'Op':<10} {'Count':>8}")
+ output.append("-" * 70)
+ for mod in report["data_modifications"][:15]:
+ table_name = f"{mod['dataset']}.{mod['table']}"[:30]
+ output.append(
+ f"{mod['user'][:30]:<30} "
+ f"{table_name:<30} "
+ f"{mod['operation']:<10} "
+ f"{mod['count']:>8,}"
+ )
+
+ if report["access_failures"]:
+ output.append("\n## Access Failures")
+ output.append("-" * 70)
+ for failure in report["access_failures"][:10]:
+ output.append(f"\nUser: {failure['user']}")
+ output.append(f" Reason: {failure['reason']}")
+ output.append(f" Count: {failure['count']}")
+
+ output.append("\n" + "=" * 70)
+ return "\n".join(output)
+
+
+def main():
+ parser = argparse.ArgumentParser(
+ description="Generate BigQuery access audit report"
+ )
+ parser.add_argument("--project", required=True, help="GCP project ID")
+ parser.add_argument(
+ "--days",
+ type=int,
+ default=7,
+ help="Number of days to analyze (default: 7)",
+ )
+ parser.add_argument(
+ "--format", choices=["text", "json"], default="text", help="Output format"
+ )
+
+ args = parser.parse_args()
+
+ report = get_audit_report(args.project, args.days)
+ anomalies = check_anomalies(report)
+
+ if args.format == "json":
+ output = {"report": report, "anomalies": anomalies}
+ print(json.dumps(output, indent=2, default=str))
+ else:
+ print(format_report(report, anomalies))
+
+
+if __name__ == "__main__":
+ main()
diff --git a/src/google/adk/skills/bigquery-integration/SKILL.md b/src/google/adk/skills/bigquery-integration/SKILL.md
new file mode 100644
index 0000000000..1ca6b15199
--- /dev/null
+++ b/src/google/adk/skills/bigquery-integration/SKILL.md
@@ -0,0 +1,572 @@
+---
+name: bigquery-integration
+description: Integrate BigQuery with external systems - client libraries, REST APIs, JDBC/ODBC drivers, Data Transfer Service, Dataflow, and third-party tools. Use when connecting BigQuery to applications, pipelines, or BI tools.
+license: Apache-2.0
+compatibility: BigQuery, Python, Java, Node.js, JDBC, ODBC
+metadata:
+ author: Google Cloud
+ version: "1.0"
+ category: integration
+adk:
+ config:
+ timeout_seconds: 300
+ max_parallel_calls: 10
+ allowed_callers:
+ - bigquery_agent
+ - integration_agent
+ - developer_agent
+---
+
+# BigQuery Integration Skill
+
+Integrate BigQuery with external systems including client libraries, REST APIs, JDBC/ODBC drivers, Data Transfer Service, and third-party tools.
+
+## When to Use This Skill
+
+Use this skill when you need to:
+- Connect applications to BigQuery using client libraries
+- Set up JDBC/ODBC connections for BI tools
+- Configure Data Transfer Service for data ingestion
+- Integrate with Dataflow for streaming
+- Connect BigQuery to Looker, Tableau, or other tools
+- Use the REST API directly
+
+## Integration Options
+
+| Method | Use Case | Best For |
+|--------|----------|----------|
+| **Python Client** | Data science, ETL | Python applications |
+| **Java Client** | Enterprise apps | Java/JVM applications |
+| **Node.js Client** | Web apps, APIs | JavaScript applications |
+| **REST API** | Any language | Custom integrations |
+| **JDBC/ODBC** | BI tools | Tableau, Power BI |
+| **Data Transfer** | Scheduled imports | External data sources |
+
+## Quick Start
+
+### Python Client
+
+```python
+from google.cloud import bigquery
+
+# Initialize client
+client = bigquery.Client(project='my-project')
+
+# Run query
+query = "SELECT * FROM `project.dataset.table` LIMIT 100"
+df = client.query(query).to_dataframe()
+
+# Load data
+job_config = bigquery.LoadJobConfig(
+ source_format=bigquery.SourceFormat.CSV,
+ skip_leading_rows=1,
+)
+client.load_table_from_uri(
+ "gs://bucket/data.csv",
+ "project.dataset.table",
+ job_config=job_config
+).result()
+```
+
+### REST API
+
+```bash
+curl -X POST \
+ -H "Authorization: Bearer $(gcloud auth print-access-token)" \
+ -H "Content-Type: application/json" \
+ "https://bigquery.googleapis.com/bigquery/v2/projects/PROJECT/queries" \
+ -d '{
+ "query": "SELECT * FROM `project.dataset.table` LIMIT 10",
+ "useLegacySql": false
+ }'
+```
+
+## Python Client Library
+
+### Installation
+
+```bash
+pip install google-cloud-bigquery
+pip install google-cloud-bigquery-storage # For faster reads
+pip install pandas # For DataFrame support
+pip install pyarrow # For Arrow optimization
+```
+
+### Query Execution
+
+```python
+from google.cloud import bigquery
+
+client = bigquery.Client()
+
+# Simple query
+query = """
+ SELECT name, SUM(amount) as total
+ FROM `project.dataset.sales`
+ GROUP BY name
+ ORDER BY total DESC
+ LIMIT 10
+"""
+
+# Execute and get results
+results = client.query(query).result()
+for row in results:
+ print(f"{row.name}: {row.total}")
+
+# To DataFrame
+df = client.query(query).to_dataframe()
+
+# With parameters
+query = """
+ SELECT * FROM `project.dataset.orders`
+ WHERE created_at > @start_date
+ AND status = @status
+"""
+job_config = bigquery.QueryJobConfig(
+ query_parameters=[
+ bigquery.ScalarQueryParameter("start_date", "DATE", "2024-01-01"),
+ bigquery.ScalarQueryParameter("status", "STRING", "completed"),
+ ]
+)
+results = client.query(query, job_config=job_config).result()
+```
+
+### Loading Data
+
+```python
+# From local file
+with open("data.csv", "rb") as f:
+ job = client.load_table_from_file(
+ f,
+ "project.dataset.table",
+ job_config=bigquery.LoadJobConfig(
+ source_format=bigquery.SourceFormat.CSV,
+ skip_leading_rows=1,
+ autodetect=True,
+ )
+ )
+ job.result()
+
+# From GCS
+job = client.load_table_from_uri(
+ "gs://bucket/data/*.parquet",
+ "project.dataset.table",
+ job_config=bigquery.LoadJobConfig(
+ source_format=bigquery.SourceFormat.PARQUET,
+ write_disposition=bigquery.WriteDisposition.WRITE_TRUNCATE,
+ )
+)
+job.result()
+
+# From DataFrame
+import pandas as pd
+
+df = pd.DataFrame({"col1": [1, 2, 3], "col2": ["a", "b", "c"]})
+job = client.load_table_from_dataframe(df, "project.dataset.table")
+job.result()
+```
+
+### Streaming Insert
+
+```python
+# Insert rows immediately (legacy streaming)
+rows = [
+ {"user_id": "123", "event": "click", "timestamp": "2024-01-15T10:00:00"},
+ {"user_id": "456", "event": "view", "timestamp": "2024-01-15T10:00:01"},
+]
+
+errors = client.insert_rows_json("project.dataset.events", rows)
+if errors:
+ print(f"Errors: {errors}")
+```
+
+### Table Management
+
+```python
+# Create table
+schema = [
+ bigquery.SchemaField("id", "INTEGER", mode="REQUIRED"),
+ bigquery.SchemaField("name", "STRING", mode="NULLABLE"),
+ bigquery.SchemaField("created_at", "TIMESTAMP", mode="NULLABLE"),
+]
+
+table = bigquery.Table("project.dataset.new_table", schema=schema)
+table.time_partitioning = bigquery.TimePartitioning(field="created_at")
+table = client.create_table(table)
+
+# Get table info
+table = client.get_table("project.dataset.table")
+print(f"Rows: {table.num_rows}, Size: {table.num_bytes}")
+
+# Delete table
+client.delete_table("project.dataset.table", not_found_ok=True)
+```
+
+### BigQuery Storage API (Faster Reads)
+
+```python
+from google.cloud import bigquery_storage
+
+# Read directly (faster for large results)
+bqstorage_client = bigquery_storage.BigQueryReadClient()
+
+df = client.query(query).to_dataframe(
+ bqstorage_client=bqstorage_client
+)
+```
+
+## Java Client Library
+
+### Maven Dependency
+
+```xml
+
+ com.google.cloud
+ google-cloud-bigquery
+ 2.34.0
+
+```
+
+### Query Execution
+
+```java
+import com.google.cloud.bigquery.*;
+
+BigQuery bigquery = BigQueryOptions.getDefaultInstance().getService();
+
+String query = "SELECT * FROM `project.dataset.table` LIMIT 100";
+QueryJobConfiguration queryConfig = QueryJobConfiguration.newBuilder(query).build();
+
+TableResult results = bigquery.query(queryConfig);
+for (FieldValueList row : results.iterateAll()) {
+ String name = row.get("name").getStringValue();
+ long amount = row.get("amount").getLongValue();
+ System.out.println(name + ": " + amount);
+}
+```
+
+### Loading Data
+
+```java
+// From GCS
+LoadJobConfiguration loadConfig = LoadJobConfiguration.newBuilder(
+ TableId.of("dataset", "table"),
+ "gs://bucket/data/*.csv"
+)
+ .setFormatOptions(CsvOptions.newBuilder().setSkipLeadingRows(1).build())
+ .setAutodetect(true)
+ .build();
+
+Job job = bigquery.create(JobInfo.of(loadConfig));
+job.waitFor();
+```
+
+## Node.js Client Library
+
+### Installation
+
+```bash
+npm install @google-cloud/bigquery
+```
+
+### Query Execution
+
+```javascript
+const {BigQuery} = require('@google-cloud/bigquery');
+
+const bigquery = new BigQuery();
+
+async function runQuery() {
+ const query = `
+ SELECT name, SUM(amount) as total
+ FROM \`project.dataset.sales\`
+ GROUP BY name
+ LIMIT 10
+ `;
+
+ const [rows] = await bigquery.query({query});
+ rows.forEach(row => console.log(`${row.name}: ${row.total}`));
+}
+
+runQuery();
+```
+
+### Loading Data
+
+```javascript
+async function loadFromGCS() {
+ const [job] = await bigquery
+ .dataset('dataset')
+ .table('table')
+ .load('gs://bucket/data.csv', {
+ sourceFormat: 'CSV',
+ skipLeadingRows: 1,
+ autodetect: true,
+ });
+
+ console.log(`Job ${job.id} completed.`);
+}
+```
+
+## REST API
+
+### Authentication
+
+```bash
+# Get access token
+ACCESS_TOKEN=$(gcloud auth print-access-token)
+
+# Or use service account
+gcloud auth activate-service-account --key-file=key.json
+```
+
+### Query
+
+```bash
+curl -X POST \
+ -H "Authorization: Bearer $ACCESS_TOKEN" \
+ -H "Content-Type: application/json" \
+ "https://bigquery.googleapis.com/bigquery/v2/projects/PROJECT/queries" \
+ -d '{
+ "query": "SELECT * FROM `project.dataset.table` LIMIT 10",
+ "useLegacySql": false,
+ "maxResults": 100
+ }'
+```
+
+### Get Query Results
+
+```bash
+curl -X GET \
+ -H "Authorization: Bearer $ACCESS_TOKEN" \
+ "https://bigquery.googleapis.com/bigquery/v2/projects/PROJECT/queries/JOB_ID"
+```
+
+### List Tables
+
+```bash
+curl -X GET \
+ -H "Authorization: Bearer $ACCESS_TOKEN" \
+ "https://bigquery.googleapis.com/bigquery/v2/projects/PROJECT/datasets/DATASET/tables"
+```
+
+## JDBC/ODBC Drivers
+
+### JDBC Connection
+
+```java
+// JDBC URL format
+String url = "jdbc:bigquery://https://www.googleapis.com/bigquery/v2:443;"
+ + "ProjectId=PROJECT_ID;"
+ + "OAuthType=0;"
+ + "OAuthServiceAcctEmail=service@project.iam.gserviceaccount.com;"
+ + "OAuthPvtKeyPath=/path/to/key.json;";
+
+Connection conn = DriverManager.getConnection(url);
+Statement stmt = conn.createStatement();
+ResultSet rs = stmt.executeQuery("SELECT * FROM dataset.table");
+```
+
+### ODBC Connection String
+
+```
+Driver={Simba ODBC Driver for Google BigQuery};
+Catalog=PROJECT_ID;
+OAuthMechanism=0;
+Email=service@project.iam.gserviceaccount.com;
+KeyFilePath=/path/to/key.json;
+```
+
+### BI Tool Connections
+
+**Tableau:**
+1. Use "Google BigQuery" connector
+2. Select authentication method
+3. Choose project and dataset
+
+**Power BI:**
+1. Get Data > Google BigQuery
+2. Sign in with Google account
+3. Select tables/views
+
+**Looker:**
+1. Admin > Connections
+2. New Connection > BigQuery
+3. Configure project and authentication
+
+## Data Transfer Service
+
+### Supported Sources
+
+| Source | Description |
+|--------|-------------|
+| Google Ads | Marketing data |
+| Campaign Manager | Advertising data |
+| Google Play | App analytics |
+| YouTube | Channel analytics |
+| Cloud Storage | File imports |
+| Amazon S3 | Cross-cloud transfer |
+| Teradata | Migration |
+| Amazon Redshift | Migration |
+
+### Create Transfer (Python)
+
+```python
+from google.cloud import bigquery_datatransfer
+
+client = bigquery_datatransfer.DataTransferServiceClient()
+
+# Cloud Storage transfer
+transfer_config = bigquery_datatransfer.TransferConfig(
+ destination_dataset_id="my_dataset",
+ display_name="GCS Daily Import",
+ data_source_id="google_cloud_storage",
+ schedule="every 24 hours",
+ params={
+ "data_path_template": "gs://bucket/data/dt=*/*.csv",
+ "destination_table_name_template": "daily_data_{run_date}",
+ "file_format": "CSV",
+ "skip_leading_rows": "1",
+ }
+)
+
+response = client.create_transfer_config(
+ parent=f"projects/{project_id}/locations/US",
+ transfer_config=transfer_config
+)
+```
+
+### Scheduled Queries
+
+```python
+transfer_config = bigquery_datatransfer.TransferConfig(
+ destination_dataset_id="analytics",
+ display_name="Daily Aggregation",
+ data_source_id="scheduled_query",
+ schedule="every day 06:00",
+ params={
+ "query": """
+ INSERT INTO `project.analytics.daily_summary`
+ SELECT
+ DATE(timestamp) as date,
+ COUNT(*) as events
+ FROM `project.raw.events`
+ WHERE DATE(timestamp) = DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)
+ GROUP BY 1
+ """,
+ "destination_table_name_template": "",
+ "write_disposition": "WRITE_APPEND",
+ }
+)
+```
+
+## Dataflow Integration
+
+### Read from BigQuery
+
+```python
+import apache_beam as beam
+from apache_beam.io.gcp.bigquery import ReadFromBigQuery
+
+with beam.Pipeline() as pipeline:
+ rows = (
+ pipeline
+ | ReadFromBigQuery(
+ query="SELECT * FROM `project.dataset.table`",
+ use_standard_sql=True
+ )
+ | beam.Map(process_row)
+ )
+```
+
+### Write to BigQuery
+
+```python
+from apache_beam.io.gcp.bigquery import WriteToBigQuery
+
+with beam.Pipeline() as pipeline:
+ (
+ pipeline
+ | beam.Create([{"name": "Alice", "score": 100}])
+ | WriteToBigQuery(
+ table="project:dataset.table",
+ schema="name:STRING,score:INTEGER",
+ create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
+ write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND
+ )
+ )
+```
+
+### Streaming to BigQuery
+
+```python
+from apache_beam.io.gcp.bigquery import WriteToBigQuery, BigQueryDisposition
+
+(
+ pipeline
+ | "Read PubSub" >> beam.io.ReadFromPubSub(topic="projects/p/topics/t")
+ | "Parse JSON" >> beam.Map(json.loads)
+ | "Write BQ" >> WriteToBigQuery(
+ table="project:dataset.streaming_table",
+ method=WriteToBigQuery.Method.STREAMING_INSERTS
+ )
+)
+```
+
+## Pub/Sub Integration
+
+### BigQuery Subscription
+
+```bash
+# Create BigQuery subscription
+gcloud pubsub subscriptions create my-bq-subscription \
+ --topic=my-topic \
+ --bigquery-table=PROJECT:DATASET.TABLE \
+ --write-metadata
+```
+
+### Schema Requirements
+
+```sql
+-- Table schema for Pub/Sub subscription
+CREATE TABLE `project.dataset.pubsub_messages`
+(
+ subscription_name STRING,
+ message_id STRING,
+ publish_time TIMESTAMP,
+ data STRING, -- or BYTES
+ attributes JSON
+);
+```
+
+## Connected Sheets
+
+### Enable Connected Sheets
+
+1. Open Google Sheets
+2. Data > Data connectors > Connect to BigQuery
+3. Select project and dataset
+4. Write query or select table
+
+### Scheduled Refresh
+
+Connected Sheets can be configured to refresh automatically on a schedule.
+
+## References
+
+- `PYTHON_EXAMPLES.md` - Python code examples
+- `API_REFERENCE.md` - REST API endpoints
+- `JDBC_SETUP.md` - JDBC configuration guide
+
+## Scripts
+
+- `connection_test.py` - Test BigQuery connectivity
+- `bulk_loader.py` - Bulk data loading utility
+- `api_client.py` - REST API wrapper
+
+## Limitations
+
+- JDBC/ODBC: Query timeout 6 hours
+- Streaming: 100,000 rows/second per table
+- Data Transfer: Source-specific limits
+- REST API: 10 MB response size
diff --git a/src/google/adk/skills/bigquery-integration/scripts/connection_test.py b/src/google/adk/skills/bigquery-integration/scripts/connection_test.py
new file mode 100644
index 0000000000..9676a7a54c
--- /dev/null
+++ b/src/google/adk/skills/bigquery-integration/scripts/connection_test.py
@@ -0,0 +1,217 @@
+"""Test BigQuery connectivity and permissions.
+
+This script verifies BigQuery connection and checks
+available permissions for the authenticated user.
+
+Usage:
+ python connection_test.py --project PROJECT
+ python connection_test.py --project PROJECT --dataset DATASET
+"""
+
+import argparse
+from datetime import datetime
+import json
+import sys
+
+
+def test_connection(project: str, dataset: str = None) -> dict:
+ """Test BigQuery connection and permissions."""
+ results = {
+ "project": project,
+ "timestamp": datetime.utcnow().isoformat(),
+ "tests": [],
+ }
+
+ try:
+ from google.cloud import bigquery
+
+ client = bigquery.Client(project=project)
+ results["tests"].append({
+ "test": "Client initialization",
+ "status": "PASS",
+ "message": "Successfully created BigQuery client",
+ })
+ except Exception as e:
+ results["tests"].append(
+ {"test": "Client initialization", "status": "FAIL", "message": str(e)}
+ )
+ return results
+
+ # Test list datasets
+ try:
+ datasets = list(client.list_datasets(max_results=5))
+ results["tests"].append({
+ "test": "List datasets",
+ "status": "PASS",
+ "message": f"Found {len(datasets)} datasets",
+ "details": [d.dataset_id for d in datasets],
+ })
+ except Exception as e:
+ results["tests"].append(
+ {"test": "List datasets", "status": "FAIL", "message": str(e)}
+ )
+
+ # Test simple query
+ try:
+ query = "SELECT 1 as test_value"
+ result = list(client.query(query).result())
+ if result[0].test_value == 1:
+ results["tests"].append({
+ "test": "Execute simple query",
+ "status": "PASS",
+ "message": "Successfully executed test query",
+ })
+ else:
+ results["tests"].append({
+ "test": "Execute simple query",
+ "status": "FAIL",
+ "message": "Query returned unexpected result",
+ })
+ except Exception as e:
+ results["tests"].append(
+ {"test": "Execute simple query", "status": "FAIL", "message": str(e)}
+ )
+
+ # Test INFORMATION_SCHEMA access
+ try:
+ query = """
+ SELECT COUNT(*) as job_count
+ FROM `region-us.INFORMATION_SCHEMA.JOBS_BY_PROJECT`
+ WHERE creation_time > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR)
+ """
+ result = list(client.query(query).result())
+ results["tests"].append({
+ "test": "INFORMATION_SCHEMA access",
+ "status": "PASS",
+ "message": f"Found {result[0].job_count} jobs in last hour",
+ })
+ except Exception as e:
+ results["tests"].append({
+ "test": "INFORMATION_SCHEMA access",
+ "status": "FAIL",
+ "message": str(e),
+ })
+
+ # Test specific dataset if provided
+ if dataset:
+ # Test dataset access
+ try:
+ ds = client.get_dataset(f"{project}.{dataset}")
+ results["tests"].append({
+ "test": f"Access dataset {dataset}",
+ "status": "PASS",
+ "message": f"Dataset location: {ds.location}",
+ })
+ except Exception as e:
+ results["tests"].append({
+ "test": f"Access dataset {dataset}",
+ "status": "FAIL",
+ "message": str(e),
+ })
+
+ # Test list tables
+ try:
+ tables = list(client.list_tables(f"{project}.{dataset}", max_results=10))
+ results["tests"].append({
+ "test": f"List tables in {dataset}",
+ "status": "PASS",
+ "message": f"Found {len(tables)} tables",
+ "details": [t.table_id for t in tables],
+ })
+ except Exception as e:
+ results["tests"].append({
+ "test": f"List tables in {dataset}",
+ "status": "FAIL",
+ "message": str(e),
+ })
+
+ # Test dry run (query cost estimation)
+ try:
+ query = (
+ f"SELECT * FROM `{project}.{dataset or 'information_schema'}.TABLES`"
+ " LIMIT 100"
+ )
+ job_config = bigquery.QueryJobConfig(dry_run=True, use_query_cache=False)
+ job = client.query(query, job_config=job_config)
+ results["tests"].append({
+ "test": "Dry run query",
+ "status": "PASS",
+ "message": f"Estimated bytes: {job.total_bytes_processed}",
+ })
+ except Exception as e:
+ results["tests"].append(
+ {"test": "Dry run query", "status": "FAIL", "message": str(e)}
+ )
+
+ # Summary
+ passed = sum(1 for t in results["tests"] if t["status"] == "PASS")
+ failed = sum(1 for t in results["tests"] if t["status"] == "FAIL")
+ results["summary"] = {
+ "total_tests": len(results["tests"]),
+ "passed": passed,
+ "failed": failed,
+ "status": "HEALTHY" if failed == 0 else "UNHEALTHY",
+ }
+
+ return results
+
+
+def format_results(results: dict) -> str:
+ """Format test results for display."""
+ output = []
+ output.append("=" * 60)
+ output.append("BIGQUERY CONNECTION TEST")
+ output.append("=" * 60)
+ output.append(f"\nProject: {results['project']}")
+ output.append(f"Timestamp: {results['timestamp']}")
+
+ output.append("\n## Test Results")
+ output.append("-" * 60)
+
+ for test in results["tests"]:
+ status_icon = "[PASS]" if test["status"] == "PASS" else "[FAIL]"
+ output.append(f"\n{status_icon} {test['test']}")
+ output.append(f" {test['message']}")
+ if "details" in test:
+ for detail in test["details"][:5]:
+ output.append(f" - {detail}")
+ if len(test.get("details", [])) > 5:
+ output.append(f" ... and {len(test['details']) - 5} more")
+
+ summary = results["summary"]
+ output.append("\n## Summary")
+ output.append("-" * 60)
+ output.append(f" Total Tests: {summary['total_tests']}")
+ output.append(f" Passed: {summary['passed']}")
+ output.append(f" Failed: {summary['failed']}")
+ output.append(f" Status: {summary['status']}")
+
+ output.append("\n" + "=" * 60)
+ return "\n".join(output)
+
+
+def main():
+ parser = argparse.ArgumentParser(
+ description="Test BigQuery connectivity and permissions"
+ )
+ parser.add_argument("--project", required=True, help="GCP project ID")
+ parser.add_argument("--dataset", help="Specific dataset to test")
+ parser.add_argument(
+ "--format", choices=["text", "json"], default="text", help="Output format"
+ )
+
+ args = parser.parse_args()
+
+ results = test_connection(args.project, args.dataset)
+
+ if args.format == "json":
+ print(json.dumps(results, indent=2))
+ else:
+ print(format_results(results))
+
+ # Exit with error code if tests failed
+ sys.exit(0 if results["summary"]["status"] == "HEALTHY" else 1)
+
+
+if __name__ == "__main__":
+ main()
diff --git a/src/google/adk/skills/bigquery-storage/SKILL.md b/src/google/adk/skills/bigquery-storage/SKILL.md
new file mode 100644
index 0000000000..1feae7277e
--- /dev/null
+++ b/src/google/adk/skills/bigquery-storage/SKILL.md
@@ -0,0 +1,552 @@
+---
+name: bigquery-storage
+description: Manage BigQuery storage architecture - create/modify tables, schema evolution, snapshots, clones, time travel, and storage optimization. Use when designing table structures, managing schema changes, or optimizing storage costs.
+license: Apache-2.0
+compatibility: BigQuery
+metadata:
+ author: Google Cloud
+ version: "1.0"
+ category: storage
+adk:
+ config:
+ timeout_seconds: 300
+ max_parallel_calls: 5
+ allowed_callers:
+ - bigquery_agent
+ - data_engineer_agent
+ - dba_agent
+---
+
+# BigQuery Storage Skill
+
+Manage BigQuery storage architecture including table creation, schema evolution, snapshots, clones, time travel, and storage optimization.
+
+## When to Use This Skill
+
+Use this skill when you need to:
+- Create and manage tables, views, and materialized views
+- Modify table schemas (add/drop columns, change types)
+- Create table snapshots and clones
+- Use time travel to query historical data
+- Optimize storage costs and usage
+- Manage datasets and data organization
+
+**Note**: For data loading operations, use `bigquery-data-management` skill.
+
+## Storage Architecture
+
+| Object | Description | Use Case |
+|--------|-------------|----------|
+| **Table** | Structured data storage | Primary data storage |
+| **View** | Virtual table from query | Abstraction layer |
+| **Materialized View** | Pre-computed view | Query acceleration |
+| **Snapshot** | Point-in-time backup | Data protection |
+| **Clone** | Zero-copy table copy | Development/testing |
+
+## Quick Start
+
+### 1. Create a Table
+
+```sql
+CREATE TABLE `project.dataset.users`
+(
+ user_id STRING NOT NULL,
+ email STRING,
+ created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP(),
+ profile STRUCT,
+ tags ARRAY
+)
+OPTIONS (
+ description = 'User profiles',
+ labels = [('team', 'data'), ('env', 'prod')]
+);
+```
+
+### 2. Create a View
+
+```sql
+CREATE VIEW `project.dataset.active_users` AS
+SELECT * FROM `project.dataset.users`
+WHERE last_login > DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY);
+```
+
+### 3. Create a Materialized View
+
+```sql
+CREATE MATERIALIZED VIEW `project.dataset.daily_stats`
+OPTIONS (enable_refresh = true, refresh_interval_minutes = 60)
+AS
+SELECT DATE(event_time) AS date, COUNT(*) AS events
+FROM `project.dataset.events`
+GROUP BY 1;
+```
+
+## Table Creation
+
+### Standard Table
+
+```sql
+CREATE TABLE `project.dataset.table_name`
+(
+ -- Column definitions
+ id INT64 NOT NULL,
+ name STRING,
+ created_at TIMESTAMP,
+ -- Complex types
+ address STRUCT,
+ phone_numbers ARRAY,
+ metadata JSON
+)
+-- Partitioning
+PARTITION BY DATE(created_at)
+-- Clustering
+CLUSTER BY name
+-- Table options
+OPTIONS (
+ description = 'Table description',
+ labels = [('key', 'value')],
+ expiration_timestamp = TIMESTAMP '2025-12-31',
+ partition_expiration_days = 365,
+ require_partition_filter = TRUE,
+ friendly_name = 'My Table'
+);
+```
+
+### Create Table As Select (CTAS)
+
+```sql
+CREATE TABLE `project.dataset.new_table`
+PARTITION BY date_column
+CLUSTER BY category
+OPTIONS (description = 'Derived table')
+AS
+SELECT * FROM `project.dataset.source_table`
+WHERE condition;
+```
+
+### Create If Not Exists
+
+```sql
+CREATE TABLE IF NOT EXISTS `project.dataset.table`
+(id INT64, name STRING);
+```
+
+### Create Or Replace
+
+```sql
+CREATE OR REPLACE TABLE `project.dataset.table`
+(id INT64, name STRING);
+```
+
+## Data Types
+
+### Scalar Types
+
+| Type | Description | Example |
+|------|-------------|---------|
+| `INT64` | 64-bit integer | `12345` |
+| `FLOAT64` | 64-bit float | `3.14159` |
+| `NUMERIC` | Exact decimal (38,9) | `123.456789` |
+| `BIGNUMERIC` | Exact decimal (76,38) | Large precise numbers |
+| `BOOL` | Boolean | `TRUE`, `FALSE` |
+| `STRING` | UTF-8 text | `'Hello'` |
+| `BYTES` | Binary data | `b'\\x00\\x01'` |
+| `DATE` | Calendar date | `DATE '2024-01-15'` |
+| `TIME` | Time of day | `TIME '10:30:00'` |
+| `DATETIME` | Date and time | `DATETIME '2024-01-15 10:30:00'` |
+| `TIMESTAMP` | Point in time (UTC) | `TIMESTAMP '2024-01-15 10:30:00 UTC'` |
+| `GEOGRAPHY` | Geospatial | `ST_GEOGPOINT(-122, 37)` |
+| `JSON` | JSON data | `JSON '{"key": "value"}'` |
+| `INTERVAL` | Duration | `INTERVAL 1 DAY` |
+
+### Complex Types
+
+```sql
+-- STRUCT (named fields)
+STRUCT<
+ name STRING,
+ age INT64,
+ address STRUCT
+>
+
+-- ARRAY
+ARRAY
+ARRAY>
+
+-- Nested example
+CREATE TABLE example (
+ id INT64,
+ orders ARRAY>,
+ total NUMERIC
+ >>
+);
+```
+
+## Schema Evolution
+
+### Add Columns
+
+```sql
+-- Add single column
+ALTER TABLE `project.dataset.table`
+ADD COLUMN new_column STRING;
+
+-- Add column with default
+ALTER TABLE `project.dataset.table`
+ADD COLUMN status STRING DEFAULT 'active';
+
+-- Add nested column
+ALTER TABLE `project.dataset.table`
+ADD COLUMN profile STRUCT;
+```
+
+### Drop Columns
+
+```sql
+-- Drop single column
+ALTER TABLE `project.dataset.table`
+DROP COLUMN column_name;
+
+-- Drop multiple columns
+ALTER TABLE `project.dataset.table`
+DROP COLUMN col1,
+DROP COLUMN col2;
+
+-- Drop if exists
+ALTER TABLE `project.dataset.table`
+DROP COLUMN IF EXISTS maybe_column;
+```
+
+### Rename Columns
+
+```sql
+ALTER TABLE `project.dataset.table`
+RENAME COLUMN old_name TO new_name;
+```
+
+### Change Column Type
+
+```sql
+-- Widen type (INT64 to FLOAT64)
+ALTER TABLE `project.dataset.table`
+ALTER COLUMN numeric_col SET DATA TYPE FLOAT64;
+
+-- String to JSON
+ALTER TABLE `project.dataset.table`
+ALTER COLUMN json_string SET DATA TYPE JSON;
+```
+
+### Set Column Options
+
+```sql
+-- Set default value
+ALTER TABLE `project.dataset.table`
+ALTER COLUMN status SET DEFAULT 'pending';
+
+-- Remove default
+ALTER TABLE `project.dataset.table`
+ALTER COLUMN status DROP DEFAULT;
+
+-- Set NOT NULL (requires no NULL values)
+ALTER TABLE `project.dataset.table`
+ALTER COLUMN id SET NOT NULL;
+```
+
+## Views
+
+### Standard View
+
+```sql
+CREATE VIEW `project.dataset.view_name` AS
+SELECT
+ user_id,
+ COUNT(*) AS order_count,
+ SUM(total) AS total_spent
+FROM `project.dataset.orders`
+GROUP BY user_id;
+```
+
+### Parameterized View (SQL UDF)
+
+```sql
+CREATE TABLE FUNCTION `project.dataset.orders_by_status`(status_param STRING)
+AS (
+ SELECT * FROM `project.dataset.orders`
+ WHERE status = status_param
+);
+
+-- Usage
+SELECT * FROM `project.dataset.orders_by_status`('completed');
+```
+
+### Authorized View
+
+```sql
+-- Grant access to underlying data through view
+ALTER VIEW `project.dataset.view_name`
+SET OPTIONS (
+ description = 'Authorized view for limited access'
+);
+
+-- In dataset permissions, authorize the view
+```
+
+## Materialized Views
+
+### Create Materialized View
+
+```sql
+CREATE MATERIALIZED VIEW `project.dataset.mv_daily_sales`
+OPTIONS (
+ enable_refresh = true,
+ refresh_interval_minutes = 60,
+ max_staleness = INTERVAL 4 HOUR
+)
+AS
+SELECT
+ DATE(sale_time) AS sale_date,
+ product_category,
+ SUM(amount) AS total_sales,
+ COUNT(*) AS transaction_count
+FROM `project.dataset.sales`
+GROUP BY 1, 2;
+```
+
+### Refresh Options
+
+```sql
+-- Manual refresh
+CALL BQ.REFRESH_MATERIALIZED_VIEW('project.dataset.mv_name');
+
+-- Alter refresh settings
+ALTER MATERIALIZED VIEW `project.dataset.mv_name`
+SET OPTIONS (
+ enable_refresh = true,
+ refresh_interval_minutes = 30
+);
+```
+
+### Supported Operations
+
+- Aggregations: SUM, COUNT, AVG, MIN, MAX, etc.
+- GROUP BY
+- INNER JOIN (limited)
+- Filters (WHERE)
+- Window functions (limited)
+
+## Snapshots
+
+### Create Snapshot
+
+```sql
+CREATE SNAPSHOT TABLE `project.dataset.orders_snapshot_20240115`
+CLONE `project.dataset.orders`
+OPTIONS (
+ expiration_timestamp = TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 90 DAY)
+);
+```
+
+### Restore from Snapshot
+
+```sql
+-- Restore to new table
+CREATE TABLE `project.dataset.orders_restored`
+CLONE `project.dataset.orders_snapshot_20240115`;
+
+-- Replace existing table
+CREATE OR REPLACE TABLE `project.dataset.orders`
+CLONE `project.dataset.orders_snapshot_20240115`;
+```
+
+## Clones
+
+### Create Clone (Zero-Copy)
+
+```sql
+-- Table clone
+CREATE TABLE `project.dataset.orders_clone`
+CLONE `project.dataset.orders`;
+
+-- Clone from point in time
+CREATE TABLE `project.dataset.orders_yesterday`
+CLONE `project.dataset.orders`
+FOR SYSTEM_TIME AS OF TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY);
+```
+
+### Clone Use Cases
+
+- Development/testing environments
+- What-if analysis
+- Quick backups before changes
+- A/B testing datasets
+
+## Time Travel
+
+### Query Historical Data
+
+```sql
+-- Query as of specific time
+SELECT * FROM `project.dataset.orders`
+FOR SYSTEM_TIME AS OF TIMESTAMP '2024-01-15 10:00:00 UTC';
+
+-- Query from N hours ago
+SELECT * FROM `project.dataset.orders`
+FOR SYSTEM_TIME AS OF TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 2 HOUR);
+
+-- Query from N days ago (up to 7 days)
+SELECT * FROM `project.dataset.orders`
+FOR SYSTEM_TIME AS OF TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 3 DAY);
+```
+
+### Restore Deleted Data
+
+```sql
+-- Recover accidentally deleted rows
+INSERT INTO `project.dataset.orders`
+SELECT * FROM `project.dataset.orders`
+FOR SYSTEM_TIME AS OF TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR)
+WHERE order_id NOT IN (SELECT order_id FROM `project.dataset.orders`);
+```
+
+### Time Travel Window
+
+- Default: 7 days
+- Configurable per table: 2-7 days
+- Storage charged for historical versions
+
+```sql
+-- Set time travel window
+ALTER TABLE `project.dataset.table`
+SET OPTIONS (max_time_travel_hours = 48); -- 2 days
+```
+
+## Dataset Management
+
+### Create Dataset
+
+```sql
+CREATE SCHEMA `project.dataset_name`
+OPTIONS (
+ location = 'US',
+ default_table_expiration_days = 90,
+ default_partition_expiration_days = 365,
+ description = 'Dataset description',
+ labels = [('team', 'analytics')]
+);
+```
+
+### Alter Dataset
+
+```sql
+ALTER SCHEMA `project.dataset`
+SET OPTIONS (
+ default_table_expiration_days = 180,
+ description = 'Updated description'
+);
+```
+
+### Drop Dataset
+
+```sql
+-- Drop empty dataset
+DROP SCHEMA `project.dataset`;
+
+-- Drop with all contents
+DROP SCHEMA `project.dataset` CASCADE;
+```
+
+## Table Operations
+
+### Copy Table
+
+```sql
+-- Copy within project
+CREATE TABLE `project.dataset.table_copy`
+COPY `project.dataset.original_table`;
+
+-- Copy across datasets
+CREATE TABLE `project.other_dataset.table`
+COPY `project.dataset.table`;
+```
+
+### Rename Table
+
+```sql
+ALTER TABLE `project.dataset.old_name`
+RENAME TO `project.dataset.new_name`;
+```
+
+### Set Table Options
+
+```sql
+ALTER TABLE `project.dataset.table`
+SET OPTIONS (
+ description = 'New description',
+ expiration_timestamp = TIMESTAMP '2025-12-31',
+ labels = [('status', 'archive')]
+);
+```
+
+### Drop Table
+
+```sql
+DROP TABLE `project.dataset.table`;
+DROP TABLE IF EXISTS `project.dataset.table`;
+```
+
+## Storage Optimization
+
+### Check Storage Usage
+
+```sql
+SELECT
+ table_name,
+ ROUND(total_logical_bytes / 1024 / 1024 / 1024, 2) AS logical_gb,
+ ROUND(total_physical_bytes / 1024 / 1024 / 1024, 2) AS physical_gb,
+ ROUND(time_travel_physical_bytes / 1024 / 1024 / 1024, 2) AS time_travel_gb
+FROM `project.dataset.INFORMATION_SCHEMA.TABLE_STORAGE`;
+```
+
+### Long-term Storage
+
+Tables not modified for 90 days automatically move to long-term storage (50% cheaper).
+
+### Reduce Time Travel
+
+```sql
+-- Reduce to minimum (2 days)
+ALTER TABLE `project.dataset.table`
+SET OPTIONS (max_time_travel_hours = 48);
+```
+
+### Delete Old Data
+
+```sql
+-- Delete old partitions
+DELETE FROM `project.dataset.events`
+WHERE DATE(event_time) < DATE_SUB(CURRENT_DATE(), INTERVAL 365 DAY);
+```
+
+## References
+
+- `DATA_TYPES.md` - Complete type reference
+- `SCHEMA_EVOLUTION.md` - Schema change patterns
+- `OPTIMIZATION.md` - Storage cost optimization
+
+## Scripts
+
+- `storage_report.py` - Generate storage usage report
+- `schema_diff.py` - Compare table schemas
+
+## Limitations
+
+- Time travel: Maximum 7 days
+- Snapshots: Count against storage quota
+- Schema changes: Some type changes not allowed
+- Clones: Base table changes propagate
diff --git a/src/google/adk/skills/bigquery-storage/scripts/storage_report.py b/src/google/adk/skills/bigquery-storage/scripts/storage_report.py
new file mode 100644
index 0000000000..c0c67c573a
--- /dev/null
+++ b/src/google/adk/skills/bigquery-storage/scripts/storage_report.py
@@ -0,0 +1,214 @@
+"""Generate BigQuery storage usage report.
+
+This script provides detailed storage analysis including
+table sizes, time travel usage, and cost estimates.
+
+Usage:
+ python storage_report.py --project PROJECT
+ python storage_report.py --project PROJECT --dataset DATASET
+ python storage_report.py --project PROJECT --format json
+"""
+
+import argparse
+from datetime import datetime
+import json
+import sys
+
+
+def get_storage_report(project: str, dataset: str = None) -> dict:
+ """Generate storage usage report."""
+ try:
+ from google.cloud import bigquery
+
+ client = bigquery.Client(project=project)
+
+ # Build query
+ if dataset:
+ query = f"""
+ SELECT
+ table_schema AS dataset_id,
+ table_name,
+ table_type,
+ total_rows,
+ total_logical_bytes,
+ total_physical_bytes,
+ time_travel_physical_bytes,
+ creation_time,
+ last_modified_time
+ FROM `{project}.{dataset}.INFORMATION_SCHEMA.TABLE_STORAGE`
+ ORDER BY total_physical_bytes DESC
+ """
+ else:
+ query = f"""
+ SELECT
+ table_catalog AS project_id,
+ table_schema AS dataset_id,
+ table_name,
+ total_rows,
+ total_logical_bytes,
+ total_physical_bytes,
+ time_travel_physical_bytes
+ FROM `{project}.region-us.INFORMATION_SCHEMA.TABLE_STORAGE`
+ ORDER BY total_physical_bytes DESC
+ LIMIT 100
+ """
+
+ results = client.query(query).result()
+
+ tables = []
+ total_logical = 0
+ total_physical = 0
+ total_time_travel = 0
+
+ for row in results:
+ table_info = {
+ "dataset": row.dataset_id if hasattr(row, "dataset_id") else "",
+ "table": row.table_name,
+ "rows": row.total_rows,
+ "logical_bytes": row.total_logical_bytes,
+ "physical_bytes": row.total_physical_bytes,
+ "time_travel_bytes": row.time_travel_physical_bytes or 0,
+ }
+
+ # Calculate compression ratio
+ if row.total_logical_bytes and row.total_physical_bytes:
+ table_info["compression_ratio"] = round(
+ row.total_logical_bytes / row.total_physical_bytes, 2
+ )
+
+ tables.append(table_info)
+ total_logical += row.total_logical_bytes or 0
+ total_physical += row.total_physical_bytes or 0
+ total_time_travel += row.time_travel_physical_bytes or 0
+
+ # Calculate costs (approximate)
+ # Active storage: $0.02/GB/month, Long-term: $0.01/GB/month
+ active_cost = (total_physical / 1024**3) * 0.02
+ time_travel_cost = (total_time_travel / 1024**3) * 0.02
+
+ return {
+ "project": project,
+ "dataset": dataset,
+ "generated_at": datetime.utcnow().isoformat(),
+ "summary": {
+ "table_count": len(tables),
+ "total_logical_gb": round(total_logical / 1024**3, 4),
+ "total_physical_gb": round(total_physical / 1024**3, 4),
+ "time_travel_gb": round(total_time_travel / 1024**3, 4),
+ "estimated_monthly_cost_usd": round(
+ active_cost + time_travel_cost, 2
+ ),
+ },
+ "tables": tables[:50], # Limit to top 50
+ }
+ except Exception as e:
+ return {"error": str(e)}
+
+
+def format_bytes(bytes_val: int) -> str:
+ """Format bytes to human-readable string."""
+ for unit in ["B", "KB", "MB", "GB", "TB"]:
+ if bytes_val < 1024:
+ return f"{bytes_val:.2f} {unit}"
+ bytes_val /= 1024
+ return f"{bytes_val:.2f} PB"
+
+
+def format_report(report: dict) -> str:
+ """Format report for display."""
+ output = []
+ output.append("=" * 70)
+ output.append("BIGQUERY STORAGE REPORT")
+ output.append("=" * 70)
+
+ if "error" in report:
+ output.append(f"\nError: {report['error']}")
+ return "\n".join(output)
+
+ output.append(f"\nProject: {report['project']}")
+ if report.get("dataset"):
+ output.append(f"Dataset: {report['dataset']}")
+ output.append(f"Generated: {report['generated_at']}")
+
+ summary = report["summary"]
+ output.append("\n## Summary")
+ output.append(f" Tables: {summary['table_count']}")
+ output.append(f" Logical Size: {summary['total_logical_gb']:.4f} GB")
+ output.append(f" Physical Size: {summary['total_physical_gb']:.4f} GB")
+ output.append(f" Time Travel: {summary['time_travel_gb']:.4f} GB")
+ output.append(
+ f" Est. Monthly Cost: ${summary['estimated_monthly_cost_usd']:.2f}"
+ )
+
+ output.append("\n## Top Tables by Physical Size")
+ output.append("-" * 70)
+ output.append(f"{'Dataset':<20} {'Table':<25} {'Physical':<12} {'Ratio':<8}")
+ output.append("-" * 70)
+
+ for table in report["tables"][:20]:
+ physical = format_bytes(table["physical_bytes"])
+ ratio = table.get("compression_ratio", "-")
+ output.append(
+ f"{table['dataset'][:20]:<20} "
+ f"{table['table'][:25]:<25} "
+ f"{physical:<12} "
+ f"{ratio}"
+ )
+
+ # Time travel analysis
+ high_tt_tables = [
+ t for t in report["tables"] if t["time_travel_bytes"] > 1024**3 # > 1GB
+ ]
+ if high_tt_tables:
+ output.append("\n## Tables with High Time Travel Usage (>1GB)")
+ output.append("-" * 70)
+ for table in high_tt_tables[:10]:
+ tt_size = format_bytes(table["time_travel_bytes"])
+ output.append(f" {table['dataset']}.{table['table']}: {tt_size}")
+
+ # Recommendations
+ output.append("\n## Recommendations")
+
+ if summary["time_travel_gb"] > summary["total_physical_gb"] * 0.2:
+ output.append(
+ " - Time travel storage is significant. Consider reducing "
+ "max_time_travel_hours on tables with frequent updates."
+ )
+
+ low_compression = [
+ t
+ for t in report["tables"]
+ if t.get("compression_ratio", 0) < 1.5 and t["physical_bytes"] > 1024**3
+ ]
+ if low_compression:
+ output.append(
+ " - Some large tables have low compression. "
+ "Consider using columnar formats for better compression."
+ )
+
+ output.append("\n" + "=" * 70)
+ return "\n".join(output)
+
+
+def main():
+ parser = argparse.ArgumentParser(
+ description="Generate BigQuery storage usage report"
+ )
+ parser.add_argument("--project", required=True, help="GCP project ID")
+ parser.add_argument("--dataset", help="Specific dataset to analyze")
+ parser.add_argument(
+ "--format", choices=["text", "json"], default="text", help="Output format"
+ )
+
+ args = parser.parse_args()
+
+ report = get_storage_report(args.project, args.dataset)
+
+ if args.format == "json":
+ print(json.dumps(report, indent=2, default=str))
+ else:
+ print(format_report(report))
+
+
+if __name__ == "__main__":
+ main()
diff --git a/src/google/adk/skills/bqml/SKILL.md b/src/google/adk/skills/bqml/SKILL.md
new file mode 100644
index 0000000000..3f78a77333
--- /dev/null
+++ b/src/google/adk/skills/bqml/SKILL.md
@@ -0,0 +1,369 @@
+---
+name: bqml
+description: Train and deploy traditional ML models in BigQuery using SQL - classification, regression, clustering, time series forecasting, and recommendations. Use when building predictive models on BigQuery data without data movement.
+license: Apache-2.0
+compatibility: BigQuery
+metadata:
+ author: Google Cloud
+ version: "2.0"
+ category: machine-learning
+adk:
+ config:
+ timeout_seconds: 600
+ max_parallel_calls: 3
+ allowed_callers:
+ - bigquery_agent
+ - data_science_agent
+ - ml_agent
+---
+
+# BQML Skill
+
+BigQuery ML (BQML) enables training, evaluating, and deploying machine learning models directly in BigQuery using SQL. No data movement required.
+
+## When to Use This Skill
+
+Use BQML when you need to:
+- Train classification models (predict categories)
+- Train regression models (predict numeric values)
+- Build time series forecasting models
+- Create clustering/segmentation models
+- Build recommendation systems
+- Detect anomalies in data
+- Import and deploy external models (TensorFlow, ONNX, XGBoost)
+
+**Note**: For generative AI tasks (text generation, embeddings, semantic search), use the `bigquery-ai` skill instead.
+
+## Supported Model Types
+
+| Category | Model Types | Use Cases |
+|----------|-------------|-----------|
+| **Classification** | LOGISTIC_REG, BOOSTED_TREE_CLASSIFIER, RANDOM_FOREST_CLASSIFIER, DNN_CLASSIFIER | Churn prediction, fraud detection, sentiment |
+| **Regression** | LINEAR_REG, BOOSTED_TREE_REGRESSOR, RANDOM_FOREST_REGRESSOR, DNN_REGRESSOR | Price prediction, demand forecasting |
+| **Clustering** | KMEANS | Customer segmentation, anomaly detection |
+| **Time Series** | ARIMA_PLUS, ARIMA_PLUS_XREG | Sales forecasting, demand planning |
+| **Recommendations** | MATRIX_FACTORIZATION | Product recommendations, content suggestions |
+| **Dimensionality Reduction** | PCA, AUTOENCODER | Feature engineering, anomaly detection |
+| **Imported Models** | TENSORFLOW, ONNX, XGBOOST | Deploy pre-trained models |
+
+## Quick Start
+
+### 1. Create a Model
+
+```sql
+-- Logistic regression for classification
+CREATE OR REPLACE MODEL `project.dataset.churn_model`
+OPTIONS(
+ model_type='LOGISTIC_REG',
+ input_label_cols=['churned'],
+ auto_class_weights=TRUE
+) AS
+SELECT
+ tenure,
+ monthly_charges,
+ total_charges,
+ contract_type,
+ churned
+FROM `project.dataset.customer_data`
+WHERE signup_date < '2024-01-01'; -- Training data
+```
+
+### 2. Evaluate the Model
+
+```sql
+SELECT * FROM ML.EVALUATE(
+ MODEL `project.dataset.churn_model`,
+ (SELECT * FROM `project.dataset.customer_data`
+ WHERE signup_date >= '2024-01-01') -- Test data
+);
+```
+
+### 3. Make Predictions
+
+```sql
+SELECT
+ customer_id,
+ predicted_churned,
+ predicted_churned_probs
+FROM ML.PREDICT(
+ MODEL `project.dataset.churn_model`,
+ (SELECT * FROM `project.dataset.new_customers`)
+);
+```
+
+## Core ML Functions
+
+| Function | Description |
+|----------|-------------|
+| `ML.EVALUATE` | Get model evaluation metrics |
+| `ML.PREDICT` | Make predictions with trained model |
+| `ML.EXPLAIN_PREDICT` | Predictions with feature attributions |
+| `ML.FEATURE_INFO` | Get feature statistics |
+| `ML.GLOBAL_EXPLAIN` | Global feature importance |
+| `ML.CONFUSION_MATRIX` | Confusion matrix for classifiers |
+| `ML.ROC_CURVE` | ROC curve data for binary classifiers |
+| `ML.FORECAST` | Time series forecasting |
+| `ML.DETECT_ANOMALIES` | Anomaly detection |
+| `ML.RECOMMEND` | Generate recommendations |
+
+## Model Training Examples
+
+### Classification (Boosted Trees)
+
+```sql
+CREATE OR REPLACE MODEL `project.dataset.fraud_detector`
+OPTIONS(
+ model_type='BOOSTED_TREE_CLASSIFIER',
+ input_label_cols=['is_fraud'],
+ num_parallel_tree=5,
+ max_iterations=50,
+ learn_rate=0.1,
+ early_stop=TRUE,
+ data_split_method='AUTO_SPLIT'
+) AS
+SELECT
+ transaction_amount,
+ merchant_category,
+ time_since_last_transaction,
+ is_international,
+ is_fraud
+FROM `project.dataset.transactions`;
+```
+
+### Regression (Linear)
+
+```sql
+CREATE OR REPLACE MODEL `project.dataset.price_predictor`
+OPTIONS(
+ model_type='LINEAR_REG',
+ input_label_cols=['price'],
+ optimize_strategy='BATCH_GRADIENT_DESCENT',
+ l2_reg=0.1
+) AS
+SELECT
+ square_feet,
+ bedrooms,
+ bathrooms,
+ neighborhood,
+ price
+FROM `project.dataset.housing_data`;
+```
+
+### Time Series Forecasting
+
+```sql
+CREATE OR REPLACE MODEL `project.dataset.sales_forecast`
+OPTIONS(
+ model_type='ARIMA_PLUS',
+ time_series_timestamp_col='date',
+ time_series_data_col='daily_sales',
+ auto_arima=TRUE,
+ holiday_region='US',
+ horizon=30
+) AS
+SELECT date, daily_sales
+FROM `project.dataset.sales_history`
+WHERE date < CURRENT_DATE();
+
+-- Generate forecasts
+SELECT * FROM ML.FORECAST(
+ MODEL `project.dataset.sales_forecast`,
+ STRUCT(30 AS horizon, 0.9 AS confidence_level)
+);
+```
+
+### Clustering
+
+```sql
+CREATE OR REPLACE MODEL `project.dataset.customer_segments`
+OPTIONS(
+ model_type='KMEANS',
+ num_clusters=5,
+ kmeans_init_method='KMEANS++',
+ standardize_features=TRUE
+) AS
+SELECT
+ recency,
+ frequency,
+ monetary_value
+FROM `project.dataset.customer_rfm`;
+
+-- Assign clusters
+SELECT
+ customer_id,
+ CENTROID_ID AS segment
+FROM ML.PREDICT(
+ MODEL `project.dataset.customer_segments`,
+ (SELECT * FROM `project.dataset.customer_rfm`)
+);
+```
+
+### Recommendations
+
+```sql
+CREATE OR REPLACE MODEL `project.dataset.product_recommender`
+OPTIONS(
+ model_type='MATRIX_FACTORIZATION',
+ user_col='user_id',
+ item_col='product_id',
+ rating_col='rating',
+ feedback_type='EXPLICIT',
+ num_factors=20
+) AS
+SELECT user_id, product_id, rating
+FROM `project.dataset.user_ratings`;
+
+-- Generate recommendations
+SELECT * FROM ML.RECOMMEND(
+ MODEL `project.dataset.product_recommender`,
+ (SELECT DISTINCT user_id FROM `project.dataset.active_users`),
+ STRUCT(5 AS top_k)
+);
+```
+
+## Model Evaluation
+
+### Classification Metrics
+
+```sql
+-- Evaluation metrics
+SELECT
+ precision,
+ recall,
+ accuracy,
+ f1_score,
+ log_loss,
+ roc_auc
+FROM ML.EVALUATE(MODEL `project.dataset.classifier`);
+
+-- Confusion matrix
+SELECT * FROM ML.CONFUSION_MATRIX(
+ MODEL `project.dataset.classifier`,
+ (SELECT * FROM test_data)
+);
+
+-- ROC curve
+SELECT * FROM ML.ROC_CURVE(
+ MODEL `project.dataset.classifier`,
+ (SELECT * FROM test_data)
+);
+```
+
+### Regression Metrics
+
+```sql
+SELECT
+ mean_absolute_error,
+ mean_squared_error,
+ mean_squared_log_error,
+ median_absolute_error,
+ r2_score,
+ explained_variance
+FROM ML.EVALUATE(MODEL `project.dataset.regressor`);
+```
+
+## Explainability
+
+### Feature Importance
+
+```sql
+-- Global feature importance
+SELECT *
+FROM ML.GLOBAL_EXPLAIN(MODEL `project.dataset.model`)
+ORDER BY attribution DESC;
+```
+
+### Prediction Explanations
+
+```sql
+-- Per-prediction explanations
+SELECT
+ customer_id,
+ predicted_label,
+ top_feature_attributions
+FROM ML.EXPLAIN_PREDICT(
+ MODEL `project.dataset.churn_model`,
+ (SELECT * FROM `project.dataset.customers` LIMIT 100),
+ STRUCT(5 AS top_k_features)
+);
+```
+
+## Model Management
+
+### Get Model Information
+
+```sql
+-- Model metadata
+SELECT * FROM ML.MODEL_INFO(MODEL `project.dataset.model`);
+
+-- Training info
+SELECT *
+FROM ML.TRAINING_INFO(MODEL `project.dataset.model`);
+
+-- Feature info
+SELECT * FROM ML.FEATURE_INFO(MODEL `project.dataset.model`);
+```
+
+### Export Model
+
+```sql
+-- Export to Cloud Storage
+EXPORT MODEL `project.dataset.model`
+OPTIONS(URI='gs://bucket/model/');
+```
+
+### Drop Model
+
+```sql
+DROP MODEL IF EXISTS `project.dataset.model`;
+```
+
+## Hyperparameter Tuning
+
+```sql
+CREATE OR REPLACE MODEL `project.dataset.tuned_model`
+OPTIONS(
+ model_type='BOOSTED_TREE_CLASSIFIER',
+ input_label_cols=['label'],
+ -- Hyperparameter search
+ num_trials=20,
+ max_parallel_trials=5,
+ hparam_tuning_objectives=['ROC_AUC'],
+ -- Parameter ranges
+ learn_rate=HPARAM_RANGE(0.01, 0.3),
+ max_tree_depth=HPARAM_CANDIDATES([4, 6, 8, 10]),
+ subsample=HPARAM_RANGE(0.5, 1.0)
+) AS
+SELECT * FROM training_data;
+```
+
+## References
+
+Load detailed documentation as needed:
+
+- `MODEL_TYPES.md` - Complete list of model types with parameters
+- `BEST_PRACTICES.md` - Tips for effective BQML usage
+- `SQL_EXAMPLES.md` - Common SQL patterns and examples
+
+## Scripts
+
+Helper scripts for common operations:
+
+- `validate_model.py` - Validate model configuration
+- `export_metrics.py` - Export evaluation metrics to JSON
+
+## Best Practices
+
+1. **Data Splitting**: Use `data_split_method='AUTO_SPLIT'` for automatic train/test split
+2. **Feature Engineering**: BQML handles basic transformations; pre-compute complex features
+3. **Class Imbalance**: Use `auto_class_weights=TRUE` for imbalanced datasets
+4. **Early Stopping**: Enable `early_stop=TRUE` to prevent overfitting
+5. **Regularization**: Use L1/L2 regularization for linear models
+6. **Model Selection**: Start simple (linear models) before complex (boosted trees, DNN)
+
+## Limitations
+
+- Training data must fit in BigQuery (no streaming)
+- Limited deep learning capabilities vs. Vertex AI
+- Some model types require specific data formats
+- Hyperparameter tuning has limited search space
diff --git a/src/google/adk/skills/bqml/references/BEST_PRACTICES.md b/src/google/adk/skills/bqml/references/BEST_PRACTICES.md
new file mode 100644
index 0000000000..dbd831a843
--- /dev/null
+++ b/src/google/adk/skills/bqml/references/BEST_PRACTICES.md
@@ -0,0 +1,120 @@
+# BQML Best Practices
+
+## Data Preparation
+
+### Feature Engineering
+- **Normalize numeric features**: Use `ML.STANDARD_SCALER` or `ML.MIN_MAX_SCALER`
+- **Handle missing values**: BQML handles NULLs automatically, but explicit imputation may improve results
+- **Encode categoricals**: BQML auto-encodes, but one-hot encoding can help for high-cardinality features
+- **Create interaction features**: Combine related features when domain knowledge suggests
+
+### Data Quality
+- Remove duplicates before training
+- Handle outliers appropriately (cap, remove, or transform)
+- Ensure sufficient training data (rule of thumb: 10x features minimum)
+- Check for data leakage from target to features
+
+### Train/Test Split
+```sql
+-- Use a hash-based split for reproducibility
+SELECT *,
+ MOD(ABS(FARM_FINGERPRINT(CAST(id AS STRING))), 10) AS split_group
+FROM table
+-- split_group < 8 for training, >= 8 for evaluation
+```
+
+## Model Selection
+
+### Choose the Right Model Type
+| Problem Type | Recommended Models |
+|-------------|-------------------|
+| Binary classification | LOGISTIC_REG, BOOSTED_TREE_CLASSIFIER |
+| Multi-class | LOGISTIC_REG (with auto_class_weights), DNN_CLASSIFIER |
+| Regression | LINEAR_REG, BOOSTED_TREE_REGRESSOR |
+| Time series | ARIMA_PLUS |
+| Clustering | KMEANS |
+| Recommendations | MATRIX_FACTORIZATION |
+
+### Start Simple
+1. Begin with linear models (fast, interpretable)
+2. Move to boosted trees if linear underperforms
+3. Use deep learning only when data volume justifies complexity
+
+## Hyperparameter Tuning
+
+### Automated Tuning
+```sql
+CREATE OR REPLACE MODEL `project.dataset.model`
+OPTIONS(
+ model_type='BOOSTED_TREE_CLASSIFIER',
+ num_trials=20,
+ max_parallel_trials=5,
+ hparam_tuning_objectives=['ROC_AUC']
+) AS
+SELECT * FROM training_data
+```
+
+### Key Parameters by Model Type
+
+**Boosted Trees**:
+- `num_parallel_tree`: 1-10 (start with 1)
+- `max_iterations`: 20-500
+- `learn_rate`: 0.01-0.3
+- `subsample`: 0.5-1.0
+
+**DNN**:
+- `hidden_units`: Start with [64, 32] or [128, 64, 32]
+- `dropout`: 0.1-0.5
+- `batch_size`: 256-4096
+
+## Evaluation
+
+### Classification Metrics
+- **Precision/Recall**: When classes are imbalanced
+- **ROC AUC**: Overall discriminative ability
+- **Log Loss**: For probability calibration
+- **Confusion Matrix**: Understand error types
+
+### Regression Metrics
+- **RMSE**: Penalizes large errors
+- **MAE**: More robust to outliers
+- **R-squared**: Explained variance
+- **MAPE**: Percentage error interpretation
+
+### Cross-Validation
+```sql
+-- Use k-fold cross-validation
+SELECT *
+FROM ML.CROSS_VALIDATE(
+ MODEL `project.dataset.model`,
+ TABLE `project.dataset.data`,
+ STRUCT(5 AS num_folds)
+)
+```
+
+## Production Deployment
+
+### Model Versioning
+- Use descriptive model names with versions: `model_v1`, `model_v2`
+- Document model changes in metadata
+- Keep training queries in version control
+
+### Monitoring
+- Track prediction drift over time
+- Monitor feature distributions
+- Set up alerts for model performance degradation
+- Retrain on schedule or when metrics decline
+
+### Cost Optimization
+- Use `DATA_SPLIT_METHOD='RANDOM'` for large datasets
+- Limit `max_iterations` during experimentation
+- Use slots reservation for production training
+- Consider batch predictions vs. real-time for cost
+
+## Common Pitfalls
+
+1. **Data Leakage**: Features that contain target information
+2. **Class Imbalance**: Use `auto_class_weights=TRUE` or resampling
+3. **Overfitting**: Monitor train vs. eval metrics, use regularization
+4. **Feature Scaling**: Required for some models (DNN, linear with regularization)
+5. **Timestamp Handling**: Ensure proper time-based splits for time series
diff --git a/src/google/adk/skills/bqml/references/MODEL_TYPES.md b/src/google/adk/skills/bqml/references/MODEL_TYPES.md
new file mode 100644
index 0000000000..6bfc9dc9c0
--- /dev/null
+++ b/src/google/adk/skills/bqml/references/MODEL_TYPES.md
@@ -0,0 +1,142 @@
+# BQML Model Types Reference
+
+## Supervised Learning
+
+### Linear Regression
+- **Type**: `LINEAR_REG`
+- **Use case**: Predict continuous numeric values
+- **Example**: Predict sales, prices, quantities
+
+```sql
+CREATE OR REPLACE MODEL `project.dataset.model_name`
+OPTIONS(
+ model_type='LINEAR_REG',
+ input_label_cols=['target_column']
+) AS
+SELECT * FROM `project.dataset.training_data`
+```
+
+### Logistic Regression
+- **Type**: `LOGISTIC_REG`
+- **Use case**: Binary or multiclass classification
+- **Example**: Predict churn, fraud detection
+
+```sql
+CREATE OR REPLACE MODEL `project.dataset.model_name`
+OPTIONS(
+ model_type='LOGISTIC_REG',
+ input_label_cols=['label_column'],
+ auto_class_weights=TRUE
+) AS
+SELECT * FROM `project.dataset.training_data`
+```
+
+### Boosted Tree Classifier
+- **Type**: `BOOSTED_TREE_CLASSIFIER`
+- **Use case**: Complex classification with feature interactions
+- **Parameters**: `num_parallel_tree`, `max_iterations`, `learn_rate`
+
+### Boosted Tree Regressor
+- **Type**: `BOOSTED_TREE_REGRESSOR`
+- **Use case**: Complex regression with non-linear relationships
+
+### Random Forest Classifier
+- **Type**: `RANDOM_FOREST_CLASSIFIER`
+- **Use case**: Ensemble classification
+
+### Random Forest Regressor
+- **Type**: `RANDOM_FOREST_REGRESSOR`
+- **Use case**: Ensemble regression
+
+### DNN Classifier
+- **Type**: `DNN_CLASSIFIER`
+- **Use case**: Deep learning for classification
+- **Parameters**: `hidden_units`, `dropout`, `batch_size`
+
+### DNN Regressor
+- **Type**: `DNN_REGRESSOR`
+- **Use case**: Deep learning for regression
+
+### Wide & Deep
+- **Type**: `DNN_LINEAR_COMBINED_CLASSIFIER` / `DNN_LINEAR_COMBINED_REGRESSOR`
+- **Use case**: Combines memorization (wide) with generalization (deep)
+
+## Unsupervised Learning
+
+### K-Means Clustering
+- **Type**: `KMEANS`
+- **Use case**: Customer segmentation, anomaly detection
+- **Parameters**: `num_clusters`, `kmeans_init_method`
+
+```sql
+CREATE OR REPLACE MODEL `project.dataset.model_name`
+OPTIONS(
+ model_type='KMEANS',
+ num_clusters=5
+) AS
+SELECT feature1, feature2 FROM `project.dataset.data`
+```
+
+### PCA (Principal Component Analysis)
+- **Type**: `PCA`
+- **Use case**: Dimensionality reduction
+- **Parameters**: `num_principal_components`
+
+### Autoencoder
+- **Type**: `AUTOENCODER`
+- **Use case**: Anomaly detection, feature learning
+
+## Time Series
+
+### ARIMA Plus
+- **Type**: `ARIMA_PLUS`
+- **Use case**: Time series forecasting
+- **Parameters**: `time_series_timestamp_col`, `time_series_data_col`, `horizon`
+
+```sql
+CREATE OR REPLACE MODEL `project.dataset.model_name`
+OPTIONS(
+ model_type='ARIMA_PLUS',
+ time_series_timestamp_col='date',
+ time_series_data_col='sales',
+ horizon=30
+) AS
+SELECT date, sales FROM `project.dataset.time_series_data`
+```
+
+## Matrix Factorization
+
+### Matrix Factorization
+- **Type**: `MATRIX_FACTORIZATION`
+- **Use case**: Recommendation systems
+- **Parameters**: `user_col`, `item_col`, `rating_col`, `num_factors`
+
+## Imported Models
+
+### TensorFlow
+- **Type**: `TENSORFLOW`
+- **Use case**: Import trained TensorFlow SavedModel
+
+### ONNX
+- **Type**: `ONNX`
+- **Use case**: Import ONNX models
+
+### XGBoost
+- **Type**: `XGBOOST`
+- **Use case**: Import XGBoost models
+
+## LLM Integration
+
+### Remote Models
+- **Type**: `remote` with `REMOTE_SERVICE_TYPE`
+- **Use case**: Connect to Vertex AI LLMs
+- **Supported**: Gemini, PaLM, Claude (via Model Garden)
+
+```sql
+CREATE OR REPLACE MODEL `project.dataset.llm_model`
+REMOTE WITH CONNECTION `project.region.connection_name`
+OPTIONS(
+ REMOTE_SERVICE_TYPE='CLOUD_AI_LARGE_LANGUAGE_MODEL_V1',
+ endpoint='gemini-pro'
+)
+```
diff --git a/src/google/adk/skills/bqml/references/SQL_EXAMPLES.md b/src/google/adk/skills/bqml/references/SQL_EXAMPLES.md
new file mode 100644
index 0000000000..f805be671f
--- /dev/null
+++ b/src/google/adk/skills/bqml/references/SQL_EXAMPLES.md
@@ -0,0 +1,197 @@
+# BQML SQL Examples
+
+## Creating Models
+
+### Classification Model
+```sql
+CREATE OR REPLACE MODEL `project.dataset.churn_model`
+OPTIONS(
+ model_type='BOOSTED_TREE_CLASSIFIER',
+ input_label_cols=['churned'],
+ auto_class_weights=TRUE,
+ max_iterations=50,
+ early_stop=TRUE
+) AS
+SELECT
+ customer_id,
+ tenure_months,
+ monthly_charges,
+ total_charges,
+ contract_type,
+ payment_method,
+ churned
+FROM `project.dataset.customer_data`
+WHERE data_split = 'TRAIN'
+```
+
+### Regression Model
+```sql
+CREATE OR REPLACE MODEL `project.dataset.sales_forecast`
+OPTIONS(
+ model_type='LINEAR_REG',
+ input_label_cols=['sales'],
+ l2_reg=0.1,
+ max_iterations=100
+) AS
+SELECT
+ product_category,
+ region,
+ month,
+ marketing_spend,
+ sales
+FROM `project.dataset.sales_data`
+```
+
+### Time Series Forecasting
+```sql
+CREATE OR REPLACE MODEL `project.dataset.demand_forecast`
+OPTIONS(
+ model_type='ARIMA_PLUS',
+ time_series_timestamp_col='date',
+ time_series_data_col='demand',
+ auto_arima=TRUE,
+ horizon=30,
+ holiday_region='US'
+) AS
+SELECT
+ date,
+ demand
+FROM `project.dataset.daily_demand`
+```
+
+### Clustering
+```sql
+CREATE OR REPLACE MODEL `project.dataset.customer_segments`
+OPTIONS(
+ model_type='KMEANS',
+ num_clusters=5,
+ standardize_features=TRUE
+) AS
+SELECT
+ recency,
+ frequency,
+ monetary_value
+FROM `project.dataset.rfm_data`
+```
+
+## Evaluating Models
+
+### Get Evaluation Metrics
+```sql
+SELECT *
+FROM ML.EVALUATE(MODEL `project.dataset.churn_model`,
+ (SELECT * FROM `project.dataset.customer_data` WHERE data_split = 'TEST')
+)
+```
+
+### Confusion Matrix
+```sql
+SELECT *
+FROM ML.CONFUSION_MATRIX(MODEL `project.dataset.churn_model`,
+ (SELECT * FROM `project.dataset.customer_data` WHERE data_split = 'TEST')
+)
+```
+
+### ROC Curve
+```sql
+SELECT *
+FROM ML.ROC_CURVE(MODEL `project.dataset.churn_model`,
+ (SELECT * FROM `project.dataset.customer_data` WHERE data_split = 'TEST')
+)
+```
+
+### Feature Importance
+```sql
+SELECT *
+FROM ML.FEATURE_IMPORTANCE(MODEL `project.dataset.churn_model`)
+ORDER BY importance_weight DESC
+```
+
+## Making Predictions
+
+### Basic Prediction
+```sql
+SELECT *
+FROM ML.PREDICT(MODEL `project.dataset.churn_model`,
+ (SELECT * FROM `project.dataset.new_customers`)
+)
+```
+
+### Prediction with Threshold
+```sql
+SELECT
+ customer_id,
+ predicted_churned,
+ predicted_churned_probs[OFFSET(1)].prob AS churn_probability
+FROM ML.PREDICT(MODEL `project.dataset.churn_model`,
+ (SELECT * FROM `project.dataset.new_customers`)
+)
+WHERE predicted_churned_probs[OFFSET(1)].prob > 0.7
+```
+
+### Explainable Predictions
+```sql
+SELECT *
+FROM ML.EXPLAIN_PREDICT(MODEL `project.dataset.churn_model`,
+ (SELECT * FROM `project.dataset.new_customers`),
+ STRUCT(3 AS top_k_features)
+)
+```
+
+### Time Series Forecast
+```sql
+SELECT *
+FROM ML.FORECAST(MODEL `project.dataset.demand_forecast`,
+ STRUCT(30 AS horizon, 0.95 AS confidence_level)
+)
+```
+
+## Advanced Patterns
+
+### Batch Scoring with Partitioning
+```sql
+CREATE OR REPLACE TABLE `project.dataset.predictions`
+PARTITION BY DATE(prediction_date)
+AS
+SELECT
+ CURRENT_DATE() AS prediction_date,
+ p.*
+FROM ML.PREDICT(MODEL `project.dataset.model`,
+ (SELECT * FROM `project.dataset.input_data`)
+) p
+```
+
+### Model Comparison
+```sql
+WITH model_metrics AS (
+ SELECT 'model_v1' AS model, * FROM ML.EVALUATE(MODEL `project.dataset.model_v1`)
+ UNION ALL
+ SELECT 'model_v2' AS model, * FROM ML.EVALUATE(MODEL `project.dataset.model_v2`)
+)
+SELECT model, precision, recall, f1_score, roc_auc
+FROM model_metrics
+```
+
+### Incremental Training
+```sql
+CREATE OR REPLACE MODEL `project.dataset.model`
+OPTIONS(
+ model_type='BOOSTED_TREE_CLASSIFIER',
+ warm_start=TRUE -- Continue from existing model
+) AS
+SELECT * FROM `project.dataset.new_training_data`
+```
+
+### Transform at Prediction Time
+```sql
+SELECT *
+FROM ML.PREDICT(MODEL `project.dataset.model`,
+ (
+ SELECT
+ IFNULL(feature1, 0) AS feature1,
+ LOG(feature2 + 1) AS feature2,
+ LOWER(category) AS category
+ FROM `project.dataset.raw_input`
+ )
+)
+```
diff --git a/src/google/adk/skills/bqml/scripts/export_metrics.py b/src/google/adk/skills/bqml/scripts/export_metrics.py
new file mode 100644
index 0000000000..40d024564f
--- /dev/null
+++ b/src/google/adk/skills/bqml/scripts/export_metrics.py
@@ -0,0 +1,188 @@
+#!/usr/bin/env python3
+# Copyright 2025 Google LLC
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Export BQML model evaluation metrics to JSON format."""
+
+import argparse
+from datetime import datetime
+import json
+import sys
+
+
+def format_metrics(raw_metrics: dict, model_name: str = None) -> dict:
+ """Format evaluation metrics for export.
+
+ Args:
+ raw_metrics: Raw metrics from ML.EVALUATE
+ model_name: Optional model name to include
+
+ Returns:
+ Formatted metrics dictionary
+ """
+ formatted = {
+ "timestamp": datetime.utcnow().isoformat() + "Z",
+ "model_name": model_name,
+ "metrics": {},
+ }
+
+ # Classification metrics
+ classification_keys = [
+ "precision",
+ "recall",
+ "accuracy",
+ "f1_score",
+ "log_loss",
+ "roc_auc",
+ ]
+
+ # Regression metrics
+ regression_keys = [
+ "mean_absolute_error",
+ "mean_squared_error",
+ "mean_squared_log_error",
+ "median_absolute_error",
+ "r2_score",
+ "explained_variance",
+ ]
+
+ # Clustering metrics
+ clustering_keys = [
+ "davies_bouldin_index",
+ "mean_squared_distance",
+ ]
+
+ all_keys = classification_keys + regression_keys + clustering_keys
+
+ for key in all_keys:
+ if key in raw_metrics:
+ value = raw_metrics[key]
+ # Round floats for readability
+ if isinstance(value, float):
+ value = round(value, 6)
+ formatted["metrics"][key] = value
+
+ # Determine model type from available metrics
+ if any(k in raw_metrics for k in ["roc_auc", "precision", "recall"]):
+ formatted["model_category"] = "classification"
+ elif any(k in raw_metrics for k in ["mean_squared_error", "r2_score"]):
+ formatted["model_category"] = "regression"
+ elif any(k in raw_metrics for k in ["davies_bouldin_index"]):
+ formatted["model_category"] = "clustering"
+ else:
+ formatted["model_category"] = "unknown"
+
+ return formatted
+
+
+def summarize_metrics(metrics: dict) -> str:
+ """Create a human-readable summary of metrics.
+
+ Args:
+ metrics: Formatted metrics dictionary
+
+ Returns:
+ Summary string
+ """
+ lines = []
+ lines.append(f"Model: {metrics.get('model_name', 'Unknown')}")
+ lines.append(f"Category: {metrics.get('model_category', 'Unknown')}")
+ lines.append(f"Evaluated: {metrics.get('timestamp', 'Unknown')}")
+ lines.append("")
+
+ m = metrics.get("metrics", {})
+
+ if metrics.get("model_category") == "classification":
+ lines.append("Classification Metrics:")
+ if "accuracy" in m:
+ lines.append(f" Accuracy: {m['accuracy']:.4f}")
+ if "precision" in m:
+ lines.append(f" Precision: {m['precision']:.4f}")
+ if "recall" in m:
+ lines.append(f" Recall: {m['recall']:.4f}")
+ if "f1_score" in m:
+ lines.append(f" F1 Score: {m['f1_score']:.4f}")
+ if "roc_auc" in m:
+ lines.append(f" ROC AUC: {m['roc_auc']:.4f}")
+
+ elif metrics.get("model_category") == "regression":
+ lines.append("Regression Metrics:")
+ if "r2_score" in m:
+ lines.append(f" R² Score: {m['r2_score']:.4f}")
+ if "mean_absolute_error" in m:
+ lines.append(f" MAE: {m['mean_absolute_error']:.4f}")
+ if "mean_squared_error" in m:
+ lines.append(f" MSE: {m['mean_squared_error']:.4f}")
+
+ elif metrics.get("model_category") == "clustering":
+ lines.append("Clustering Metrics:")
+ if "davies_bouldin_index" in m:
+ lines.append(f" Davies-Bouldin Index: {m['davies_bouldin_index']:.4f}")
+ if "mean_squared_distance" in m:
+ lines.append(f" Mean Squared Distance: {m['mean_squared_distance']:.4f}")
+
+ return "\n".join(lines)
+
+
+def main():
+ parser = argparse.ArgumentParser(
+ description="Export BQML evaluation metrics to JSON"
+ )
+ parser.add_argument(
+ "--metrics",
+ type=str,
+ required=True,
+ help="JSON string with raw metrics from ML.EVALUATE",
+ )
+ parser.add_argument(
+ "--model-name",
+ type=str,
+ help="Model name to include in output",
+ )
+ parser.add_argument(
+ "--output",
+ type=str,
+ help="Output file path (default: stdout)",
+ )
+ parser.add_argument(
+ "--summary",
+ action="store_true",
+ help="Print human-readable summary instead of JSON",
+ )
+
+ args = parser.parse_args()
+
+ try:
+ raw_metrics = json.loads(args.metrics)
+ except json.JSONDecodeError as e:
+ print(f"Error parsing metrics JSON: {e}", file=sys.stderr)
+ sys.exit(1)
+
+ formatted = format_metrics(raw_metrics, args.model_name)
+
+ if args.summary:
+ output = summarize_metrics(formatted)
+ else:
+ output = json.dumps(formatted, indent=2)
+
+ if args.output:
+ with open(args.output, "w") as f:
+ f.write(output)
+ print(f"Metrics exported to: {args.output}")
+ else:
+ print(output)
+
+
+if __name__ == "__main__":
+ main()
diff --git a/src/google/adk/skills/bqml/scripts/validate_model.py b/src/google/adk/skills/bqml/scripts/validate_model.py
new file mode 100644
index 0000000000..68d0f64634
--- /dev/null
+++ b/src/google/adk/skills/bqml/scripts/validate_model.py
@@ -0,0 +1,189 @@
+#!/usr/bin/env python3
+# Copyright 2025 Google LLC
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Validate BQML model configuration before training."""
+
+import argparse
+import json
+import sys
+
+VALID_MODEL_TYPES = {
+ "LINEAR_REG",
+ "LOGISTIC_REG",
+ "KMEANS",
+ "MATRIX_FACTORIZATION",
+ "PCA",
+ "AUTOENCODER",
+ "DNN_CLASSIFIER",
+ "DNN_REGRESSOR",
+ "DNN_LINEAR_COMBINED_CLASSIFIER",
+ "DNN_LINEAR_COMBINED_REGRESSOR",
+ "BOOSTED_TREE_CLASSIFIER",
+ "BOOSTED_TREE_REGRESSOR",
+ "RANDOM_FOREST_CLASSIFIER",
+ "RANDOM_FOREST_REGRESSOR",
+ "ARIMA_PLUS",
+ "TENSORFLOW",
+ "ONNX",
+ "XGBOOST",
+}
+
+SUPERVISED_TYPES = {
+ "LINEAR_REG",
+ "LOGISTIC_REG",
+ "DNN_CLASSIFIER",
+ "DNN_REGRESSOR",
+ "DNN_LINEAR_COMBINED_CLASSIFIER",
+ "DNN_LINEAR_COMBINED_REGRESSOR",
+ "BOOSTED_TREE_CLASSIFIER",
+ "BOOSTED_TREE_REGRESSOR",
+ "RANDOM_FOREST_CLASSIFIER",
+ "RANDOM_FOREST_REGRESSOR",
+}
+
+
+def validate_model_config(config: dict) -> dict:
+ """Validate model configuration.
+
+ Args:
+ config: Model configuration dictionary with keys:
+ - model_type: Type of model
+ - input_label_cols: Label column(s) for supervised learning
+ - features: List of feature columns
+ - options: Additional model options
+
+ Returns:
+ Validation result with 'valid' boolean and 'errors'/'warnings' lists.
+ """
+ errors = []
+ warnings = []
+
+ # Check model type
+ model_type = config.get("model_type", "").upper()
+ if not model_type:
+ errors.append("model_type is required")
+ elif model_type not in VALID_MODEL_TYPES:
+ errors.append(
+ f"Invalid model_type: {model_type}. "
+ f"Valid types: {', '.join(sorted(VALID_MODEL_TYPES))}"
+ )
+
+ # Check label columns for supervised learning
+ if model_type in SUPERVISED_TYPES:
+ label_cols = config.get("input_label_cols", [])
+ if not label_cols:
+ errors.append(
+ f"input_label_cols required for supervised model type: {model_type}"
+ )
+
+ # Check features
+ features = config.get("features", [])
+ if not features:
+ warnings.append("No features specified - will use all columns from query")
+
+ # Validate options
+ options = config.get("options", {})
+
+ # Check max_iterations
+ max_iter = options.get("max_iterations")
+ if max_iter is not None:
+ if max_iter < 1:
+ errors.append("max_iterations must be >= 1")
+ elif max_iter > 500:
+ warnings.append(
+ f"max_iterations={max_iter} is high - consider starting lower"
+ )
+
+ # Check learn_rate for boosted trees
+ if "BOOSTED" in model_type:
+ learn_rate = options.get("learn_rate")
+ if learn_rate is not None:
+ if learn_rate <= 0 or learn_rate > 1:
+ errors.append("learn_rate must be in (0, 1]")
+ elif learn_rate > 0.3:
+ warnings.append(
+ f"learn_rate={learn_rate} is high - may cause overfitting"
+ )
+
+ # Check num_clusters for KMEANS
+ if model_type == "KMEANS":
+ num_clusters = options.get("num_clusters")
+ if num_clusters is None:
+ warnings.append("num_clusters not specified - will use default")
+ elif num_clusters < 2:
+ errors.append("num_clusters must be >= 2")
+
+ return {
+ "valid": len(errors) == 0,
+ "errors": errors,
+ "warnings": warnings,
+ "model_type": model_type,
+ }
+
+
+def main():
+ parser = argparse.ArgumentParser(
+ description="Validate BQML model configuration"
+ )
+ parser.add_argument(
+ "--config",
+ type=str,
+ help="JSON string or file path with model configuration",
+ )
+ parser.add_argument(
+ "--model-type",
+ type=str,
+ help="Model type (alternative to --config)",
+ )
+ parser.add_argument(
+ "--label-cols",
+ type=str,
+ nargs="+",
+ help="Label columns for supervised learning",
+ )
+
+ args = parser.parse_args()
+
+ # Build config from arguments
+ if args.config:
+ try:
+ # Try as JSON string first
+ config = json.loads(args.config)
+ except json.JSONDecodeError:
+ # Try as file path
+ try:
+ with open(args.config) as f:
+ config = json.load(f)
+ except FileNotFoundError:
+ print(f"Error: Config file not found: {args.config}")
+ sys.exit(1)
+ else:
+ config = {
+ "model_type": args.model_type or "",
+ "input_label_cols": args.label_cols or [],
+ }
+
+ # Validate
+ result = validate_model_config(config)
+
+ # Output result
+ print(json.dumps(result, indent=2))
+
+ # Exit with error code if invalid
+ sys.exit(0 if result["valid"] else 1)
+
+
+if __name__ == "__main__":
+ main()
diff --git a/src/google/adk/skills/markdown_skill.py b/src/google/adk/skills/markdown_skill.py
new file mode 100644
index 0000000000..0bb3165d08
--- /dev/null
+++ b/src/google/adk/skills/markdown_skill.py
@@ -0,0 +1,599 @@
+# Copyright 2025 Google LLC
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""MarkdownSkill implementation for Agent Skills standard SKILL.md format.
+
+This module provides support for loading skills defined using the Agent Skills
+standard (https://agentskills.io), enabling skills built with SKILL.md files
+to be used directly as ADK Skills.
+"""
+
+from __future__ import annotations
+
+from pathlib import Path
+import re
+from typing import Any
+from typing import Dict
+from typing import List
+from typing import Optional
+from typing import Union
+
+from pydantic import BaseModel
+from pydantic import ConfigDict
+from pydantic import Field
+from pydantic import PrivateAttr
+
+from ..utils.feature_decorator import experimental
+from .base_skill import BaseSkill
+from .base_skill import SkillConfig
+
+
+class MarkdownSkillMetadata(BaseModel):
+ """Metadata extracted from SKILL.md frontmatter.
+
+ Follows the Agent Skills standard specification for frontmatter fields.
+ """
+
+ model_config = ConfigDict(extra="allow")
+
+ name: str = Field(
+ description="Unique identifier for the skill (max 64 chars).",
+ )
+ description: str = Field(
+ description="LLM-readable description of what the skill does.",
+ )
+ license: Optional[str] = Field(
+ default=None,
+ description="License name or reference to bundled license file.",
+ )
+ compatibility: Optional[str] = Field(
+ default=None,
+ description="Environment requirements (max 500 chars).",
+ )
+ metadata: Dict[str, Any] = Field(
+ default_factory=dict,
+ description="Additional metadata (author, version, etc.).",
+ )
+ allowed_tools: Optional[str] = Field(
+ default=None,
+ description="Space-delimited list of pre-approved tools.",
+ alias="allowed-tools",
+ )
+
+ # ADK-specific extensions
+ adk: Optional[Dict[str, Any]] = Field(
+ default=None,
+ description="ADK-specific configuration extensions.",
+ )
+
+
+@experimental
+class MarkdownSkill(BaseSkill):
+ """Skill loaded from Agent Skills standard SKILL.md format.
+
+ Supports progressive disclosure with three loading stages:
+ - Stage 1 (Discovery): Only name and description loaded (~100 tokens)
+ - Stage 2 (Activation): Full SKILL.md content loaded (~2000-5000 tokens)
+ - Stage 3 (Execution): Scripts and references loaded on-demand
+
+ Example:
+ ```python
+ # Load a skill from directory
+ skill = MarkdownSkill.from_directory("/path/to/pdf-processing")
+
+ # Stage 1: Metadata only (loaded at construction)
+ print(skill.name) # "pdf-processing"
+ print(skill.description) # "Extract text from PDFs..."
+
+ # Stage 2: Full instructions (loaded on demand)
+ instructions = skill.get_instructions()
+
+ # Stage 3: Access scripts/references (loaded on demand)
+ script = skill.get_script("extract_text.py")
+ reference = skill.get_reference("FORMS.md")
+ ```
+ """
+
+ model_config = ConfigDict(
+ extra="forbid",
+ arbitrary_types_allowed=True,
+ )
+
+ # Path to the skill directory
+ skill_path: Path = Field(
+ description="Absolute path to the skill directory.",
+ )
+
+ # Parsed frontmatter metadata
+ skill_metadata: MarkdownSkillMetadata = Field(
+ description="Parsed metadata from SKILL.md frontmatter.",
+ )
+
+ # Private attributes for caching (not part of model)
+ _instructions_cache: Optional[str] = PrivateAttr(default=None)
+ _scripts_cache: Dict[str, str] = PrivateAttr(default_factory=dict)
+ _references_cache: Dict[str, str] = PrivateAttr(default_factory=dict)
+ _current_stage: int = PrivateAttr(default=1)
+
+ @classmethod
+ def from_directory(
+ cls,
+ skill_dir: Union[str, Path],
+ validate_name: bool = True,
+ ) -> "MarkdownSkill":
+ """Load a skill from a directory containing SKILL.md.
+
+ Args:
+ skill_dir: Path to the skill directory.
+ validate_name: Whether to validate that name matches directory name.
+
+ Returns:
+ MarkdownSkill instance with Stage 1 (metadata) loaded.
+
+ Raises:
+ FileNotFoundError: If SKILL.md doesn't exist.
+ ValueError: If frontmatter is invalid or name doesn't match directory.
+ """
+ skill_path = Path(skill_dir).resolve()
+ skill_md_path = skill_path / "SKILL.md"
+
+ if not skill_md_path.exists():
+ raise FileNotFoundError(f"SKILL.md not found in {skill_dir}")
+
+ # Parse only frontmatter for Stage 1
+ content = skill_md_path.read_text(encoding="utf-8")
+ metadata = cls._parse_frontmatter(content)
+
+ # Validate name matches directory (optional but recommended)
+ if validate_name and metadata.name != skill_path.name:
+ raise ValueError(
+ f"Skill name '{metadata.name}' must match "
+ f"directory name '{skill_path.name}'"
+ )
+
+ # Build configuration from metadata
+ config = cls._build_config(metadata)
+
+ # Determine allowed_callers
+ allowed_callers = ["code_execution_20250825"]
+ if metadata.adk and "allowed_callers" in metadata.adk:
+ allowed_callers = metadata.adk["allowed_callers"]
+
+ return cls(
+ name=metadata.name,
+ description=metadata.description,
+ skill_path=skill_path,
+ skill_metadata=metadata,
+ config=config,
+ allowed_callers=allowed_callers,
+ )
+
+ @staticmethod
+ def _parse_frontmatter(content: str) -> MarkdownSkillMetadata:
+ """Parse YAML frontmatter from SKILL.md content.
+
+ Args:
+ content: Full content of the SKILL.md file.
+
+ Returns:
+ MarkdownSkillMetadata with parsed values.
+
+ Raises:
+ ValueError: If frontmatter is missing or invalid.
+ """
+ import yaml
+
+ if not content.startswith("---"):
+ raise ValueError("SKILL.md must start with YAML frontmatter (---)")
+
+ # Split frontmatter from body
+ parts = content.split("---", 2)
+ if len(parts) < 3:
+ raise ValueError("Invalid frontmatter format. Expected '---' delimiters.")
+
+ frontmatter_yaml = parts[1].strip()
+ if not frontmatter_yaml:
+ raise ValueError("Empty frontmatter")
+
+ try:
+ frontmatter = yaml.safe_load(frontmatter_yaml)
+ except yaml.YAMLError as e:
+ raise ValueError(f"Invalid YAML in frontmatter: {e}") from e
+
+ if not isinstance(frontmatter, dict):
+ raise ValueError("Frontmatter must be a YAML mapping")
+
+ # Validate required fields
+ if "name" not in frontmatter:
+ raise ValueError("Missing required field 'name' in frontmatter")
+ if "description" not in frontmatter:
+ raise ValueError("Missing required field 'description' in frontmatter")
+
+ return MarkdownSkillMetadata(**frontmatter)
+
+ @staticmethod
+ def _build_config(metadata: MarkdownSkillMetadata) -> SkillConfig:
+ """Build SkillConfig from metadata.
+
+ Args:
+ metadata: Parsed frontmatter metadata.
+
+ Returns:
+ SkillConfig with appropriate settings.
+ """
+ config_kwargs = {}
+
+ # Check for ADK-specific config
+ if metadata.adk and "config" in metadata.adk:
+ adk_config = metadata.adk["config"]
+ if "max_parallel_calls" in adk_config:
+ config_kwargs["max_parallel_calls"] = adk_config["max_parallel_calls"]
+ if "timeout_seconds" in adk_config:
+ config_kwargs["timeout_seconds"] = adk_config["timeout_seconds"]
+ if "allow_network" in adk_config:
+ config_kwargs["allow_network"] = adk_config["allow_network"]
+ if "memory_limit_mb" in adk_config:
+ config_kwargs["memory_limit_mb"] = adk_config["memory_limit_mb"]
+
+ # Parse compatibility for network requirements
+ if metadata.compatibility:
+ compat_lower = metadata.compatibility.lower()
+ if "network" in compat_lower or "internet" in compat_lower:
+ config_kwargs.setdefault("allow_network", True)
+
+ return SkillConfig(**config_kwargs)
+
+ # ===========================================================================
+ # Progressive Disclosure Implementation
+ # ===========================================================================
+
+ def get_instructions(self) -> str:
+ """Get full SKILL.md instructions (Stage 2).
+
+ Loads and caches the markdown body on first access.
+
+ Returns:
+ The markdown content after the frontmatter.
+ """
+ if self._instructions_cache is None:
+ skill_md_path = self.skill_path / "SKILL.md"
+ content = skill_md_path.read_text(encoding="utf-8")
+
+ # Extract body after frontmatter
+ parts = content.split("---", 2)
+ self._instructions_cache = parts[2].strip() if len(parts) > 2 else ""
+ self._current_stage = max(self._current_stage, 2)
+
+ return self._instructions_cache
+
+ def get_script(self, script_name: str) -> Optional[str]:
+ """Get script content from scripts/ directory (Stage 3).
+
+ Args:
+ script_name: Name of the script file.
+
+ Returns:
+ Script content or None if not found.
+ """
+ if script_name not in self._scripts_cache:
+ script_path = self.skill_path / "scripts" / script_name
+ if script_path.exists() and script_path.is_file():
+ self._scripts_cache[script_name] = script_path.read_text(
+ encoding="utf-8"
+ )
+ self._current_stage = 3
+ else:
+ return None
+
+ return self._scripts_cache.get(script_name)
+
+ def get_script_path(self, script_name: str) -> Optional[Path]:
+ """Get absolute path to a script file.
+
+ Args:
+ script_name: Name of the script file.
+
+ Returns:
+ Absolute Path or None if not found.
+ """
+ script_path = self.skill_path / "scripts" / script_name
+ if script_path.exists() and script_path.is_file():
+ return script_path
+ return None
+
+ def get_reference(self, ref_name: str) -> Optional[str]:
+ """Get reference content from references/ directory (Stage 3).
+
+ Args:
+ ref_name: Name of the reference file.
+
+ Returns:
+ Reference content or None if not found.
+ """
+ if ref_name not in self._references_cache:
+ ref_path = self.skill_path / "references" / ref_name
+ if ref_path.exists() and ref_path.is_file():
+ self._references_cache[ref_name] = ref_path.read_text(encoding="utf-8")
+ self._current_stage = 3
+ else:
+ return None
+
+ return self._references_cache.get(ref_name)
+
+ def get_asset_path(self, asset_name: str) -> Optional[Path]:
+ """Get absolute path to an asset file (Stage 3).
+
+ Args:
+ asset_name: Relative path within assets/ directory.
+
+ Returns:
+ Absolute Path or None if not found.
+ """
+ asset_path = self.skill_path / "assets" / asset_name
+ if asset_path.exists():
+ self._current_stage = 3
+ return asset_path
+ return None
+
+ def list_scripts(self) -> List[str]:
+ """List available scripts in the skill.
+
+ Returns:
+ List of script file names.
+ """
+ scripts_dir = self.skill_path / "scripts"
+ if scripts_dir.exists() and scripts_dir.is_dir():
+ return sorted([f.name for f in scripts_dir.iterdir() if f.is_file()])
+ return []
+
+ def list_references(self) -> List[str]:
+ """List available references in the skill.
+
+ Returns:
+ List of reference file names.
+ """
+ refs_dir = self.skill_path / "references"
+ if refs_dir.exists() and refs_dir.is_dir():
+ return sorted([f.name for f in refs_dir.iterdir() if f.is_file()])
+ return []
+
+ def list_assets(self) -> List[str]:
+ """List available assets in the skill.
+
+ Returns:
+ List of relative paths to asset files.
+ """
+ assets_dir = self.skill_path / "assets"
+ if assets_dir.exists() and assets_dir.is_dir():
+ return sorted([
+ str(f.relative_to(assets_dir))
+ for f in assets_dir.rglob("*")
+ if f.is_file()
+ ])
+ return []
+
+ @property
+ def current_stage(self) -> int:
+ """Get the current progressive disclosure stage.
+
+ Returns:
+ 1 for Discovery, 2 for Activation, 3 for Execution.
+ """
+ return self._current_stage
+
+ def has_scripts(self) -> bool:
+ """Check if the skill has any scripts."""
+ scripts_dir = self.skill_path / "scripts"
+ return scripts_dir.exists() and any(scripts_dir.iterdir())
+
+ def has_references(self) -> bool:
+ """Check if the skill has any references."""
+ refs_dir = self.skill_path / "references"
+ return refs_dir.exists() and any(refs_dir.iterdir())
+
+ def has_assets(self) -> bool:
+ """Check if the skill has any assets."""
+ assets_dir = self.skill_path / "assets"
+ return assets_dir.exists() and any(assets_dir.rglob("*"))
+
+ # ===========================================================================
+ # BaseSkill Abstract Method Implementations
+ # ===========================================================================
+
+ def get_tool_declarations(self) -> List[dict[str, Any]]:
+ """Return tool declarations extracted from SKILL.md.
+
+ Generates declarations for:
+ - Script-based tools (from scripts/ directory)
+ - Reference loading tools (from references/ directory)
+ - ADK-specific tool declarations (from frontmatter)
+
+ Returns:
+ List of tool declaration dictionaries.
+ """
+ declarations = []
+
+ # Check for ADK-specific tool declarations in frontmatter
+ if self.skill_metadata.adk and "tools" in self.skill_metadata.adk:
+ for tool_def in self.skill_metadata.adk["tools"]:
+ declarations.append({
+ "name": tool_def.get("name", "unknown"),
+ "description": tool_def.get("description", ""),
+ "parameters": tool_def.get("parameters", {}),
+ })
+ return declarations
+
+ # Auto-generate declarations from scripts
+ for script_name in self.list_scripts():
+ script_path = self.skill_path / "scripts" / script_name
+ docstring = self._extract_script_docstring(script_path)
+
+ # Create safe tool name
+ safe_name = script_name.replace(".", "_").replace("-", "_")
+
+ declarations.append({
+ "name": f"run_{safe_name}",
+ "description": docstring or f"Execute {script_name}",
+ "parameters": {"args": "Command-line arguments for the script"},
+ })
+
+ # Add reference loading declarations
+ for ref_name in self.list_references():
+ safe_name = ref_name.replace(".", "_").replace("-", "_")
+ declarations.append({
+ "name": f"load_{safe_name}",
+ "description": f"Load reference document: {ref_name}",
+ })
+
+ return declarations
+
+ def get_orchestration_template(self) -> str:
+ """Return example orchestration code for this skill.
+
+ Generates a template based on available scripts and tools.
+
+ Returns:
+ A Python async function as a string showing example usage.
+ """
+ safe_name = self.name.replace("-", "_").replace(".", "_")
+ scripts = self.list_scripts()
+
+ if not scripts:
+ return f'''async def use_{safe_name}(tools):
+ """Example orchestration for {self.name} skill.
+
+ This skill provides instructions but no bundled scripts.
+ Follow the instructions in the SKILL.md file.
+ """
+ return {{"status": "ready", "skill": "{self.name}"}}
+'''
+
+ # Generate script calls for first few scripts
+ script_lines = []
+ for i, script in enumerate(scripts[:3]):
+ safe_script = script.replace(".", "_").replace("-", "_")
+ script_lines.append(
+ f' result_{i} = await tools.run_{safe_script}(args="")'
+ )
+
+ script_calls = "\n".join(script_lines)
+ result_list = ", ".join(f"result_{i}" for i in range(min(3, len(scripts))))
+
+ return f'''async def use_{safe_name}(tools):
+ """Example orchestration for {self.name} skill.
+
+ Available scripts: {", ".join(scripts)}
+ """
+{script_calls}
+ return {{"results": [{result_list}]}}
+'''
+
+ def get_skill_prompt(self) -> str:
+ """Generate LLM-friendly skill description with progressive detail.
+
+ Returns:
+ Formatted string describing the skill and its capabilities.
+ """
+ base_prompt = super().get_skill_prompt()
+
+ # Add available resources
+ resources = []
+
+ scripts = self.list_scripts()
+ if scripts:
+ resources.append(f"Scripts: {', '.join(scripts)}")
+
+ refs = self.list_references()
+ if refs:
+ resources.append(f"References: {', '.join(refs)}")
+
+ assets = self.list_assets()
+ if assets:
+ # Only show first few assets to avoid overwhelming
+ shown_assets = assets[:5]
+ if len(assets) > 5:
+ shown_assets.append(f"... and {len(assets) - 5} more")
+ resources.append(f"Assets: {', '.join(shown_assets)}")
+
+ if resources:
+ base_prompt += "\n\nAvailable resources:\n" + "\n".join(
+ f" - {r}" for r in resources
+ )
+
+ # Add compatibility info if present
+ if self.skill_metadata.compatibility:
+ base_prompt += f"\n\nRequirements: {self.skill_metadata.compatibility}"
+
+ return base_prompt
+
+ @staticmethod
+ def _extract_script_docstring(script_path: Path) -> Optional[str]:
+ """Extract docstring from a Python script.
+
+ Args:
+ script_path: Path to the script file.
+
+ Returns:
+ First line of docstring or None if not found/not Python.
+ """
+ if script_path.suffix != ".py":
+ # For non-Python scripts, try to extract from comments
+ if script_path.suffix in (".sh", ".bash"):
+ return MarkdownSkill._extract_shell_description(script_path)
+ return None
+
+ try:
+ content = script_path.read_text(encoding="utf-8")
+ # Try to extract module docstring
+ # Handle both """ and ''' styles
+ for quote in ['"""', "'''"]:
+ pattern = rf"^{quote}(.+?){quote}"
+ match = re.match(pattern, content, re.DOTALL)
+ if match:
+ docstring = match.group(1).strip()
+ # Return first line only
+ return docstring.split("\n")[0].strip()
+ except Exception:
+ pass
+
+ return None
+
+ @staticmethod
+ def _extract_shell_description(script_path: Path) -> Optional[str]:
+ """Extract description from shell script comments.
+
+ Args:
+ script_path: Path to the shell script.
+
+ Returns:
+ Description from comment or None if not found.
+ """
+ try:
+ content = script_path.read_text(encoding="utf-8")
+ lines = content.split("\n")
+
+ for line in lines:
+ line = line.strip()
+ # Skip shebang and empty lines
+ if line.startswith("#!") or not line:
+ continue
+ # Found a comment line
+ if line.startswith("#"):
+ return line[1:].strip()
+ # Found non-comment, stop looking
+ break
+ except Exception:
+ pass
+
+ return None
diff --git a/src/google/adk/skills/script_executor.py b/src/google/adk/skills/script_executor.py
new file mode 100644
index 0000000000..bb11982e27
--- /dev/null
+++ b/src/google/adk/skills/script_executor.py
@@ -0,0 +1,499 @@
+# Copyright 2025 Google LLC
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""ScriptExecutor for safely executing scripts from Agent Skills bundles.
+
+This module provides sandboxed execution of Python, Bash, and JavaScript
+scripts bundled with skills.
+"""
+
+from __future__ import annotations
+
+import asyncio
+import logging
+import os
+from pathlib import Path
+import shutil
+import time
+from typing import Dict
+from typing import List
+from typing import Optional
+from typing import Union
+
+from pydantic import BaseModel
+from pydantic import ConfigDict
+from pydantic import Field
+
+from ..utils.feature_decorator import experimental
+
+logger = logging.getLogger("google_adk." + __name__)
+
+
+class ScriptExecutionResult(BaseModel):
+ """Result from script execution.
+
+ Contains the output, errors, and execution metadata from running
+ a script.
+ """
+
+ model_config = ConfigDict(extra="forbid")
+
+ success: bool = Field(
+ description="Whether the script executed successfully (return code 0).",
+ )
+ stdout: str = Field(
+ default="",
+ description="Standard output from the script.",
+ )
+ stderr: str = Field(
+ default="",
+ description="Standard error from the script.",
+ )
+ return_code: int = Field(
+ default=0,
+ description="Exit code from the script.",
+ )
+ execution_time_ms: float = Field(
+ default=0.0,
+ description="Execution time in milliseconds.",
+ )
+ timed_out: bool = Field(
+ default=False,
+ description="Whether execution was terminated due to timeout.",
+ )
+
+
+class ScriptExecutionError(Exception):
+ """Raised when script execution fails."""
+
+ def __init__(
+ self, message: str, result: Optional[ScriptExecutionResult] = None
+ ):
+ super().__init__(message)
+ self.result = result
+
+
+@experimental
+class ScriptExecutor(BaseModel):
+ """Executes scripts from Agent Skills bundles.
+
+ Provides sandboxed execution of Python, Bash, and JavaScript
+ scripts bundled with skills.
+
+ Security features:
+ - Execution timeout
+ - Working directory isolation
+ - Environment variable filtering
+ - Optional container sandboxing (requires Docker)
+
+ Example:
+ ```python
+ executor = ScriptExecutor(
+ timeout_seconds=30.0,
+ allow_network=False,
+ )
+
+ result = await executor.execute_script(
+ script_path=Path("/path/to/skill/scripts/extract.py"),
+ args=["--input", "file.pdf"],
+ working_dir=Path("/tmp/workspace"),
+ )
+
+ if result.success:
+ print(result.stdout)
+ else:
+ print(f"Error: {result.stderr}")
+ ```
+ """
+
+ model_config = ConfigDict(extra="forbid")
+
+ timeout_seconds: float = Field(
+ default=60.0,
+ description="Maximum execution time in seconds.",
+ gt=0,
+ )
+ allow_network: bool = Field(
+ default=False,
+ description="Whether to allow network access (container mode only).",
+ )
+ memory_limit_mb: int = Field(
+ default=256,
+ description="Memory limit in megabytes (container mode only).",
+ gt=0,
+ )
+ use_container: bool = Field(
+ default=False,
+ description="Use container isolation (requires Docker).",
+ )
+ allowed_env_vars: List[str] = Field(
+ default_factory=lambda: [
+ "PATH",
+ "HOME",
+ "LANG",
+ "LC_ALL",
+ "PYTHONPATH",
+ "PYTHONIOENCODING",
+ ],
+ description="Environment variables to pass through to scripts.",
+ )
+ max_output_size: int = Field(
+ default=1024 * 1024, # 1MB
+ description="Maximum size of stdout/stderr in bytes.",
+ gt=0,
+ )
+
+ # Mapping of file extensions to interpreters
+ INTERPRETERS: Dict[str, List[str]] = {
+ ".py": ["python3", "python"],
+ ".sh": ["bash", "sh"],
+ ".bash": ["bash"],
+ ".js": ["node"],
+ ".mjs": ["node"],
+ }
+
+ async def execute_script(
+ self,
+ script_path: Union[str, Path],
+ args: Optional[List[str]] = None,
+ working_dir: Optional[Union[str, Path]] = None,
+ env: Optional[Dict[str, str]] = None,
+ stdin: Optional[str] = None,
+ ) -> ScriptExecutionResult:
+ """Execute a script file.
+
+ Args:
+ script_path: Path to the script file.
+ args: Command-line arguments to pass to the script.
+ working_dir: Working directory for execution. Defaults to script's
+ parent directory.
+ env: Additional environment variables to set.
+ stdin: Optional input to pass to script's stdin.
+
+ Returns:
+ ScriptExecutionResult with stdout, stderr, and status.
+
+ Raises:
+ FileNotFoundError: If script doesn't exist.
+ ValueError: If script type is not supported.
+ """
+ script_path = Path(script_path).resolve()
+ args = args or []
+ start_time = time.time()
+
+ # Validate script exists
+ if not script_path.exists():
+ raise FileNotFoundError(f"Script not found: {script_path}")
+
+ if not script_path.is_file():
+ raise ValueError(f"Not a file: {script_path}")
+
+ # Determine interpreter
+ interpreter = self._get_interpreter(script_path)
+
+ # Build command
+ cmd = [interpreter, str(script_path)] + args
+
+ # Build safe environment
+ safe_env = self._build_safe_env(env)
+
+ # Set working directory
+ cwd = Path(working_dir).resolve() if working_dir else script_path.parent
+
+ if not cwd.exists():
+ raise FileNotFoundError(f"Working directory not found: {cwd}")
+
+ logger.debug(
+ "Executing script: %s with args: %s in %s",
+ script_path.name,
+ args,
+ cwd,
+ )
+
+ try:
+ if self.use_container:
+ result = await self._execute_in_container(cmd, cwd, safe_env, stdin)
+ else:
+ result = await self._execute_subprocess(cmd, cwd, safe_env, stdin)
+
+ execution_time = (time.time() - start_time) * 1000
+ result.execution_time_ms = execution_time
+
+ logger.debug(
+ "Script %s completed: success=%s, return_code=%d, time=%.2fms",
+ script_path.name,
+ result.success,
+ result.return_code,
+ execution_time,
+ )
+
+ return result
+
+ except asyncio.TimeoutError:
+ execution_time = (time.time() - start_time) * 1000
+ logger.warning(
+ "Script %s timed out after %.2fs",
+ script_path.name,
+ self.timeout_seconds,
+ )
+ return ScriptExecutionResult(
+ success=False,
+ stderr=f"Execution timed out after {self.timeout_seconds}s",
+ return_code=-1,
+ execution_time_ms=execution_time,
+ timed_out=True,
+ )
+
+ except Exception as e:
+ execution_time = (time.time() - start_time) * 1000
+ logger.error("Script %s failed: %s", script_path.name, e)
+ return ScriptExecutionResult(
+ success=False,
+ stderr=str(e),
+ return_code=-1,
+ execution_time_ms=execution_time,
+ )
+
+ def _get_interpreter(self, script_path: Path) -> str:
+ """Determine the interpreter for a script.
+
+ Args:
+ script_path: Path to the script.
+
+ Returns:
+ Interpreter command.
+
+ Raises:
+ ValueError: If script type is not supported.
+ """
+ suffix = script_path.suffix.lower()
+
+ if suffix not in self.INTERPRETERS:
+ supported = ", ".join(self.INTERPRETERS.keys())
+ raise ValueError(
+ f"Unsupported script type: {suffix}. Supported: {supported}"
+ )
+
+ # Find available interpreter
+ for interpreter in self.INTERPRETERS[suffix]:
+ if shutil.which(interpreter):
+ return interpreter
+
+ raise ValueError(
+ f"No interpreter found for {suffix}. Tried: {self.INTERPRETERS[suffix]}"
+ )
+
+ def _build_safe_env(
+ self, additional_env: Optional[Dict[str, str]] = None
+ ) -> Dict[str, str]:
+ """Build a safe environment for script execution.
+
+ Args:
+ additional_env: Additional environment variables to include.
+
+ Returns:
+ Filtered environment dictionary.
+ """
+ safe_env = {}
+
+ # Only pass allowed environment variables
+ for var in self.allowed_env_vars:
+ if var in os.environ:
+ safe_env[var] = os.environ[var]
+
+ # Ensure UTF-8 encoding
+ safe_env.setdefault("PYTHONIOENCODING", "utf-8")
+ safe_env.setdefault("LANG", "en_US.UTF-8")
+
+ # Add additional environment variables
+ if additional_env:
+ safe_env.update(additional_env)
+
+ return safe_env
+
+ async def _execute_subprocess(
+ self,
+ cmd: List[str],
+ cwd: Path,
+ env: Dict[str, str],
+ stdin: Optional[str] = None,
+ ) -> ScriptExecutionResult:
+ """Execute script using subprocess.
+
+ Args:
+ cmd: Command and arguments.
+ cwd: Working directory.
+ env: Environment variables.
+ stdin: Optional stdin input.
+
+ Returns:
+ ScriptExecutionResult with output.
+ """
+ proc = await asyncio.create_subprocess_exec(
+ *cmd,
+ stdout=asyncio.subprocess.PIPE,
+ stderr=asyncio.subprocess.PIPE,
+ stdin=asyncio.subprocess.PIPE if stdin else None,
+ cwd=str(cwd),
+ env=env,
+ )
+
+ try:
+ stdin_bytes = stdin.encode("utf-8") if stdin else None
+ stdout, stderr = await asyncio.wait_for(
+ proc.communicate(input=stdin_bytes),
+ timeout=self.timeout_seconds,
+ )
+
+ # Truncate output if too large
+ stdout_str = self._truncate_output(
+ stdout.decode("utf-8", errors="replace")
+ )
+ stderr_str = self._truncate_output(
+ stderr.decode("utf-8", errors="replace")
+ )
+
+ return ScriptExecutionResult(
+ success=proc.returncode == 0,
+ stdout=stdout_str,
+ stderr=stderr_str,
+ return_code=proc.returncode or 0,
+ )
+
+ except asyncio.TimeoutError:
+ # Kill the process on timeout
+ try:
+ proc.kill()
+ await proc.wait()
+ except ProcessLookupError:
+ pass
+ raise
+
+ async def _execute_in_container(
+ self,
+ cmd: List[str],
+ cwd: Path,
+ env: Dict[str, str],
+ stdin: Optional[str] = None,
+ ) -> ScriptExecutionResult:
+ """Execute script in a Docker container.
+
+ Args:
+ cmd: Command and arguments.
+ cwd: Working directory (mounted as volume).
+ env: Environment variables.
+ stdin: Optional stdin input.
+
+ Returns:
+ ScriptExecutionResult with output.
+
+ Raises:
+ ValueError: If Docker is not available.
+ """
+ # Check Docker availability
+ if not shutil.which("docker"):
+ raise ValueError(
+ "Docker not found. Install Docker or set use_container=False."
+ )
+
+ # Build Docker command
+ docker_cmd = [
+ "docker",
+ "run",
+ "--rm",
+ f"--memory={self.memory_limit_mb}m",
+ "--cpus=1",
+ f"-v",
+ f"{cwd}:/workspace:rw",
+ "-w",
+ "/workspace",
+ ]
+
+ # Add network isolation if required
+ if not self.allow_network:
+ docker_cmd.extend(["--network", "none"])
+
+ # Add environment variables
+ for key, value in env.items():
+ docker_cmd.extend(["-e", f"{key}={value}"])
+
+ # Determine base image based on interpreter
+ interpreter = cmd[0]
+ if interpreter in ("python3", "python"):
+ image = "python:3.11-slim"
+ elif interpreter in ("node",):
+ image = "node:20-slim"
+ else:
+ image = "ubuntu:22.04"
+
+ docker_cmd.append(image)
+ docker_cmd.extend(cmd)
+
+ logger.debug("Docker command: %s", " ".join(docker_cmd))
+
+ # Execute in container
+ return await self._execute_subprocess(
+ docker_cmd, cwd, os.environ.copy(), stdin
+ )
+
+ def _truncate_output(self, output: str) -> str:
+ """Truncate output if it exceeds max size.
+
+ Args:
+ output: Output string.
+
+ Returns:
+ Truncated output with notice if truncated.
+ """
+ if len(output) <= self.max_output_size:
+ return output
+
+ truncated = output[: self.max_output_size]
+ return (
+ truncated
+ + f"\n\n... (output truncated, exceeded {self.max_output_size} bytes)"
+ )
+
+ def check_interpreter_available(self, script_suffix: str) -> bool:
+ """Check if an interpreter is available for a script type.
+
+ Args:
+ script_suffix: File extension (e.g., ".py").
+
+ Returns:
+ True if an interpreter is available.
+ """
+ suffix = script_suffix.lower()
+ if suffix not in self.INTERPRETERS:
+ return False
+
+ for interpreter in self.INTERPRETERS[suffix]:
+ if shutil.which(interpreter):
+ return True
+
+ return False
+
+ def get_available_interpreters(self) -> Dict[str, str]:
+ """Get available interpreters for all supported script types.
+
+ Returns:
+ Dictionary mapping extensions to available interpreters.
+ """
+ available = {}
+ for suffix, interpreters in self.INTERPRETERS.items():
+ for interpreter in interpreters:
+ if shutil.which(interpreter):
+ available[suffix] = interpreter
+ break
+ return available
diff --git a/src/google/adk/skills/skill_manager.py b/src/google/adk/skills/skill_manager.py
new file mode 100644
index 0000000000..1aebb17e39
--- /dev/null
+++ b/src/google/adk/skills/skill_manager.py
@@ -0,0 +1,188 @@
+# Copyright 2025 Google LLC
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Skills manager for ADK.
+
+Manages skill registration, discovery, and execution.
+"""
+
+from __future__ import annotations
+
+import time
+from typing import Any
+from typing import Dict
+from typing import List
+from typing import Optional
+
+from pydantic import BaseModel
+from pydantic import ConfigDict
+from pydantic import Field
+
+from ..utils.feature_decorator import experimental
+from .base_skill import BaseSkill
+
+
+class SkillInvocationResult(BaseModel):
+ """Result from a skill invocation."""
+
+ model_config = ConfigDict(extra="forbid")
+
+ success: bool = Field(
+ description="Whether the skill execution was successful.",
+ )
+ result: Any = Field(
+ default=None,
+ description="The result of the skill execution.",
+ )
+ error: Optional[str] = Field(
+ default=None,
+ description="Error message if execution failed.",
+ )
+ call_log: List[dict] = Field(
+ default_factory=list,
+ description="Log of tool calls made during execution.",
+ )
+ execution_time_ms: float = Field(
+ default=0.0,
+ description="Execution time in milliseconds.",
+ )
+
+
+@experimental
+class SkillsManager:
+ """Manages skill registration, discovery, and execution.
+
+ The SkillsManager provides:
+ - Skill registration and lookup
+ - Skill discovery for agents
+ - Execution coordination
+
+ Example:
+ ```python
+ manager = SkillsManager()
+ manager.register_skill(my_skill)
+
+ skill = manager.get_skill("my_skill")
+ all_skills = manager.get_all_skills()
+ ```
+ """
+
+ def __init__(self):
+ """Initialize the skills manager."""
+ self._skills: Dict[str, BaseSkill] = {}
+
+ def register_skill(self, skill: BaseSkill) -> None:
+ """Register a skill.
+
+ Args:
+ skill: The skill to register.
+
+ Raises:
+ ValueError: If a skill with the same name is already registered.
+ """
+ if skill.name in self._skills:
+ raise ValueError(f"Skill '{skill.name}' already registered")
+ self._skills[skill.name] = skill
+
+ def unregister_skill(self, name: str) -> bool:
+ """Unregister a skill by name.
+
+ Args:
+ name: The name of the skill to unregister.
+
+ Returns:
+ True if the skill was unregistered, False if it wasn't found.
+ """
+ if name in self._skills:
+ del self._skills[name]
+ return True
+ return False
+
+ def get_skill(self, name: str) -> Optional[BaseSkill]:
+ """Get a registered skill by name.
+
+ Args:
+ name: The name of the skill to retrieve.
+
+ Returns:
+ The skill if found, None otherwise.
+ """
+ return self._skills.get(name)
+
+ def get_all_skills(self) -> List[BaseSkill]:
+ """Get all registered skills.
+
+ Returns:
+ List of all registered skills.
+ """
+ return list(self._skills.values())
+
+ def get_skill_names(self) -> List[str]:
+ """Get names of all registered skills.
+
+ Returns:
+ List of skill names.
+ """
+ return list(self._skills.keys())
+
+ def has_skill(self, name: str) -> bool:
+ """Check if a skill is registered.
+
+ Args:
+ name: The name of the skill to check.
+
+ Returns:
+ True if the skill is registered.
+ """
+ return name in self._skills
+
+ def clear(self) -> None:
+ """Remove all registered skills."""
+ self._skills.clear()
+
+ @staticmethod
+ async def execute_skill(
+ skill: BaseSkill,
+ tool_results: Dict[str, Any],
+ ) -> SkillInvocationResult:
+ """Execute a skill with pre-computed tool results.
+
+ This is a simplified execution that doesn't use PTC code generation.
+ It applies the skill's result filtering to the provided results.
+
+ Args:
+ skill: The skill to execute.
+ tool_results: Dictionary of tool results to filter.
+
+ Returns:
+ SkillInvocationResult with filtered results.
+ """
+ start_time = time.time()
+
+ try:
+ filtered_result = skill.filter_result(tool_results)
+ execution_time = (time.time() - start_time) * 1000
+
+ return SkillInvocationResult(
+ success=True,
+ result=filtered_result,
+ execution_time_ms=execution_time,
+ )
+ except Exception as e:
+ execution_time = (time.time() - start_time) * 1000
+ return SkillInvocationResult(
+ success=False,
+ error=str(e),
+ execution_time_ms=execution_time,
+ )
diff --git a/src/google/adk/skills/skill_tool.py b/src/google/adk/skills/skill_tool.py
new file mode 100644
index 0000000000..236f4b261e
--- /dev/null
+++ b/src/google/adk/skills/skill_tool.py
@@ -0,0 +1,412 @@
+# Copyright 2025 Google LLC
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""SkillTool - Wraps a Skill as a BaseTool for LLM invocation.
+
+This module provides the SkillTool class which exposes skills as ADK tools,
+enabling LLMs to interact with Agent Skills standard skills through a
+unified tool interface.
+"""
+
+from __future__ import annotations
+
+import logging
+from pathlib import Path
+from typing import Any
+from typing import Optional
+from typing import Union
+
+from google.genai import types
+from typing_extensions import override
+
+from ..tools.base_tool import BaseTool
+from ..tools.tool_context import ToolContext
+from ..utils.feature_decorator import experimental
+from .base_skill import BaseSkill
+from .markdown_skill import MarkdownSkill
+from .script_executor import ScriptExecutionResult
+from .script_executor import ScriptExecutor
+
+logger = logging.getLogger("google_adk." + __name__)
+
+
+@experimental
+class SkillTool(BaseTool):
+ """Wraps a Skill as a BaseTool for LLM invocation.
+
+ Provides three action types:
+ - "activate": Load full skill instructions (Stage 2)
+ - "run_script": Execute a bundled script (Stage 3)
+ - "load_reference": Load a reference document (Stage 3)
+
+ This tool enables LLMs to interact with Agent Skills standard skills
+ through the standard ADK tool interface.
+
+ Example:
+ ```python
+ from google.adk.skills import MarkdownSkill, SkillTool
+
+ # Load a skill
+ skill = MarkdownSkill.from_directory("/path/to/pdf-processing")
+
+ # Create tool wrapper
+ tool = SkillTool(skill)
+
+ # LLM can now invoke:
+ # - {"action": "activate"} → Returns full instructions
+ # - {"action": "run_script", "script": "extract.py", "args": ["file.pdf"]}
+ # - {"action": "load_reference", "reference": "FORMS.md"}
+ ```
+ """
+
+ def __init__(
+ self,
+ skill: BaseSkill,
+ script_executor: Optional[ScriptExecutor] = None,
+ working_dir: Optional[Union[str, Path]] = None,
+ ):
+ """Initialize the SkillTool.
+
+ Args:
+ skill: The skill to wrap.
+ script_executor: Optional custom script executor. If not provided,
+ a default executor will be created.
+ working_dir: Default working directory for script execution.
+ If not provided, uses the skill's directory.
+ """
+ # Create safe tool name from skill name
+ safe_name = skill.name.replace("-", "_").replace(".", "_")
+
+ super().__init__(
+ name=f"skill_{safe_name}",
+ description=self._build_description(skill),
+ )
+ self._skill = skill
+ self._script_executor = script_executor or ScriptExecutor()
+
+ # Set working directory
+ if working_dir:
+ self._working_dir = Path(working_dir).resolve()
+ elif isinstance(skill, MarkdownSkill):
+ self._working_dir = skill.skill_path
+ else:
+ self._working_dir = Path.cwd()
+
+ def _build_description(self, skill: BaseSkill) -> str:
+ """Build tool description from skill metadata.
+
+ Args:
+ skill: The skill to describe.
+
+ Returns:
+ Formatted description string.
+ """
+ lines = [
+ skill.description,
+ "",
+ "Actions:",
+ "- activate: Load full skill instructions",
+ ]
+
+ if isinstance(skill, MarkdownSkill):
+ scripts = skill.list_scripts()
+ if scripts:
+ lines.append(
+ f"- run_script: Execute bundled scripts ({', '.join(scripts[:5])})"
+ )
+ if len(scripts) > 5:
+ lines[-1] += f" and {len(scripts) - 5} more"
+
+ refs = skill.list_references()
+ if refs:
+ lines.append(
+ "- load_reference: Load reference documents"
+ f" ({', '.join(refs[:5])})"
+ )
+ if len(refs) > 5:
+ lines[-1] += f" and {len(refs) - 5} more"
+
+ return "\n".join(lines)
+
+ @override
+ def _get_declaration(self) -> Optional[types.FunctionDeclaration]:
+ """Get function declaration for LLM.
+
+ Returns:
+ FunctionDeclaration describing the tool's parameters.
+ """
+ # Build action enum based on skill capabilities
+ actions = ["activate"]
+ if isinstance(self._skill, MarkdownSkill):
+ if self._skill.has_scripts():
+ actions.append("run_script")
+ if self._skill.has_references():
+ actions.append("load_reference")
+
+ properties = {
+ "action": types.Schema(
+ type="STRING",
+ description=f"Action to perform: {', '.join(actions)}",
+ enum=actions,
+ ),
+ }
+
+ # Add script-related parameters if skill has scripts
+ if isinstance(self._skill, MarkdownSkill) and self._skill.has_scripts():
+ scripts = self._skill.list_scripts()
+ properties["script"] = types.Schema(
+ type="STRING",
+ description=(
+ "Script name for run_script action. Available:"
+ f" {', '.join(scripts)}"
+ ),
+ )
+ properties["args"] = types.Schema(
+ type="ARRAY",
+ items=types.Schema(type="STRING"),
+ description="Command-line arguments for the script",
+ )
+
+ # Add reference parameter if skill has references
+ if isinstance(self._skill, MarkdownSkill) and self._skill.has_references():
+ refs = self._skill.list_references()
+ properties["reference"] = types.Schema(
+ type="STRING",
+ description=(
+ "Reference file for load_reference action. Available:"
+ f" {', '.join(refs)}"
+ ),
+ )
+
+ return types.FunctionDeclaration(
+ name=self.name,
+ description=self.description,
+ parameters=types.Schema(
+ type="OBJECT",
+ properties=properties,
+ required=["action"],
+ ),
+ )
+
+ @override
+ async def run_async(
+ self,
+ *,
+ args: dict[str, Any],
+ tool_context: ToolContext,
+ ) -> Any:
+ """Execute the skill action.
+
+ Args:
+ args: Arguments from the LLM containing action and parameters.
+ tool_context: The tool execution context.
+
+ Returns:
+ Result dictionary with action-specific content.
+ """
+ action = args.get("action", "activate")
+
+ logger.debug(
+ "SkillTool %s executing action: %s",
+ self._skill.name,
+ action,
+ )
+
+ if action == "activate":
+ return self._handle_activate()
+
+ elif action == "run_script":
+ return await self._handle_run_script(args)
+
+ elif action == "load_reference":
+ return self._handle_load_reference(args)
+
+ else:
+ return {
+ "error": f"Unknown action: {action}",
+ "available_actions": ["activate", "run_script", "load_reference"],
+ }
+
+ def _handle_activate(self) -> dict[str, Any]:
+ """Handle skill activation (Stage 2).
+
+ Returns:
+ Dictionary with skill instructions and available resources.
+ """
+ if isinstance(self._skill, MarkdownSkill):
+ instructions = self._skill.get_instructions()
+ return {
+ "status": "activated",
+ "skill": self._skill.name,
+ "instructions": instructions,
+ "available_scripts": self._skill.list_scripts(),
+ "available_references": self._skill.list_references(),
+ "available_assets": self._skill.list_assets()[:20], # Limit assets
+ }
+ else:
+ # For non-markdown skills, return the skill prompt
+ return {
+ "status": "activated",
+ "skill": self._skill.name,
+ "prompt": self._skill.get_skill_prompt(),
+ "tool_declarations": self._skill.get_tool_declarations(),
+ }
+
+ async def _handle_run_script(self, args: dict[str, Any]) -> dict[str, Any]:
+ """Handle script execution (Stage 3).
+
+ Args:
+ args: Arguments containing script name and args.
+
+ Returns:
+ Dictionary with script execution results.
+ """
+ if not isinstance(self._skill, MarkdownSkill):
+ return {"error": "This skill does not support scripts"}
+
+ script_name = args.get("script")
+ if not script_name:
+ return {
+ "error": "Script name required",
+ "available_scripts": self._skill.list_scripts(),
+ }
+
+ script_args = args.get("args", [])
+ if isinstance(script_args, str):
+ # Handle case where args is passed as a single string
+ script_args = script_args.split() if script_args else []
+
+ # Get script path
+ script_path = self._skill.get_script_path(script_name)
+ if not script_path:
+ return {
+ "error": f"Script not found: {script_name}",
+ "available_scripts": self._skill.list_scripts(),
+ }
+
+ logger.info(
+ "Executing script: %s with args: %s",
+ script_name,
+ script_args,
+ )
+
+ # Execute script
+ try:
+ result: ScriptExecutionResult = (
+ await self._script_executor.execute_script(
+ script_path=script_path,
+ args=script_args,
+ working_dir=self._working_dir,
+ )
+ )
+
+ return {
+ "script": script_name,
+ "success": result.success,
+ "stdout": result.stdout,
+ "stderr": result.stderr,
+ "return_code": result.return_code,
+ "execution_time_ms": result.execution_time_ms,
+ "timed_out": result.timed_out,
+ }
+ except Exception as e:
+ logger.error("Script execution failed: %s", e)
+ return {
+ "script": script_name,
+ "success": False,
+ "error": str(e),
+ }
+
+ def _handle_load_reference(self, args: dict[str, Any]) -> dict[str, Any]:
+ """Handle reference loading (Stage 3).
+
+ Args:
+ args: Arguments containing reference name.
+
+ Returns:
+ Dictionary with reference content.
+ """
+ if not isinstance(self._skill, MarkdownSkill):
+ return {"error": "This skill does not support references"}
+
+ ref_name = args.get("reference")
+ if not ref_name:
+ return {
+ "error": "Reference name required",
+ "available_references": self._skill.list_references(),
+ }
+
+ content = self._skill.get_reference(ref_name)
+ if content is None:
+ return {
+ "error": f"Reference not found: {ref_name}",
+ "available_references": self._skill.list_references(),
+ }
+
+ return {
+ "reference": ref_name,
+ "content": content,
+ }
+
+ @property
+ def skill(self) -> BaseSkill:
+ """Get the wrapped skill."""
+ return self._skill
+
+ @property
+ def working_dir(self) -> Path:
+ """Get the working directory for script execution."""
+ return self._working_dir
+
+ @working_dir.setter
+ def working_dir(self, value: Union[str, Path]) -> None:
+ """Set the working directory for script execution."""
+ self._working_dir = Path(value).resolve()
+
+
+def create_skill_tools(
+ skills: list[BaseSkill],
+ script_executor: Optional[ScriptExecutor] = None,
+ working_dir: Optional[Union[str, Path]] = None,
+) -> list[SkillTool]:
+ """Create SkillTool instances for a list of skills.
+
+ Convenience function to create tool wrappers for multiple skills.
+
+ Args:
+ skills: List of skills to wrap.
+ script_executor: Optional shared script executor.
+ working_dir: Optional default working directory.
+
+ Returns:
+ List of SkillTool instances.
+
+ Example:
+ ```python
+ from google.adk.skills import AgentSkillLoader, create_skill_tools
+
+ loader = AgentSkillLoader()
+ loader.add_skill_directory("./skills")
+
+ tools = create_skill_tools(loader.get_all_skills())
+ ```
+ """
+ return [
+ SkillTool(
+ skill=skill,
+ script_executor=script_executor,
+ working_dir=working_dir,
+ )
+ for skill in skills
+ ]
diff --git a/tests/unittests/skills/__init__.py b/tests/unittests/skills/__init__.py
new file mode 100644
index 0000000000..0a2669d7a2
--- /dev/null
+++ b/tests/unittests/skills/__init__.py
@@ -0,0 +1,13 @@
+# Copyright 2025 Google LLC
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
diff --git a/tests/unittests/skills/test_agent_skill_loader.py b/tests/unittests/skills/test_agent_skill_loader.py
new file mode 100644
index 0000000000..66ff46a655
--- /dev/null
+++ b/tests/unittests/skills/test_agent_skill_loader.py
@@ -0,0 +1,378 @@
+# Copyright 2025 Google LLC
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Unit tests for AgentSkillLoader class."""
+
+from __future__ import annotations
+
+from pathlib import Path
+import tempfile
+
+from google.adk.skills import AgentSkillLoader
+from google.adk.skills import MarkdownSkill
+from google.adk.skills import SkillsManager
+import pytest
+
+
+def create_skill_in_dir(base_dir: Path, name: str, description: str) -> Path:
+ """Helper to create a skill directory with SKILL.md."""
+ skill_path = base_dir / name
+ skill_path.mkdir(parents=True, exist_ok=True)
+
+ skill_md = skill_path / "SKILL.md"
+ skill_md.write_text(f"""---
+name: {name}
+description: {description}
+---
+
+# {name}
+
+Instructions for {name}.
+""")
+
+ return skill_path
+
+
+class TestAgentSkillLoader:
+ """Tests for AgentSkillLoader."""
+
+ @pytest.fixture
+ def skills_dir(self):
+ """Create a temporary directory with multiple skills."""
+ with tempfile.TemporaryDirectory() as tmpdir:
+ base = Path(tmpdir)
+
+ # Create multiple skills
+ create_skill_in_dir(base, "skill-one", "First test skill")
+ create_skill_in_dir(base, "skill-two", "Second test skill")
+ create_skill_in_dir(base, "skill-three", "Third test skill")
+
+ # Create a non-skill directory
+ (base / "not-a-skill").mkdir()
+
+ # Create a hidden directory (should be skipped)
+ hidden = base / ".hidden-skill"
+ hidden.mkdir()
+ (hidden / "SKILL.md").write_text(
+ "---\nname: hidden\ndescription: Hidden\n---\nContent"
+ )
+
+ yield base
+
+ def test_add_skill_directory(self, skills_dir):
+ """Test adding a skill directory."""
+ loader = AgentSkillLoader()
+
+ count = loader.add_skill_directory(skills_dir)
+
+ assert count == 3
+ assert len(loader) == 3
+
+ def test_add_skill_directory_not_found(self):
+ """Test adding non-existent directory."""
+ loader = AgentSkillLoader()
+
+ with pytest.raises(FileNotFoundError):
+ loader.add_skill_directory("/nonexistent/path")
+
+ def test_add_skill_directory_not_a_directory(self):
+ """Test adding a file instead of directory."""
+ with tempfile.NamedTemporaryFile() as f:
+ loader = AgentSkillLoader()
+
+ with pytest.raises(ValueError, match="not a directory"):
+ loader.add_skill_directory(f.name)
+
+ def test_add_single_skill(self, skills_dir):
+ """Test adding a single skill."""
+ loader = AgentSkillLoader()
+
+ result = loader.add_skill(skills_dir / "skill-one")
+
+ assert result is True
+ assert len(loader) == 1
+ assert "skill-one" in loader
+
+ def test_add_single_skill_not_found(self):
+ """Test adding non-existent skill."""
+ loader = AgentSkillLoader()
+
+ result = loader.add_skill("/nonexistent/path")
+
+ assert result is False
+ assert len(loader.get_load_errors()) == 1
+
+ def test_get_skill(self, skills_dir):
+ """Test getting a skill by name."""
+ loader = AgentSkillLoader()
+ loader.add_skill_directory(skills_dir)
+
+ skill = loader.get_skill("skill-one")
+
+ assert skill is not None
+ assert skill.name == "skill-one"
+ assert isinstance(skill, MarkdownSkill)
+
+ def test_get_skill_not_found(self, skills_dir):
+ """Test getting non-existent skill."""
+ loader = AgentSkillLoader()
+ loader.add_skill_directory(skills_dir)
+
+ skill = loader.get_skill("nonexistent")
+
+ assert skill is None
+
+ def test_get_all_skills(self, skills_dir):
+ """Test getting all skills."""
+ loader = AgentSkillLoader()
+ loader.add_skill_directory(skills_dir)
+
+ skills = loader.get_all_skills()
+
+ assert len(skills) == 3
+ assert all(isinstance(s, MarkdownSkill) for s in skills)
+
+ def test_get_skill_names(self, skills_dir):
+ """Test getting skill names."""
+ loader = AgentSkillLoader()
+ loader.add_skill_directory(skills_dir)
+
+ names = loader.get_skill_names()
+
+ assert sorted(names) == ["skill-one", "skill-three", "skill-two"]
+
+ def test_has_skill(self, skills_dir):
+ """Test checking if skill exists."""
+ loader = AgentSkillLoader()
+ loader.add_skill_directory(skills_dir)
+
+ assert loader.has_skill("skill-one") is True
+ assert loader.has_skill("nonexistent") is False
+
+ def test_contains(self, skills_dir):
+ """Test __contains__ method."""
+ loader = AgentSkillLoader()
+ loader.add_skill_directory(skills_dir)
+
+ assert "skill-one" in loader
+ assert "nonexistent" not in loader
+
+ def test_iter(self, skills_dir):
+ """Test __iter__ method."""
+ loader = AgentSkillLoader()
+ loader.add_skill_directory(skills_dir)
+
+ skills = list(loader)
+
+ assert len(skills) == 3
+
+ def test_clear(self, skills_dir):
+ """Test clearing all skills."""
+ loader = AgentSkillLoader()
+ loader.add_skill_directory(skills_dir)
+
+ loader.clear()
+
+ assert len(loader) == 0
+
+ def test_register_all(self, skills_dir):
+ """Test registering all skills with manager."""
+ loader = AgentSkillLoader()
+ loader.add_skill_directory(skills_dir)
+
+ manager = SkillsManager()
+ count = loader.register_all(manager)
+
+ assert count == 3
+ assert manager.has_skill("skill-one")
+ assert manager.has_skill("skill-two")
+ assert manager.has_skill("skill-three")
+
+ def test_generate_discovery_prompt(self, skills_dir):
+ """Test generating discovery prompt."""
+ loader = AgentSkillLoader()
+ loader.add_skill_directory(skills_dir)
+
+ prompt = loader.generate_discovery_prompt()
+
+ assert "" in prompt
+ assert "" in prompt
+ assert "skill-one" in prompt
+ assert "skill-two" in prompt
+ assert "skill-three" in prompt
+ assert "First test skill" in prompt
+
+ def test_generate_discovery_prompt_empty(self):
+ """Test generating prompt with no skills."""
+ loader = AgentSkillLoader()
+
+ prompt = loader.generate_discovery_prompt()
+
+ assert prompt == ""
+
+ def test_generate_discovery_prompt_without_resources(self, skills_dir):
+ """Test generating prompt without resource hints."""
+ loader = AgentSkillLoader()
+ loader.add_skill_directory(skills_dir)
+
+ prompt = loader.generate_discovery_prompt(include_resources=False)
+
+ assert "" in prompt
+ assert "has_scripts" not in prompt
+ assert "has_references" not in prompt
+
+ def test_generate_activation_prompt(self, skills_dir):
+ """Test generating activation prompt."""
+ loader = AgentSkillLoader()
+ loader.add_skill_directory(skills_dir)
+
+ prompt = loader.generate_activation_prompt("skill-one")
+
+ assert prompt is not None
+ assert "# Skill: skill-one" in prompt
+ assert "Instructions for skill-one" in prompt
+
+ def test_generate_activation_prompt_not_found(self, skills_dir):
+ """Test generating activation prompt for non-existent skill."""
+ loader = AgentSkillLoader()
+ loader.add_skill_directory(skills_dir)
+
+ prompt = loader.generate_activation_prompt("nonexistent")
+
+ assert prompt is None
+
+ def test_generate_summary(self, skills_dir):
+ """Test generating summary."""
+ loader = AgentSkillLoader()
+ loader.add_skill_directory(skills_dir)
+
+ summary = loader.generate_summary()
+
+ assert "Agent Skills Loader Summary" in summary
+ assert "Skills discovered: 3" in summary
+ assert "skill-one" in summary
+
+ def test_load_errors(self):
+ """Test tracking load errors."""
+ with tempfile.TemporaryDirectory() as tmpdir:
+ base = Path(tmpdir)
+
+ # Create a skill with invalid SKILL.md
+ invalid_skill = base / "invalid-skill"
+ invalid_skill.mkdir()
+ (invalid_skill / "SKILL.md").write_text(
+ "Invalid content without frontmatter"
+ )
+
+ loader = AgentSkillLoader()
+ loader.add_skill_directory(base)
+
+ errors = loader.get_load_errors()
+
+ assert len(errors) == 1
+ assert str(invalid_skill) in list(errors.keys())[0]
+
+ def test_multiple_directories(self):
+ """Test loading from multiple directories."""
+ with (
+ tempfile.TemporaryDirectory() as tmpdir1,
+ tempfile.TemporaryDirectory() as tmpdir2,
+ ):
+
+ base1 = Path(tmpdir1)
+ base2 = Path(tmpdir2)
+
+ create_skill_in_dir(base1, "skill-a", "Skill A")
+ create_skill_in_dir(base2, "skill-b", "Skill B")
+
+ loader = AgentSkillLoader()
+ loader.add_skill_directory(base1)
+ loader.add_skill_directory(base2)
+
+ assert len(loader) == 2
+ assert "skill-a" in loader
+ assert "skill-b" in loader
+
+ def test_duplicate_skill_names(self):
+ """Test handling duplicate skill names."""
+ with (
+ tempfile.TemporaryDirectory() as tmpdir1,
+ tempfile.TemporaryDirectory() as tmpdir2,
+ ):
+
+ base1 = Path(tmpdir1)
+ base2 = Path(tmpdir2)
+
+ create_skill_in_dir(base1, "duplicate-skill", "First version")
+ create_skill_in_dir(base2, "duplicate-skill", "Second version")
+
+ loader = AgentSkillLoader()
+ loader.add_skill_directory(base1)
+ loader.add_skill_directory(base2)
+
+ # Should have only one skill (second shadows first)
+ skill = loader.get_skill("duplicate-skill")
+ assert skill.description == "Second version"
+
+ def test_xml_escaping(self):
+ """Test that XML special characters are escaped."""
+ with tempfile.TemporaryDirectory() as tmpdir:
+ base = Path(tmpdir)
+
+ # Create skill with special characters in description
+ skill_path = base / "special-chars"
+ skill_path.mkdir()
+ (skill_path / "SKILL.md").write_text("""---
+name: special-chars
+description: Handle & "characters"
+---
+
+Content
+""")
+
+ loader = AgentSkillLoader()
+ loader.add_skill_directory(base)
+
+ prompt = loader.generate_discovery_prompt()
+
+ assert "<special>" in prompt
+ assert "&" in prompt
+ assert ""characters"" in prompt
+
+ def test_validate_names_disabled(self):
+ """Test with name validation disabled."""
+ with tempfile.TemporaryDirectory() as tmpdir:
+ base = Path(tmpdir)
+
+ # Create skill with mismatched name
+ skill_path = base / "directory-name"
+ skill_path.mkdir()
+ (skill_path / "SKILL.md").write_text("""---
+name: different-name
+description: Test skill
+---
+
+Content
+""")
+
+ # With validation enabled, should not load
+ loader1 = AgentSkillLoader(validate_names=True)
+ loader1.add_skill_directory(base)
+ assert len(loader1) == 0
+
+ # With validation disabled, should load
+ loader2 = AgentSkillLoader(validate_names=False)
+ loader2.add_skill_directory(base)
+ assert len(loader2) == 1
+ assert "different-name" in loader2
diff --git a/tests/unittests/skills/test_base_skill.py b/tests/unittests/skills/test_base_skill.py
new file mode 100644
index 0000000000..8f00cf02f8
--- /dev/null
+++ b/tests/unittests/skills/test_base_skill.py
@@ -0,0 +1,179 @@
+# Copyright 2025 Google LLC
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Unit tests for base skill classes."""
+
+from __future__ import annotations
+
+from typing import Any
+from typing import List
+
+from google.adk.skills import BaseSkill
+from google.adk.skills import SkillConfig
+import pytest
+
+
+class TestSkillConfig:
+ """Tests for SkillConfig."""
+
+ def test_default_values(self):
+ """Test default configuration values."""
+ config = SkillConfig()
+
+ assert config.max_parallel_calls == 10
+ assert config.timeout_seconds == 60.0
+ assert config.allow_network is False
+ assert config.memory_limit_mb == 256
+
+ def test_custom_values(self):
+ """Test custom configuration values."""
+ config = SkillConfig(
+ max_parallel_calls=20,
+ timeout_seconds=120.0,
+ allow_network=True,
+ memory_limit_mb=512,
+ )
+
+ assert config.max_parallel_calls == 20
+ assert config.timeout_seconds == 120.0
+ assert config.allow_network is True
+ assert config.memory_limit_mb == 512
+
+ def test_forbid_extra_fields(self):
+ """Test that extra fields are forbidden."""
+ with pytest.raises(ValueError):
+ SkillConfig(unknown_field="value")
+
+
+class ConcreteSkill(BaseSkill):
+ """Concrete implementation for testing."""
+
+ name: str = "test_skill"
+ description: str = "A test skill"
+
+ def get_tool_declarations(self) -> List[dict[str, Any]]:
+ return [
+ {"name": "tool1", "description": "First tool"},
+ {"name": "tool2", "description": "Second tool"},
+ ]
+
+ def get_orchestration_template(self) -> str:
+ return """
+async def test_func(tools):
+ result = await tools.tool1()
+ return result
+"""
+
+
+class TestBaseSkill:
+ """Tests for BaseSkill."""
+
+ def test_create_skill(self):
+ """Test creating a skill with default values."""
+ skill = ConcreteSkill()
+
+ assert skill.name == "test_skill"
+ assert skill.description == "A test skill"
+ assert skill.config.max_parallel_calls == 10
+ assert skill.allowed_callers == ["code_execution_20250825"]
+
+ def test_create_skill_with_custom_config(self):
+ """Test creating a skill with custom config."""
+ config = SkillConfig(timeout_seconds=300.0)
+ skill = ConcreteSkill(config=config)
+
+ assert skill.config.timeout_seconds == 300.0
+
+ def test_get_tool_declarations(self):
+ """Test getting tool declarations."""
+ skill = ConcreteSkill()
+ declarations = skill.get_tool_declarations()
+
+ assert len(declarations) == 2
+ assert declarations[0]["name"] == "tool1"
+ assert declarations[1]["name"] == "tool2"
+
+ def test_get_orchestration_template(self):
+ """Test getting orchestration template."""
+ skill = ConcreteSkill()
+ template = skill.get_orchestration_template()
+
+ assert "async def test_func" in template
+ assert "tools.tool1()" in template
+
+ def test_filter_result_default(self):
+ """Test default filter_result returns unchanged data."""
+ skill = ConcreteSkill()
+ data = {"key": "value", "nested": {"inner": 123}}
+
+ result = skill.filter_result(data)
+
+ assert result == data
+
+ def test_is_programmatically_callable(self):
+ """Test checking if skill is programmatically callable."""
+ skill = ConcreteSkill()
+
+ assert skill.is_programmatically_callable() is True
+
+ def test_is_not_programmatically_callable(self):
+ """Test skill with empty allowed_callers."""
+ skill = ConcreteSkill(allowed_callers=[])
+
+ assert skill.is_programmatically_callable() is False
+
+ def test_get_skill_prompt(self):
+ """Test generating skill prompt."""
+ skill = ConcreteSkill()
+ prompt = skill.get_skill_prompt()
+
+ assert "test_skill" in prompt
+ assert "A test skill" in prompt
+ assert "tool1" in prompt
+ assert "tool2" in prompt
+ assert "async def test_func" in prompt
+
+
+class CustomFilterSkill(ConcreteSkill):
+ """Skill with custom filter logic."""
+
+ def filter_result(self, result: Any) -> Any:
+ if isinstance(result, dict) and "sensitive" in result:
+ result = result.copy()
+ del result["sensitive"]
+ return result
+
+
+class TestSkillFiltering:
+ """Tests for skill result filtering."""
+
+ def test_custom_filter(self):
+ """Test custom filtering logic."""
+ skill = CustomFilterSkill()
+ data = {"public": "info", "sensitive": "secret"}
+
+ result = skill.filter_result(data)
+
+ assert "public" in result
+ assert "sensitive" not in result
+
+ def test_filter_preserves_original(self):
+ """Test that filtering doesn't modify original data."""
+ skill = CustomFilterSkill()
+ original = {"public": "info", "sensitive": "secret"}
+ data = original.copy()
+
+ skill.filter_result(data)
+
+ assert "sensitive" in original
diff --git a/tests/unittests/skills/test_bigquery_skills.py b/tests/unittests/skills/test_bigquery_skills.py
new file mode 100644
index 0000000000..a82fb787da
--- /dev/null
+++ b/tests/unittests/skills/test_bigquery_skills.py
@@ -0,0 +1,369 @@
+# Copyright 2025 Google LLC
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Unit tests for BigQuery MarkdownSkill implementations."""
+
+from __future__ import annotations
+
+from pathlib import Path
+
+from google.adk.skills import MarkdownSkill
+from google.adk.skills import SKILLS_DIR
+import pytest
+
+
+class TestBigQueryDataManagementSkill:
+ """Tests for bigquery-data-management skill."""
+
+ @pytest.fixture
+ def skill(self):
+ """Load the skill from the skills directory."""
+ skill_path = SKILLS_DIR / "bigquery-data-management"
+ return MarkdownSkill.from_directory(skill_path)
+
+ def test_skill_loads_successfully(self, skill):
+ """Test that the skill loads without errors."""
+ assert skill is not None
+ assert skill.name == "bigquery-data-management"
+
+ def test_skill_has_description(self, skill):
+ """Test that the skill has a proper description."""
+ assert "data" in skill.description.lower()
+ assert len(skill.description) > 50
+
+ def test_skill_has_metadata(self, skill):
+ """Test that skill metadata is properly set."""
+ assert skill.skill_metadata.license == "Apache-2.0"
+ assert skill.skill_metadata.metadata.get("category") == "data-management"
+
+ def test_skill_has_instructions(self, skill):
+ """Test that skill instructions are available."""
+ instructions = skill.get_instructions()
+ assert len(instructions) > 100
+ assert "LOAD DATA" in instructions
+ assert "partitioning" in instructions.lower()
+
+ def test_skill_has_references(self, skill):
+ """Test that skill has reference documents."""
+ refs = skill.list_references()
+ assert len(refs) >= 2
+ assert "DATA_FORMATS.md" in refs
+ assert "PARTITIONING.md" in refs
+
+ def test_skill_has_scripts(self, skill):
+ """Test that skill has helper scripts."""
+ scripts = skill.list_scripts()
+ assert len(scripts) >= 1
+ assert "validate_schema.py" in scripts
+
+ def test_reference_content_loads(self, skill):
+ """Test that reference content can be loaded."""
+ content = skill.get_reference("DATA_FORMATS.md")
+ assert content is not None
+ assert "Parquet" in content
+ assert "CSV" in content
+
+ def test_progressive_disclosure_stages(self, skill):
+ """Test progressive disclosure works correctly."""
+ # Stage 1: Just loaded
+ assert skill.current_stage == 1
+
+ # Stage 2: Get instructions
+ skill.get_instructions()
+ assert skill.current_stage == 2
+
+ # Stage 3: Access scripts/references
+ skill.get_script("validate_schema.py")
+ assert skill.current_stage == 3
+
+
+class TestBigQueryAnalyticsSkill:
+ """Tests for bigquery-analytics skill."""
+
+ @pytest.fixture
+ def skill(self):
+ """Load the skill from the skills directory."""
+ skill_path = SKILLS_DIR / "bigquery-analytics"
+ return MarkdownSkill.from_directory(skill_path)
+
+ def test_skill_loads_successfully(self, skill):
+ """Test that the skill loads without errors."""
+ assert skill is not None
+ assert skill.name == "bigquery-analytics"
+
+ def test_skill_has_description(self, skill):
+ """Test that the skill has a proper description."""
+ assert "analytics" in skill.description.lower()
+ assert "SQL" in skill.description
+
+ def test_skill_has_instructions(self, skill):
+ """Test that skill instructions contain expected content."""
+ instructions = skill.get_instructions()
+ assert "window function" in instructions.lower()
+ assert "aggregation" in instructions.lower()
+ assert "geospatial" in instructions.lower()
+
+ def test_skill_has_references(self, skill):
+ """Test that skill has reference documents."""
+ refs = skill.list_references()
+ assert "WINDOW_FUNCTIONS.md" in refs
+ assert "GEOSPATIAL.md" in refs
+
+ def test_window_functions_reference(self, skill):
+ """Test window functions reference content."""
+ content = skill.get_reference("WINDOW_FUNCTIONS.md")
+ assert "ROW_NUMBER" in content
+ assert "RANK" in content
+ assert "LAG" in content
+
+ def test_geospatial_reference(self, skill):
+ """Test geospatial reference content."""
+ content = skill.get_reference("GEOSPATIAL.md")
+ assert "ST_DISTANCE" in content
+ assert "ST_CONTAINS" in content
+
+
+class TestBigQueryStorageSkill:
+ """Tests for bigquery-storage skill."""
+
+ @pytest.fixture
+ def skill(self):
+ """Load the skill from the skills directory."""
+ skill_path = SKILLS_DIR / "bigquery-storage"
+ return MarkdownSkill.from_directory(skill_path)
+
+ def test_skill_loads_successfully(self, skill):
+ """Test that the skill loads without errors."""
+ assert skill is not None
+ assert skill.name == "bigquery-storage"
+
+ def test_skill_has_description(self, skill):
+ """Test that the skill has a proper description."""
+ assert "storage" in skill.description.lower()
+
+ def test_skill_has_instructions(self, skill):
+ """Test that skill instructions contain expected content."""
+ instructions = skill.get_instructions()
+ assert "CREATE TABLE" in instructions
+ assert "schema" in instructions.lower()
+ assert "time travel" in instructions.lower()
+
+ def test_skill_has_scripts(self, skill):
+ """Test that skill has helper scripts."""
+ scripts = skill.list_scripts()
+ assert "storage_report.py" in scripts
+
+
+class TestBigQueryGovernanceSkill:
+ """Tests for bigquery-governance skill."""
+
+ @pytest.fixture
+ def skill(self):
+ """Load the skill from the skills directory."""
+ skill_path = SKILLS_DIR / "bigquery-governance"
+ return MarkdownSkill.from_directory(skill_path)
+
+ def test_skill_loads_successfully(self, skill):
+ """Test that the skill loads without errors."""
+ assert skill is not None
+ assert skill.name == "bigquery-governance"
+
+ def test_skill_has_description(self, skill):
+ """Test that the skill has a proper description."""
+ assert "governance" in skill.description.lower()
+ assert (
+ "access" in skill.description.lower()
+ or "security" in skill.description.lower()
+ )
+
+ def test_skill_has_instructions(self, skill):
+ """Test that skill instructions contain expected content."""
+ instructions = skill.get_instructions()
+ assert "IAM" in instructions
+ assert "row" in instructions.lower() and "security" in instructions.lower()
+ assert "masking" in instructions.lower()
+
+ def test_skill_has_scripts(self, skill):
+ """Test that skill has helper scripts."""
+ scripts = skill.list_scripts()
+ assert "audit_report.py" in scripts
+
+
+class TestBigQueryAdminSkill:
+ """Tests for bigquery-admin skill."""
+
+ @pytest.fixture
+ def skill(self):
+ """Load the skill from the skills directory."""
+ skill_path = SKILLS_DIR / "bigquery-admin"
+ return MarkdownSkill.from_directory(skill_path)
+
+ def test_skill_loads_successfully(self, skill):
+ """Test that the skill loads without errors."""
+ assert skill is not None
+ assert skill.name == "bigquery-admin"
+
+ def test_skill_has_description(self, skill):
+ """Test that the skill has a proper description."""
+ assert "admin" in skill.description.lower()
+
+ def test_skill_has_instructions(self, skill):
+ """Test that skill instructions contain expected content."""
+ instructions = skill.get_instructions()
+ assert "reservation" in instructions.lower()
+ assert "BI Engine" in instructions
+ assert "monitoring" in instructions.lower()
+
+ def test_skill_has_scripts(self, skill):
+ """Test that skill has helper scripts."""
+ scripts = skill.list_scripts()
+ assert "cost_report.py" in scripts
+
+
+class TestBigQueryIntegrationSkill:
+ """Tests for bigquery-integration skill."""
+
+ @pytest.fixture
+ def skill(self):
+ """Load the skill from the skills directory."""
+ skill_path = SKILLS_DIR / "bigquery-integration"
+ return MarkdownSkill.from_directory(skill_path)
+
+ def test_skill_loads_successfully(self, skill):
+ """Test that the skill loads without errors."""
+ assert skill is not None
+ assert skill.name == "bigquery-integration"
+
+ def test_skill_has_description(self, skill):
+ """Test that the skill has a proper description."""
+ assert "integrate" in skill.description.lower()
+
+ def test_skill_has_instructions(self, skill):
+ """Test that skill instructions contain expected content."""
+ instructions = skill.get_instructions()
+ assert "Python" in instructions
+ assert "JDBC" in instructions
+ assert "REST API" in instructions
+
+ def test_skill_has_scripts(self, skill):
+ """Test that skill has helper scripts."""
+ scripts = skill.list_scripts()
+ assert "connection_test.py" in scripts
+
+
+class TestBigQuerySkillsIntegration:
+ """Integration tests for all BigQuery skills."""
+
+ @pytest.fixture
+ def skill_names(self):
+ """List of all BigQuery skill directory names."""
+ return [
+ "bigquery-data-management",
+ "bigquery-analytics",
+ "bigquery-storage",
+ "bigquery-governance",
+ "bigquery-admin",
+ "bigquery-integration",
+ "bqml",
+ "bigquery-ai",
+ ]
+
+ def test_all_skills_exist(self, skill_names):
+ """Test that all expected skill directories exist."""
+ for name in skill_names:
+ skill_path = SKILLS_DIR / name
+ assert skill_path.exists(), f"Skill directory {name} not found"
+ assert (skill_path / "SKILL.md").exists(), f"SKILL.md not found in {name}"
+
+ def test_all_skills_load(self, skill_names):
+ """Test that all skills can be loaded."""
+ for name in skill_names:
+ skill_path = SKILLS_DIR / name
+ skill = MarkdownSkill.from_directory(skill_path)
+ assert skill.name == name
+
+ def test_all_skills_have_unique_names(self, skill_names):
+ """Test that all skills have unique names."""
+ loaded_names = []
+ for name in skill_names:
+ skill_path = SKILLS_DIR / name
+ skill = MarkdownSkill.from_directory(skill_path)
+ loaded_names.append(skill.name)
+
+ assert len(loaded_names) == len(
+ set(loaded_names)
+ ), "Duplicate skill names found"
+
+ def test_all_skills_have_valid_metadata(self, skill_names):
+ """Test that all skills have valid metadata."""
+ for name in skill_names:
+ skill_path = SKILLS_DIR / name
+ skill = MarkdownSkill.from_directory(skill_path)
+
+ assert skill.name, f"Skill {name} missing name"
+ assert skill.description, f"Skill {name} missing description"
+ assert len(skill.description) >= 50, f"Skill {name} description too short"
+
+ def test_skill_prompt_generation(self, skill_names):
+ """Test that all skills can generate prompts."""
+ for name in skill_names:
+ skill_path = SKILLS_DIR / name
+ skill = MarkdownSkill.from_directory(skill_path)
+ prompt = skill.get_skill_prompt()
+
+ assert skill.name in prompt
+ assert len(prompt) > 50
+
+
+class TestSkillToolDeclarations:
+ """Tests for skill tool declarations."""
+
+ def test_data_management_tool_declarations(self):
+ """Test data management skill tool declarations."""
+ skill = MarkdownSkill.from_directory(
+ SKILLS_DIR / "bigquery-data-management"
+ )
+ declarations = skill.get_tool_declarations()
+
+ # Should have auto-generated declarations from scripts
+ tool_names = [d["name"] for d in declarations]
+ assert any("validate_schema" in name for name in tool_names)
+
+ def test_analytics_tool_declarations(self):
+ """Test analytics skill tool declarations."""
+ skill = MarkdownSkill.from_directory(SKILLS_DIR / "bigquery-analytics")
+ declarations = skill.get_tool_declarations()
+
+ tool_names = [d["name"] for d in declarations]
+ assert any("query_analyzer" in name for name in tool_names)
+
+ def test_declarations_have_descriptions(self):
+ """Test that all tool declarations have descriptions."""
+ skill_names = [
+ "bigquery-data-management",
+ "bigquery-analytics",
+ "bigquery-admin",
+ "bigquery-governance",
+ "bigquery-integration",
+ ]
+
+ for name in skill_names:
+ skill = MarkdownSkill.from_directory(SKILLS_DIR / name)
+ declarations = skill.get_tool_declarations()
+
+ for decl in declarations:
+ assert "name" in decl, f"Tool in {name} missing name"
+ assert (
+ "description" in decl
+ ), f"Tool {decl.get('name')} in {name} missing description"
diff --git a/tests/unittests/skills/test_markdown_skill.py b/tests/unittests/skills/test_markdown_skill.py
new file mode 100644
index 0000000000..e60c271d1c
--- /dev/null
+++ b/tests/unittests/skills/test_markdown_skill.py
@@ -0,0 +1,448 @@
+# Copyright 2025 Google LLC
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Unit tests for MarkdownSkill class."""
+
+from __future__ import annotations
+
+from pathlib import Path
+import tempfile
+
+from google.adk.skills import MarkdownSkill
+from google.adk.skills import MarkdownSkillMetadata
+import pytest
+
+
+class TestMarkdownSkillMetadata:
+ """Tests for MarkdownSkillMetadata."""
+
+ def test_required_fields(self):
+ """Test that name and description are required."""
+ metadata = MarkdownSkillMetadata(
+ name="test-skill",
+ description="A test skill",
+ )
+
+ assert metadata.name == "test-skill"
+ assert metadata.description == "A test skill"
+
+ def test_optional_fields(self):
+ """Test optional fields."""
+ metadata = MarkdownSkillMetadata(
+ name="test-skill",
+ description="A test skill",
+ license="Apache-2.0",
+ compatibility="Python 3.8+",
+ metadata={"author": "test", "version": "1.0"},
+ )
+
+ assert metadata.license == "Apache-2.0"
+ assert metadata.compatibility == "Python 3.8+"
+ assert metadata.metadata["author"] == "test"
+
+ def test_adk_extensions(self):
+ """Test ADK-specific extensions."""
+ metadata = MarkdownSkillMetadata(
+ name="test-skill",
+ description="A test skill",
+ adk={
+ "config": {
+ "timeout_seconds": 120,
+ "allow_network": True,
+ },
+ "allowed_callers": ["custom_caller"],
+ },
+ )
+
+ assert metadata.adk["config"]["timeout_seconds"] == 120
+ assert metadata.adk["config"]["allow_network"] is True
+
+
+class TestMarkdownSkill:
+ """Tests for MarkdownSkill."""
+
+ @pytest.fixture
+ def skill_dir(self):
+ """Create a temporary skill directory."""
+ with tempfile.TemporaryDirectory() as tmpdir:
+ skill_path = Path(tmpdir) / "test-skill"
+ skill_path.mkdir()
+
+ # Create SKILL.md
+ skill_md = skill_path / "SKILL.md"
+ skill_md.write_text("""---
+name: test-skill
+description: A test skill for unit testing.
+license: Apache-2.0
+compatibility: Python 3.8+
+metadata:
+ author: test-author
+ version: "1.0"
+---
+
+# Test Skill
+
+## When to use this skill
+Use this skill for testing.
+
+## Instructions
+1. Step one
+2. Step two
+
+## Examples
+Example usage here.
+""")
+
+ # Create scripts directory
+ scripts_dir = skill_path / "scripts"
+ scripts_dir.mkdir()
+
+ # Create a Python script
+ (scripts_dir / "process.py").write_text('''"""Process data."""
+import sys
+print("Processing...")
+''')
+
+ # Create a shell script
+ (scripts_dir / "run.sh").write_text("""# Run something
+echo "Running"
+""")
+
+ # Create references directory
+ refs_dir = skill_path / "references"
+ refs_dir.mkdir()
+ (refs_dir / "REFERENCE.md").write_text(
+ "# Reference\n\nReference content."
+ )
+
+ # Create assets directory
+ assets_dir = skill_path / "assets"
+ assets_dir.mkdir()
+ (assets_dir / "template.txt").write_text("Template content")
+
+ yield skill_path
+
+ def test_from_directory(self, skill_dir):
+ """Test loading a skill from directory."""
+ skill = MarkdownSkill.from_directory(skill_dir)
+
+ assert skill.name == "test-skill"
+ assert skill.description == "A test skill for unit testing."
+ assert skill.skill_metadata.license == "Apache-2.0"
+ assert skill.skill_metadata.metadata["author"] == "test-author"
+
+ def test_from_directory_not_found(self):
+ """Test loading from non-existent directory."""
+ with pytest.raises(FileNotFoundError):
+ MarkdownSkill.from_directory("/nonexistent/path")
+
+ def test_from_directory_no_skill_md(self):
+ """Test loading from directory without SKILL.md."""
+ with tempfile.TemporaryDirectory() as tmpdir:
+ skill_path = Path(tmpdir) / "empty-skill"
+ skill_path.mkdir()
+
+ with pytest.raises(FileNotFoundError, match="SKILL.md not found"):
+ MarkdownSkill.from_directory(skill_path)
+
+ def test_from_directory_name_mismatch(self):
+ """Test that name must match directory name."""
+ with tempfile.TemporaryDirectory() as tmpdir:
+ skill_path = Path(tmpdir) / "wrong-name"
+ skill_path.mkdir()
+
+ skill_md = skill_path / "SKILL.md"
+ skill_md.write_text("""---
+name: correct-name
+description: Test skill
+---
+
+Content
+""")
+
+ with pytest.raises(ValueError, match="must match directory name"):
+ MarkdownSkill.from_directory(skill_path)
+
+ def test_from_directory_skip_name_validation(self):
+ """Test skipping name validation."""
+ with tempfile.TemporaryDirectory() as tmpdir:
+ skill_path = Path(tmpdir) / "wrong-name"
+ skill_path.mkdir()
+
+ skill_md = skill_path / "SKILL.md"
+ skill_md.write_text("""---
+name: correct-name
+description: Test skill
+---
+
+Content
+""")
+
+ # Should not raise when validation is disabled
+ skill = MarkdownSkill.from_directory(skill_path, validate_name=False)
+ assert skill.name == "correct-name"
+
+ def test_get_instructions(self, skill_dir):
+ """Test getting full instructions (Stage 2)."""
+ skill = MarkdownSkill.from_directory(skill_dir)
+
+ instructions = skill.get_instructions()
+
+ assert "# Test Skill" in instructions
+ assert "## When to use this skill" in instructions
+ assert "## Instructions" in instructions
+ assert skill.current_stage == 2
+
+ def test_get_script(self, skill_dir):
+ """Test getting script content (Stage 3)."""
+ skill = MarkdownSkill.from_directory(skill_dir)
+
+ script = skill.get_script("process.py")
+
+ assert script is not None
+ assert "Process data" in script
+ assert skill.current_stage == 3
+
+ def test_get_script_not_found(self, skill_dir):
+ """Test getting non-existent script."""
+ skill = MarkdownSkill.from_directory(skill_dir)
+
+ script = skill.get_script("nonexistent.py")
+
+ assert script is None
+
+ def test_get_script_path(self, skill_dir):
+ """Test getting script path."""
+ skill = MarkdownSkill.from_directory(skill_dir)
+
+ path = skill.get_script_path("process.py")
+
+ assert path is not None
+ assert path.exists()
+ assert path.name == "process.py"
+
+ def test_get_reference(self, skill_dir):
+ """Test getting reference content (Stage 3)."""
+ skill = MarkdownSkill.from_directory(skill_dir)
+
+ ref = skill.get_reference("REFERENCE.md")
+
+ assert ref is not None
+ assert "Reference content" in ref
+ assert skill.current_stage == 3
+
+ def test_get_reference_not_found(self, skill_dir):
+ """Test getting non-existent reference."""
+ skill = MarkdownSkill.from_directory(skill_dir)
+
+ ref = skill.get_reference("nonexistent.md")
+
+ assert ref is None
+
+ def test_get_asset_path(self, skill_dir):
+ """Test getting asset path."""
+ skill = MarkdownSkill.from_directory(skill_dir)
+
+ path = skill.get_asset_path("template.txt")
+
+ assert path is not None
+ assert path.exists()
+
+ def test_list_scripts(self, skill_dir):
+ """Test listing available scripts."""
+ skill = MarkdownSkill.from_directory(skill_dir)
+
+ scripts = skill.list_scripts()
+
+ assert "process.py" in scripts
+ assert "run.sh" in scripts
+
+ def test_list_references(self, skill_dir):
+ """Test listing available references."""
+ skill = MarkdownSkill.from_directory(skill_dir)
+
+ refs = skill.list_references()
+
+ assert "REFERENCE.md" in refs
+
+ def test_list_assets(self, skill_dir):
+ """Test listing available assets."""
+ skill = MarkdownSkill.from_directory(skill_dir)
+
+ assets = skill.list_assets()
+
+ assert "template.txt" in assets
+
+ def test_has_scripts(self, skill_dir):
+ """Test checking for scripts."""
+ skill = MarkdownSkill.from_directory(skill_dir)
+
+ assert skill.has_scripts() is True
+
+ def test_has_references(self, skill_dir):
+ """Test checking for references."""
+ skill = MarkdownSkill.from_directory(skill_dir)
+
+ assert skill.has_references() is True
+
+ def test_has_assets(self, skill_dir):
+ """Test checking for assets."""
+ skill = MarkdownSkill.from_directory(skill_dir)
+
+ assert skill.has_assets() is True
+
+ def test_get_tool_declarations(self, skill_dir):
+ """Test getting tool declarations."""
+ skill = MarkdownSkill.from_directory(skill_dir)
+
+ declarations = skill.get_tool_declarations()
+
+ # Should have declarations for scripts
+ names = [d["name"] for d in declarations]
+ assert any("process" in name for name in names)
+ assert any("run" in name for name in names)
+
+ def test_get_orchestration_template(self, skill_dir):
+ """Test getting orchestration template."""
+ skill = MarkdownSkill.from_directory(skill_dir)
+
+ template = skill.get_orchestration_template()
+
+ assert "async def" in template
+ assert "tools" in template
+
+ def test_get_skill_prompt(self, skill_dir):
+ """Test getting skill prompt."""
+ skill = MarkdownSkill.from_directory(skill_dir)
+
+ prompt = skill.get_skill_prompt()
+
+ assert skill.name in prompt
+ assert skill.description in prompt
+ assert "Scripts" in prompt
+ assert "References" in prompt
+
+ def test_progressive_disclosure_stages(self, skill_dir):
+ """Test progressive disclosure stages."""
+ skill = MarkdownSkill.from_directory(skill_dir)
+
+ # Stage 1: Only metadata loaded
+ assert skill.current_stage == 1
+
+ # Stage 2: Instructions loaded
+ skill.get_instructions()
+ assert skill.current_stage == 2
+
+ # Stage 3: Resources loaded
+ skill.get_script("process.py")
+ assert skill.current_stage == 3
+
+
+class TestMarkdownSkillFrontmatter:
+ """Tests for frontmatter parsing."""
+
+ def test_missing_frontmatter(self):
+ """Test handling missing frontmatter."""
+ with tempfile.TemporaryDirectory() as tmpdir:
+ skill_path = Path(tmpdir) / "no-frontmatter"
+ skill_path.mkdir()
+
+ skill_md = skill_path / "SKILL.md"
+ skill_md.write_text("# No Frontmatter\n\nContent without frontmatter.")
+
+ with pytest.raises(ValueError, match="must start with YAML frontmatter"):
+ MarkdownSkill.from_directory(skill_path)
+
+ def test_missing_name(self):
+ """Test handling missing name field."""
+ with tempfile.TemporaryDirectory() as tmpdir:
+ skill_path = Path(tmpdir) / "missing-name"
+ skill_path.mkdir()
+
+ skill_md = skill_path / "SKILL.md"
+ skill_md.write_text("""---
+description: Test skill
+---
+
+Content
+""")
+
+ with pytest.raises(ValueError, match="Missing required field 'name'"):
+ MarkdownSkill.from_directory(skill_path)
+
+ def test_missing_description(self):
+ """Test handling missing description field."""
+ with tempfile.TemporaryDirectory() as tmpdir:
+ skill_path = Path(tmpdir) / "missing-desc"
+ skill_path.mkdir()
+
+ skill_md = skill_path / "SKILL.md"
+ skill_md.write_text("""---
+name: missing-desc
+---
+
+Content
+""")
+
+ with pytest.raises(
+ ValueError, match="Missing required field 'description'"
+ ):
+ MarkdownSkill.from_directory(skill_path)
+
+ def test_invalid_yaml(self):
+ """Test handling invalid YAML."""
+ with tempfile.TemporaryDirectory() as tmpdir:
+ skill_path = Path(tmpdir) / "invalid-yaml"
+ skill_path.mkdir()
+
+ skill_md = skill_path / "SKILL.md"
+ skill_md.write_text("""---
+name: [invalid
+description: yaml
+---
+
+Content
+""")
+
+ with pytest.raises(ValueError, match="Invalid YAML"):
+ MarkdownSkill.from_directory(skill_path)
+
+ def test_adk_config_extension(self):
+ """Test ADK config extension in frontmatter."""
+ with tempfile.TemporaryDirectory() as tmpdir:
+ skill_path = Path(tmpdir) / "adk-config"
+ skill_path.mkdir()
+
+ skill_md = skill_path / "SKILL.md"
+ skill_md.write_text("""---
+name: adk-config
+description: Test ADK config
+adk:
+ config:
+ timeout_seconds: 120
+ allow_network: true
+ memory_limit_mb: 512
+ allowed_callers:
+ - custom_caller
+---
+
+Content
+""")
+
+ skill = MarkdownSkill.from_directory(skill_path)
+
+ assert skill.config.timeout_seconds == 120
+ assert skill.config.allow_network is True
+ assert skill.config.memory_limit_mb == 512
+ assert "custom_caller" in skill.allowed_callers
diff --git a/tests/unittests/skills/test_script_executor.py b/tests/unittests/skills/test_script_executor.py
new file mode 100644
index 0000000000..ecbbf06a31
--- /dev/null
+++ b/tests/unittests/skills/test_script_executor.py
@@ -0,0 +1,308 @@
+# Copyright 2025 Google LLC
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Unit tests for ScriptExecutor class."""
+
+from __future__ import annotations
+
+from pathlib import Path
+import tempfile
+
+from google.adk.skills import ScriptExecutionResult
+from google.adk.skills import ScriptExecutor
+import pytest
+
+
+class TestScriptExecutionResult:
+ """Tests for ScriptExecutionResult."""
+
+ def test_default_values(self):
+ """Test default values."""
+ result = ScriptExecutionResult(success=True)
+
+ assert result.success is True
+ assert result.stdout == ""
+ assert result.stderr == ""
+ assert result.return_code == 0
+ assert result.execution_time_ms == 0.0
+ assert result.timed_out is False
+
+ def test_custom_values(self):
+ """Test custom values."""
+ result = ScriptExecutionResult(
+ success=False,
+ stdout="output",
+ stderr="error",
+ return_code=1,
+ execution_time_ms=100.5,
+ timed_out=True,
+ )
+
+ assert result.success is False
+ assert result.stdout == "output"
+ assert result.stderr == "error"
+ assert result.return_code == 1
+ assert result.execution_time_ms == 100.5
+ assert result.timed_out is True
+
+
+class TestScriptExecutor:
+ """Tests for ScriptExecutor."""
+
+ @pytest.fixture
+ def executor(self):
+ """Create a default executor."""
+ return ScriptExecutor()
+
+ @pytest.fixture
+ def python_script(self):
+ """Create a temporary Python script."""
+ with tempfile.NamedTemporaryFile(mode="w", suffix=".py", delete=False) as f:
+ f.write("""#!/usr/bin/env python3
+import sys
+print("Hello from Python!")
+print("Args:", sys.argv[1:])
+""")
+ f.flush()
+ script_path = Path(f.name)
+ yield script_path
+ script_path.unlink(missing_ok=True)
+
+ @pytest.fixture
+ def failing_script(self):
+ """Create a Python script that fails."""
+ with tempfile.NamedTemporaryFile(mode="w", suffix=".py", delete=False) as f:
+ f.write("""#!/usr/bin/env python3
+import sys
+print("Error message", file=sys.stderr)
+sys.exit(1)
+""")
+ f.flush()
+ script_path = Path(f.name)
+ yield script_path
+ script_path.unlink(missing_ok=True)
+
+ @pytest.fixture
+ def slow_script(self):
+ """Create a slow Python script."""
+ with tempfile.NamedTemporaryFile(mode="w", suffix=".py", delete=False) as f:
+ f.write("""#!/usr/bin/env python3
+import time
+time.sleep(10)
+print("Done")
+""")
+ f.flush()
+ script_path = Path(f.name)
+ yield script_path
+ script_path.unlink(missing_ok=True)
+
+ @pytest.fixture
+ def shell_script(self):
+ """Create a temporary shell script."""
+ with tempfile.NamedTemporaryFile(mode="w", suffix=".sh", delete=False) as f:
+ f.write("""#!/bin/bash
+echo "Hello from Bash!"
+echo "Args: $@"
+""")
+ f.flush()
+ script_path = Path(f.name)
+ yield script_path
+ script_path.unlink(missing_ok=True)
+
+ def test_default_config(self, executor):
+ """Test default configuration."""
+ assert executor.timeout_seconds == 60.0
+ assert executor.allow_network is False
+ assert executor.memory_limit_mb == 256
+ assert executor.use_container is False
+
+ def test_custom_config(self):
+ """Test custom configuration."""
+ executor = ScriptExecutor(
+ timeout_seconds=30.0,
+ allow_network=True,
+ memory_limit_mb=512,
+ )
+
+ assert executor.timeout_seconds == 30.0
+ assert executor.allow_network is True
+ assert executor.memory_limit_mb == 512
+
+ @pytest.mark.asyncio
+ async def test_execute_python_script(self, executor, python_script):
+ """Test executing a Python script."""
+ result = await executor.execute_script(python_script)
+
+ assert result.success is True
+ assert "Hello from Python!" in result.stdout
+ assert result.return_code == 0
+ assert result.execution_time_ms > 0
+
+ @pytest.mark.asyncio
+ async def test_execute_python_script_with_args(self, executor, python_script):
+ """Test executing a Python script with arguments."""
+ result = await executor.execute_script(
+ python_script,
+ args=["arg1", "arg2"],
+ )
+
+ assert result.success is True
+ assert "arg1" in result.stdout
+ assert "arg2" in result.stdout
+
+ @pytest.mark.asyncio
+ async def test_execute_shell_script(self, executor, shell_script):
+ """Test executing a shell script."""
+ result = await executor.execute_script(shell_script)
+
+ assert result.success is True
+ assert "Hello from Bash!" in result.stdout
+
+ @pytest.mark.asyncio
+ async def test_execute_failing_script(self, executor, failing_script):
+ """Test executing a failing script."""
+ result = await executor.execute_script(failing_script)
+
+ assert result.success is False
+ assert result.return_code == 1
+ assert "Error message" in result.stderr
+
+ @pytest.mark.asyncio
+ async def test_execute_with_timeout(self, slow_script):
+ """Test script timeout."""
+ executor = ScriptExecutor(timeout_seconds=0.5)
+
+ result = await executor.execute_script(slow_script)
+
+ assert result.success is False
+ assert result.timed_out is True
+ assert "timed out" in result.stderr.lower()
+
+ @pytest.mark.asyncio
+ async def test_execute_nonexistent_script(self, executor):
+ """Test executing non-existent script."""
+ with pytest.raises(FileNotFoundError):
+ await executor.execute_script("/nonexistent/script.py")
+
+ @pytest.mark.asyncio
+ async def test_execute_unsupported_script_type(self, executor):
+ """Test executing unsupported script type."""
+ with tempfile.NamedTemporaryFile(suffix=".unknown") as f:
+ with pytest.raises(ValueError, match="Unsupported script type"):
+ await executor.execute_script(f.name)
+
+ @pytest.mark.asyncio
+ async def test_execute_with_working_dir(self, executor, python_script):
+ """Test executing with custom working directory."""
+ with tempfile.TemporaryDirectory() as tmpdir:
+ result = await executor.execute_script(
+ python_script,
+ working_dir=tmpdir,
+ )
+
+ assert result.success is True
+
+ @pytest.mark.asyncio
+ async def test_execute_with_env(self, executor):
+ """Test executing with custom environment variables."""
+ with tempfile.NamedTemporaryFile(mode="w", suffix=".py", delete=False) as f:
+ f.write("""#!/usr/bin/env python3
+import os
+print(os.environ.get("TEST_VAR", "not found"))
+""")
+ script_path = Path(f.name)
+
+ result = await executor.execute_script(
+ script_path,
+ env={"TEST_VAR": "test_value"},
+ )
+
+ assert result.success is True
+ assert "test_value" in result.stdout
+
+ @pytest.mark.asyncio
+ async def test_execute_with_stdin(self, executor):
+ """Test executing with stdin input."""
+ with tempfile.NamedTemporaryFile(mode="w", suffix=".py", delete=False) as f:
+ f.write("""#!/usr/bin/env python3
+import sys
+data = sys.stdin.read()
+print(f"Received: {data}")
+""")
+ script_path = Path(f.name)
+
+ result = await executor.execute_script(
+ script_path,
+ stdin="hello world",
+ )
+
+ assert result.success is True
+ assert "Received: hello world" in result.stdout
+
+ def test_check_interpreter_available(self, executor):
+ """Test checking interpreter availability."""
+ # Python should be available
+ assert executor.check_interpreter_available(".py") is True
+
+ # Unknown extension should not be available
+ assert executor.check_interpreter_available(".unknown") is False
+
+ def test_get_available_interpreters(self, executor):
+ """Test getting available interpreters."""
+ interpreters = executor.get_available_interpreters()
+
+ # Python should be available on most systems
+ assert ".py" in interpreters
+
+ @pytest.mark.asyncio
+ async def test_output_truncation(self):
+ """Test that large output is truncated."""
+ executor = ScriptExecutor(max_output_size=100)
+
+ with tempfile.NamedTemporaryFile(mode="w", suffix=".py", delete=False) as f:
+ f.write("""#!/usr/bin/env python3
+print("x" * 1000)
+""")
+ script_path = Path(f.name)
+
+ result = await executor.execute_script(script_path)
+
+ assert result.success is True
+ assert len(result.stdout) < 1000
+ assert "truncated" in result.stdout.lower()
+
+
+class TestScriptExecutorValidation:
+ """Tests for ScriptExecutor validation."""
+
+ def test_invalid_timeout(self):
+ """Test that invalid timeout is rejected."""
+ with pytest.raises(ValueError):
+ ScriptExecutor(timeout_seconds=0)
+
+ with pytest.raises(ValueError):
+ ScriptExecutor(timeout_seconds=-1)
+
+ def test_invalid_memory_limit(self):
+ """Test that invalid memory limit is rejected."""
+ with pytest.raises(ValueError):
+ ScriptExecutor(memory_limit_mb=0)
+
+ with pytest.raises(ValueError):
+ ScriptExecutor(memory_limit_mb=-1)
+
+ def test_invalid_max_output_size(self):
+ """Test that invalid max output size is rejected."""
+ with pytest.raises(ValueError):
+ ScriptExecutor(max_output_size=0)
diff --git a/tests/unittests/skills/test_skill_manager.py b/tests/unittests/skills/test_skill_manager.py
new file mode 100644
index 0000000000..4652e9077a
--- /dev/null
+++ b/tests/unittests/skills/test_skill_manager.py
@@ -0,0 +1,259 @@
+# Copyright 2025 Google LLC
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Unit tests for skills manager."""
+
+from __future__ import annotations
+
+from typing import Any
+from typing import List
+
+from google.adk.skills import BaseSkill
+from google.adk.skills import SkillInvocationResult
+from google.adk.skills import SkillsManager
+import pytest
+
+
+class MockSkill(BaseSkill):
+ """Mock skill for testing."""
+
+ name: str = "mock_skill"
+ description: str = "A mock skill for testing"
+
+ def get_tool_declarations(self) -> List[dict[str, Any]]:
+ return [{"name": "mock_tool", "description": "A mock tool"}]
+
+ def get_orchestration_template(self) -> str:
+ return "async def mock(tools): pass"
+
+
+class MockSkill2(BaseSkill):
+ """Second mock skill for testing."""
+
+ name: str = "mock_skill_2"
+ description: str = "Another mock skill"
+
+ def get_tool_declarations(self) -> List[dict[str, Any]]:
+ return [{"name": "mock_tool_2", "description": "Another mock tool"}]
+
+ def get_orchestration_template(self) -> str:
+ return "async def mock2(tools): pass"
+
+
+class TestSkillsManager:
+ """Tests for SkillsManager."""
+
+ def test_register_skill(self):
+ """Test registering a skill."""
+ manager = SkillsManager()
+ skill = MockSkill()
+
+ manager.register_skill(skill)
+
+ assert manager.has_skill("mock_skill")
+ assert manager.get_skill("mock_skill") == skill
+
+ def test_register_duplicate_skill_raises(self):
+ """Test that registering a duplicate skill raises ValueError."""
+ manager = SkillsManager()
+ skill1 = MockSkill()
+ skill2 = MockSkill()
+
+ manager.register_skill(skill1)
+
+ with pytest.raises(ValueError, match="already registered"):
+ manager.register_skill(skill2)
+
+ def test_unregister_skill(self):
+ """Test unregistering a skill."""
+ manager = SkillsManager()
+ skill = MockSkill()
+
+ manager.register_skill(skill)
+ result = manager.unregister_skill("mock_skill")
+
+ assert result is True
+ assert not manager.has_skill("mock_skill")
+
+ def test_unregister_nonexistent_skill(self):
+ """Test unregistering a skill that doesn't exist."""
+ manager = SkillsManager()
+
+ result = manager.unregister_skill("nonexistent")
+
+ assert result is False
+
+ def test_get_skill(self):
+ """Test getting a skill by name."""
+ manager = SkillsManager()
+ skill = MockSkill()
+ manager.register_skill(skill)
+
+ retrieved = manager.get_skill("mock_skill")
+
+ assert retrieved == skill
+
+ def test_get_nonexistent_skill(self):
+ """Test getting a skill that doesn't exist."""
+ manager = SkillsManager()
+
+ result = manager.get_skill("nonexistent")
+
+ assert result is None
+
+ def test_get_all_skills(self):
+ """Test getting all registered skills."""
+ manager = SkillsManager()
+ skill1 = MockSkill()
+ skill2 = MockSkill2()
+
+ manager.register_skill(skill1)
+ manager.register_skill(skill2)
+
+ skills = manager.get_all_skills()
+
+ assert len(skills) == 2
+ assert skill1 in skills
+ assert skill2 in skills
+
+ def test_get_skill_names(self):
+ """Test getting all skill names."""
+ manager = SkillsManager()
+ skill1 = MockSkill()
+ skill2 = MockSkill2()
+
+ manager.register_skill(skill1)
+ manager.register_skill(skill2)
+
+ names = manager.get_skill_names()
+
+ assert set(names) == {"mock_skill", "mock_skill_2"}
+
+ def test_has_skill(self):
+ """Test checking if a skill exists."""
+ manager = SkillsManager()
+ skill = MockSkill()
+
+ assert not manager.has_skill("mock_skill")
+
+ manager.register_skill(skill)
+
+ assert manager.has_skill("mock_skill")
+
+ def test_clear(self):
+ """Test clearing all skills."""
+ manager = SkillsManager()
+ manager.register_skill(MockSkill())
+ manager.register_skill(MockSkill2())
+
+ manager.clear()
+
+ assert len(manager.get_all_skills()) == 0
+
+
+class TestSkillInvocationResult:
+ """Tests for SkillInvocationResult."""
+
+ def test_success_result(self):
+ """Test creating a success result."""
+ result = SkillInvocationResult(
+ success=True,
+ result={"data": "value"},
+ execution_time_ms=100.5,
+ )
+
+ assert result.success is True
+ assert result.result == {"data": "value"}
+ assert result.error is None
+ assert result.execution_time_ms == 100.5
+
+ def test_error_result(self):
+ """Test creating an error result."""
+ result = SkillInvocationResult(
+ success=False,
+ error="Something went wrong",
+ execution_time_ms=50.0,
+ )
+
+ assert result.success is False
+ assert result.result is None
+ assert result.error == "Something went wrong"
+
+ def test_default_values(self):
+ """Test default values."""
+ result = SkillInvocationResult(success=True)
+
+ assert result.result is None
+ assert result.error is None
+ assert result.call_log == []
+ assert result.execution_time_ms == 0.0
+
+
+class FilteringSkill(BaseSkill):
+ """Skill with custom filtering for testing."""
+
+ name: str = "filtering_skill"
+ description: str = "A skill that filters results"
+
+ def get_tool_declarations(self) -> List[dict[str, Any]]:
+ return []
+
+ def get_orchestration_template(self) -> str:
+ return "async def f(tools): pass"
+
+ def filter_result(self, result: Any) -> Any:
+ if isinstance(result, dict):
+ return {k: v for k, v in result.items() if not k.startswith("_")}
+ return result
+
+
+class TestSkillsManagerExecution:
+ """Tests for SkillsManager.execute_skill."""
+
+ @pytest.mark.asyncio
+ async def test_execute_skill_success(self):
+ """Test executing a skill successfully."""
+ skill = FilteringSkill()
+ tool_results = {"public": "data", "_private": "secret"}
+
+ result = await SkillsManager.execute_skill(skill, tool_results)
+
+ assert result.success is True
+ assert result.result == {"public": "data"}
+ assert "_private" not in result.result
+ assert result.execution_time_ms > 0
+
+ @pytest.mark.asyncio
+ async def test_execute_skill_with_error(self):
+ """Test executing a skill that raises an error."""
+
+ class ErrorSkill(BaseSkill):
+ name: str = "error_skill"
+ description: str = "A skill that errors"
+
+ def get_tool_declarations(self) -> List[dict[str, Any]]:
+ return []
+
+ def get_orchestration_template(self) -> str:
+ return ""
+
+ def filter_result(self, result: Any) -> Any:
+ raise ValueError("Filter error")
+
+ skill = ErrorSkill()
+
+ result = await SkillsManager.execute_skill(skill, {"data": "value"})
+
+ assert result.success is False
+ assert "Filter error" in result.error
diff --git a/tests/unittests/skills/test_skill_tool.py b/tests/unittests/skills/test_skill_tool.py
new file mode 100644
index 0000000000..2c52cc4d4f
--- /dev/null
+++ b/tests/unittests/skills/test_skill_tool.py
@@ -0,0 +1,412 @@
+# Copyright 2025 Google LLC
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Unit tests for SkillTool class."""
+
+from __future__ import annotations
+
+from pathlib import Path
+import tempfile
+from typing import Any
+from typing import List
+from unittest.mock import AsyncMock
+from unittest.mock import MagicMock
+
+from google.adk.skills import BaseSkill
+from google.adk.skills import create_skill_tools
+from google.adk.skills import MarkdownSkill
+from google.adk.skills import ScriptExecutor
+from google.adk.skills import SkillTool
+import pytest
+
+
+class ConcreteSkill(BaseSkill):
+ """Concrete skill for testing."""
+
+ def get_tool_declarations(self) -> List[dict[str, Any]]:
+ return [{"name": "test_tool", "description": "A test tool"}]
+
+ def get_orchestration_template(self) -> str:
+ return "async def test(tools): pass"
+
+
+class TestSkillTool:
+ """Tests for SkillTool."""
+
+ @pytest.fixture
+ def skill_dir(self):
+ """Create a temporary skill directory."""
+ with tempfile.TemporaryDirectory() as tmpdir:
+ skill_path = Path(tmpdir) / "test-skill"
+ skill_path.mkdir()
+
+ # Create SKILL.md
+ skill_md = skill_path / "SKILL.md"
+ skill_md.write_text("""---
+name: test-skill
+description: A test skill for unit testing.
+---
+
+# Test Skill
+
+## Instructions
+Follow these instructions to use the skill.
+""")
+
+ # Create scripts directory
+ scripts_dir = skill_path / "scripts"
+ scripts_dir.mkdir()
+
+ # Create a Python script
+ (scripts_dir / "hello.py").write_text('''"""Say hello."""
+print("Hello, World!")
+''')
+
+ # Create references directory
+ refs_dir = skill_path / "references"
+ refs_dir.mkdir()
+ (refs_dir / "GUIDE.md").write_text("# Guide\n\nReference guide content.")
+
+ yield skill_path
+
+ @pytest.fixture
+ def markdown_skill(self, skill_dir):
+ """Create a MarkdownSkill from the test directory."""
+ return MarkdownSkill.from_directory(skill_dir)
+
+ @pytest.fixture
+ def concrete_skill(self):
+ """Create a concrete BaseSkill."""
+ return ConcreteSkill(
+ name="concrete-skill",
+ description="A concrete test skill",
+ )
+
+ @pytest.fixture
+ def tool_context(self):
+ """Create a mock tool context."""
+ context = MagicMock()
+ context.invocation_context = MagicMock()
+ return context
+
+ def test_create_from_markdown_skill(self, markdown_skill):
+ """Test creating SkillTool from MarkdownSkill."""
+ tool = SkillTool(markdown_skill)
+
+ assert tool.name == "skill_test_skill"
+ assert "test skill for unit testing" in tool.description
+ assert tool.skill == markdown_skill
+
+ def test_create_from_concrete_skill(self, concrete_skill):
+ """Test creating SkillTool from concrete BaseSkill."""
+ tool = SkillTool(concrete_skill)
+
+ assert tool.name == "skill_concrete_skill"
+ assert "concrete test skill" in tool.description
+
+ def test_description_includes_actions(self, markdown_skill):
+ """Test that description includes available actions."""
+ tool = SkillTool(markdown_skill)
+
+ assert "activate" in tool.description
+ assert "run_script" in tool.description
+ assert "load_reference" in tool.description
+
+ def test_get_declaration(self, markdown_skill):
+ """Test getting function declaration."""
+ tool = SkillTool(markdown_skill)
+
+ declaration = tool._get_declaration()
+
+ assert declaration is not None
+ assert declaration.name == "skill_test_skill"
+ assert "action" in declaration.parameters.properties
+
+ @pytest.mark.asyncio
+ async def test_activate_action(self, markdown_skill, tool_context):
+ """Test activate action."""
+ tool = SkillTool(markdown_skill)
+
+ result = await tool.run_async(
+ args={"action": "activate"},
+ tool_context=tool_context,
+ )
+
+ assert result["status"] == "activated"
+ assert result["skill"] == "test-skill"
+ assert "Instructions" in result["instructions"]
+ assert "hello.py" in result["available_scripts"]
+ assert "GUIDE.md" in result["available_references"]
+
+ @pytest.mark.asyncio
+ async def test_activate_action_concrete_skill(
+ self, concrete_skill, tool_context
+ ):
+ """Test activate action with concrete skill."""
+ tool = SkillTool(concrete_skill)
+
+ result = await tool.run_async(
+ args={"action": "activate"},
+ tool_context=tool_context,
+ )
+
+ assert result["status"] == "activated"
+ assert result["skill"] == "concrete-skill"
+ assert "prompt" in result
+ assert "tool_declarations" in result
+
+ @pytest.mark.asyncio
+ async def test_run_script_action(self, markdown_skill, tool_context):
+ """Test run_script action."""
+ tool = SkillTool(markdown_skill)
+
+ result = await tool.run_async(
+ args={"action": "run_script", "script": "hello.py"},
+ tool_context=tool_context,
+ )
+
+ assert result["script"] == "hello.py"
+ assert result["success"] is True
+ assert "Hello, World!" in result["stdout"]
+
+ @pytest.mark.asyncio
+ async def test_run_script_with_args(self, skill_dir, tool_context):
+ """Test run_script action with arguments."""
+ # Create a script that uses arguments
+ scripts_dir = skill_dir / "scripts"
+ (scripts_dir / "greet.py").write_text('''"""Greet someone."""
+import sys
+name = sys.argv[1] if len(sys.argv) > 1 else "World"
+print(f"Hello, {name}!")
+''')
+
+ skill = MarkdownSkill.from_directory(skill_dir)
+ tool = SkillTool(skill)
+
+ result = await tool.run_async(
+ args={"action": "run_script", "script": "greet.py", "args": ["Alice"]},
+ tool_context=tool_context,
+ )
+
+ assert result["success"] is True
+ assert "Hello, Alice!" in result["stdout"]
+
+ @pytest.mark.asyncio
+ async def test_run_script_not_found(self, markdown_skill, tool_context):
+ """Test run_script action with non-existent script."""
+ tool = SkillTool(markdown_skill)
+
+ result = await tool.run_async(
+ args={"action": "run_script", "script": "nonexistent.py"},
+ tool_context=tool_context,
+ )
+
+ assert "error" in result
+ assert "not found" in result["error"].lower()
+ assert "available_scripts" in result
+
+ @pytest.mark.asyncio
+ async def test_run_script_no_script_name(self, markdown_skill, tool_context):
+ """Test run_script action without script name."""
+ tool = SkillTool(markdown_skill)
+
+ result = await tool.run_async(
+ args={"action": "run_script"},
+ tool_context=tool_context,
+ )
+
+ assert "error" in result
+ assert "Script name required" in result["error"]
+
+ @pytest.mark.asyncio
+ async def test_run_script_concrete_skill(self, concrete_skill, tool_context):
+ """Test run_script action with non-markdown skill."""
+ tool = SkillTool(concrete_skill)
+
+ result = await tool.run_async(
+ args={"action": "run_script", "script": "test.py"},
+ tool_context=tool_context,
+ )
+
+ assert "error" in result
+ assert "does not support scripts" in result["error"]
+
+ @pytest.mark.asyncio
+ async def test_load_reference_action(self, markdown_skill, tool_context):
+ """Test load_reference action."""
+ tool = SkillTool(markdown_skill)
+
+ result = await tool.run_async(
+ args={"action": "load_reference", "reference": "GUIDE.md"},
+ tool_context=tool_context,
+ )
+
+ assert result["reference"] == "GUIDE.md"
+ assert "Reference guide content" in result["content"]
+
+ @pytest.mark.asyncio
+ async def test_load_reference_not_found(self, markdown_skill, tool_context):
+ """Test load_reference action with non-existent reference."""
+ tool = SkillTool(markdown_skill)
+
+ result = await tool.run_async(
+ args={"action": "load_reference", "reference": "nonexistent.md"},
+ tool_context=tool_context,
+ )
+
+ assert "error" in result
+ assert "not found" in result["error"].lower()
+ assert "available_references" in result
+
+ @pytest.mark.asyncio
+ async def test_load_reference_no_name(self, markdown_skill, tool_context):
+ """Test load_reference action without reference name."""
+ tool = SkillTool(markdown_skill)
+
+ result = await tool.run_async(
+ args={"action": "load_reference"},
+ tool_context=tool_context,
+ )
+
+ assert "error" in result
+ assert "Reference name required" in result["error"]
+
+ @pytest.mark.asyncio
+ async def test_unknown_action(self, markdown_skill, tool_context):
+ """Test unknown action."""
+ tool = SkillTool(markdown_skill)
+
+ result = await tool.run_async(
+ args={"action": "unknown"},
+ tool_context=tool_context,
+ )
+
+ assert "error" in result
+ assert "Unknown action" in result["error"]
+ assert "available_actions" in result
+
+ @pytest.mark.asyncio
+ async def test_default_action_is_activate(self, markdown_skill, tool_context):
+ """Test that default action is activate."""
+ tool = SkillTool(markdown_skill)
+
+ result = await tool.run_async(
+ args={},
+ tool_context=tool_context,
+ )
+
+ assert result["status"] == "activated"
+
+ def test_custom_script_executor(self, markdown_skill):
+ """Test using custom script executor."""
+ executor = ScriptExecutor(timeout_seconds=30.0)
+ tool = SkillTool(markdown_skill, script_executor=executor)
+
+ assert tool._script_executor == executor
+
+ def test_custom_working_dir(self, markdown_skill):
+ """Test using custom working directory."""
+ with tempfile.TemporaryDirectory() as tmpdir:
+ tool = SkillTool(markdown_skill, working_dir=tmpdir)
+
+ # Resolve both paths to handle symlinks (e.g., /var -> /private/var on macOS)
+ assert tool.working_dir == Path(tmpdir).resolve()
+
+ def test_working_dir_defaults_to_skill_path(self, markdown_skill):
+ """Test that working dir defaults to skill path."""
+ tool = SkillTool(markdown_skill)
+
+ assert tool.working_dir == markdown_skill.skill_path
+
+ def test_working_dir_setter(self, markdown_skill):
+ """Test setting working directory."""
+ tool = SkillTool(markdown_skill)
+
+ with tempfile.TemporaryDirectory() as tmpdir:
+ tool.working_dir = tmpdir
+ # Resolve both paths to handle symlinks (e.g., /var -> /private/var on macOS)
+ assert tool.working_dir == Path(tmpdir).resolve()
+
+
+class TestCreateSkillTools:
+ """Tests for create_skill_tools function."""
+
+ def test_create_tools_from_skills(self):
+ """Test creating tools from multiple skills."""
+ skills = [
+ ConcreteSkill(name="skill-1", description="First skill"),
+ ConcreteSkill(name="skill-2", description="Second skill"),
+ ]
+
+ tools = create_skill_tools(skills)
+
+ assert len(tools) == 2
+ assert all(isinstance(t, SkillTool) for t in tools)
+ assert tools[0].name == "skill_skill_1"
+ assert tools[1].name == "skill_skill_2"
+
+ def test_create_tools_with_shared_executor(self):
+ """Test creating tools with shared executor."""
+ skills = [
+ ConcreteSkill(name="skill-1", description="First skill"),
+ ConcreteSkill(name="skill-2", description="Second skill"),
+ ]
+ executor = ScriptExecutor(timeout_seconds=30.0)
+
+ tools = create_skill_tools(skills, script_executor=executor)
+
+ assert all(t._script_executor == executor for t in tools)
+
+ def test_create_tools_with_working_dir(self):
+ """Test creating tools with shared working directory."""
+ skills = [
+ ConcreteSkill(name="skill-1", description="First skill"),
+ ]
+
+ with tempfile.TemporaryDirectory() as tmpdir:
+ tools = create_skill_tools(skills, working_dir=tmpdir)
+
+ # Resolve both paths to handle symlinks (e.g., /var -> /private/var on macOS)
+ assert tools[0].working_dir == Path(tmpdir).resolve()
+
+ def test_create_tools_empty_list(self):
+ """Test creating tools from empty list."""
+ tools = create_skill_tools([])
+
+ assert len(tools) == 0
+
+
+class TestSkillToolNameSanitization:
+ """Tests for skill name sanitization in tool names."""
+
+ def test_hyphens_replaced(self):
+ """Test that hyphens are replaced with underscores."""
+ skill = ConcreteSkill(name="my-skill-name", description="Test")
+ tool = SkillTool(skill)
+
+ assert tool.name == "skill_my_skill_name"
+
+ def test_dots_replaced(self):
+ """Test that dots are replaced with underscores."""
+ skill = ConcreteSkill(name="my.skill.name", description="Test")
+ tool = SkillTool(skill)
+
+ assert tool.name == "skill_my_skill_name"
+
+ def test_mixed_special_chars(self):
+ """Test mixed special characters."""
+ skill = ConcreteSkill(name="my-skill.name", description="Test")
+ tool = SkillTool(skill)
+
+ assert tool.name == "skill_my_skill_name"