-
Notifications
You must be signed in to change notification settings - Fork 15
feat: Enable CLI support for content variants and custom extensions #46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
138a10c
3b3c28c
e8704ab
4f2b271
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. File seems useless? Is
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for the feedback. The usage is currently indirect, which I agree is not very clear. I’ll add clarification (or refactor) to make the purpose and usage of |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,93 @@ | ||
| """ | ||
| SPARQL Queries for Databus Python Client | ||
|
|
||
| This module contains SPARQL queries used for interacting with the DBpedia Databus. | ||
| """ | ||
|
|
||
| # Query to fetch ontologies with proper content variant aggregation | ||
| # Uses GROUP_CONCAT to handle multiple content variants per distribution | ||
| ONTOLOGIES_QUERY = """ | ||
| PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> | ||
| PREFIX databus: <https://databus.dbpedia.org/> | ||
| PREFIX dataid: <http://dataid.dbpedia.org/ns/core#> | ||
| PREFIX dataid-cv: <http://dataid.dbpedia.org/ns/cv#> | ||
| PREFIX dct: <http://purl.org/dc/terms/> | ||
| PREFIX dcat: <http://www.w3.org/ns/dcat#> | ||
| PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> | ||
|
|
||
| SELECT DISTINCT | ||
| ?group ?art ?version ?title ?publisher ?comment ?description | ||
| ?license ?file ?extension ?type ?bytes ?shasum | ||
| (GROUP_CONCAT(DISTINCT ?variantStr; separator=", ") AS ?contentVariants) | ||
| WHERE { | ||
| ?dataset dataid:account databus:ontologies . | ||
| ?dataset dataid:group ?group . | ||
| ?dataset dataid:artifact ?art. | ||
| ?dataset dcat:distribution ?distribution . | ||
| ?dataset dct:license ?license . | ||
| ?dataset dct:publisher ?publisher . | ||
| ?dataset rdfs:comment ?comment . | ||
| ?dataset dct:description ?description . | ||
| ?dataset dct:title ?title . | ||
| ?distribution dcat:downloadURL ?file . | ||
| ?distribution dataid:formatExtension ?extension . | ||
| ?distribution dataid-cv:type ?type . | ||
| ?distribution dcat:byteSize ?bytes . | ||
| ?distribution dataid:sha256sum ?shasum . | ||
| ?dataset dct:hasVersion ?version . | ||
|
|
||
| # Excludes dev versions | ||
| FILTER (!regex(?art, "--DEV")) | ||
|
|
||
| # OPTIONAL: Check for variants, but don't fail if none exist | ||
| OPTIONAL { | ||
| ?distribution dataid:contentVariant ?cv . | ||
| BIND(STR(?cv) AS ?variantStr) | ||
| } | ||
|
|
||
| } | ||
| GROUP BY ?group ?art ?version ?title ?publisher ?comment ?description ?license ?file ?extension ?type ?bytes ?shasum | ||
| ORDER BY ?version | ||
| """ | ||
|
|
||
|
|
||
| def parse_content_variants_string(variants_str: str) -> dict: | ||
| """ | ||
| Parse a comma-separated content variants string from SPARQL GROUP_CONCAT result. | ||
|
|
||
| Parameters | ||
| ---------- | ||
| variants_str : str | ||
| Comma-separated string of content variants, e.g., "lang=en, type=full, sorted" | ||
|
|
||
| Returns | ||
| ------- | ||
| dict | ||
| Dictionary of parsed content variants. For key=value pairs, both the key | ||
| and value are returned as strings (no type conversion is performed, so | ||
| "true" remains the string "true", not a boolean). For standalone values | ||
| without an "=" sign, the value is recorded as the boolean ``True``. | ||
|
|
||
| Example: "lang=en, type=full, sorted" -> {"lang": "en", "type": "full", "sorted": True} | ||
|
|
||
| Notes | ||
| ----- | ||
| - All values from key=value pairs are kept as strings. If you need boolean | ||
| or numeric conversion, perform it after calling this function. | ||
| - Standalone items (e.g., "sorted") are stored with boolean ``True`` as | ||
| their value, indicating presence rather than a specific string value. | ||
| """ | ||
| if not variants_str or variants_str.strip() == "": | ||
| return {} | ||
|
|
||
| variants = {} | ||
| for part in variants_str.split(","): | ||
| part = part.strip() | ||
| if "=" in part: | ||
| key, value = part.split("=", 1) | ||
| variants[key.strip()] = value.strip() | ||
| elif part: | ||
| # Handle standalone values (no key=value format) | ||
| variants[part] = True | ||
|
|
||
| return variants |
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would prefer to keep
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
sure @Integer-Ctrl i would incorporate the suggested chnages |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -11,6 +11,51 @@ | |
| from databusclient.extensions import webdav | ||
|
|
||
|
|
||
| def parse_distribution_str(dist_str: str): | ||
| """ | ||
| Parses a distribution string with format: | ||
| URL|key=value|...|.extension | ||
|
|
||
| Returns a dictionary suitable for the deploy API. | ||
| """ | ||
| parts = dist_str.split('|') | ||
| url = parts[0].strip() | ||
|
|
||
| variants = {} | ||
| format_ext = None | ||
| compression = None | ||
|
|
||
| # Iterate over the modifiers (everything after the URL) | ||
| for part in parts[1:]: | ||
| part = part.strip() | ||
|
|
||
| # Case 1: Extension (starts with .) | ||
| if part.startswith('.'): | ||
| # purely heuristic: if it looks like compression (gz, zip, br), treat as compression | ||
| # otherwise treat as format extension | ||
| if part.lower() in ['.gz', '.zip', '.br', '.tar', '.zst']: | ||
| compression = part.lstrip('.') # remove leading dot for API compatibility if needed | ||
| else: | ||
| format_ext = part.lstrip('.') | ||
vaibhav45sktech marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| # Case 2: Content Variant (key=value) | ||
| elif '=' in part: | ||
| key, value = part.split('=', 1) | ||
| variants[key.strip()] = value.strip() | ||
|
|
||
| # Case 3: Standalone tag (treat as boolean variant or ignore? | ||
| # For now, we assume it's a value for a default key or warn) | ||
| else: | ||
| print(f"WARNING: Unrecognized modifier '{part}' in distribution. Expected '.ext' or 'key=val'.") | ||
|
|
||
| return { | ||
| "url": url, | ||
| "variants": variants, | ||
| "formatExtension": format_ext, | ||
| "compression": compression | ||
| } | ||
|
|
||
|
|
||
| @click.group() | ||
| def app(): | ||
| """Databus Client CLI. | ||
|
|
@@ -85,14 +130,22 @@ | |
| click.echo("[MODE] Classic deploy with distributions") | ||
| click.echo(f"Deploying dataset version: {version_id}") | ||
|
|
||
| # --- CHANGE START --- | ||
| # Parse the input strings into structured objects | ||
| parsed_distributions = [parse_distribution_str(d) for d in distributions] | ||
|
|
||
| # Note: api_deploy.create_dataset now accepts this list of dicts | ||
| dataid = api_deploy.create_dataset( | ||
| version_id, title, abstract, description, license_url, parsed_distributions | ||
| version_id=version_id, | ||
| artifact_version_title=title, | ||
| artifact_version_abstract=abstract, | ||
| artifact_version_description=description, | ||
| license_url=license_url, | ||
| distributions=distributions | ||
| ) | ||
| # --- CHANGE END --- | ||
|
Comment on lines
+133
to
+147
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chain🏁 Script executed: # Check the actual file content at the specified lines
cat -n databusclient/cli.py | sed -n '130,150p'Repository: dbpedia/databus-python-client Length of output: 1099 🏁 Script executed: # Also search for the create_dataset function signature to understand expected parameters
rg -n "def create_dataset" --type py -A 10Repository: dbpedia/databus-python-client Length of output: 764 🏁 Script executed: # Find where api_deploy is defined/imported to understand the module
rg -n "api_deploy" databusclient/cli.py | head -20Repository: dbpedia/databus-python-client Length of output: 396 Fix syntax error in Line 139 is missing a comma before the keyword arguments on line 140, and you're passing Fix- dataid = api_deploy.create_dataset(
- version_id, title, abstract, description, license_url, parsed_distributions
- version_id=version_id,
- artifact_version_title=title,
- artifact_version_abstract=abstract,
- artifact_version_description=description,
- license_url=license_url,
- distributions=distributions
- )
+ dataid = api_deploy.create_dataset(
+ version_id=version_id,
+ artifact_version_title=title,
+ artifact_version_abstract=abstract,
+ artifact_version_description=description,
+ license_url=license_url,
+ distributions=parsed_distributions,
+ )🧰 Tools🪛 GitHub Actions: Python CI (Lint & pytest)[error] 140-140: Ruff check failed with SyntaxError: Expected ',', found name at databusclient/cli.py:140:13. Command: 'poetry run ruff check --output-format=github .' 🪛 GitHub Check: build[failure] 140-140: Ruff 🤖 Prompt for AI Agents |
||
|
|
||
| api_deploy.deploy(dataid=dataid, api_key=apikey) | ||
| return | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validate required
urlbefore fallback download.If a caller passes a dict without
url,_load_file_stats("")raises a low-signal requests error. Fail fast with a clear exception before any network call.🛠️ Proposed fix
🤖 Prompt for AI Agents