Skip to content

Conversation

@Tpt
Copy link
Contributor

@Tpt Tpt commented Dec 12, 2025

Makes it clear that inserting SAMPLE is something the user can do, not something SPARQL implementation should do automatically

Close #165


Preview | Diff

Makes it clear that inserting SAMPLE is something the user can do, not something SPARQL implementation should do automatically

Close #165
@Tpt Tpt requested review from afs, hartig, kasei and rubensworks December 12, 2025 17:13
@Tpt Tpt self-assigned this Dec 12, 2025
Co-authored-by: Olaf Hartig <[email protected]>
@kasei
Copy link
Contributor

kasei commented Dec 13, 2025

Is this section even true? I'm trying to square the language in this note (which exists in similar form in the 1.1 spec) with the translation in 18.3.4.1 Grouping and Aggregation which includes:

For each (X AS Var) in SELECT, each HAVING(X), and each ORDER BY X in Q
  For each unaggregated variable V in X
      Replace V with SAMPLE(V)

I read that as supporting STR(?x) as a valid expression for X.

I think the language in this section is probably true of the algebra, but I understood that to be the entire point of the translation logic that wraps un-grouped and un-aggregated variables in SAMPLE. Am I missing something?

I also think this note is a bit confusing, because it's not clear if it is describing a restriction on the example just presented (in which the presence of grouping on STR(?x) does not imply that you can project that expression without the extra AS clause) or if it's describing a general rule (which as I mention above, I'm now unsure about).

@Tpt Tpt changed the title Clarify a note on unagreggated variables Clarify that unagreggated variables in SELECT raise an error Dec 13, 2025
@Tpt Tpt requested a review from hartig December 13, 2025 11:08
@Tpt Tpt added spec:substantive Change in the spec affecting its normative content (class 3) –see also spec:bug, spec:new-feature spec:bug Change fixing a bug in the specification (class 3) –see also spec:substantive and removed spec:substantive Change in the spec affecting its normative content (class 3) –see also spec:bug, spec:new-feature labels Dec 13, 2025
@Tpt
Copy link
Contributor Author

Tpt commented Dec 13, 2025

@kasei This is a great point! Thank you! I overlooked it even if it was in the issue motivating the MR (#165).

I think the SPARQL 1.1 specification is inconsistent and 18.3.4.1 Grouping and Aggregation section is contradicting 11.4 Aggregate Projection Restrictions. The testsuite seems to take the 11.4 section approach of not inserting SAMPLE but raising an error (agg08, agg09, agg10 and agg11 tests). I edited the MR to go into this direction but this is something that should be discussed by the WG.

Current quick survey:

  • error: Jena, Blazegraph, Oxigraph, QLever and Communica
  • insertion of SAMPLE: Virtuoso and RDFLib

@kasei
Copy link
Contributor

kasei commented Dec 14, 2025

@kasei This is a great point! Thank you! I overlooked it even if it was in the issue motivating the MR (#165).

I think the SPARQL 1.1 specification is inconsistent and 18.3.4.1 Grouping and Aggregation section is contradicting 11.4 Aggregate Projection Restrictions. The testsuite seems to take the 11.4 section approach of not inserting SAMPLE but raising an error (agg08, agg09, agg10 and agg11 tests). I edited the MR to go into this direction but this is something that should be discussed by the WG.

I don't think agg08 shows this. It has grouping on the expression (?O1 + ?O2) and then tries to use that same expression in projection, so it isn't about a "simple expressions consisting of just a variable". agg09, agg10, and agg11 similarly don't seem to be about using such a grouped "simple expression" in a non-simple projection.

Current quick survey:

  • error: Jena, Blazegraph, Oxigraph, QLever and Communica
  • insertion of SAMPLE: Virtuoso and RDFLib

That's interesting. Ignoring current implementations, my feeling here is the insertion case makes the most intuitive sense. Both aggregated values and grouped values should be available for use in any (non-aggregation) select expressions.

@Tpt
Copy link
Contributor Author

Tpt commented Dec 14, 2025

I don't think agg08 shows this. It has grouping on the expression (?O1 + ?O2) and then tries to use that same expression in projection, so it isn't about a "simple expressions consisting of just a variable".

Yes, it's not about single variable indeed. My bad.

agg09, agg10, and agg11 similarly don't seem to be about using such a grouped "simple expression" in a non-simple projection.

It seems to me we are not focusing on the same cases. I am focusing on "simple expressions consisting of just a variable" in the projection. I still think agg09 (and agg10 are relevant). Taking agg09:

SELECT ?P (COUNT(?O) AS ?C)
WHERE { ?S ?P ?O } GROUP BY ?S

There are two possible outcomes:

  1. We apply the section 11.4 Aggregate Projection Restrictions
In a query level which uses aggregates, only expressions consisting of aggregates and constants may be projected, with one exception. When GROUP BY is given with one or more simple expressions consisting of just a variable, those variables may be projected from the level.
Then this query is invalid because `?P` in the `SELECT` is not a grouped value.
  1. We apply 18.3.4.1 Grouping and Aggregation algorithm and the line
Replace V with SAMPLE(V)
Then the query become equivalent to
SELECT (SAMPLE(?P) AS ?P) (COUNT(?O) AS ?C)
WHERE { ?S ?P ?O } GROUP BY ?S

and we get results.

Hence my impression that the spec is inconsistent.

My MR is taking the first approach and changing algorithm but I don't feel strongly on it. I just think we should be consistent and make a choice here.

@hartig
Copy link
Contributor

hartig commented Dec 15, 2025

In light of the discussion, looking again at the following sentence from 11.4 Aggregate Projection Restrictions [SPARQL 1.1] ..

Other expressions, not using GROUP BY variables, or aggregates may have non-deterministic values projected from their groups using the SAMPLE aggregate.

(which was the sentence that you wanted to improve with this PR), may be this sentence was indeed meant to say that SAMPLE will be injected implicitly rather than saying that SAMPLE would have to be used explicitly. If that was indeed the meaning, then the sentence is consistent with the

Replace V with SAMPLE(V)

of the translation algorithm in 18.2.4.1 Grouping and Aggregation [SPARQL 1.1]. However, I would still consider this inconsistent with the following first paragraph of 11.4 Aggregate Projection Restrictions [SPARQL 1.1]:

In a query level which uses aggregates, only expressions consisting of aggregates and constants may be projected, with one exception. When GROUP BY is given with one or more simple expressions consisting of just a variable, those variables may be projected from the level.

This one says, indirectly but clearly, that it is not allowed to project a variable unless the variable is a GROUP BY variable.

So, in any case, there is an inconsistency in the spec.

Regarding the two possible outcomes, I would find the first one (throwing an error, as done by Jena, Blazegraph, Oxigraph, QLever and Communica) to be more intuitive. The PR in its current form (with commit 3f3e302 included) captures this option.

Yet, I agree that this is something to be discussed, at least in the TF if not the whole WG, because no matter which of the two options we pick, it is a change for some implementations.

@afs
Copy link
Contributor

afs commented Dec 15, 2025

Not completely sure but I think SAMPLE is inserted to put in an aggregate.
Then the next line ("For each aggregate") only has to consider aggregates.

I think the language in this section is probably true of the algebra, but I understood that to be the entire point of the translation logic that wraps un-grouped and un-aggregated variables in SAMPLE. Am I missing something?

agg08/agg09/agg10 are negative syntax tests so, yes, they don't get to the algebra.


Does anyone have an example query where it would hit that SAMPLE injection clause?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

spec:bug Change fixing a bug in the specification (class 3) –see also spec:substantive

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Clarify use of SAMPLE

5 participants