You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When applying a Distinct keyword in Spring Data JPA, it seems to lead to wrong results.
For example, if someone defines a UserRepository with a findDistinctByLastname(String lastname), what is supposed to happen? Should it filter out ALL entries of the table based upon lastname and then do a DISTINCT? Because this can have different results.
select distinct u.id from User u where u.lastname = Baggins -> you'll get two rows, Frodo and Bilbo. select distinct u.lastname from User u where u.lastname = ?1-> you'll get one row, one for Baggins.
By default, u.id are the columns selected when doing any form of custom finder with a Distinct applied. This turns out to be correct in the sense that there are no duplicates. But in this situation, the distinct didn't contribute anything. The second query actually doesn't work, because the results are List<String>, and Spring Data doesn't know how to map this onto List<User> without a converter.
This also impacts COUNT operations. For example, a countDistinctByLastname(String lastname) currently ends up yields select distinct count(distinct u.id) from User u where u.lastname = ?1. Now this is bug (see #1380), since it should really be just select count(distinct u.id) from User u where u.lastname = ?1.
Based on the earlier fact, this may surprise you in that countDistinctByLastname("Baggins") will produce a value of 2, since there were 2 rows that had the value of Baggins. But if you were able to to select count(distinct u.lastname) from User u where u.lastname = ?1, the result would be 1.
Since countByLastname("Baggins") produces the same result of 2, you may be wondering what the distinct does. In this case, nothing. To get an appreciable difference, you have to suppress things like primary keys.
In fact, anything involving DISTINCT almost needs to be hand-written. For example, do you want to count how many distinct last names there are?
select u.lastname, count(u.lastname) from User u
Do you want to count how distinct lastnames there are, using a LIKE parameter?
select u.lastname, count(u.lastname) from User u where u.lastname like '%$1%'
Neither of these queries will map onto the repository's original User type.
There is a chance that the query does some left outer joins that result in duplicate entries of the same User type, like "all users enrolled in a class", where if the User is linked to multiple Class entries. But again, is this something expressed through a oin with a DISTINCT applied?
select distinct u from User u JOIN u.classes c
Or are you better off doing some correlated subquery?
select u from User u where exists (
select 'found' from Class c
where c in u.classes
)
All of this reveals that our current support for DISTINCT is incomplete and needs to be fleshed out so we can properly support it.
The text was updated successfully, but these errors were encountered:
In light of a team discussion, we can only from queries as best we can.
We have fixed one distinct issue in #1380. We also updated ref docs to properly share that writing queries with distinct clauses can be tricky and that users should put careful thought in order to get proper results.
When applying a
Distinct
keyword in Spring Data JPA, it seems to lead to wrong results.For example, if someone defines a
UserRepository
with afindDistinctByLastname(String lastname)
, what is supposed to happen? Should it filter out ALL entries of the table based uponlastname
and then do aDISTINCT
? Because this can have different results.Imagine having a data set with:
select distinct u.id from User u where u.lastname = Baggins
-> you'll get two rows,Frodo
andBilbo
.select distinct u.lastname from User u where u.lastname = ?1
-> you'll get one row, one forBaggins
.By default,
u.id
are the columns selected when doing any form of custom finder with aDistinct
applied. This turns out to be correct in the sense that there are no duplicates. But in this situation, the distinct didn't contribute anything. The second query actually doesn't work, because the results areList<String>
, and Spring Data doesn't know how to map this ontoList<User>
without a converter.This also impacts
COUNT
operations. For example, acountDistinctByLastname(String lastname)
currently ends up yieldsselect distinct count(distinct u.id) from User u where u.lastname = ?1
. Now this is bug (see #1380), since it should really be justselect count(distinct u.id) from User u where u.lastname = ?1
.Based on the earlier fact, this may surprise you in that
countDistinctByLastname("Baggins")
will produce a value of2
, since there were 2 rows that had the value ofBaggins
. But if you were able to toselect count(distinct u.lastname) from User u where u.lastname = ?1
, the result would be 1.Since
countByLastname("Baggins")
produces the same result of 2, you may be wondering what the distinct does. In this case, nothing. To get an appreciable difference, you have to suppress things like primary keys.In fact, anything involving
DISTINCT
almost needs to be hand-written. For example, do you want to count how many distinct last names there are?select u.lastname, count(u.lastname) from User u
Do you want to count how distinct lastnames there are, using a LIKE parameter?
select u.lastname, count(u.lastname) from User u where u.lastname like '%$1%'
Neither of these queries will map onto the repository's original
User
type.There is a chance that the query does some left outer joins that result in duplicate entries of the same
User
type, like "all users enrolled in a class", where if theUser
is linked to multipleClass
entries. But again, is this something expressed through a oin with a DISTINCT applied?select distinct u from User u JOIN u.classes c
Or are you better off doing some correlated subquery?
All of this reveals that our current support for
DISTINCT
is incomplete and needs to be fleshed out so we can properly support it.The text was updated successfully, but these errors were encountered: