Skip to content

Different number of elements in paged result of non-distinct query #3220

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
bielitom opened this issue Nov 8, 2023 · 3 comments
Closed

Different number of elements in paged result of non-distinct query #3220

bielitom opened this issue Nov 8, 2023 · 3 comments
Labels
status: declined A suggestion or change that we don't feel we should currently apply

Comments

@bielitom
Copy link

bielitom commented Nov 8, 2023

By default, all the results of the Repository query are returned as the distinct list. When pagination and joins are used the count of elements differs from the actually obtained results. In some cases it can totally break a pagination, mostly unpredictably.

This JPQL query: SELECT a FROM Author a INNER JOIN Book b ON b.author = a will return list of the unique Author entities independently from the number of the books per author (assuming, that author have non-zero number of books).
On the other hand, the native query will return the list as should be, with the duplicates caused by the join.

This behavior basically does not cause the problem, but when it comes to the pagination, it have some consequences.
Page.getTotalElements() method shows different number than returned number of the entities.

The problem looks like to be caused by the condition at org/springframework/data/support/PageableExecutionUtils.java:66
The condition works with the actual results and if the result list is smaller than pageSize it will not run countQuery supplier. This causes different getTotalElements() number on the pages where content.size()<getPageSize(). This can happen on every page because algorithm wrongly thinks, that it is the last page. On the affected pages the number of totalElements suddenly drops because it is calculated from the actual offset, not using the count query.

To get the acceptable results:

  • It should run the countQuery every time (you will get the wrong result, but the pagination will remain unchanged on all pages).
  • It should detect if the query is JPQL query and using joins. If yes, the distinct keyword in the countQuery should be used (to not to change the actual behavior). Native queries must be unaffected.

How to reproduce

Entities

@Entity Getter @Setter
public class Author {
    @Id @GeneratedValue(strategy= GenerationType.AUTO) private Long id;
    private String name;
    @OneToMany(mappedBy = "author") private List<Book> books = new ArrayList<>();

    public Author() {}
    public Author(String name) {
        this.name = name;
    }
}

@Entity Getter @Setter
public class Book {
    @Id @GeneratedValue(strategy= GenerationType.AUTO) private Long id;
    private String name;
    @ManyToOne(cascade = CascadeType.ALL) private Author author;

    public Book() { }
    public Book(String name, Author author) {
        this.name = name;
        this.author = author;
    }
}

Repository

public interface AuthorRepository extends JpaRepository<Author, Long> {
    @Query("SELECT a FROM Author a INNER JOIN Book b ON b.author = a")
    Page<Author> findAllWithBooks(Pageable pageable);

    @Query(value = "SELECT a.* FROM author a INNER JOIN book b ON b.author_id = a.id", nativeQuery = true)
    Page<Author> findAllWithBooksNative(Pageable pageable);

    @Query(value = "SELECT DISTINCT a.* FROM author a INNER JOIN book b ON b.author_id = a.id", nativeQuery = true)
    Page<Author> findAllWithBooksNativeDistinct(Pageable pageable);

    @Query("SELECT DISTINCT a FROM Author a INNER JOIN Book b ON b.author = a")
    Page<Author> findAllWithBooksDistinct(Pageable pageable);
}

Test Code

Author a;
a = authorRepository.save(new Author("Orson Scott Card"));
bookRepository.save(new Book("Ender's Game", a));

a = authorRepository.save(new Author("Douglas Adams"));
bookRepository.save(new Book("The Hitchhiker's Guide to the Galaxy", a));
bookRepository.save(new Book("The Restaurant at the End of the Universe", a));

a = authorRepository.save(new Author("Frank Herbert"));
bookRepository.save(new Book("Dune", a));
bookRepository.save(new Book("Destination: Void", a));

Assertions.assertEquals(3, authorRepository.count());
Assertions.assertEquals(5, bookRepository.count());

//Here we have 5 results presented as total results as expected, because of non-distinct select with the join
Assertions.assertEquals(5, authorRepository.findAllWithBooks(PageRequest.of(0, 1)).getTotalElements());

//If the page request is bigger than the actual number of Author entities we get getContent().size() == getTotalElements()
Assertions.assertEquals(3, authorRepository.findAllWithBooks(PageRequest.of(0, 100)).getContent().size());
Assertions.assertEquals(3, authorRepository.findAllWithBooks(PageRequest.of(0, 100)).getTotalElements());

//When we use distinct keyword in JPQL query (or in the Specification), the problem does not occur
Assertions.assertEquals(3, authorRepository.findAllWithBooksDistinct(PageRequest.of(0, 1)).getTotalElements());
Assertions.assertEquals(3, authorRepository.findAllWithBooksDistinct(PageRequest.of(0, 100)).getContent().size());
Assertions.assertEquals(3, authorRepository.findAllWithBooksDistinct(PageRequest.of(0, 100)).getTotalElements());

//Using the native query everything works as expected - non-distinct
Assertions.assertEquals(5, authorRepository.findAllWithBooksNative(PageRequest.of(0, 1)).getTotalElements());
Assertions.assertEquals(5, authorRepository.findAllWithBooksNative(PageRequest.of(0, 100)).getContent().size());
Assertions.assertEquals(5, authorRepository.findAllWithBooksNative(PageRequest.of(0, 100)).getTotalElements());

//Using the native query everything works as expected - distinct
Assertions.assertEquals(3, authorRepository.findAllWithBooksNativeDistinct(PageRequest.of(0, 1)).getTotalElements());
Assertions.assertEquals(3, authorRepository.findAllWithBooksNativeDistinct(PageRequest.of(0, 100)).getContent().size());
Assertions.assertEquals(3, authorRepository.findAllWithBooksNativeDistinct(PageRequest.of(0, 100)).getTotalElements());

Related: #750

@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged label Nov 8, 2023
@mp911de
Copy link
Member

mp911de commented Nov 9, 2023

For this case, you must provide your own count query via @Query(countQuery = …). Spring Data attempts to derive a count query on a best effort basis. Because both, JPQL and SQL allow statements that affect the multiplicity, it is your code that needs to provide a count query if you provided a base query yielding non-distinct results.

As the examples above do not define a countQuery the natural consequence is that you yield invalid results.

@mp911de mp911de added the status: waiting-for-feedback We need additional information before we can continue label Nov 9, 2023
@bielitom
Copy link
Author

bielitom commented Nov 9, 2023

@mp911de
I didn't mention @Query(countQuery = …) because it may be problematic to maintain with longer queries or using JPA Specification (same results, not included in the example).

When it comes to the getCountQuery method in /org/springframework/data/jpa/repository/support/SimpleJpaRepository.java:776 it can correctly recognize distinct and non-distinct query. But as mentioned before, the entities returned from the query itself are always unique when JPQL is used.

Anyway, the problem is not really in the count query alone. The problem lies in the getPage method in PageableExecutionUtils.java, where the algorithm always thinks, that it is on the last page and totally ignores countQuery. It happens when the query does not return the full page of the results, because JPA naturally removes duplicates from the results.

@spring-projects-issues spring-projects-issues added status: feedback-provided Feedback has been provided and removed status: waiting-for-feedback We need additional information before we can continue labels Nov 9, 2023
@mp911de
Copy link
Member

mp911de commented Nov 10, 2023

because it may be problematic to maintain with longer queries

In such a case, the implementation should go in a custom implementation. Repositories and @Query have their limits, these aren't all purpose utilities.

Similar to

where the algorithm always thinks, that it is on the last page

We have certain assumptions that we meet. If your queries do not meet these assumptions, then you need to provide either more specific queries or follow the route of implementing a custom fragment.

That being said, we do not intend to introduce more complexity on our side.

@mp911de mp911de closed this as not planned Won't fix, can't repro, duplicate, stale Nov 10, 2023
@mp911de mp911de added status: declined A suggestion or change that we don't feel we should currently apply and removed status: waiting-for-triage An issue we've not yet triaged status: feedback-provided Feedback has been provided labels Nov 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: declined A suggestion or change that we don't feel we should currently apply
Projects
None yet
Development

No branches or pull requests

3 participants