Skip to content

In SearchOperations.searchForStream, 'maxResults' overwrites 'size' (pageSize) #3089

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
hy2850 opened this issue Apr 20, 2025 · 1 comment
Labels
status: waiting-for-triage An issue we've not yet triaged

Comments

@hy2850
Copy link

hy2850 commented Apr 20, 2025

Hi, I'm opening this issue to report something likely to be a bug.
When creating a SearchRequest for searchForStream method, maxResults parameter is overwriting size, while it seems like it shouldn't be.

Bug

I set both maxResult and Pageable.size (will refer to pageSize) fields for the query and passes it to SearchOperations.searchForStream.
When requestConverter.searchRequestinside the method creates a SearchRequest, maxResult overwrites pageSize.

In the below code, builder.size is called first with query.getPageable().getPageSize(), followed by another call with query.getMaxResults(), overwriting the result of the first call.

if (query.getPageable().isPaged()) {
builder //
.from((int) query.getPageable().getOffset()) //
.size(query.getPageable().getPageSize());
} else {
builder.from(0).size(INDEX_MAX_RESULT_WINDOW);
}
if (!isEmpty(query.getFields())) {
var fieldAndFormats = query.getFields().stream().map(field -> FieldAndFormat.of(b -> b.field(field))).toList();
builder.fields(fieldAndFormats);
}
if (!isEmpty(query.getStoredFields())) {
builder.storedFields(query.getStoredFields());
}
if (query.getIndicesOptions() != null) {
addIndicesOptions(builder, query.getIndicesOptions());
}
if (query.isLimiting()) {
builder.size(query.getMaxResults());
}

I was stuck at the case where,

  • 250,000 data available in the ES index
  • Want to fetch at most 150,000 documents in total (out of 250k), using SearchOperations.searchForStream (maxResult 150000)
  • Want to fetch 10,000 documents per page (per scroll search request), which fills almost all the response memory buffer (Pageable.ofSize(10000))

result

  • Expected : 15 search requests in total, a single search returning a page of 10k documents
  • Actual : a single search tries to fetch 150k documents, overflowing response memory buffer and causing below Exception

Caused by: org.springframework.dao.DataAccessResourceFailureException: entity content is too long [169386197] for the configured buffer limit [104857600]; nested exception is java.lang.RuntimeException: entity content is too long [169386197] for the configured buffer limit [104857600]


Possible solution

I think two parameters have totally separate roles, thus one shouldn't overwrite the other.

  • pageSize parameter should decide how many documents that I want ES to fetch per page,
  • while maxResult parameter decides how many documents I want to fetch in total during the whole stream operation.

👉 Therefore, the actual pageSize should be Min(maxResult, pageSize) if maxResult is set

Changes made : https://github.com/hy2850/spring-data-elasticsearch/commits/%233089-size-and-maxResults/


Further explanation with code for your understanding

SearchOperations.searchForStream returns a CloseableIterator that uses scroll API to search through a large set of data by pages.
Currently, maxResult is solely used to get maxCount for this method.

@Override
public <T> SearchHitsIterator<T> searchForStream(Query query, Class<T> clazz, IndexCoordinates index) {
Duration scrollTime = query.getScrollTime() != null ? query.getScrollTime() : Duration.ofMinutes(1);
long scrollTimeInMillis = scrollTime.toMillis();
// noinspection ConstantConditions
int maxCount = query.isLimiting() ? query.getMaxResults() : 0;
return StreamQueries.streamResults( //
maxCount, //
searchScrollStart(scrollTimeInMillis, query, clazz, index), //
scrollId -> searchScrollContinue(scrollId, scrollTimeInMillis, clazz, index), //
this::searchScrollClear);
}

The iterator keeps track of how many documents it has fetched so far, in currentCount instance variable.

@Override
public SearchHit<T> next() {
if (hasNext()) {
currentCount.incrementAndGet();
return currentScrollHits.next();
}
throw new NoSuchElementException();
}

When currentCount reachesmaxCount, the iterator is forced to return false for hasNext, even when there could be more documents available for searching.

@Override
public boolean hasNext() {
boolean hasNext = false;
if (!isClosed && continueScroll && (maxCount <= 0 || currentCount.get() < maxCount)) {
if (!currentScrollHits.hasNext()) {
SearchScrollHits<T> nextPage = continueScrollFunction.apply(scrollState.getScrollId());
currentScrollHits = nextPage.iterator();

So, I reached a conclusion that maxResult should only decide maxCount, not the pageSize, and this is a bug.

@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged label Apr 20, 2025
hy2850 added a commit to hy2850/spring-data-elasticsearch that referenced this issue Apr 20, 2025
hy2850 added a commit to hy2850/spring-data-elasticsearch that referenced this issue Apr 20, 2025
hy2850 added a commit to hy2850/spring-data-elasticsearch that referenced this issue Apr 20, 2025
hy2850 added a commit to hy2850/spring-data-elasticsearch that referenced this issue Apr 20, 2025
hy2850 added a commit to hy2850/spring-data-elasticsearch that referenced this issue Apr 20, 2025
@sothawo
Copy link
Collaborator

sothawo commented Apr 21, 2025

Thanks for reporting this, I will have look, but probably only towards the end of the week/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: waiting-for-triage An issue we've not yet triaged
Projects
None yet
Development

No branches or pull requests

3 participants