-
Notifications
You must be signed in to change notification settings - Fork 310
Use prepare(String)
for Prepared Statement preparation to prevent bind values from being cached
#1213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Do you have an actual example or reproducer? Spring Data Cassandra 3.3.x uses positional bind markers to prepare and run statements. We had once a bug #1172 that created issues for statements using |
I think, use can produce the issue by writing any entity with (Reactive)CassandraTemplate::insert, ::update or using a *Repository on the entity with ::save oder ::insert. The positional bind markers you mentioned, are still contained in the prepare request (a DefaultPrepareRequest will be created with the DefaultSimpleStatement inside it, which contains the markers). So, the preparing will see different PrepareRequests (see hashCode-methods), where you probably intended having the same PrepareRequest. I think, by wrapping the SimpleStatement and leaving out the markers for the PrepareRequest, the problem could be fixed. |
PrepareRequest is inside the Cassandra-Driver, so the wrapping (punching-out holes) must occur before calling session.prepareAsync. |
With the investigation from your side, would you mind creating a pull request where we can explore the possibilities? |
My environment is locked in in a vm with no access to the outside. But perhaps we can have a call, both sitting in Germany? |
One way to get the behaviour we expect is to ensure equals/hashCode for PrepareRequest only includes properties that will actually be used as part of the PreparedStatement. This would include dropping positional values and keeping consistency level, for example. DefaultPrepareRequest does contain a String accepting constructor, of which we could pass in the parameterized (with bind markers) query String, however certain execution details of the query would be dropped that are usually contained on the SimpleStatement (consistency level etc). The DataStax docs do say if you want to customize the current behaviour then you can implement your own PrepareRequest, so maybe this is the way forward However, I would be very keen to hear their thoughts on the current behaviour. I think it should be also noted that due to the cache misses for each request, there will be an unnecessary amount of prepare requests being sent to the Cassandra nodes, which is waste of resources. |
Paging @adutra |
As another solution, I think it might be worth considering to create the PreparedStatement once and store it with the associated CassandraPersistentEntity or a similar place. This might cause problems with consistency level and other options, of course. |
Why not remove the values before preparing? public PreparedStatement createPreparedStatement(CqlSession session) throws DriverException {
return session.prepare(simpleStatement.setPositionalValues(Collections.emptyList()));
} at org.springframework.data.cassandra.core.CassandraTemplate.PreparedStatementHandler#createPreparedStatement In the javadocs of com.datastax.oss.driver.api.core.cql.SimpleStatement#setPositionalValues
I don't think this is the most elegant solution as it says that custom implementations may choose to be mutable. Let me know if I can contribute. |
I'm not sure that we want to host a fix/workaround for a driver issue. |
I don't think it's a driver issue. It is working as documented: At https://docs.datastax.com/en/developer/java-driver/4.13/manual/core/statements/prepared/
I do think it's strange behavior from the driver. |
I agree, the cache shouldn't take into account any actual bound values present in the statement being prepared. It's difficult to tell whether this should be considered a flaw in the driver or in SDC because implicitly, it is expected that what you are preparing should be a template statement. There is no sense in preparing an actual statement that you would like to see executed as is. IOW, it is expected that when calling That said, this constraint is not clearly stated anywhere in our docs, so I don't mind opening a ticket on our side to explore the idea of fixing/improving that. There you go: https://datastax-oss.atlassian.net/browse/JAVA-2996. But frankly, this looks to me as a misusage, so something to fix on your side. And I do like very much @Rednas92-spec suggestion, as this is exactly what I said above: use a statement template, not the actual statement: public PreparedStatement createPreparedStatement(CqlSession session) throws DriverException {
return session.prepare(simpleStatement
.setPositionalValues(Collections.emptyList())
.setNamedValues(Collections.emptyMap()));
} Paging @absurdfarce for awareness. |
Thanks @adutra for providing more context. Does it actually make a difference (during prepare) when specifying consistency levels or routing keys? ( I'm wondering whether it would make sense to just prepare plain CQL ( |
The consistency level would become the default consistency level for all bound statements. For routing keys and tokens, it doesn't make sense to provide those, as these are precisely what Cassandra computes for you when you prepare a statement. I will add a note to JAVA-2966. If you apply all desired settings to the bound statements themselves, then yes, you could – and probably should :-) – prepare just the CQL query. |
Thanks a lot. Then I'd switch our |
prepare(String)
for Prepared Statement preparation to prevent bind values from being cached
…d values from being cached. We now also apply query options from the initial statement to the bound statement by copying these if query options are set. Closes #1213
We've decided to address the issue in two steps: First, we introduced with #1237 a customization hook for allowing customization of the preparation. Existing applications can override the creation of Please ensure that you apply any statement options (page size, paging state, consistency level) as these are not retained into the Version 3.4 and 4.0 of Spring Data Cassandra are going to use |
When updating to spring.data.cassandra 3.3.1 from an older version (that one "bundled" with spring-boot 2.3.8) I realized a poor write performance due to the fact that each write eats about 7k memory which then later results in painful full-gc's. After a full-gc, memory is reclaimed and the same procedure begins...
So, this being a typical cache-going-postal pattern, I searched for found the cache in Cassandra java driver class CQLPrepareAsyncProcessor responsible for this behavior. This cache holds PrepareRequests as key and CompletableFuture as values.
Since the write paths of spring.data.cassandra (*CassandraRepository, *CassandraTemplate) call session.prepare with fully filled-out SimpleStatements (not as one might expect with punched-out holes for the actual values to be filled in later during the binding process), a correct prepared statement is delivered, but each individual SimpleStatement (via DefaultPrepareRequest) is stored in the cache, yielding the "memory leak". The cache works with weak references, so a full-gc can reclaim memory.
For a small fix, it should be possible override the createPreparedStatement-Methods in the PreparedStatementHandler classes (themselves inner classes of *CassandraTemplate). Perhaps a SimpleStatementWrapper can do the job.
As a WORKAROUND I found out, that setting usePreparedStatement=false in *CassandraTemplate circumvents the problem (but leads to a performance loss of about 20% in my use case).
I hope, you can help here!
Best regards,
Sven
The text was updated successfully, but these errors were encountered: