Skip to content

merge/persist distinction in SimpleJpaRepository not always correct #2546

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jarnbjo opened this issue May 24, 2022 · 4 comments
Closed

merge/persist distinction in SimpleJpaRepository not always correct #2546

jarnbjo opened this issue May 24, 2022 · 4 comments
Assignees
Labels
status: declined A suggestion or change that we don't feel we should currently apply

Comments

@jarnbjo
Copy link

jarnbjo commented May 24, 2022

The current implementation of SimpleJpaRepository.save is as follows:

  • If the method is invoked with a new (unsaved) entity, in our case this is detected by id == null, the call is delegated to EntityManager.persist.
  • Otherwise, the call is delegated to EntityManager.merge.

When using JPA directly, the things to consider when choosing wether to use the persist or the merge methods in the EntityManager are unfortunately not that simple. Invoking the merge method with a managed entity can, for example in connection with cascading, have unpredictable results and is often discouraged. Accidentally, the optimization introduced with #1285 would also have solved our problem, but it was removed later in #1594.

The problem is as follows:

I have three entity types, Parent1, Parent2 and Child with both Parent1 and Parent2 having a OneToMany relationship to Child with CascadeType.ALL. The Child can in the DB reference either a Parent1 or a Parent2 or both, but these references are not modeled in the Java classes. The tables are defined as follows:

create table parent1 (
     id char(36) primary key
);

create table parent2 (
     id char(36) primary key
);

create table child (
    id char(36) primary key,
    parent1_id char(36),
    parent2_id char(36)
);

If I now in one transaction:

  • load a persistent entity p1 of type Parent1
  • create a new entity p2 of type Parent2
  • create a new entity c of type Child
  • add c to both p1 and p2
  • save p1 and p2 through their repositories

... I would expect to have only one instance of type Child persisted and the same instance references both by p1 and p2. In reality however, the outcome depends on in which order I save p1 and p2.

If I save p1 first and then p2, the following happens:

  • When saving p1, SimpleJpaRepository recognizes p1 as an already persistent entity and delegates the call to EntityManager.merge.
  • Since p1 is referencing a new entity c through a CascadeType.ALL reference, the EntityManager will also merge c and this is where the JPA specification might not be particularly obvious. Merging a new or detached entity means that the object is replaced with a managed copy. The replacement is in this case done in the list of children in p1, so p1 is now referencing a new, merged copy of c, let's call it c2.
  • When saving p2, SimpleJpaRepository recognizes p2 as a new entity and delegates the call to EntityManager.persist.
  • Here, the persist operation is also cascading, but since p2 is still referencing the original instance of c, which after the merge in the previous step is still not managed, the JPA implementation believes to see a new, unsaved enitity and creates a second managed Child.

After the save operations, I have one Parent1, one Parent2 and two Child entities in the database, with each of the Child entities referencing either Parent1 or Parent2. This result is not expected.

If I save p2 first and then p1, the following happens:

  • When saving p2, SimpleJpaRepository recognizes p2 as a new entity and delegates the call to EntityManager.persist.
  • The persist operation will cascade and cause both p2 and c to become managed entities. Note that in this case, the original instances will become managed and are not replaced with managed copies.
  • When saving p1, SimpleJpaRepository recognizes p1 as a persistent entity and delegates the call to EntityManager.merge.
  • In this case, p1 is still referencing the same c instance as referenced from p2 and this instance is already managed thanks to the previous persist operation. Now, the merge operation does not make a new copy of c and since c is already managed and up to date, does nothing.

After the save operations, I have one Parent1, one Parent2 and oneChild entitiy in the database, with the Child entity referencing both Parent1 and Parent2. This is the expected result.

Since JPA implementations are required to recognize changes in managed entities without any explicit call to merge or persist, the optimization introduced in #1285 (invoking save with a persistent and already managed entity will neither forward it to merge nor persist) incidentially would also have fixed this problem. This optimization was reverted in #1594 because it allegedly broke functionality, then it seems to be disputed if the broken functionality was caused by the optimization at all and also requested that the revert should be reverted.

And just to already here address the comments in #1594 (comment) arguing in favour of not reverting the revert:

  1. EntityManager.merge does not do the optimization itself because the purpose of the merge method is not to save changes in a managed entity, but to reattach a detached entity to the current transaction. IMHO, the merge method is simply incorrectly used in the Spring Data implementation.
  2. With this piece of code and parent being a managed entity and child being a new entity, calling em.merge(parent) would in this case still not make child a managed entity as the comment author seems to expect. As I already pointed out earlier, in this case, the merge method will replace the instance in the list of children in parent with a managed copy of child, but the child instance as we see it in the method scope is still not managed. If the intention is to make child a managed entity in this scenario, the Spring Data repository would have to delegate parent to persist instead of merge, as invoking persist with an already managed entity will ignore the root entity, but still cascade the persist operation to potentially new referenced objects and then attach the original child instance to the EntityManager as a managed (but yet unsaved) entity.

Implementations of the classes referenced in the text and a JUnit test showing the difference in behaviour depending on the order of the save methods: example.zip

Related issues:

#1285
#1594

@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged label May 24, 2022
@gregturn
Copy link
Contributor

gregturn commented Jun 3, 2022

For reference purposes, a later comment by @odrotbohm on #1594 shows that the optimization applied then reverted was inadequate, and introduced a different scenario that could break things.

#1594 (comment)

I believe we need to further capture these various scenarios into a batch of test cases. Otherwise, we're talking in circles. This is the only way to guard against the scenario presented above AND the scenario in this related comment.

@odrotbohm
Copy link
Member

I don't think we need to throw out the baby with the bathwater. The arrangement outlined is in clear violation of the notion of an aggregate. In particular, a child entity belonging to two aggregates. This fundamentally subverts the ability of the original aggregate to control its state, as the change to another aggregate would change the state of the original one behind its back. That's where the subtle mismatches in JPA-specific entity state come from.

If the current implementation of save(…) doesn't meet the requirements of the entity arrangement you envision, you're free to override the method with a custom implementation that supports the subtleties you need.

@gregturn gregturn self-assigned this Jun 8, 2022
@gregturn gregturn added status: declined A suggestion or change that we don't feel we should currently apply and removed status: waiting-for-triage An issue we've not yet triaged labels Jun 8, 2022
@gregturn
Copy link
Contributor

gregturn commented Jun 8, 2022

After further evaluation, this ticket doesn't warrant changes to Spring Data JPA.

@gregturn gregturn closed this as completed Jun 8, 2022
@jarnbjo
Copy link
Author

jarnbjo commented Jun 14, 2022

I don't think we need to throw out the baby with the bathwater. The arrangement outlined is in clear violation of the notion of an aggregate.

Of course it is not. A relational database model is not necessarily a tree, but can be an arbitrary graph where on entity type is both referencis multiple other types and is references by multiple types. Perhaps even my example was too complicated. The same problem as described here will also occur in an N:M relationship if an existing and a new entity on the 'N' side of the relationship are referencing a new entity on the 'M' side.

I also tried to explain why the merge method is incorrectly used in the repository implementation. You are using the merge method obviously in the belief that it is supposed to be used for updating an entity, but that is not what the method is there for. The method is there to reattach detached entites. Why are you ignoring this obvious fact?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: declined A suggestion or change that we don't feel we should currently apply
Projects
None yet
Development

No branches or pull requests

4 participants