Skip to content

Non deterministic issue with not initialize fields when projecting the same Node to two different interfaces in projection hierachy #2858

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
NilsWild opened this issue Jan 24, 2024 · 24 comments
Assignees
Labels
status: feedback-provided Feedback has been provided

Comments

@NilsWild
Copy link

spring-data-neo4j:7.21
neo4j 5.16-community
Java 17
Kotlin 1.9.22

Given a Node with two relationships that could point to the same entities and using projections to get the hierachy, there seems to be a non deteministic behavior leading to non initialized fields.

An example: Given a node like this:

@Node
class Person() {

    constructor(name: String) : this() {
        this.name = name
    }

    @Id
    var id = UUID.randomUUID()

    lateinit var name: String
    lateinit var lastname: Strin


    @Relationship(type = "FRIEND_OF")
    var friends: Set<Person> = setOf()

    @Relationship(type = "RELATED_TO")
    var family: Set<Person> = setOf()
}

I can create a Graph with two persons Alice and Bob. Both are friends with each other and Alice is related to Bob.

    val person = Person("Alice", "Smith")
    val friend = Person("Bob", "Kelso")

    person.friends = setOf(friend)
    person.family = setOf(friend)
    friend.friends = setOf(person)

    personRepository.save(person)

This works perfectly.
I also have a Projection to query a person with their friends, friends of friends and their relatives:

interface PersonWithFriendsAndFamily {
    val id: UUID
    val name: String
    val lastname: String
    val family: Set<Family>
    val friends: Set<Friends>

    interface Friends {
        val id: UUID
        val name: String
        val friends: Set<FriendsOfFriends>

        interface FriendsOfFriends {
            val id: UUID
            val name: String
        }
    }

    interface Family{
        val id: UUID
        val lastname: String
    }
}

@Repository
interface PersonRepository: org.springframework.data.repository.CrudRepository<Person, UUID> {
    fun findPersonWithFriendsAndFamilyById(id: UUID): PersonWithFriendsAndFamily?
}

Notice that the "Family" projection is missing the lastname field whereas the "Friends" and "FriendsOfFriends" Projections are missing the lastname field. Furthermore the "FriendsOfFriends" projection is missing the friends field.

When i now try to retrieve alice like this:

 val retrievedPerson = personRepository.findPersonWithFriendsAndFamilyById(person.id)

and retrieve her friends of friends (which would be her) like this:

retrievedPerson!!.friends.first().friends.first().name

There is a chance that the collection is reported as empty:

java.util.NoSuchElementException: Collection is empty.

Same thing if i want to retrieve the name of alice friend like this:

retrievedPerson!!.friends.first().name

There is a chance that the name field of the person entity is reported as not initialized. I suspect that this is due to only one of the projections for the respective node is taken into account and thus the respective fields are not queried by the repository. The behavior is not deterministic.

I provided the example project here: https://github.com/NilsWild/spring-issue-demo/tree/d20ed4eb9469592fe9ed5625f42791d414d6e01f
Just run the provided testcase multiple times and it will fail eventually.

@NilsWild NilsWild changed the title Non deterministic issue when projecting the same Node to two different interfaces in projection hierachy Non deterministic issue with not initialize fields when projecting the same Node to two different interfaces in projection hierachy Jan 24, 2024
@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged label Jan 24, 2024
@NilsWild
Copy link
Author

After some further investigation i can tell that it depends on the order of the queries that resolve the relationships. If the friends for relationships are fetched first everything works finde. If the related to relationship is fetched first the collection of the friends relationship is empty, as the projection for the related to relationship does not contain the friends field and is thus not included in the result.

@meistermeier meistermeier self-assigned this Jan 25, 2024
@meistermeier meistermeier added status: needs-investigation An issue that has been triaged but needs further investigation and removed status: waiting-for-triage An issue we've not yet triaged labels Jan 25, 2024
@NilsWild
Copy link
Author

I tried to investigate this further. I noticed that for certain queries or graphs the query strategy changes. In the example provided above the data is fetched using multiple queries. In another example the data is fetchen using a single query looking similar to this:

MATCH (person:`Person`)-[r_0:`FRIEND_OF`]->(m_0:`Person`) WHERE m_0.name = $friendName and person.name = $personName RETURN person{.id, .name,....,Person_FRIEND_OF_Person: [ ....

executing that single query actually returns all the data. However, the issue that the set is empty occurs as well. So it might not be about the queries but about handling the results returned by the query. Unfortunately I have no Idea where to start debugging in the spring-data-neo4j project. Neither do I know how to trigger that single query strategy so I can not provide a simple example project.

@meistermeier
Copy link
Collaborator

Thanks for your input. I am currently already working on this.
The reason why you observe different queries is based on the detection of (logical) domain cycles.
From what I see in your projections, you are describing a directed model. The query you posted in your last comment should be the one that should always_ be taken.
I am currently unsure why SDN decides to go the route given by your domain model (Person->Person->Person->...). The result of this "cycle detection" are the multiple queries, you are observing sometimes. Basically SDN cannot predict the amount of hops it should iterate over the same nodes and falls back into a data-driven approach.
As I said, this should not happen and I am glad that you have reported this.

@NilsWild
Copy link
Author

Just to be clear: The issue regarding non initialized fields also applies to the query in my last comment. Although the query actually returns the data if executed in the neo4j browser. So there might be two issues:

  1. generating the query
  2. mapping the query results

@meistermeier
Copy link
Collaborator

In another example the data is fetchen using a single query looking similar to this:

This is not part of the tests, right?
I can now at least make sense why the cascading queries get fired (every time with the code linked above): SDN does not know about the projections at a certain point and reflects the field friends: Set<Friends> back to Person, for example. Also it takes into consideration that the initial projection PersonWithFriendsAndFamily is also a Person and falls into its "cycle-mode". But as I said, this is reliable. Not great but works as this point as intended. (I will certainly not dismiss this because it is a bad user-experience)
For the mapping part: Yes I can see the failing test. Will focus on this on at the moment.

@NilsWild
Copy link
Author

Yeah its not part of the Test. I experienced it in my real project but couldn't create a demo as i don't know how to model my domain/projections to have the single query behavior.

@meistermeier
Copy link
Collaborator

You get the single query behaviour if you don't repeat an entity in your chain. As explained earlier this is unrelated to the projections. A PersonProjection->Friend is also a Person->Person from SDN's perspective. Something that we cannot change with a small patch, unfortunately. Also, keep in mind that Person->SomethingElse->Person (or even more complex) is also a logical cycle.

Good news regarding the mapping: I found the problem. It is rooted in the, what I thought is a good idea, concept to not iterate over an already discovered entity again. Like you said, fetching the "family Bob" first with the projection in mind, results into a dead end because there are no further relationships defined in the Family projection.
Going to the next relationship (FRIEND_OF) Alice has and coming back to Bob as "friend Bob", returns fast because SDN already knows Bob.
Let me try to see the impact on performance if we simply track the relationships that led here additionally to the nodes as the filter criteria.

@NilsWild
Copy link
Author

Thank you for the clearification. I didn't get it because in the project that i first discovered the issue in, has this cycle but uses a single query. I guess this is due to the cycle not containing the same entity from spring-data-neo4js POV as it is based on an inheritance hierachy and looks more like this:

Person->SpecialPerson->VerySpecialPerson

Where Person and VerySpecialPerson are effectively the same Node but propably not treated as possibly beeing the same entity.

And great that you were able to find the problem.

@NilsWild
Copy link
Author

I created another example that is closer to my original project. It uses the single query and fails consistently: https://github.com/NilsWild/spring-issue-demo/tree/f1533f8ff6cee6d9daa8fe262276fa906575bfbe

@NilsWild
Copy link
Author

Let me try to see the impact on performance if we simply track the relationships that led here additionally to the nodes as the filter criteria.

I thought about this statement. Wouldn't it make more sense to track the projectionInformation / returnType instead of the relationships that led there? If Bob is already known and should be projected to the same return type it can return fast if not it should be "another" Bob proxy.

Though i am not sure if that information is available at that point.

@meistermeier
Copy link
Collaborator

Though i am not sure if that information is available at that point.

That's a problem to solve this even nicer. The relationships give me the information of connected entities, unrelated to possible projections defined (and later being used).

Will check out your other example tomorrow. I am curious how this relates to the problem with cascading queries because the query bug is down in the subsequent query creation for cycles and not something that would also apply to the directed query. On the other hand I am pretty "confident" that there might be also a similar logic in this case :)

For now tracking the relationships that resolved the ids works. I am not yet sure if this is a performant solution because I keep a copy of the ids in a map around, that gets modified heavily when there are a lot of related nodes.

@NilsWild
Copy link
Author

As far as i can tell the query is fine and returns all the data needed to initialize the entity. However the entity might be cached and not initialized twice if the same entity is returned twice in the query results.

Let me know when you have a solution, so i can test it on my project and provide another example if needed.

meistermeier added a commit that referenced this issue Jan 26, 2024
If an entity has already been loaded by any relationship,
it gets marked as processed.
But this is not a valid state if there are multiple relationships
to this entity and it is loaded via different projections for each
relationship.
In those cases SDN will just stop to find other relationships.
This commit fixes this behaviour by also taking the relationship
the entity got loaded with into account.
@meistermeier
Copy link
Collaborator

I finally managed to fix the initial problematic bit regarding the loading of back-references.
There will be a 7.2.3-GH-2858-SNAPSHOT available in ~30 minutes.
Please also add the snapshot and milestone repositories to your pom:

  <repositories>
    <repository>
      <id>spring-milestones</id>
      <name>Spring Milestones</name>
      <url>https://repo.spring.io/milestone</url>
      <snapshots>
        <enabled>false</enabled>
      </snapshots>
    </repository>
    <repository>
      <id>spring-snapshots</id>
      <name>Spring Snapshots</name>
      <url>https://repo.spring.io/snapshot</url>
      <releases>
        <enabled>false</enabled>
      </releases>
    </repository>
  </repositories>

@NilsWild
Copy link
Author

Looks good. Solves the issue for the multiple queries.

@meistermeier
Copy link
Collaborator

Great, thanks for your feedback.
Will have a look at the other problem with the directed query on Monday. It took me too much time to come up with the proper fix AND keep the already existing tests green :)

@meistermeier
Copy link
Collaborator

I don't get a directed query in the commit, you have posted. The cyclic query makes sense for me because e.g. a EnvironmentReaction has a reactionTo that points to a ComponentReaction which again points back to a ReceivedMessage what an EnvironmentReaction is.

Nevertheless, I get a

org.springframework.beans.InvalidPropertyException: Invalid property 'headers' of bean class [de.example.ComponentReaction]: Getter for property 'headers' threw exception

Is this this problem you were also observing? (Or the commit hash is the wrong one)

@meistermeier meistermeier added status: waiting-for-feedback We need additional information before we can continue and removed status: needs-investigation An issue that has been triaged but needs further investigation labels Jan 29, 2024
@NilsWild
Copy link
Author

NilsWild commented Jan 29, 2024

Just double checked. You're right it's not a single query. The final query just gave me the impression. I get the following final query for "save and retrieve" test:

MATCH (behavior:`Behavior`) WHERE behavior.id = $id RETURN behavior{__nodeLabels__: labels(behavior), __elementId__: elementId(behavior), Behavior_OBSERVED_Message: [(behavior)-[:`OBSERVED`]->(behavior_messages:`Message`) | behavior_messages{.headers, .payload, __nodeLabels__: labels(behavior_messages), __elementId__: elementId(behavior_messages), Message_REACTION_TO_ComponentReaction: [(behavior_messages)-[:`REACTION_TO`]->(behavior_messages_reactionTo:`ComponentReaction`:`SentMessage`:`Message`) | behavior_messages_reactionTo{.payload, __nodeLabels__: labels(behavior_messages_reactionTo), __elementId__: elementId(behavior_messages_reactionTo)}], Message_DEPENDS_ON_ReceivedMessage: [(behavior_messages)-[:`DEPENDS_ON`]->(behavior_messages_dependsOn:`ReceivedMessage`:`Message`) | behavior_messages_dependsOn{.payload, __nodeLabels__: labels(behavior_messages_dependsOn), __elementId__: elementId(behavior_messages_dependsOn)}]}]}

The query includes all relevant attributes and relationships.

And yes you're right thats the problem i was referring to. Notice that besides the assertion made in line 40 the one of line 41 checking the dependsOn relationship fails as well. Don't know if those are separate problems or the same as one is referencing an attribute of the node and the other one a relationship.

@spring-projects-issues spring-projects-issues added status: feedback-provided Feedback has been provided and removed status: waiting-for-feedback We need additional information before we can continue labels Jan 29, 2024
@meistermeier
Copy link
Collaborator

This is a tough one.
Already at this and spotted the problem.

  1. behavior_messages{.headers, .payload,...} all good, we get the headers
  2. behavior_messages_dependsOn{.payload, __nodeLabels__ as intended because the Dependency relationship does not define the headers.

Spring Data Neo4j usually maps an entity once per record. Again, given the randomised order of property traversal, this can either be the first case of ComponentReaction data (test passes) or the second one (test fails). A re-mapping of an entity within one record was never intended and I am currently trying to wrap my head around immutability reliability in such cases. Also this has an impact on the mapping performance because we would need to assume that each already mapped entity will get asked again to map its properties and relationships.

Would take some time to figure out if there is an potential "middle ground" that solves this.

@NilsWild
Copy link
Author

I see. However, i think that this is a very harsh functional limitation. For the headers field a workaround would be to just declare it everywhere. For the dependsOn relationship i can not imagine any workaround as it would lead to infinite recursion to include the relationship in every projection as there's no such mechanism as JsonIdentityInfo. So only custom queries could solve this for now i guess.

@meistermeier
Copy link
Collaborator

I think the solution should to fix this limitation. It's just that nobody before encountered this, and tbh I never thought about this scenario with different projections for an entity within one query.
Current status: I have a solution but there are still some existing tests that show that it is not yet the solution ;)
Bonus points for me: near to zero performance impact for non affected use-cases.

meistermeier added a commit that referenced this issue Jan 30, 2024
…pping.

Prior to this, an incomplete loaded entity, due to projection, was never
touched again to add missing properties loaded via a different relationship
and projection definition.
@meistermeier
Copy link
Collaborator

And we have a few commits more on the branch. As usual the snapshot (same name) should be available in ~30 minutes, if nothing breaks 🤞
Would, again, be great to get your feedback on functionality. Depending what the build time (incl. test) on CI will be, I might need to take another round of performance improvements. Currently it is more a concept written down in code from the top of my head. But, hey, your tests are passing all the time.

@meistermeier meistermeier added status: waiting-for-feedback We need additional information before we can continue and removed status: feedback-provided Feedback has been provided labels Jan 30, 2024
@NilsWild
Copy link
Author

Based on my tests i have for my project. I can confirm that this snapshot fixed the problem.

@spring-projects-issues spring-projects-issues added status: feedback-provided Feedback has been provided and removed status: waiting-for-feedback We need additional information before we can continue labels Jan 30, 2024
@meistermeier
Copy link
Collaborator

Many thanks for the second confirmation. A little bit refactoring tomorrow and both fixes will get merged for the next round of releases.

meistermeier added a commit that referenced this issue Jan 31, 2024
…pping.

Prior to this, an incomplete loaded entity, due to projection, was never
touched again to add missing properties loaded via a different relationship
and projection definition.
@meistermeier meistermeier added this to the 7.2.3 (2023.1.3) milestone Jan 31, 2024
meistermeier added a commit that referenced this issue Jan 31, 2024
If an entity has already been loaded by any relationship,
it gets marked as processed.
But this is not a valid state if there are multiple relationships
to this entity and it is loaded via different projections for each
relationship.
In those cases SDN will just stop to find other relationships.
This commit fixes this behaviour by also taking the relationship
the entity got loaded with into account.
meistermeier added a commit that referenced this issue Jan 31, 2024
…pping.

Prior to this, an incomplete loaded entity, due to projection, was never
touched again to add missing properties loaded via a different relationship
and projection definition.
meistermeier added a commit that referenced this issue Jan 31, 2024
If an entity has already been loaded by any relationship,
it gets marked as processed.
But this is not a valid state if there are multiple relationships
to this entity and it is loaded via different projections for each
relationship.
In those cases SDN will just stop to find other relationships.
This commit fixes this behaviour by also taking the relationship
the entity got loaded with into account.
meistermeier added a commit that referenced this issue Jan 31, 2024
…pping.

Prior to this, an incomplete loaded entity, due to projection, was never
touched again to add missing properties loaded via a different relationship
and projection definition.
@meistermeier
Copy link
Collaborator

Thanks again for all your feedback. The fix(es) will be in the next 7.2. release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: feedback-provided Feedback has been provided
Projects
None yet
Development

No branches or pull requests

3 participants