Skip to content

Relationships save takes long time (Neo4jTemplate 'saveAs' method) #2235

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
aaramg opened this issue Apr 17, 2021 · 6 comments
Closed

Relationships save takes long time (Neo4jTemplate 'saveAs' method) #2235

aaramg opened this issue Apr 17, 2021 · 6 comments
Assignees

Comments

@aaramg
Copy link

aaramg commented Apr 17, 2021

Hi.
Have models similar to these:

@Data
@Node("User")
public class User {

    @Id
    private String id;
-----------------
    @Relationship("LIKES")
    private List<Like> likes = new ArrayList<>();
----------------
}
@Data
@RelationshipProperties
public class Like {

    @Id
    @GeneratedValue
    private Long id;

    private Integer score;

    @TargetNode
    private Movie movie;
}
@Data
@Node("Movie")
public class Movie {

    @Id
    private String id;

    @Property("name")
    private String name;
}

trying to save user object, but only want to save likes:

 @Relationship("LIKES")
 private List<Like> likes = new ArrayList<>();

Have created an interface:

public interface UserLikesMask {

    List<Like> getLikes();
}

neo4jTemplate.saveAs(user, UserLikesMask.class);

What happens:

  1. First of all: when List<Like> is Set<Like> (both in User domain class and mask interface), an exception happens underneath the neo4jTemplate (it says: 'Cannot cast java.util.ArrayList to java.util.Set',, but I am not using ArrayList anywhere. So I guess maybe this is a bug)
  2. Have changed everywhere to List and it works. It updates the relationships(List size is 40),, but this takes 12seconds!!!
    I have approximately 9000 users and 9000*40=360,000 LIKES relationships already (both users and likes have been created randomly just for testing).
  3. After that, if I run the method "saveAs" again it adds another 40 relationships instead of erasing and writing them. So I guess that means I should execute one more IO operation -> "get" to get the relationships with ids to be able to update it (haven't tried yet).
    Would be nice to be able to call "saveAs" and the system understands that if I provide only 40 relationships that means I want the user to have 40 relationships, not to have 80 or 120, etc. (that can happen if the system does not save with "append" mode and adding another new 40 relationships, but with "write" mode: erasing and saving. In that case I will always have 40 relationships) Maybe you can have some flag in saveAs method,, don't know )))

Also one mention. I am calling saveAs in the situation that both the user nodes and the movie nodes are already in the database.

Do you see issues?? Or am I doing smth wrong?? :)
Thank you in advance. :)

P.S. >> You are the best :) You are always ready to help with any topic :)

@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged label Apr 17, 2021
@meistermeier meistermeier self-assigned this Apr 22, 2021
@meistermeier meistermeier added status: needs-investigation An issue that has been triaged but needs further investigation and removed status: waiting-for-triage An issue we've not yet triaged labels Apr 22, 2021
@meistermeier
Copy link
Collaborator

Thanks for reporting your problem.

  1. This is already solved in the development branches and will get patched with the next release iteration.
  2. Does it take 12 seconds for 369k saves or just for one user?
  3. I am currently investigating this problem.

@aaramg
Copy link
Author

aaramg commented Apr 22, 2021

Thanks for the response.
2. Save happens on 40 relationships. (I mean user and movie nodes already exist). 360K relationships are already existing relationships in the database. I gave you that insight to be more descriptive. Because when the database is empty 40 relationships save happen very quickly. But when there are a lot of likes 360K on the same 40 movies. The speed of saving goes down and down.

P.S. >> Just a quick question. I guess it will be useful for everyone. Is it recommended to use Neo4j as OLAP db??

@meistermeier
Copy link
Collaborator

You should see something like

Instances of class <FQN of your domain class> with an assigned id will always be treated as new without version property!

That basically boils it down to the duplication of the relationships. SDN assumes every time on save that the entity is new, because it does not have a version property and thus cannot determine if it is already present in the database or not.
As a consequence, it will not execute the delete before the relationship creation.
So adding the version attribute will solve this problem.
There is a lack of information about this in the documentation, we should add something there.
Using either generated ids or assigned ids (manually set) and optimistic locking via the @Version attribute makes SDN save each node individually. This besides the amount of data takes time because SDN needs to extract every information from the object, create (or load) the related object and create the relationship.
Using a so called externally generated (https://docs.spring.io/spring-data/neo4j/docs/current/reference/html/#mapping.id-handling.external-id) will remove the need for the optimistic locking and do a batch save. But I cannot see any significant improvement when saving all users with all movies linked in my local example.

@meistermeier
Copy link
Collaborator

Update: The idea with externally generated ids is much worse regarding performance than the single save. Cannot put the finger on it right now but it is due to the amount of relationships. On the other hand saves of the individual entities (without relationships first) are much faster.

@meistermeier
Copy link
Collaborator

We introduced some performance tweaks in the latest snapshot version (6.0.9-SNAPSHOT). You could try it, if you want. But as I said handling 360+k objects is not ideal for any object <graph/relational/document...> mapper.

@meistermeier meistermeier added blocked: awaiting feedback and removed status: needs-investigation An issue that has been triaged but needs further investigation labels May 3, 2021
@meistermeier
Copy link
Collaborator

Closing this because there were no response in the last three weeks.
Feel free to open it again if there is any addition.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants