Skip to content

Memory usage for pandas.read_xml #362

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
tcompa opened this issue Apr 11, 2023 · 2 comments · Fixed by #363
Closed

Memory usage for pandas.read_xml #362

tcompa opened this issue Apr 11, 2023 · 2 comments · Fixed by #363

Comments

@tcompa
Copy link
Collaborator

tcompa commented Apr 11, 2023

We have examples where create-ome-zarr goes out of memory when the limit is set to 1G or 2G, e.g. for a XML file of 160k lines (see fractal-analytics-platform/fractal-server#599 (comment)).

Maybe it's worth checking that we are using pandas.read_xml correctly. We can quickly debug the memory usage of this function, and possibly look around for known issues (pandas-dev/pandas#45442 - possibly related?).

If all looks reasonable on the XML-parsing side, should we set some more generous default memory in the manifest? It's a non-parallel task, and it should be simple for SLURM to schedule it even if it requires 4G.

@jluethi
Copy link
Collaborator

jluethi commented Apr 11, 2023

I wouldn't invest too much time into this. 4G for parsing a ~1 Mio file microscope acquisition is not unreasonable. See:

The XML file for the full 23 well example is waaaay bigger than the tiny examples. So it's not unreasonable that they could be a bit more memory hungry. And if that's the case, the 23 well example is probably close to an upper bound of xml sizes we'd normally hit. It's not many wells, but imaging for ~14h, something on the order of a million images (=> a million lines in the xml file). Thus, we may want to adjust the default memory to be something like 4G for the Create OME Zarr task as well, after all.

=> let's increase this default to 4G

It likely could be optimized further, but the potential gain is not really worth the time investment at the time being.

tcompa added a commit that referenced this issue Apr 11, 2023
…age-for-pandasread_xml

Increase memory requirements for create-ome-zarr tasks (close #362)
@tcompa
Copy link
Collaborator Author

tcompa commented Apr 11, 2023

Now updated in fractal-tasks-core 0.9.2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants