Memory usage for pandas.read_xml #362

tcompa · 2023-04-11T12:53:32Z

We have examples where create-ome-zarr goes out of memory when the limit is set to 1G or 2G, e.g. for a XML file of 160k lines (see fractal-analytics-platform/fractal-server#599 (comment)).

Maybe it's worth checking that we are using pandas.read_xml correctly. We can quickly debug the memory usage of this function, and possibly look around for known issues (pandas-dev/pandas#45442 - possibly related?).

If all looks reasonable on the XML-parsing side, should we set some more generous default memory in the manifest? It's a non-parallel task, and it should be simple for SLURM to schedule it even if it requires 4G.

The text was updated successfully, but these errors were encountered:

jluethi · 2023-04-11T13:16:54Z

I wouldn't invest too much time into this. 4G for parsing a ~1 Mio file microscope acquisition is not unreasonable. See:

The XML file for the full 23 well example is waaaay bigger than the tiny examples. So it's not unreasonable that they could be a bit more memory hungry. And if that's the case, the 23 well example is probably close to an upper bound of xml sizes we'd normally hit. It's not many wells, but imaging for ~14h, something on the order of a million images (=> a million lines in the xml file). Thus, we may want to adjust the default memory to be something like 4G for the Create OME Zarr task as well, after all.

=> let's increase this default to 4G

It likely could be optimized further, but the potential gain is not really worth the time investment at the time being.

…age-for-pandasread_xml Increase memory requirements for create-ome-zarr tasks (close #362)

tcompa · 2023-04-11T14:38:07Z

Now updated in fractal-tasks-core 0.9.2.

tcompa added maintenance and removed maintenance labels Apr 11, 2023

jluethi added this to Fractal Project Management Apr 11, 2023

github-project-automation bot moved this to TODO in Fractal Project Management Apr 11, 2023

tcompa added a commit that referenced this issue Apr 11, 2023

Increase memory requirements for create-ome-zarr tasks (close #362)

8ac78d6

tcompa linked a pull request Apr 11, 2023 that will close this issue

Increase memory requirements for create-ome-zarr tasks (close #362) #363

Merged

tcompa closed this as completed in #363 Apr 11, 2023

tcompa added a commit that referenced this issue Apr 11, 2023

Merge pull request #363 from fractal-analytics-platform/362-memory-us…

c3c0dbf

…age-for-pandasread_xml Increase memory requirements for create-ome-zarr tasks (close #362)

github-project-automation bot moved this from TODO to Done in Fractal Project Management Apr 11, 2023

jluethi removed this from Fractal Project Management Apr 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Memory usage for pandas.read_xml #362

Memory usage for pandas.read_xml #362

tcompa commented Apr 11, 2023

jluethi commented Apr 11, 2023 •

edited

Loading

Uh oh!

tcompa commented Apr 11, 2023 •

edited

Loading

Uh oh!

Memory usage for pandas.read_xml #362

Memory usage for pandas.read_xml #362

Comments

tcompa commented Apr 11, 2023

jluethi commented Apr 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tcompa commented Apr 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jluethi commented Apr 11, 2023 •

edited

Loading

tcompa commented Apr 11, 2023 •

edited

Loading