Skip to content

PERF: Memory leaks after migrating from 0.25.3 to 1.1.3 #37031

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
netchose opened this issue Oct 10, 2020 · 1 comment
Closed

PERF: Memory leaks after migrating from 0.25.3 to 1.1.3 #37031

netchose opened this issue Oct 10, 2020 · 1 comment
Labels
Performance Memory or execution speed performance

Comments

@netchose
Copy link

hi,

Code Sample,

        step = 100
        requestlist = []
        headers = {'Content-type': 'application/json', 'Accept': 'text/plain'}

        for i in range(0, len(df), step):

            tmp_df = df.iloc[i:i + step].copy()

            final_json_string = []

            tmp_df.apply(lambda row: row_to_json_str(row), axis=1)

            final_json_string = "\n".join(final_json_string)
            final_json_string += "\n"

            url = f'{ES_URL}/{table_name}/_bulk'
            requestlist.append(
                grequests.post(url, headers=headersdata=final_json_string))
            del tmp_df
            gc.collect()

        r = grequests.imap(requestlist, size=4, exception_handler=exception_handler)

del final_json_string
del requestlist
del df
cnx.close()
gc.collect()

Problem description

os : Ubuntu 16.04.7 LTS
Python 3.8.1 (default, Jan 6 2020, 09:57:21)
[GCC 5.4.0 20160609] on linux

after migrating from 0.25.3 to 1.1.3 the memory usage increase after each method call

back to 0.25.3 the script the memory usage is stable

the source of dataframe is Pandas.read_sql.
the goal of the script is to convert a dataframe to Elasticsearch json request

the loop below isn't exactly the same but the memory profiler give same result : 1669.2 MiB to 6218.0 MiB when for loop comes

memory_usage

108 1669.2 MiB 0.0 MiB list_df = [df[i:i + n] for i in range(0, df.shape[0], n)]
109
110 1669.2 MiB 0.0 MiB requestlist = []
111
112 1669.2 MiB 0.0 MiB headers = {'Content-type': 'application/json', 'Accept': 'text/plain'}
113
116 6218.0 MiB 0.0 MiB for tmp_df in list_df:
117 6214.4 MiB 0.0 MiB final_json_string = []
118
119 6215.6 MiB 1.5 MiB tmp_df.apply(lambda row: row_to_json_str(row), axis=1)
120
121 6216.9 MiB 1.3 MiB final_json_string = "\n".join(final_json_string)
122 6218.0 MiB 1.3 MiB final_json_string += "\n"
123
124 6218.0 MiB 0.2 MiB url = f'{ES_URL}/{table_name}/_bulk'
125
127
128 6218.0 MiB 0.0 MiB requestlist.append(
129 6218.0 MiB 0.2 MiB grequests.post(url, headers=headers, data=final_json_string))
130
132 6218.0 MiB 0.0 MiB grequests.imap(requestlist, size=4, exception_handler=exception_handler)
133 6218.0 MiB 0.0 MiB del tmp_df
134 6218.0 MiB 0.0 MiB del requestlist
135 6218.0 MiB 0.0 MiB del list_df

137 6218.0 MiB 0.0 MiB del df

139 6218.0 MiB 0.0 MiB cnx.close()

thanks

@netchose netchose added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 10, 2020
@dsaxton dsaxton added Performance Memory or execution speed performance and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 11, 2020
@dsaxton dsaxton changed the title BUG:Memory leaks after migrating from 0.25.3 to 1.1.3 PERF: Memory leaks after migrating from 0.25.3 to 1.1.3 Oct 11, 2020
@mroeschke
Copy link
Member

Thanks for the report, it appears these are run with very old versions of pandas so closing for now. Can reopen if these also appear with more recent versions of pandas

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance
Projects
None yet
Development

No branches or pull requests

3 participants