-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Pandas get_dummies validate "columns" input #28383
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Can you post an example that we can copy/paste to replicate the behavior |
I already posted the example in quotes. It is: Import pandas p_csv= pandas.read_csv(("/myDir/myFile.csv"), index_col=0) |
That code sample is not self-contained, so no one can copy / paste to replicate your issue. The below link might help: http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports |
alright. here is a copy-pastable version with random data ###CODE
|
That is liable to produce a DataFrame with shape on the order of (145000, 145000), which I think is on the order of 80GB, which certainly won't fit on my laptop. SIGKILL is likely coming from the OOM killer. |
that is what I guessed what probably happened. But why would the api function do that when there are only 6 different values for separator? |
(edited your comment to make it copy/paste more neatly, LMK if you object and I'll revert) |
do you get the expected result if you pass |
Yes! That worked, very fast too. Now that you mention it, I do see that the expected parameter is "list-like". Didn't expect that to be a problem, since I didn't get any error and also do get an error if I pass 'columns="fake_column"' |
I think this should raise if |
I will take a shot at this. |
@WillAyd I created a pull request for this validation, but I wanted to confirm one thing. If |
Yea I think that would be good to emulate - nice find! |
Code Sample, a copy-pastable example if possible
import string, pandas
p_csv = pandas.read_csv(my_dir/myFile), index_col=0)
sepanames = sorted(p_csv["SEPARATOR"].unique())
for i in range(0, 14):
print(i)
col = p_csv.columns.get_loc("SEPARATOR") + 1 + i
p_csv.insert(col, "SEPARATOR_" + sepanames[i].upper(), p_csv["SEPARATOR"].apply(lambda x: int(x == sepanames[i])))
p_csv.to_csv("my_dir/new_file.csv")
''' p_csv= pandas.read_csv(("/myDir/myFile.csv"), index_col=0)
pandas.get_dummies(p_csv, prefix="SEPARATOR_", columns="SEPARATOR")
p_csv.to_csv("/myDir/myNew.csv")'''
FILES:
https://www.amazon.de/clouddrive/share/h37d1hqtrj5SrZTvKdrs9gXltVKUgo8Is9BxL8WH7Sf
Problem description
This is all of my code. The quoted part is what I first tried, but after 20 minutes it ended through a sigkill without any result. The required files are available for download. I would think, that my code does the equivalent in this very case and it works just fine in under about a minute.
Pandas Version: 25.1.0
Pandas git version: '171c71611886aab8549a8620c5b0071a129ad685'
Expected Output
No error and changed csv file
The text was updated successfully, but these errors were encountered: