Skip to content

BUG: don't allow an empty dataframe to have scalar assignment succeed (GH5744) #5745

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 19, 2013

Conversation

jreback
Copy link
Contributor

@jreback jreback commented Dec 19, 2013

closes #5744
related #5720

I think this is the correct behavior

In [1]: df = pd.DataFrame(columns=['a', 'b', 'c c'])

In [2]: df['d'] = 3
ValueError: Cannot set a frame with no defined index and a value that cannot be converted to a Series

In [3]: df['c c']
Out[3]: Series([], name: c c, dtype: object)

jreback added a commit that referenced this pull request Dec 19, 2013
BUG: don't allow an empty dataframe to have scalar assignment succeed (GH5744)
@jreback jreback merged commit f8b6208 into pandas-dev:master Dec 19, 2013
@socheon
Copy link

socheon commented Dec 19, 2013

Can we just allow the scalar assignment for an empty dataframe?
It can be useful to add new columns dynamically.
A common use case for me is to read a data file using read_csv and then add some derived columns.
Checking that the read dataframe is not empty everytime before adding a new column could be tedious.

@jreback
Copy link
Contributor Author

jreback commented Dec 19, 2013

@socheon

in general working with a dataframe that does not have an index is very odd to begin with
if you have an index, then their is no problem

you can read_csv and then add a column with a scalar no problem

if you have an example of something you think ought to work, pls put it up

@socheon
Copy link

socheon commented Dec 19, 2013

@jreback

At work, I receive daily many third party csv files (e.g. new orders etc) which I need to parse, add some derived columns, and then load into database. Occasionally, some of the files may be empty and just having the column names. For example

Date,ProductID,Quantity,UnitCost
01282013,1,10,3
01282013,2,5,6

To load into database, I simply use

from datetime import datetime
df = pd.read_csv('data.csv', dtype={'Date': 'object'})
df.Date = df.Date.map(lambda s: datetime.strptime(s, '%m%d%Y'))
df['TotalCost'] = df.Quantity * df.UnitCost
load_into_database(df)

But on some days when there are no orders i.e. the input file is just

Date,ProductID,Quantity,UnitCost

The program would break when adding new columns to empty dataframe. Of course, I could always
check if len(df) > 0 before adding new columns. But somehow, I would still need to add the TotalCost column to the empty dataframe before passing it to the function load_into_database.

Anyway, the intended change will not a problem for me. Probably this example is too specific and has easy workaround.

@jreback
Copy link
Contributor Author

jreback commented Dec 19, 2013

@socheon

I hear ya!

just trying to guard against have an operation which looks like it succeeds, but actually doesn't do anything.

but I see that in 0.12 that was ok....

let me see...maybe I will revert back..

@jreback
Copy link
Contributor Author

jreback commented Dec 19, 2013

@socheon alright...give a try with master again...and let me know (original behavior is restored)

@socheon
Copy link

socheon commented Dec 20, 2013

Thanks. I tried again and the master is good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Column indexing does not work in this case
2 participants