-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
variable dtype does not update when populating a dataframe #25294
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hmm well this way of populating a DataFrame is not idiomatic and inferring the intended result is nearly impossible. Understood this is a toy example, but the construction here should be done in one expression if you want to be explicit about dtypes. While ambiguous the previous behavior is in any case not correct; there isn't any indication that you want floats with what you are doing, especially since you are inserting int values. |
What do you mean by "inferring the intended result is nearly impossible"? From the names of the columns when creating the DataFrame, of course, but from what values are added to them, something like that was done with previous versions apparently. The dtype could be updated as it gets populated (although it would probably be deemed inefficient): add an What would then be the recommended way to populate an empty DataFrame in Pandas, for example in a loop? Populating series and constructing the DataFrame at the end, or creating an empty dataframe with
Thanks for the quick reply! And very sorry for my limited experience with Pandas – and Python in general. |
Well your ideal state is probably what is manifested in #4464 so I'm going to close this as a duplicated. You have a few other things in there but I'll say generally appending to a DataFrame is very expensive. Your best approach is usually to construct the entire DataFrame from a sequence of values rather than creating and empty DataFrame and continually appending. In lieu of For further and future usage questions we ask that you turn to StackOverflow as this tracker is for enhancement requests and bugs. SO will be a much better forum for Q&A on usage and will help other users out with the same question more than this issue tracker could |
No problem, thank you @WillAyd , I appreciate it. |
I posted a question about this on StackOverflow, but though it might be something worth reporting here.
Code Sample, a copy-pastable example if possible
Problem description
In my test:
Why is that? I couldn't find an explanation in the documentation.
Expected Output
The dtype of an ampty variable gets updated when populating it for the first time, similarly to what
infer_objects()
does.Output of
pd.show_versions()
pandas: 0.23.4
pytest: 4.0.1
pip: 18.1
setuptools: 40.6.2
Cython: 0.29
numpy: 1.15.4
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 7.2.0
sphinx: 1.8.2
patsy: 0.5.1
dateutil: 2.7.5
pytz: 2018.7
blosc: None
bottleneck: 1.2.1
tables: 3.4.4
numexpr: 2.6.8
feather: None
matplotlib: 3.0.1
openpyxl: 2.5.11
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.1.2
lxml: 4.2.5
bs4: 4.6.3
html5lib: 1.0.1
sqlalchemy: 1.2.14
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: