Skip to content

DOC: add examples to DataFrame.insert() #39313

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
sjvdm opened this issue Jan 21, 2021 · 7 comments · Fixed by #39500
Closed

DOC: add examples to DataFrame.insert() #39313

sjvdm opened this issue Jan 21, 2021 · 7 comments · Fixed by #39500

Comments

@sjvdm
Copy link

sjvdm commented Jan 21, 2021

Code Sample

import pandas as pd
#create a df and insert a series astype String
df = pd.DataFrame({1:[1,2,3]},index=[0,1,3])
df.insert(loc=0,column='test',value=pd.Series(["one","two","three"]).astype("string"))
print(df)
#The values inserted normally evaluates correct though
print('==============')
print(pd.Series(["one","two","three"]).astype("string"))

Output:

   test  1
0   one  1
1   two  2
3  <NA>  3

==============

0      one
1      two
2    three
dtype: string

Problem description

Issue: StringDtype exhibits strange behaviour when inserting as column. Some elements are inserted as , but evaluates correctly outside of the insert function

Expected Output

Values should be inserted as is.

   test  1
0   one  1
1   two  2
3  three  3

==============

0      one
1      two
2    three
dtype: string

Output of pd.show_versions()

INSTALLED VERSIONS

commit : 3e89b4c
python : 3.8.3.final.0
python-bits : 64
OS : Linux
OS-release : 4.18.0-240.1.1.el8_3.x86_64
Version : #1 SMP Thu Nov 19 17:20:08 UTC 2020
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_ZA.UTF-8
LOCALE : en_ZA.UTF-8

pandas : 1.2.0
numpy : 1.19.5
pytz : 2020.5
dateutil : 2.8.1
pip : 20.3.3
setuptools : 49.6.0
Cython : 0.29.21
pytest : 6.2.1
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.6.2
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.19.0
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.3.3
numexpr : 2.7.2
odfpy : None
openpyxl : 3.0.6
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : 1.6.0
sqlalchemy : 1.3.22
tables : 3.6.1
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None

Workaround

Parse to "Values" before inserting. In the example above:

df.insert(loc=0,column='test',value=pd.Series(["one","two","three"]).astype("string").values)

@sjvdm sjvdm added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 21, 2021
@simonjayhawkins
Copy link
Member

Thanks @sjvdm for the report.

I don't think this is a bug since the value argument to df.insert is indexed (i.e. a Series) and hence pandas uses alignment. https://pandas.pydata.org/docs/user_guide/dsintro.html?highlight=alignment#intro-to-data-structures

Parse to "Values" before inserting. In the example above:

indeed passing an numpy array to this function does not result in alignment. alternatively, the Series could be indexed the same.

PR enhancing the docs https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.insert.html (maybe with examples) most welcome.

@simonjayhawkins simonjayhawkins added Docs good first issue and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 21, 2021
@simonjayhawkins simonjayhawkins added this to the Contributions Welcome milestone Jan 21, 2021
@simonjayhawkins simonjayhawkins changed the title BUG: DOC: add examples to DataFrame.insert() Jan 21, 2021
@sjvdm
Copy link
Author

sjvdm commented Jan 21, 2021

Hi @simonjayhawkins ,

You are right! Thanks for the quick response and thanks for the time to evaluate.

I had a typo in my original index passed to df and fixing this alligns the indexes correctly.

import pandas as pd
#create a df and insert a series astype String
df = pd.DataFrame({1:[1,2,3]},index=[0,1,2])
df.insert(loc=0,column='test',value=pd.Series(["one","two","three"]).astype("string").values)
print(df)
#The values inserted normally evaluates correct though
print('==============')
print(pd.Series(["one","two","three"]).astype("string"))

@sjvdm sjvdm closed this as completed Jan 21, 2021
@simonjayhawkins
Copy link
Member

reopening since enhancing the docs is probably worthwhile.

@nofarm3
Copy link
Contributor

nofarm3 commented Jan 21, 2021

take

@nofarm3
Copy link
Contributor

nofarm3 commented Jan 22, 2021

@sjvdm @simonjayhawkins
I see it was already added here:
d3e970b

I guess this issue is not relevant anymore.

@nofarm3 nofarm3 removed their assignment Jan 22, 2021
@simonjayhawkins
Copy link
Member

@nofarm3 the docs for insert have indeed been updated since the released docs.

but, could still be updated further to clarify the alignment when adding a Series.

so maybe could add an example like

>>> df = pd.DataFrame({"col1": [1, 2], "col2": [3, 4]}, index=["cat", "dog"])
>>> df
     col1  col2
cat     1     3
dog     2     4
>>>
>>> df.insert(1, "col1a", pd.Series([5, 6], index=["dog", "elephant"]))
>>> df
     col1  col1a  col2
cat     1    NaN     3
dog     2    5.0     4
>>>

@nofarm3
Copy link
Contributor

nofarm3 commented Jan 22, 2021

take

nofarm3 pushed a commit to nofarm3/pandas that referenced this issue Jan 31, 2021
nofarm3 pushed a commit to nofarm3/pandas that referenced this issue Jan 31, 2021
MarcoGorelli added a commit that referenced this issue Feb 1, 2021
* DOC: add example to insert (#39313)

* DOC: fix style (#39313)

* Update pandas/core/frame.py

Co-authored-by: nofarmishraki <[email protected]>
Co-authored-by: Marco Gorelli <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants