-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
wide_to_long should verify uniqueness #16382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@ohiobicyclist sorry, but I do not understand what you intend to do (with either the 0.19 or 0.2 version). From the wide_to_long docstring in
This is not directly checked, which is why you get at After you fix this and supply a unique row identifier, you forget to specify the correct stubnames (as you did in the first example), which is why an empty data frame is returned (empty because no columns matching those stubs where found): pd.wide_to_long(test_df, ["A_", "B_"], i="x", j="colname")
Out[20]:
B_B3 A_A4 A_A1 A_A6 A_A7 B_B7 B_B1 B_B5 B_B6 B_B4 A_A2 B_B2 A_A3 A_A5 A_ B_
x colname pd.wide_to_long(test_df, ["A_A", "B_B"], i="x", j="colname")
A_A B_B
x colname
1 1 1 1
1 1 2
1 1 3
1 1 4
1 1 5
1 2 1
1 2 2
1 2 3
1 2 4
1 2 5
1 3 1
... which returns a new df with 5*7 = 35 rows because the df had 7 columns and 5 id rows. |
nuffe -- I'm a little less worried about the first example as yes, worst comes to worst, I can just reset_index (twice if the index isn't already unique) and use that as a unique column (it's annoying that code that works in 0.19.2 breaks in 0.20.1 because of this, but there is a work-around at least); however, I contend the following -- changing "A_Ax" "B_Bx" to "A_Ax" "B_Ax" then "A_" and "B_" ARE the correct sub-strings, with the stack strings being "A1" "A2" "A3" etc Yet even that doesn't produce a table with data under 0.20.1. |
@ohiobicyclist can you post the code for the example that should work here in the issue? |
Sure, thanks.
|
That example has the same issue with the In [13]: pd.wide_to_long(test_df, ['A_A', 'B_A'], i='x', j='y').head()
Out[13]:
A_A B_A
x y
1 1 1 1
2 1 2 2
3 1 3 3
4 1 4 4
5 1 5 5 |
@ohiobicyclist when you specify In [7]: pd.wide_to_long(test_df, ["A_", "B_"], i="x", j="colname", suffix='').head()
Out[7]:
A_ B_
x colname
1 A1 1.0 NaN
A2 1.0 NaN
A3 1.0 NaN
A4 1.0 NaN
A5 1.0 NaN |
Was this helpful @ohiobicyclist ? (I am not sure if your exact use case is better handled by another function - because I am not sure I understand, judging from your examples, what exactly you need to do). But I agree that checking for a unique |
@Nuffe yes pls. |
This (suffix="") is helpful. The thing that makes wide_to_long better than melt for my application is the dual-column (or multiple column) stacked output. A somewhat clearer example would be column names |
…andas-dev#16403) * BUG: wide_to_long should check for unique id vars (pandas-dev#16382) * Fix uncaught lint error * Add whatsnew note (bug fix) (cherry picked from commit 04356a8)
…andas-dev#16403) * BUG: wide_to_long should check for unique id vars (pandas-dev#16382) * Fix uncaught lint error * Add whatsnew note (bug fix)
Code Sample, a copy-pastable example if possible
Problem description
Changelog lists "performance improvements" for pd.wide_to_long but this is not an improvement for me; for these corner cases I would rather have the old behavior. Are these not mainstream enough to support?
Expected Output
pd_wide_to_long_changes.zip
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.0.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 42 Stepping 7, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.20.1
pytest: 3.0.5
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.11.3
scipy: 0.18.1
xarray: None
IPython: 5.1.0
sphinx: 1.5.1
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: 1.2.0
tables: 3.2.2
numexpr: 2.6.1
feather: None
matplotlib: 2.0.0
openpyxl: 2.4.1
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.2
bs4: 4.5.3
html5lib: None
sqlalchemy: 1.1.5
pymysql: None
psycopg2: None
jinja2: 2.9.4
s3fs: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: