Skip to content

BUG: Shift on DataFrame which has more than 1 block creates wrong result #35488

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 of 3 tasks
qinxuye opened this issue Jul 31, 2020 · 1 comment · Fixed by #35578
Closed
2 of 3 tasks

BUG: Shift on DataFrame which has more than 1 block creates wrong result #35488

qinxuye opened this issue Jul 31, 2020 · 1 comment · Fixed by #35578
Labels
Bug Internals Related to non-user accessible pandas implementation Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@qinxuye
Copy link

qinxuye commented Jul 31, 2020

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

# Your code here
In [1]: import pandas as pd                                                     

In [2]: import numpy as np                                                      

In [3]: df1 = pd.DataFrame(np.random.randint(1000, size=(5, 3)))                

In [4]: df2 = pd.DataFrame(np.random.randint(1000, size=(5, 2)))                

In [5]: df3 = pd.concat([df1, df2], axis=1)                                     

In [6]: df3                                                                     
Out[6]: 
     0    1    2    0    1
0   61  536  154  766  179
1  484   18   15  787  766
2  391  171  715  836  654
3  914  969  765  824  950
4  169  414  759   16  666

In [7]: len(df3._data.blocks)                                                   
Out[7]: 2

In [9]: df3.shift(2, axis=1)                                                    
Out[9]: 
    0   1      2   0   1
0 NaN NaN   61.0 NaN NaN
1 NaN NaN  484.0 NaN NaN
2 NaN NaN  391.0 NaN NaN
3 NaN NaN  914.0 NaN NaN
4 NaN NaN  169.0 NaN NaN

Problem description

[this should explain why the current behaviour is a problem and why the expected output is a better solution]

I guess shift is applied to both of the internal blocks.

Expected Output

I forced consolidate, the result is right.

In [12]: df3._data._consolidate_inplace()                                       

In [13]: df3.shift(2, axis=1)                                                   
Out[13]: 
    0   1      2      0      1
0 NaN NaN   61.0  536.0  154.0
1 NaN NaN  484.0   18.0   15.0
2 NaN NaN  391.0  171.0  715.0
3 NaN NaN  914.0  969.0  765.0
4 NaN NaN  169.0  414.0  759.0

Output of pd.show_versions()

In [14]: pd.show_versions()
/Users/qinxuye/miniconda3/envs/test_pandas_1.1/lib/python3.7/site-packages/setuptools/distutils_patch.py:26: UserWarning: Distutils was imported before Setuptools. This usage is discouraged and may exhibit undesirable behaviors or errors. Please use Setuptools' objects directly or at least import Setuptools first.
"Distutils was imported before Setuptools. This usage is discouraged "

ImportError Traceback (most recent call last)
in
----> 1 pd.show_versions()

~/miniconda3/envs/test_pandas_1.1/lib/python3.7/site-packages/pandas/util/_print_versions.py in show_versions(as_json)
104 """
105 sys_info = _get_sys_info()
--> 106 deps = _get_dependency_info()
107
108 if as_json:

~/miniconda3/envs/test_pandas_1.1/lib/python3.7/site-packages/pandas/util/_print_versions.py in _get_dependency_info()
82 for modname in deps:
83 mod = import_optional_dependency(
---> 84 modname, raise_on_missing=False, on_version="ignore"
85 )
86 result[modname] = _get_version(mod) if mod else None

~/miniconda3/envs/test_pandas_1.1/lib/python3.7/site-packages/pandas/compat/_optional.py in import_optional_dependency(name, extra, raise_on_missing, on_version)
97 minimum_version = VERSIONS.get(name)
98 if minimum_version:
---> 99 version = _get_version(module)
100 if distutils.version.LooseVersion(version) < minimum_version:
101 assert on_version in {"warn", "raise", "ignore"}

~/miniconda3/envs/test_pandas_1.1/lib/python3.7/site-packages/pandas/compat/_optional.py in _get_version(module)
42
43 if version is None:
---> 44 raise ImportError(f"Can't determine version for {module.name}")
45 return version
46

ImportError: Can't determine version for numba

@qinxuye qinxuye added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 31, 2020
@qinxuye qinxuye changed the title BUG: shift on DataFrame which has more than 1 blocks creates wrong result BUG: Shift on DataFrame which has more than 1 blocks creates wrong result Jul 31, 2020
@qinxuye qinxuye changed the title BUG: Shift on DataFrame which has more than 1 blocks creates wrong result BUG: Shift on DataFrame which has more than 1 block creates wrong result Jul 31, 2020
@simonjayhawkins
Copy link
Member

@qinxuye Thanks for the report.

#34389 resulted in this change in behaviour. reverting that PR is under consideration #34407 cc @jbrockmendel @jorisvandenbossche

760ba37 is the first bad commit
commit 760ba37
Author: jbrockmendel [email protected]
Date: Tue May 26 15:47:18 2020 -0700

CLN: _consolidate_inplace less (#34389)

@simonjayhawkins simonjayhawkins added Internals Related to non-user accessible pandas implementation and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 31, 2020
@simonjayhawkins simonjayhawkins added this to the 1.1.1 milestone Jul 31, 2020
@simonjayhawkins simonjayhawkins added the Regression Functionality that used to work in a prior pandas version label Jul 31, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Internals Related to non-user accessible pandas implementation Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants