bug in dataframe.join()

Hi all,

I have a strange issue with pandas 0.9, I think it's a bug.  I'm trying to use dataframe.join() and it works well on a random dataframe, but not on a dataframe created from my simulation result.  The code below shows that join() on the second dataframe blows up the index and the result is completely wrong.  

To run this code you need this file: https://dl.dropbox.com/u/6200325/mydf.dataframe in your work folder.  The script below can also be downloaded here: https://dl.dropbox.com/u/6200325/BugJoin.py

This is the result I get:

In [17]: run -i 'C:\Workspace\Python\Tests\BugJoin.py'

Before:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 100000 entries, 2012-01-01 00:00:00 to 2023-05-29 15:00:00
Freq: H
Empty DataFrame 

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 100000 entries, 2012-01-01 00:00:00 to 2023-05-29 15:00:00
Freq: H
Data columns:
0    100000  non-null values
dtypes: float64(1) 

After:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 100000 entries, 2012-01-01 00:00:00 to 2023-05-29 15:00:00
Freq: H
Data columns:
0    100000  non-null values
dtypes: float64(1)

Before:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 108355 entries, 2010-01-01 00:00:00 to 2011-01-01 00:00:00
Empty DataFrame 

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 108355 entries, 2010-01-01 00:00:00 to 2011-01-01 00:00:00
Data columns:
SID0000    108355  non-null values
dtypes: float64(1) 

After:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 4054807 entries, 2010-01-09 16:00:00 to 2010-05-17 15:55:42
Data columns:
SID0000    4054807  non-null values
dtypes: float64(1)

This is the code from the script:

```
import numpy as np
import pandas as pd
from scipy.integrate import cumtrapz

df1=pd.DataFrame(np.random.rand(1e5), 
     index=pd.date_range('2012-01-01', freq='H', periods=1e5))

df2=pd.load('mydf.dataframe')

for dataframe in [df1, df2]:

    cum = pd.DataFrame(index=dataframe.index)
    for c in dataframe.columns:
        # we need to remove the empty values for the cumtrapz function to work
        ts = dataframe[c].dropna()

        tscum = pd.DataFrame(data=cumtrapz(ts.values, ts.index.asi8/1e9, initial=0),
                         index=ts.index, 
                         columns=[c])
        print '\nBefore:'
        print cum, '\n'
        print tscum, '\n'

        cum=cum.join(tscum, how='left')

        print 'After:'
        print cum
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

bug in dataframe.join() #2189

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

bug in dataframe.join() #2189

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions