-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
pd.stats.api.ols inconsistent estimates #6874
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
seems ok to me
|
I see the same thing using your example. However it seems the issue occurs when there are row labels. Is this the expected behavior?
|
have duplicate labels rarely makes sense should prob raise an error with a duplicate index I don't know what statsmodels does in this case |
@jseabold does patsy/sm align on the index? |
I noticed this issue when using groupby and ols with an indexed DataFrame. GroupBy splits have "duplicate" row labels. I noticed this issue when It seems that sm correctly ignores the duplicate row labels. |
would be helpful to show some code |
We check for alignment. https://github.com/statsmodels/statsmodels/blob/master/statsmodels/base/data.py#L308 |
Sure. My data is organized by id and date. I have the dataframe indexed by id. It looks something like this (without the date column):
|
refering to statsmodels as this functionaility is not supported (not deprecated either as of yet). |
What is the status of this issue? |
@rsdenijs As @jreback pointed out in his last comment, this is not supported anymore in pandas (they will also be effectively deprecated in the coming release, see #11898). So the status of this issue is that we do not plan to take any action on this. Can you use statsmodels for your use case? (for OLS everything should be in statsmodels, for the other functions in pandas there are still some things missing in statsmodels: statsmodels/statsmodels#2745) |
I am running into an issue trying to run OLS using pandas 0.13.1.
Here is a simple example: I want to regress a variable on itself, in this case excess returns. The intercept should be 0, and the coefficient should be 1. pandas provides the wrong estimates, while statsmodels gives the correct estimates.
This is not due to the silly regression specification, as I have noticed the pandas.ols estimates are inconsistent for other specifications as well.
Has anyone else encountered this problem?
The text was updated successfully, but these errors were encountered: