-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
pd.concat loses frequency attribute for 'continuous' DataFrame appends #3232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@nehalecky as an aside, this is a good case for appending your data with an |
Thanks @jreback for the note. I do use HDFStore when I persist data to my local machine, and it works great for that, however, the application I am referring to above is persisting data to a remote store, which will eventually have to scale horizontally. For both those reasons, the use of hdf5 isn't an option, unfortunately. :( Actually, for preprocessing analysis, I am storing the raw record data as heavily compressed hdf5 binary in the db now (via pandas HDFStore). This allows me to retrieve individual records and load them directly to DataFrame, tying directly into my analysis stack, which is nice. I am really looking forward to whatever solutions are implemented for binary storage of data frame (#686 and all), but this is how I'm rolling for now. ;) BTW, this writeup (http://pandas.pydata.org/pandas-docs/dev/io.html#hdf5-pytables) is awesomeness, and your SO answer for pandas workflow is off the charts (http://stackoverflow.com/questions/14262433/large-data-work-flows-using-pandas), thank you! |
glad to here the docs are useful! Heres another resource: http://pandas.pydata.org/pandas-docs/dev/cookbook.html#hdfstore Contributions needed! If you have the time, http://msgpack.org is probably a reasonable format (though doesn't support compression directly), is I think a good choice, and probably simple too implement (and db storagable). |
In the general case concat can join any two indices types, or Another angle would rely on the fact that although |
Hey @y-p, thanks for the tips. Agreed it's a very special case, and I right now it isn't a major performance issue, however, when we begin to scale, it could be. In the meantime, I'll try and implement your suggestions and I'll keep you posted as to how this performs when we things begin to get bigger. :) Thanks again. |
Still an issue on 23.2 |
Still an issue in 0.23.4.
So setting the frequency is about 13k times faster than resample and about 1.6k times faster than reindexing. If it is not known if the indices are contiguous, I'd thus go with reindex. Any opinions/advice on this? |
@TomAugspurger did the concat_same_type implementation end up checking if freq could be preserved? |
Nope, but it would certainly make a good standalone PR. |
Wow! Thanks @jreback and @mroeschke and everyone here! |
Hey all,
I have a
DataFrame
(df) that stores live sensor data that is captured at a specific frequency. New raw data from sensor is updated at a set interval (an attempt at bandwidth conservation), which is parsed into a new df object.These new update dataframes are of the same frequency, and contain data that is 'continuous' in time (i.e., they pick up right where the last timestamp left off), and ultimately I would like to append this new data to the existing dataframe while preserving the main dataframe frequency attribute. I tried by using a
concat
of old and new dataframes, however, it seems thatconcat
doesn't check this case for continuous time series, and loses its frequency attribute. This can be reproduced in code below:These guys look good:
However, these guys, together, forget where they came from:
I currently get around this with a
resample
of the resulting df to set frequency, which isn't that big of a deal, however, thought I'd mention it so that a more elegant behavior could be implemented. I'll try and take a look when I have time, but I know that all you here are so much more familiar with pandas internals. Any pointers?And, as always, thank you! :)
The text was updated successfully, but these errors were encountered: