-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: Add option to resample data by a non-timeseries column (e.g. Price) #46794
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
this can be done use groupby with a binned grouper no need to add any api pls ask on stack overflow usage questions |
I did spend many hours in doing research before opening this feature request. If you provide a simple one liner example or a documentation page where I can get more details, that would be helpful for me and other people who stumble upon to this page via google. Thanks |
@dsstex you would need to provide a working example here to illustrate - but in any event this forum is for bug reports or enhancements that fit the api without an example it's impossible but i believe what you are doing is possible his groupby |
Let me explain my case. This is how my original data looks like.
After resampling, the result should look like, (Note: I'm using 10 as brick_size here)
I believe you are talking about pd.cut option. And then do groupby.
That outputs something like this.
Please note: As you may well know, trade data is not unique. The price of bitcoin one year back would have looked like 45555 USD. But it's again at the same price this year. If I use a 100 bin size, it would be in (45500, 45600). A groupby would put both 1 year ago data and current data at the same bin. So I have no idea what is the point of doing groupby here. I'm looking for a linear implementation. Did I misunderstood your answer or you misunderstood my question? Thanks |
@dsstex pls ask a fully formed question on stack overflow |
I have already tried 2 times in stackoverflow and 1 times in data-science stackexchange. Haven't received any answer. Only comments. I had to delete the questions after waiting for multiple days to ask again. I'll give it a try one more time. Note: It would be very unprofessional if you force me to move to another forum without understanding my feature request. So I assume the following:
Thanks |
@dsstex you are missing the point this is NOT a forum for helping you rather it is a forum for bug reports and feature requests pandas is all volunteer - please respect everyone's time |
I don't want to argue much here. It is my mistake if you are 100% sure that my feature request can be accomplished using pandas and stackoverflow is the right forum. Because in that case, I'm merely seeking support. But if what I'm seeking can't be achieved using current version of pandas, then it is a valid feature request. |
I have created a question here. https://stackoverflow.com/questions/71909107/pandas-resample-data-by-a-non-timeseries-column-e-g-price Thanks |
I haven't received any response so far to my SO question even after 24 hours. With 25 views and no response, I would assume this is not simple in pandas. https://stackoverflow.com/questions/71909107/pandas-resample-data-by-a-non-timeseries-column-e-g-price I'm really struggling here. Could you assist me in any way? Even pointing me to a proper documentation would be really helpful. This page doesn't help me much with regards to what I'm after. https://pandas.pydata.org/docs/reference/api/pandas.cut.html Thanks |
@dsstex i am not sure why you think this is some kind of support channel it's not pandas is all volunteer |
I seriously have no idea why you are treating people like this. I COMPLETELY UNDERSTAND PANDAS IS ALL VOLUNTEER. You have made that clear from the beginning. I have contributed to several open source PHP projects over the last decade via my past employment. So I can assure you I understand what you mean. I didn't open a ticket here without doing any research. The fact I'm struggling here says, either it is not possible in pandas (Hence my feature request is a valid one) OR the pandas documentation is so hard to find with regards to what I'm asking. If it is the latter, then a better way would be, take my problem as feedback and improve the pandas documentation. p.s. It seem like you are determined to not provide me any assistance. I'm okay with that part. However, I still personally believe you should let other pandas volunteers evaluate my ticket rather than taking decision at your sole discretion by closing this ticket. Because what i'm asking could be a genuine feature request. |
It could also be that you haven't provided a minimal reproducible example. That may be why you've not had responses on StackOverflow Anyway, this has run its course |
@jreback @MarcoGorelli The line between genuine usage questions and feature requests is always a bit fuzzy (in the end, many feature requests are backed by a use case, which is often already somehow possible to do in pandas, but the feature request is about making this easier to do). For example, I think there is some feature request hidden here. @dsstex Thank you for thinking about how pandas can be improved. Now, I have to say that also for me your question was not very clear. It might be a bit late now, but I still wanted to give you some tips:
For your actual feature request, I think in the meantime it has been answered on StackOverflow by @MarcoGorelli. I also think it is not really a "resample" operation (using pandas' terminology), because a resample will group all data that fall into a certain (time) interval together, regardless of order of the rows in your DataFrame. After doing the |
Thanks Joris, some good points there With regards to closing issues - people's time is very limited, and there's a lot of open issues, and if there's one without a clear example with expected output then arguably it's not worth spending too long on it. But I acknowledge that I locked this one prematurely, apologies! |
Is your feature request related to a problem?
Renko Chart Wiki: https://en.wikipedia.org/wiki/Renko_chart
I'm trying to generate a renko chart using the trade tick data. The data contains
Timestamp, Price, Volume
. The Timestamp is in unix milliseconds format. e.g.1649289600174
.Pandas already supports OHLC resampling via
df.resample('10Min').agg({'Price': 'ohlc'})
. However, I would like resample trade data based on price. Not by Time.Describe the solution you'd like
I'm looking for a solution that would sort of look like
df.resample('10Num').agg({'Price': 'ohlc', 'Timestamp': 'last'})
.Here 10 is the
brick size
and it is based on the close price. The keywordNum
says, treat this as a numeric value resampling instead oftimeseries
resampling. i.e. If the close price hits +10 or -10, then I would like to aggregate that data.We should also have a flag to ignore down movement.
if ignore_down set to True, then the agg function should ignore the down side movement. e.g. 100 to 90.
API breaking implications
N/A
Describe alternatives you've considered
At the moment, I'm creating the renko chart manually using a python loop.
Additional context
N/A
The text was updated successfully, but these errors were encountered: