Skip to content

ENH: Add option to resample data by a non-timeseries column (e.g. Price) #46794

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dsstex opened this issue Apr 17, 2022 · 15 comments
Closed

ENH: Add option to resample data by a non-timeseries column (e.g. Price) #46794

dsstex opened this issue Apr 17, 2022 · 15 comments

Comments

@dsstex
Copy link

dsstex commented Apr 17, 2022

Is your feature request related to a problem?

Renko Chart Wiki: https://en.wikipedia.org/wiki/Renko_chart

I'm trying to generate a renko chart using the trade tick data. The data contains Timestamp, Price, Volume. The Timestamp is in unix milliseconds format. e.g. 1649289600174.

Pandas already supports OHLC resampling via df.resample('10Min').agg({'Price': 'ohlc'}). However, I would like resample trade data based on price. Not by Time.

Describe the solution you'd like

I'm looking for a solution that would sort of look like

df.resample('10Num').agg({'Price': 'ohlc', 'Timestamp': 'last'}).

Here 10 is the brick size and it is based on the close price. The keyword Num says, treat this as a numeric value resampling instead of timeseries resampling. i.e. If the close price hits +10 or -10, then I would like to aggregate that data.

We should also have a flag to ignore down movement.

if ignore_down set to True, then the agg function should ignore the down side movement. e.g. 100 to 90.

API breaking implications

N/A

Describe alternatives you've considered

At the moment, I'm creating the renko chart manually using a python loop.

Additional context

N/A

@dsstex dsstex added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 17, 2022
@jreback
Copy link
Contributor

jreback commented Apr 17, 2022

this can be done use groupby with a binned grouper

no need to add any api

pls ask on stack overflow usage questions

@jreback jreback added Groupby Usage Question and removed Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 17, 2022
@jreback jreback added this to the No action milestone Apr 17, 2022
@jreback jreback closed this as completed Apr 17, 2022
@dsstex
Copy link
Author

dsstex commented Apr 17, 2022

@jreback

I did spend many hours in doing research before opening this feature request. If you provide a simple one liner example or a documentation page where I can get more details, that would be helpful for me and other people who stumble upon to this page via google.

Thanks

@jreback
Copy link
Contributor

jreback commented Apr 17, 2022

@dsstex you would need to provide a working example here to illustrate - but in any event this forum is for bug reports or enhancements that fit the api

without an example it's impossible

but i believe what you are doing is possible his groupby

@dsstex
Copy link
Author

dsstex commented Apr 17, 2022

@jreback

Let me explain my case.

This is how my original data looks like.

Timestamp           Price               Volume

1649289600174       100                 100
1649289600175       102                 150
1649289600176       105                 200
1649289600177       109                 100
1649289600178       110                 200
1649289600179       107                 400
1649289600180       102                 500
1649289600181       101                 600
1649289600182       100                 100
1649289600183       103                 200
1649289600184       107                 400
1649289600185       102                 200
1649289600186        99                 100
1649289600187        97                 100
1649289600188        93                 100
1649289600189        90                 100

After resampling, the result should look like, (Note: I'm using 10 as brick_size here)

1649289600178       110                 750
1649289600182       100               1600
1649289600189         90               1200

I believe you are talking about pd.cut option. And then do groupby.

import pandas as pd
import numpy as np

df = pd.DataFrame({'price': np.random.randint(1, 100, 1000)})
df['bins'] = pd.cut(x=df['price'], bins=[0, 10, 20, 30, 40, 50, 60,
                                          70, 80, 90, 100])

That outputs something like this.

      price       bins
0       92  (90, 100]
1       15   (10, 20]
2       54   (50, 60]
3       55   (50, 60]
4       72   (70, 80]
..     ...        ...
95      88   (80, 90]
96      21   (20, 30]
97      91  (90, 100]
98      51   (50, 60]
99      18   (10, 20]

Please note: As you may well know, trade data is not unique. The price of bitcoin one year back would have looked like 45555 USD. But it's again at the same price this year. If I use a 100 bin size, it would be in (45500, 45600).

A groupby would put both 1 year ago data and current data at the same bin. So I have no idea what is the point of doing groupby here.

I'm looking for a linear implementation. Did I misunderstood your answer or you misunderstood my question?

Thanks

@jreback
Copy link
Contributor

jreback commented Apr 17, 2022

@dsstex pls ask a fully formed question on stack overflow

@dsstex
Copy link
Author

dsstex commented Apr 17, 2022

@jreback

I have already tried 2 times in stackoverflow and 1 times in data-science stackexchange. Haven't received any answer. Only comments. I had to delete the questions after waiting for multiple days to ask again.

I'll give it a try one more time.

Note: It would be very unprofessional if you force me to move to another forum without understanding my feature request. So I assume the following:

  1. You fully understood my feature request
  2. It is possible to accomplish what I'm asking in pandas without introducing new API

Thanks

@jreback
Copy link
Contributor

jreback commented Apr 17, 2022

@dsstex you are missing the point

this is NOT a forum for helping you

rather it is a forum for bug reports and feature requests

pandas is all volunteer - please respect everyone's time

@dsstex
Copy link
Author

dsstex commented Apr 17, 2022

@jreback

I don't want to argue much here.

It is my mistake if you are 100% sure that my feature request can be accomplished using pandas and stackoverflow is the right forum. Because in that case, I'm merely seeking support.

But if what I'm seeking can't be achieved using current version of pandas, then it is a valid feature request.

@dsstex
Copy link
Author

dsstex commented Apr 18, 2022

@dsstex
Copy link
Author

dsstex commented Apr 19, 2022

@jreback

I haven't received any response so far to my SO question even after 24 hours. With 25 views and no response, I would assume this is not simple in pandas. https://stackoverflow.com/questions/71909107/pandas-resample-data-by-a-non-timeseries-column-e-g-price

I'm really struggling here. Could you assist me in any way? Even pointing me to a proper documentation would be really helpful.

This page doesn't help me much with regards to what I'm after. https://pandas.pydata.org/docs/reference/api/pandas.cut.html

Thanks

@jreback
Copy link
Contributor

jreback commented Apr 19, 2022

@dsstex i am not sure why you think this is some kind of support channel it's not

pandas is all volunteer

@dsstex
Copy link
Author

dsstex commented Apr 19, 2022

@jreback

I seriously have no idea why you are treating people like this.

I COMPLETELY UNDERSTAND PANDAS IS ALL VOLUNTEER. You have made that clear from the beginning.

I have contributed to several open source PHP projects over the last decade via my past employment. So I can assure you I understand what you mean.

I didn't open a ticket here without doing any research. The fact I'm struggling here says, either it is not possible in pandas (Hence my feature request is a valid one) OR the pandas documentation is so hard to find with regards to what I'm asking. If it is the latter, then a better way would be, take my problem as feedback and improve the pandas documentation.

p.s. It seem like you are determined to not provide me any assistance. I'm okay with that part. However, I still personally believe you should let other pandas volunteers evaluate my ticket rather than taking decision at your sole discretion by closing this ticket. Because what i'm asking could be a genuine feature request.

@MarcoGorelli
Copy link
Member

The fact I'm struggling here says, either it is not possible in pandas (Hence my feature request is a valid one) OR the pandas documentation is so hard to find with regards to what I'm asking.

It could also be that you haven't provided a minimal reproducible example. That may be why you've not had responses on StackOverflow

Anyway, this has run its course

@pandas-dev pandas-dev locked as resolved and limited conversation to collaborators Apr 19, 2022
@pandas-dev pandas-dev unlocked this conversation Apr 22, 2022
@jorisvandenbossche
Copy link
Member

@jreback @MarcoGorelli The line between genuine usage questions and feature requests is always a bit fuzzy (in the end, many feature requests are backed by a use case, which is often already somehow possible to do in pandas, but the feature request is about making this easier to do). For example, I think there is some feature request hidden here.
So as long as we don't have a better place or discussion forum for such questions/requests (StackOverflow also doesn't allow any discussion), I personally think we need to be more tolerant in accepting such questions here. Or at least try to first ask for clarification and allow some discussion, before closing the issue. It's not very welcoming to be directly shut down.


@dsstex Thank you for thinking about how pandas can be improved. Now, I have to say that also for me your question was not very clear. It might be a bit late now, but I still wanted to give you some tips:

  • Try to provide an actual reproducible example (Marco gave already a link above, and another one is https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports). This ideally means that there is some code to construct an example dataframe that can be copy-pasted (which in this case should be possible, and for example was done in this SO answer: https://stackoverflow.com/a/71935563).
  • Try to avoid jargon or don't assume we know trade data (eg I don't know what "brick size" is). It that sense, it can also help to more clearly explain how you get from the example input data to the expected result (which steps, in logic or pseudo-code, are taken to obtain the expected result).
  • In general for feature requests, it is also good to think about how this "generalizes". Currently it sounds very specific to finance, and if that is the case that can actually be a reason to not include a feature in pandas (pandas already has a vast feature set, and so the bar should be quite high to add yet another feature). To be clear, this is not easy.

For your actual feature request, I think in the meantime it has been answered on StackOverflow by @MarcoGorelli. I also think it is not really a "resample" operation (using pandas' terminology), because a resample will group all data that fall into a certain (time) interval together, regardless of order of the rows in your DataFrame. After doing the cut step to create the actual group key values, what you then want (as far as I understand) is a logic of "group by this key, but only group contiguous values of a given key value".
That can be solved somewhat with the shift+cumsum trick (as shown in the SO answer), but personally I think this is something that we should actually try to make easier to do in pandas. This was long time ago reported in #5494 as well (which is closed now, but I suppose in favor of this issue: #4059).

@MarcoGorelli
Copy link
Member

Thanks Joris, some good points there

With regards to closing issues - people's time is very limited, and there's a lot of open issues, and if there's one without a clear example with expected output then arguably it's not worth spending too long on it. But I acknowledge that I locked this one prematurely, apologies!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants