Skip to content

When ingest dataframe, use alternative tagging #94

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
cjelsa opened this issue May 13, 2020 · 15 comments
Closed

When ingest dataframe, use alternative tagging #94

cjelsa opened this issue May 13, 2020 · 15 comments
Labels
enhancement New feature or request
Milestone

Comments

@cjelsa
Copy link

cjelsa commented May 13, 2020

In addition to ticket #79

Would it be possible to next to, data_frame_tag_columns=tag_columns, also have a 'data_frame_tag=' argument? This way a tag can be added to a DF which doesn't appear in the DF.

For example, I have a DF with stock prices: timestamp, open, high, low, close (etc) data. I would like to be able to add tags as ticker, exchange etc which don't appear in the DF, by using a 'data_frame_tag=' argument with data_frame_tag='NASDAQ', 'AAPL'

@bednar bednar added the enhancement New feature or request label May 13, 2020
@bednar
Copy link
Contributor

bednar commented May 13, 2020

@cjelsa thanks for the issue. we will take a look.

@bednar
Copy link
Contributor

bednar commented May 13, 2020

Hi @cjelsa,

Did you try the default tags? It will also work for ingesting DataFrame.

https://github.com/influxdata/influxdb-client-python#default-tags

Regards

@cjelsa
Copy link
Author

cjelsa commented May 13, 2020

No, I have not and I didn't know it worked for DataFrame ingestion as well.

I think I can get it working from there. So I use the PointSettings() function?

@bednar
Copy link
Contributor

bednar commented May 14, 2020

Yes, you could use something like:

settings = PointSettings(**{"NASDAQ": "AAPL", "type": "technology"})

write_client = self.client.write_api(point_settings=settings)

@cjelsa
Copy link
Author

cjelsa commented May 14, 2020

Hmm. It seems that PointSettings is not really working.

This is my code, where contract is a tuple:

point_settings = PointSettings()
point_settings.add_default_tag('ticker', contract[0])
point_settings.add_default_tag('exchange', contract[1])
point_settings.add_default_tag('currency', contract[2])
point_settings.add_default_tag('data_type', 'bar_data_1s') # discriminate if tick or bar (# seconds) data

Error:

NameError Traceback (most recent call last)
in
27
28 # set default tags for this batch
---> 29 point_settings = PointSettings()
30 point_settings.add_default_tag("ticker", contract[0])
31 point_settings.add_default_tag("exchange", contract[1])

NameError: name 'PointSettings' is not defined

@cjelsa
Copy link
Author

cjelsa commented May 14, 2020

And this is also not right ;-)

File "", line 30
point_settings.add_default_tag('ticker': contract[0])
^
SyntaxError: invalid syntax

@bednar
Copy link
Contributor

bednar commented May 15, 2020

Hi @cjelsa,

it looks like you are missed a correct import of PointSettings.

I prepared an example: How to ingest DataFrame with default tags. You could use it as a start point to your implementation.

Regards

@cjelsa
Copy link
Author

cjelsa commented May 18, 2020

Hi,

That is exactly right, I didn't import that ;-).

Now the problem moves to the following:


TypeError Traceback (most recent call last)
in
28 # set default tags for this batch
29 point_settings = PointSettings()
---> 30 point_settings.add_default_tag('ticker', contract[0])
31 point_settings.add_default_tag('exchange', contract[1])
32 point_settings.add_default_tag('currency', contract[2])

TypeError: 'Stock' object is not subscriptable

Any idea?

@bednar
Copy link
Contributor

bednar commented May 18, 2020

It is caused by contract[0] not by add_default_tag, the contract object doesn't support access by index.

Try to extract tag values and then set it into PointSettings:

ticker = contract[0]
exchange = contract[1]
currency = contract[2]

point_settings = PointSettings()
point_settings.add_default_tag('ticker', ticker)
point_settings.add_default_tag('exchange', exchange)
point_settings.add_default_tag('currency', currency)

@cjelsa
Copy link
Author

cjelsa commented May 18, 2020

Ok, yes. That was not the smartest thing ;-)

But still it looks like the point_settings method is not really working. Data gets injected without point_settings default tags, but with data_frame_measurement_name.

The Code:

client = InfluxDBClient(url=inflx_url, token=inflx_token, org=inflx_org)
write_api = client.write_api(write_options=SYNCHRONOUS)

p_settings = PointSettings()
p_settings.add_default_tag('ticker', c)
p_settings.add_default_tag('exchange', scope_primary_exchange)
p_settings.add_default_tag('currency', scope_currency)
p_settings.add_default_tag('data_type', 'bar_data_1s')

write_api.write(bucket=inflx_bucket, record=df, data_frame_measurement_name='IB_Hist_Data', point_settings=p_settings)

But no tags get written.

When I check:

Input:
p_settings.defaultTags

Output:
{'ticker': 'PFE',
'exchange': 'SMART',
'currency': 'USD',
'data_type': 'bar_data_1s'}

So the logic seems to work, only the writing to DB doesn't seem to work.

Screenshot 2020-05-18 at 18 46 41

@bednar
Copy link
Contributor

bednar commented May 19, 2020

It's strange. Try to display raw data in Data Explorer or enable debug info for client:

client = InfluxDBClient(url=inflx_url, token=inflx_token, org=inflx_org, debug=True)

@cjelsa
Copy link
Author

cjelsa commented May 19, 2020

Raw data doesn't change much, just the view, see picture.

Screenshot 2020-05-19 at 11 14 15

I tried again with debug set to true (I masked the token):

send: b'POST /api/v2/write?org=PA&bucket=data&precision=ns HTTP/1.1\r\nHost: localhost:9999\r\nAccept-Encoding: identity\r\nContent-Length: 137850\r\nContent-Encoding: identity\r\nContent-Type: text/plain\r\nAccept: application/json\r\nAuthorization: Token XXXXXXXXXXXc-XXXXXXXXXXXXXXXXXXXXX7Zg==\r\nUser-Agent: influxdb-client-python/1.8.0dev\r\n\r\n'
send: b'IB_Hist_Data close=37.76,high=37.76,low=37.75,open=37.75 1589824800000000000\nIB_Hist_Data close=37.76,high=37.76,low=37.75,open=37.75 1589824801000000000\nIB_Hist_Data close=37.77,high=37.77,low=37.75,open=37.76 1589824802000000000\nIB_Hist_Data close=37.77,high=37.77,low=37.76,open=37.76 1589824803000000000\nIB_Hist_Data close=37.77,high=37.77,low=37.76,open=37.76 1589824804000000000\nIB_Hist_Data close=37.77,high=37.77,low=37.76,open=37.76 1589824805000000000\nIB_Hist_Data close=37.77,high=37.77,low=37.76,open=37.76 1589824806000000000\nIB_Hist_Data close=37.77,high=37.77,low=37.76,open=37.76
..........

No sign of the tags...

@bednar
Copy link
Contributor

bednar commented May 19, 2020

The p_settings is scope to write_api. Set it when you creating write_api:

client = InfluxDBClient(url=inflx_url, token=inflx_token, org=inflx_org)

p_settings = PointSettings()
p_settings.add_default_tag('ticker', c)
p_settings.add_default_tag('exchange', scope_primary_exchange)
p_settings.add_default_tag('currency', scope_currency)
p_settings.add_default_tag('data_type', 'bar_data_1s')

write_api = client.write_api(write_options=SYNCHRONOUS, point_settings=p_settings)
write_api.write(bucket=inflx_bucket, record=df, data_frame_measurement_name='IB_Hist_Data')

@cjelsa
Copy link
Author

cjelsa commented May 19, 2020

Yes!

Thank you very much!

@bednar bednar added this to the 1.8.0 milestone May 21, 2020
@bednar bednar closed this as completed May 21, 2020
@thojdid
Copy link

thojdid commented May 14, 2021

Thank you.
The example also really helped me after struggling with the argument data_frame_tag_columns which never worked out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants