When ingest dataframe, use alternative tagging #94

cjelsa · 2020-05-13T14:03:34Z

In addition to ticket #79

Would it be possible to next to, data_frame_tag_columns=tag_columns, also have a 'data_frame_tag=' argument? This way a tag can be added to a DF which doesn't appear in the DF.

For example, I have a DF with stock prices: timestamp, open, high, low, close (etc) data. I would like to be able to add tags as ticker, exchange etc which don't appear in the DF, by using a 'data_frame_tag=' argument with data_frame_tag='NASDAQ', 'AAPL'

bednar · 2020-05-13T18:38:57Z

@cjelsa thanks for the issue. we will take a look.

bednar · 2020-05-13T18:45:31Z

Hi @cjelsa,

Did you try the default tags? It will also work for ingesting DataFrame.

https://github.com/influxdata/influxdb-client-python#default-tags

Regards

cjelsa · 2020-05-13T19:29:42Z

No, I have not and I didn't know it worked for DataFrame ingestion as well.

I think I can get it working from there. So I use the PointSettings() function?

bednar · 2020-05-14T05:07:09Z

Yes, you could use something like:

settings = PointSettings(**{"NASDAQ": "AAPL", "type": "technology"})

write_client = self.client.write_api(point_settings=settings)

cjelsa · 2020-05-14T17:58:36Z

Hmm. It seems that PointSettings is not really working.

This is my code, where contract is a tuple:

point_settings = PointSettings()
point_settings.add_default_tag('ticker', contract[0])
point_settings.add_default_tag('exchange', contract[1])
point_settings.add_default_tag('currency', contract[2])
point_settings.add_default_tag('data_type', 'bar_data_1s') # discriminate if tick or bar (# seconds) data

Error:

NameError Traceback (most recent call last)
in
27
28 # set default tags for this batch
---> 29 point_settings = PointSettings()
30 point_settings.add_default_tag("ticker", contract[0])
31 point_settings.add_default_tag("exchange", contract[1])

NameError: name 'PointSettings' is not defined

cjelsa · 2020-05-14T18:03:55Z

And this is also not right ;-)

File "", line 30
point_settings.add_default_tag('ticker': contract[0])
^
SyntaxError: invalid syntax

bednar · 2020-05-15T07:53:25Z

Hi @cjelsa,

it looks like you are missed a correct import of PointSettings.

I prepared an example: How to ingest DataFrame with default tags. You could use it as a start point to your implementation.

Regards

cjelsa · 2020-05-18T11:07:33Z

Hi,

That is exactly right, I didn't import that ;-).

Now the problem moves to the following:

TypeError Traceback (most recent call last)
in
28 # set default tags for this batch
29 point_settings = PointSettings()
---> 30 point_settings.add_default_tag('ticker', contract[0])
31 point_settings.add_default_tag('exchange', contract[1])
32 point_settings.add_default_tag('currency', contract[2])

TypeError: 'Stock' object is not subscriptable

Any idea?

bednar · 2020-05-18T11:15:28Z

It is caused by contract[0] not by add_default_tag, the contract object doesn't support access by index.

Try to extract tag values and then set it into PointSettings:

ticker = contract[0]
exchange = contract[1]
currency = contract[2]

point_settings = PointSettings()
point_settings.add_default_tag('ticker', ticker)
point_settings.add_default_tag('exchange', exchange)
point_settings.add_default_tag('currency', currency)

cjelsa · 2020-05-18T16:48:48Z

Ok, yes. That was not the smartest thing ;-)

But still it looks like the point_settings method is not really working. Data gets injected without point_settings default tags, but with data_frame_measurement_name.

The Code:

client = InfluxDBClient(url=inflx_url, token=inflx_token, org=inflx_org)
write_api = client.write_api(write_options=SYNCHRONOUS)

p_settings = PointSettings()
p_settings.add_default_tag('ticker', c)
p_settings.add_default_tag('exchange', scope_primary_exchange)
p_settings.add_default_tag('currency', scope_currency)
p_settings.add_default_tag('data_type', 'bar_data_1s')

write_api.write(bucket=inflx_bucket, record=df, data_frame_measurement_name='IB_Hist_Data', point_settings=p_settings)

But no tags get written.

When I check:

Input:
p_settings.defaultTags

Output:
{'ticker': 'PFE',
'exchange': 'SMART',
'currency': 'USD',
'data_type': 'bar_data_1s'}

So the logic seems to work, only the writing to DB doesn't seem to work.

bednar · 2020-05-19T06:07:19Z

It's strange. Try to display raw data in Data Explorer or enable debug info for client:

client = InfluxDBClient(url=inflx_url, token=inflx_token, org=inflx_org, debug=True)

cjelsa · 2020-05-19T09:17:05Z

Raw data doesn't change much, just the view, see picture.

I tried again with debug set to true (I masked the token):

send: b'POST /api/v2/write?org=PA&bucket=data&precision=ns HTTP/1.1\r\nHost: localhost:9999\r\nAccept-Encoding: identity\r\nContent-Length: 137850\r\nContent-Encoding: identity\r\nContent-Type: text/plain\r\nAccept: application/json\r\nAuthorization: Token XXXXXXXXXXXc-XXXXXXXXXXXXXXXXXXXXX7Zg==\r\nUser-Agent: influxdb-client-python/1.8.0dev\r\n\r\n'
send: b'IB_Hist_Data close=37.76,high=37.76,low=37.75,open=37.75 1589824800000000000\nIB_Hist_Data close=37.76,high=37.76,low=37.75,open=37.75 1589824801000000000\nIB_Hist_Data close=37.77,high=37.77,low=37.75,open=37.76 1589824802000000000\nIB_Hist_Data close=37.77,high=37.77,low=37.76,open=37.76 1589824803000000000\nIB_Hist_Data close=37.77,high=37.77,low=37.76,open=37.76 1589824804000000000\nIB_Hist_Data close=37.77,high=37.77,low=37.76,open=37.76 1589824805000000000\nIB_Hist_Data close=37.77,high=37.77,low=37.76,open=37.76 1589824806000000000\nIB_Hist_Data close=37.77,high=37.77,low=37.76,open=37.76
..........

No sign of the tags...

bednar · 2020-05-19T10:00:00Z

The p_settings is scope to write_api. Set it when you creating write_api:

client = InfluxDBClient(url=inflx_url, token=inflx_token, org=inflx_org)

p_settings = PointSettings()
p_settings.add_default_tag('ticker', c)
p_settings.add_default_tag('exchange', scope_primary_exchange)
p_settings.add_default_tag('currency', scope_currency)
p_settings.add_default_tag('data_type', 'bar_data_1s')

write_api = client.write_api(write_options=SYNCHRONOUS, point_settings=p_settings)
write_api.write(bucket=inflx_bucket, record=df, data_frame_measurement_name='IB_Hist_Data')

cjelsa · 2020-05-19T10:08:53Z

Yes!

Thank you very much!

thojdid · 2021-05-14T07:06:17Z

Thank you.
The example also really helped me after struggling with the argument data_frame_tag_columns which never worked out.

bednar added the enhancement New feature or request label May 13, 2020

bednar added this to the 1.8.0 milestone May 21, 2020

bednar closed this as completed May 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When ingest dataframe, use alternative tagging #94

When ingest dataframe, use alternative tagging #94

cjelsa commented May 13, 2020

bednar commented May 13, 2020

bednar commented May 13, 2020 •

edited

Loading

cjelsa commented May 13, 2020

bednar commented May 14, 2020

cjelsa commented May 14, 2020

cjelsa commented May 14, 2020

bednar commented May 15, 2020

cjelsa commented May 18, 2020

bednar commented May 18, 2020

cjelsa commented May 18, 2020

bednar commented May 19, 2020

cjelsa commented May 19, 2020

bednar commented May 19, 2020

cjelsa commented May 19, 2020

thojdid commented May 14, 2021

When ingest dataframe, use alternative tagging #94

When ingest dataframe, use alternative tagging #94

Comments

cjelsa commented May 13, 2020

bednar commented May 13, 2020

bednar commented May 13, 2020 • edited Loading

cjelsa commented May 13, 2020

bednar commented May 14, 2020

cjelsa commented May 14, 2020

cjelsa commented May 14, 2020

bednar commented May 15, 2020

cjelsa commented May 18, 2020

bednar commented May 18, 2020

cjelsa commented May 18, 2020

bednar commented May 19, 2020

cjelsa commented May 19, 2020

bednar commented May 19, 2020

cjelsa commented May 19, 2020

thojdid commented May 14, 2021

bednar commented May 13, 2020 •

edited

Loading