Skip to content

FluxTable / FluxRecord can't handle tables with duplicate column labels #500

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
pgorczak opened this issue Sep 12, 2022 · 6 comments · Fixed by #502
Closed

FluxTable / FluxRecord can't handle tables with duplicate column labels #500

pgorczak opened this issue Sep 12, 2022 · 6 comments · Fixed by #502
Assignees
Labels
bug Something isn't working
Milestone

Comments

@pgorczak
Copy link

pgorczak commented Sep 12, 2022

Specifications

Code sample to reproduce problem

Make a query for measurements that include a field called result or other labels that occur by default in the annotated CSV response header. Make the field name part of the annotated CSV response header via pivot. The annotated CSV returned by the API will have columns with duplicate labels.

Expected behavior

Being able to access all data returned by the API as annotated CSV

Actual behavior

Flux CSV parser can't handle duplicate header names. It turns CSV rows into FluxRecords whose internal values are represented as python dicts, meaning values from columns with the same label overwrite each other.

See flux_csv_parser.py from line 262 for the logic that collapses a list of columns into a dict based on the column label.

Additional info

No response

@pgorczak pgorczak added the bug Something isn't working label Sep 12, 2022
@pgorczak pgorczak changed the title FluxRecord can't handle tables with duplicate column labels FluxTable / FluxRecord can't handle tables with duplicate column labels Sep 12, 2022
@bednar
Copy link
Contributor

bednar commented Sep 12, 2022

Hi @pgorczak,

thanks for using our client.

I will take a look.

Regards

@bednar bednar self-assigned this Sep 12, 2022
@pgorczak
Copy link
Author

Thank you @bednar :) I just clarified the expected behavior since the problem isn't about measurements but rather how the annotated CSV is parsed into Python objects

@bednar
Copy link
Contributor

bednar commented Sep 13, 2022

Just for clarification, the problem is caused by Annotated CSV with duplicate column names:

#datatype,string,long,dateTime:RFC3339,dateTime:RFC3339,dateTime:RFC3339,string,string,double
#group,false,false,true,true,false,true,true,false
#default,_result,,,,,,,
,result,table,_start,_stop,_time,_measurement,location,result
,,0,2022-09-13T06:14:40.469404272Z,2022-09-13T06:24:40.469404272Z,2022-09-13T06:24:33.746Z,my_measurement,Prague,25.3
,,0,2022-09-13T06:14:40.469404272Z,2022-09-13T06:24:40.469404272Z,2022-09-13T06:24:39.299Z,my_measurement,Prague,25.3
,,0,2022-09-13T06:14:40.469404272Z,2022-09-13T06:24:40.469404272Z,2022-09-13T06:24:40.454Z,my_measurement,Prague,25.3
from datetime import datetime

from influxdb_client import WritePrecision, InfluxDBClient, Point
from influxdb_client.client.write_api import SYNCHRONOUS

with InfluxDBClient(url="http://localhost:8086", token="my-token", org="my-org", debug=True) as client:
    query_api = client.query_api()

    p = Point("my_measurement") \
        .tag("location", "Prague") \
        .field("result", 25.3) \
        .time(datetime.utcnow(), WritePrecision.MS)
    write_api = client.write_api(write_options=SYNCHRONOUS)

    write_api.write(bucket="my-bucket", record=p)

    tables = query_api.query(
        'from(bucket:"my-bucket") |> range(start: -10m) |> pivot(rowKey:["_time"], columnKey: ["_field"], valueColumn: "_value")')
    for table in tables:
        print(table)
        for record in table.records:
            # process record
            print(record.values)

@bednar
Copy link
Contributor

bednar commented Sep 14, 2022

Hi @pgorczak,

the #502 adds possibility to access your data by record.row. The record.row is an array of column values from Annotated CSV row.

If you would like to use this fixed version before regular release, please install client via:

pip install git+https://github.com/influxdata/influxdb-client-python.git@record-row-array

What do you think about this solution?

Regards

@pgorczak
Copy link
Author

Sorry for the late response @bednar . I somehow didn't get the notification.

Just tried record.row and it works like a charm. I think it's a useful alternative way to access row values. Thank you!

@bednar
Copy link
Contributor

bednar commented Sep 20, 2022

@pgorczak thanks for testing, I will keep open this issue until #502 will be merged.

@bednar bednar reopened this Sep 20, 2022
@bednar bednar added this to the 1.33.0 milestone Sep 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants