-
Epic
-
Resolution: Duplicate
-
Major
-
None
-
None
-
None
-
None
-
Trino's map_concat overwrites values of the same key
-
False
-
None
-
False
During testing, rh-ee-plopezpe discovered that the persistent volume labels were not showing up in the tag options while reviewing the all_labels work and asked me to investigate.
After, we download the csv out of trino we do see:
persistentvolume_labels | persistentvolumeclaim_labels |
---|---|
label_volume:bravo | label_volume:stor_bravo |
Pv labels are added in nise as:
volumes: volume: volume_name: pvc-volume_1 storage_class: gp2 volume_request_gig: 2 labels: label_volume:bravo volume_claims: volume_claim: volume_claim_name: data_bravo pod_name: bravo labels: label_volume:stor_bravo capacity_gig: 2 volume_claim_usage_gig: full_period: 2
However, if you go to:
You won't see the them in the API:
{ "key": "volume", "values": [ "stor_alpha", "stor_bravo", "stor_charlie", "stor_nod" ], "enabled": true }
We attempted to modify the yaml to give it a different value:
volumes: volume: volume_name: pvc-volume_1 storage_class: gp2 volume_request_gig: 8 labels: label_notvol:notavclabel volume_claims: volume_claim: volume_claim_name: data_nod pod_name: nod labels: label_volume:stor_nod|label_test:nice|label_git:commit|label_stack:overflow|label_preference:tag_tests|label_storageclass:dua capacity_gig: 10 volume_claim_usage_gig: full_period: 5
Which (after enabling the key ) resulted in the values from that column showing up in the return:
postgres=# select distinct(volume_labels) from reporting_ocpusagelineitem_daily_summary where persistentvolumeclaim_capacity_gigabyte_months is not null; volume_labels ------------------------------------------------------------------------------------------------------------------------------ {"volume": "charlie"} {"git": "commit", "test": "nice", "stack": "overflow", "notvol": "notavclabel", "volume": "stor_nod", "storageclass": "dua"} {"volume": "bravo"} {"volume": "alpha"}
This made me take a closer look at how Trino's map_config works and after reading their docs, I discovered the the map_config, returns the union of all the given maps. If a key is found in multiple given maps, that key’s value in the resulting map comes from the last one of those maps. That's when I discover we are overwriting the same key with the last one in the list.
How to reproduce
1. Run the smoke test with a breakpoint
2. Use the make command to populate the jinja vars for the trino sql file, and then run it in trino.
3. Now in Postgres run this query and look at the results:
select distinct(volume_labels) from reporting_ocpusagelineitem_daily_summary where persistentvolumeclaim_capacity_gigabyte_months is not null;
Results:
------------------------------------------------------------------------------------------------------------------------------ {"volume": "stor_bravo"} {"git": "commit", "test": "nice", "stack": "overflow", "notvol": "notavclabel", "volume": "stor_nod", "storageclass": "dua"} {"volume": "stor_charlie"} {"volume": "stor_alpha"}
4. Now update the trino sql code to be:
map_concat( cast(json_parse(coalesce(nli.node_labels, '{}')) as map(varchar, varchar)), cast(json_parse(coalesce(nsli.namespace_labels, '{}')) as map(varchar, varchar)), cast(json_parse(sli.persistentvolumeclaim_labels) as map(varchar, varchar)), cast(json_parse(sli.persistentvolume_labels) as map(varchar, varchar)) ) as volume_labels,
5. And then rerun the test and run the same postgres query to see that the "volume" key's values are all from the sli.persistentvolume_labels column.
- clones
-
COST-4296 Trino's map_concat overwrites values of the same key
- Closed