-
Task
-
Resolution: Done
-
Major
-
None
-
None
-
False
-
False
-
Not Started
-
Not Started
-
Not Started
-
Not Started
-
Not Started
-
Not Started
-
-
API Sprint 23, API Sprint 24
One of the migrations that we added recently is adding a unique index to the tags table.
In theory this should not be an issue but it may happen that two tags within a tenant were created with the same name. This will cause the migration to fail.
So we need some update documentation for 2.12
Customers can check in advance whether they have a duplicate tag by:
SELECT tenant_id, name, count(*) from tags GROUP BY tenant_id, name HAVING count(*) > 1;
If they see an empty result - that's good, no further actions needed.
If they see something like
+-----------+------+----------+ | tenant_id | name | count(*) | +-----------+------+----------+ | 2 | tag3 | 3 | +-----------+------+----------+ 1 row in set (0.001 sec)
This means there are 3 duplicate entries. Then they need to remove the duplicate indices. They can see how many of the tags are actually used in a tagging by
select tags.id, tags.name, count(taggings.id) from tags left join taggings ON tags.id = taggings.tag_id where tags.tenant_id = 2 and tags.name = 'tag3' GROUP BY tags.id, tags.name;
They may see a result like this:
+----+------+--------------------+ | id | name | count(taggings.id) | +----+------+--------------------+ | 1 | tag3 | 1 | | 3 | tag3 | 2 | | 6 | tag3 | 0 | +----+------+--------------------+ 3 rows in set (0.002 sec)
Which means that tag with ID 1 is used once, tag with id 3 is used twice and tag with id 6 is not used.
The resolution here is to move all taggings to a single tag id and remove the remaining entries. In this case
update taggings set tag_id = '1' where tag_id = '3';
And result is
Query OK, 2 rows affected (0.069 sec) Rows matched: 2 Changed: 2 Warnings: 0
Now counting usages again will show:
mysql> select tags.id, tags.name, count(taggings.id) from tags left join taggings ON tags.id = taggings.tag_id where tags.tenant_id
= 2 and tags.name = 'tag3' GROUP BY tags.id, tags.name;
+----+------+--------------------+
| id | name | count(taggings.id) |
+----+------+--------------------+
| 1 | tag3 | 3 |
| 3 | tag3 | 0 |
| 6 | tag3 | 0 |
+----+------+--------------------+
3 rows in set (0.001 sec)
We can finally remove the duplicate entries by
> delete from tags where id = '3'; > delete from tags where id = '6';
That's the end of the procedure. I tested the above procedure with MySQL. If not the same, the procedure should be very similar with PostgreSQL and Oracle.
It is kind of complicated but I believe that duplicate entries should rarely exist. Even less often should duplicate entries exist where more than one is used. In SaaS we only had one duplicate entry with no usages so I think it is a very low chance any customers to have.
So I think it should be fine and only the command to check existence would be needed to be run. Let me know what you think.