Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Normal
Fix Version/s: 2025.2 (Flamingo)
Affects Version/s: 2025.2 (Flamingo)
Component/s: python-oslo-db
Labels:
None

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Docs Approval:
?
AssignedTeam:
rhos-ops-platform-services-pidone
Regression:
None
Intelligence Requested:
Market:

Sprint:
Sprint 3
sprint_count:
1
Severity:
Moderate

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

To Reproduce Steps to reproduce the behavior:

Configure OpenStack service (Nova, Keystone, etc.) to use MariaDB as database backend
Set MariaDB isolation level to REPEATABLE-READ
Execute concurrent UPDATE operations on the same table rows within transactions
Observe that MariaDB error 1020 is raised but not properly categorized by oslo.db
See error: sqlalchemy.exc.OperationalError: (pymysql.err.OperationalError) (1020, "Record has changed since last read in table '...'") passed through as generic OperationalError

Expected behavior

The MariaDB error 1020 should be detected and wrapped in a specific oslo.db exception class (e.g., DBConsistencyError) that clearly indicates this is a transient consistency validation failure that can be resolved by retrying the transaction. This would allow OpenStack services to implement appropriate retry logic instead of treating it as a generic operational error.

Screenshots

N/A - This is a database error handling issue

Device Info (please complete the following information):

Database: MariaDB (various versions supporting REPEATABLE-READ isolation)
OS Version: Various Linux distributions running OpenStack
Python Version: Python 3.x
SQLAlchemy Version: 1.4+ / 2.0+
Oslo.db Version: 17.3.0 or any previous versions lacking MariaDB 1020 error handling

Bug impact

This bug impacts OpenStack deployments using MariaDB as the database backend under REPEATABLE-READ isolation level:

High: Services like Nova and Keystone experience unhandled OperationalErrors during concurrent operations
Medium: Lack of proper error categorization prevents implementation of appropriate retry mechanisms
Low: Generic error handling makes troubleshooting more difficult for operators
The issue becomes more prominent in high-concurrency environments where multiple transactions attempt to modify the same data simultaneously.

Known workaround

Currently, the work around is simply to set `innodb_snapshot_isolation = OFF`. Applications can also catch the generic OperationalError and manually parse the error message to detect MariaDB error 1020 but this would requires service-specific code rather than leveraging oslo.db's centralized error handling.

Additional context

This is related to MariaDB's new transaction validation model under REPEATABLE-READ isolation level. The error occurs when MariaDB detects that a row has changed since it was last read in the current transaction, which is a legitimate consistency check but should be treated as a retryable condition.

References

Assignee:: Hervé Beraud

Reporter:: Hervé Beraud

Team:: rhos-dfg-pidone

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Created:: 2025/07/16 1:28 PM

Updated:: 2025/08/07 1:54 PM

Resolved:: 2025/08/07 1:54 PM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates

PagerDuty