-
Bug
-
Resolution: Done
-
Critical
-
2.6.7
-
None
-
False
-
None
-
False
-
-
-
Important
Description of problem:
On customer's environment, we are seeing forklift controller main container crashing with error below:
2024-09-24T19:53:48.771988483Z fatal error: concurrent map read and map write 2024-09-24T19:53:48.775136464Z 2024-09-24T19:53:48.775136464Z goroutine 207 [running]: 2024-09-24T19:53:48.775143344Z github.com/konveyor/forklift-controller/pkg/monitoring/metrics/forklift-controller.processMigration({{{0x28229dc, 0x9}, {0xc00618b540, 0x1c}}, {{0xc000f9af30, 0x26}, {0xc000f9af00, 0x21}, {0xc000acb800, 0x7}, ...}, ...}, ...)
I can see two "goroutine" on "RecordMigrationMetrics":
2024-09-24T19:53:48.775136464Z goroutine 207 [running]: 2024-09-24T19:53:48.775143344Z github.com/konveyor/forklift-controller/pkg/monitoring/metrics/forklift-controller.processMigration({{{0x28229dc, 0x9}, {0xc00618b540, 0x1c}}, {{0xc000f9af30, 0x26}, {0xc000f9af00, 0x21}, {0xc000acb800, 0x7}, ...}, ...}, ...) 2024-09-24T19:53:48.775154761Z /remote-source/app/pkg/monitoring/metrics/forklift-controller/migration_metrics.go:97 +0x1b5 2024-09-24T19:53:48.775158743Z github.com/konveyor/forklift-controller/pkg/monitoring/metrics/forklift-controller.RecordMigrationMetrics.func1() 2024-09-24T19:53:48.775158743Z /remote-source/app/pkg/monitoring/metrics/forklift-controller/migration_metrics.go:70 +0x3ac 2024-09-24T19:53:48.775163255Z created by github.com/konveyor/forklift-controller/pkg/monitoring/metrics/forklift-controller.RecordMigrationMetrics in goroutine 1 2024-09-24T19:53:48.775163255Z /remote-source/app/pkg/monitoring/metrics/forklift-controller/migration_metrics.go:21 +0x65 2024-09-24T19:53:48.775171150Z
2024-09-24T19:53:48.775263359Z goroutine 206 [runnable]: 2024-09-24T19:53:48.775267266Z k8s.io/apimachinery/pkg/apis/meta/v1.(*Time).DeepCopy(...) 2024-09-24T19:53:48.775270634Z /remote-source/app/vendor/k8s.io/apimachinery/pkg/apis/meta/v1/zz_generated.deepcopy.go:1099 2024-09-24T19:53:48.775270634Z github.com/konveyor/forklift-controller/pkg/apis/forklift/v1beta1/plan.(*Timed).DeepCopyInto(0xc00068c620?, 0xc0033fca10) 2024-09-24T19:53:48.775274176Z /remote-source/app/pkg/apis/forklift/v1beta1/plan/zz_generated.deepcopy.go:269 +0xed 2024-09-24T19:53:48.775279094Z github.com/konveyor/forklift-controller/pkg/apis/forklift/v1beta1/plan.(*Task).DeepCopyInto(0xc00068c620, 0xc0033fca10) 2024-09-24T19:53:48.775279094Z /remote-source/app/pkg/apis/forklift/v1beta1/plan/zz_generated.deepcopy.go:234 +0x78 2024-09-24T19:53:48.775282711Z github.com/konveyor/forklift-controller/pkg/apis/forklift/v1beta1/plan.(*Step).DeepCopyInto(0xc000fdae10, 0xc0034045a0) 2024-09-24T19:53:48.775282711Z /remote-source/app/pkg/apis/forklift/v1beta1/plan/zz_generated.deepcopy.go:215 +0x179 2024-09-24T19:53:48.775286409Z github.com/konveyor/forklift-controller/pkg/apis/forklift/v1beta1/plan.(*VMStatus).DeepCopyInto(0xc001002820, 0xc0033f36c0) 2024-09-24T19:53:48.775290009Z /remote-source/app/pkg/apis/forklift/v1beta1/plan/zz_generated.deepcopy.go:317 +0x4f6 2024-09-24T19:53:48.775293532Z github.com/konveyor/forklift-controller/pkg/apis/forklift/v1beta1/plan.(*MigrationStatus).DeepCopyInto(0xc007900b58, 0xc00340a758) 2024-09-24T19:53:48.775293532Z /remote-source/app/pkg/apis/forklift/v1beta1/plan/zz_generated.deepcopy.go:96 +0x245 2024-09-24T19:53:48.775297183Z github.com/konveyor/forklift-controller/pkg/apis/forklift/v1beta1.(*PlanStatus).DeepCopyInto(0xc007900b18, 0xc00340a718) 2024-09-24T19:53:48.775300715Z /remote-source/app/pkg/apis/forklift/v1beta1/zz_generated.deepcopy.go:752 +0x73 2024-09-24T19:53:48.775300715Z github.com/konveyor/forklift-controller/pkg/apis/forklift/v1beta1.(*Plan).DeepCopyInto(0xc007900800, 0xc00340a400) 2024-09-24T19:53:48.775304322Z /remote-source/app/pkg/apis/forklift/v1beta1/zz_generated.deepcopy.go:665 +0xf4 2024-09-24T19:53:48.775307997Z github.com/konveyor/forklift-controller/pkg/apis/forklift/v1beta1.(*Plan).DeepCopy(...) 2024-09-24T19:53:48.775307997Z /remote-source/app/pkg/apis/forklift/v1beta1/zz_generated.deepcopy.go:675 2024-09-24T19:53:48.775311662Z github.com/konveyor/forklift-controller/pkg/apis/forklift/v1beta1.(*Plan).DeepCopyObject(0xc007900800) 2024-09-24T19:53:48.775311662Z /remote-source/app/pkg/apis/forklift/v1beta1/zz_generated.deepcopy.go:681 +0x3a 2024-09-24T19:53:48.775315415Z sigs.k8s.io/controller-runtime/pkg/cache/internal.(*CacheReader).Get(0xc000a28be0, {0x508c360?, 0x321da99?}, {{0xc000d17978?, 0x31f93ae?}, {0xc000bb1cc0?, 0x280549a?}}, {0x36c6d60, 0xc00340a000}, {0x0, ...}) 2024-09-24T19:53:48.775328659Z /remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/cache/internal/cache_reader.go:88 +0x135 2024-09-24T19:53:48.775332077Z sigs.k8s.io/controller-runtime/pkg/cache.(*informerCache).Get(0xc000918288, {0x36a4ad8, 0x508c360}, {{0xc000d17978?, 0x5087e48?}, {0xc000bb1cc0?, 0x5?}}, {0x36c6d60?, 0xc00340a000?}, {0x0, ...}) 2024-09-24T19:53:48.775345061Z /remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/cache/informer_cache.go:88 +0x1f7 2024-09-24T19:53:48.775359610Z sigs.k8s.io/controller-runtime/pkg/client.(*client).Get(0xc0008226c0, {0x36a4ad8, 0x508c360}, {{0xc000d17978?, 0x17?}, {0xc000bb1cc0?, 0x7?}}, {0x36c6d60?, 0xc00340a000?}, {0x0, ...}) 2024-09-24T19:53:48.775368330Z /remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/client/client.go:348 +0x491 2024-09-24T19:53:48.775371748Z github.com/konveyor/forklift-controller/pkg/monitoring/metrics/forklift-controller.RecordMigrationMetrics.func1() 2024-09-24T19:53:48.775371748Z /remote-source/app/pkg/monitoring/metrics/forklift-controller/migration_metrics.go:37 +0x1e3 2024-09-24T19:53:48.775375332Z created by github.com/konveyor/forklift-controller/pkg/monitoring/metrics/forklift-controller.RecordMigrationMetrics in goroutine 1 2024-09-24T19:53:48.775375332Z /remote-source/app/pkg/monitoring/metrics/forklift-controller/migration_metrics.go:21 +0x65 2024-09-24T19:53:48.775379323Z
I can see that both the migration and plan controller is calling RecordMigrationMetrics which in turn creates new goroutines that run continuously in parallel without synchronization. If I understand this correctly, a race condition is occurring here when one goroutine reads from the map while another writes causing error "concurrent map read and map write error".
Version-Release number of selected component (if applicable):
mtv-operator.v2.6.7
How reproducible:
Observed in customer's environment
Steps to Reproduce:
1. 2. 3.
Actual results:
forklift controller crashing with error "fatal error: concurrent map read and map write"
Expected results:
Additional info: