Uploaded image for project: 'Project Quay'
  1. Project Quay
  2. PROJQUAY-3754

Quay 3.7.0 Geo-replication Clair APP POD from 2nd Quay Deployment use shared Postgres DB was crashed

    XMLWordPrintable

Details

    • Bug
    • Resolution: Not a Bug
    • Blocker
    • quay-v3.7.0
    • quay-v3.7.0
    • quay-operator
    • None
    • False
    • None
    • False
    • 0

    Description

      Description:

      This is an issue found when deploy quay 3.7.0 in Geo-Replication mode, now there will be  two Quay deployed in two OCP Cluster managed by two Operators, after deploy Quay in 1st region, Quay and Clair PODs are ready, next move to deploy 2nd Quay in 2nd OCP Cluster, and also provide the same clair-config.yaml(shared postgres DB for Clair), but in the 2nd Quay deployment, Clair APP POD was failed to start, checked Clair APP POD, get error message "{"level":"error","component":"main","error":"service initialization failed: failed to initialize indexer: failed to register configured scanners: failed getting id for scanner \"dpkg\": failed to connect to `host=35.236.115.104 user=quayrdsdb database=postgres`: failed to write startup message (read tcp 10.129.2.26:57578->35.236.115.104:5432: i/o timeout)","time":"2022-05-05T11:26:47Z","message":"fatal error"}"

      Quay Image: quay-operator-bundle-container-v3.7.0-112

      Quay config.yaml:

      FEATURE_QUOTA_MANAGEMENT: true
      FEATURE_PROXY_CACHE: true
      FEATURE_STORAGE_REPLICATION: true
      SERVER_HOSTNAME: quaygeo.qe.devcluster.openshift.com
      DB_CONNECTION_ARGS:
        autorollback: true
        threadlocals: true
      DB_URI: postgresql://quayrdsdb:quayrdsdb@terraform-20220505085433237000000001.c2dhnkrh69gr.us-west-2.rds.amazonaws.com:5432/quay
      BUILDLOGS_REDIS:
        host: 18.221.161.154
        port: 6379
      USER_EVENTS_REDIS:
        host: 18.221.161.154
        port: 6379
      DISTRIBUTED_STORAGE_DEFAULT_LOCATIONS:
        - primary
        - secondary
      DISTRIBUTED_STORAGE_PREFERENCE:
        - primary
        - secondary
      DISTRIBUTED_STORAGE_CONFIG:
        primary:
          - S3Storage
          - s3_bucket: quay370a
            storage_path: /quay370
            s3_access_key: ***
            s3_secret_key: ***
            host: s3.us-west-2.amazonaws.com
        secondary:
          - S3Storage
          - s3_bucket: quay370b
            storage_path: /quay370
            s3_access_key: ***
            s3_secret_key: ***
            host: s3.us-east-2.amazonaws.com 
      oc create secret generic --from-file config.yaml=./quay_config_1.yaml --from-file ssl.cert=./ssl.cert --from-file ssl.key=./ssl.key --from-file clair-config.yaml=./clair-config.yaml georep-config-bundle 

      QuayRegistry CR:

      QuayRegistry CR in OCP Cluster1:
      
      apiVersion: quay.redhat.com/v1
      kind: QuayRegistry
      metadata:
        name: quay370
      spec:
        configBundleSecret: georep-config-bundle
        components:
        - kind: quay
          managed: true
          overrides:
            env:
            - name: QUAY_DISTRIBUTED_STORAGE_PREFERENCE
              value: primary
        - kind: postgres
          managed: false
        - kind: redis
          managed: false
        - kind: objectstorage
          managed: false
        - kind: tls
          managed: false
        - kind: clairpostgres
          managed: false
      
      QuayRegistry CR in OCP Cluster2:
      
      apiVersion: quay.redhat.com/v1
      kind: QuayRegistry
      metadata:
        name: quay370
      spec:
        configBundleSecret: georep-config-bundle
        components:
        - kind: quay
          managed: true
          overrides:
            env:
            - name: QUAY_DISTRIBUTED_STORAGE_PREFERENCE
              value: secondary
        - kind: postgres
          managed: false
        - kind: redis
          managed: false
        - kind: objectstorage
          managed: false
        - kind: tls
          managed: false
        - kind: clairpostgres
          managed: false
       

      Clari-config.yaml:

      indexer:
          connstring: host=35.236.115.104 port=5432 dbname=postgres user=quayrdsdb password=quayrdsdb sslmode=require
          layer_scan_concurrency: 6
          migrations: true
          scanlock_retry: 11
      log_level: debug
      matcher:
          connstring: host=35.236.115.104 port=5432 dbname=postgres user=quayrdsdb password=quayrdsdb sslmode=require
          migrations: true
      notifier:
          connstring: host=35.236.115.104 port=5432 dbname=postgres user=quayrdsdb password=quayrdsdb sslmode=require
          migrations: true 

      1st Quay deployment in 1st OCP Cluster:

      oc get pod
      NAME                                          READY   STATUS      RESTARTS   AGE
      quay-operator.v3.7.0-648df8c596-lg9bc         1/1     Running     0          146m
      quay370-clair-app-7694fb8498-dnnjm            1/1     Running     0          50m
      quay370-clair-app-7694fb8498-vbk5d            1/1     Running     0          50m
      quay370-quay-app-5f6b5955-mtjhl               1/1     Running     0          50m
      quay370-quay-app-5f6b5955-tt2mn               1/1     Running     0          50m
      quay370-quay-app-upgrade-89vrc                0/1     Completed   0          50m
      quay370-quay-config-editor-76dffbf759-wqzms   1/1     Running     0          50m
      quay370-quay-mirror-66b98bdc8-2smzm           1/1     Running     0          50m
      quay370-quay-mirror-66b98bdc8-msjs2           1/1     Running     0          50m 

      2nd Quay deployment in OCP Cluster2, and Clair APP POD use the same shared postgres DB:

      oc get pod
      NAME                                         READY   STATUS             RESTARTS        AGE
      quay-operator.v3.7.0-74d7cc69c8-skd8n        1/1     Running            0               134m
      quay370-clair-app-5c7975586-4vzvm            0/1     CrashLoopBackOff   6 (2m2s ago)    8m57s
      quay370-clair-app-5c7975586-c4qwp            0/1     CrashLoopBackOff   6 (2m20s ago)   9m10s
      quay370-quay-app-f5d4d778c-lqcr7             1/1     Running            0               8m40s
      quay370-quay-app-f5d4d778c-nghpz             1/1     Running            0               8m42s
      quay370-quay-app-upgrade-cg7cq               0/1     Completed          0               9m10s
      quay370-quay-config-editor-9cd99b659-bn9ll   1/1     Running            0               9m10s
      quay370-quay-mirror-6d8c8cb7d8-pkrnw         1/1     Running            0               9m10s
      quay370-quay-mirror-6d8c8cb7d8-tvhpb         1/1     Running            0               8m55s
      
      oc logs quay370-clair-app-5c7975586-4vzvm
      
      {"level":"debug","component":"initialize/Logging","time":"2022-05-05T11:26:37Z","message":"logging initialized"}
      {"level":"info","component":"main","version":"v4.4.1","time":"2022-05-05T11:26:37Z","message":"starting"}
      {"level":"info","component":"main","lint":"introspection address not provided, default will be used (at $.introspection_addr)","time":"2022-05-05T11:26:37Z"}
      {"level":"info","component":"main","lint":"automatically sizing number of concurrent requests (at $.indexer.index_report_request_concurrency)","time":"2022-05-05T11:26:37Z"}
      {"level":"info","component":"main","lint":"large values will increase latency (at $.indexer.scanlock_retry)","time":"2022-05-05T11:26:37Z"}
      {"level":"info","component":"main","lint":"no delivery mechanisms specified (at $.notifier)","time":"2022-05-05T11:26:37Z"}
      {"level":"debug","component":"main","time":"2022-05-05T11:26:37Z","message":"found cgroups v1 and cpu controller"}
      {"level":"debug","component":"main","time":"2022-05-05T11:26:37Z","message":"falling back to root hierarchy"}
      {"level":"info","component":"main","cur":4,"prev":8,"time":"2022-05-05T11:26:37Z","message":"set GOMAXPROCS value"}
      {"level":"info","component":"main","version":"v4.4.1","time":"2022-05-05T11:26:37Z","message":"ready"}
      {"level":"info","component":"main","time":"2022-05-05T11:26:37Z","message":"launching introspection server"}
      {"level":"info","component":"main","time":"2022-05-05T11:26:37Z","message":"launching http transport"}
      {"level":"info","component":"main","time":"2022-05-05T11:26:37Z","message":"registered signal handler"}
      {"level":"info","component":"introspection/New","address":":8089","time":"2022-05-05T11:26:37Z","message":"no introspection address provided; using default"}
      {"level":"info","component":"initialize/Services","time":"2022-05-05T11:26:37Z","message":"begin service initialization"}
      {"level":"warn","component":"introspection/New","time":"2022-05-05T11:26:37Z","message":"no health check configured; unconditionally reporting OK"}
      {"level":"info","component":"introspection/Server.withPrometheus","endpoint":"/metrics","server":":8089","time":"2022-05-05T11:26:37Z","message":"configuring prometheus"}
      {"level":"info","component":"introspection/New","time":"2022-05-05T11:26:37Z","message":"no distributed tracing enabled"}
      {"level":"info","component":"libindex/New","time":"2022-05-05T11:26:39Z","message":"created database connection"}
      {"level":"debug","component":"internal/ctxlock/Locker.reconnect","gen":"1","time":"2022-05-05T11:26:46Z","message":"set up"}
      {"level":"info","component":"initialize/Services","time":"2022-05-05T11:26:47Z","message":"end service initialization"}
      {"level":"error","component":"main","error":"service initialization failed: failed to initialize indexer: failed to register configured scanners: failed getting id for scanner \"dpkg\": failed to connect to `host=35.236.115.104 user=quayrdsdb database=postgres`: failed to write startup message (read tcp 10.129.2.26:57578->35.236.115.104:5432: i/o timeout)","time":"2022-05-05T11:26:47Z","message":"fatal error"} 

      Attachments

        Activity

          People

            jonathankingfc Jonathan King
            lzha1981 luffy zhang
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: