Uploaded image for project: 'Project Quay'
  1. Project Quay
  2. PROJQUAY-2558

Clair stopped scanning images

    XMLWordPrintable

Details

    • Bug
    • Resolution: Done
    • Critical
    • None
    • None
    • clair, quay
    • 0

    Description

      After scanning all images for a day, Clair once again stopped scanning images and is returning a 500 on certain indexing requests. In the logs we see the following:

      2021-09-16T09:18:48.953507453Z {"level":"info","component":"internal/indexer/layerscannner/layerScanner.scan","kind":"repository","layer":"sha256:8c16f80565cf1dffeb961c9616f4d07641c771755cab827393074a1317f8c5
      5f","manifest":"sha256:be4ddc89ca2c4a0dad707924e70b1ec8a9cb6d1a2bfd54a6b05a0d0ad8b74ca3","scanner":"rhel-repository-scanner","state":"ScanLayers","error":"cpe: wfn attr edition is invalid: cpe: string contain
      s space characters","time":"2021-09-16T09:18:48Z"}
      2021-09-16T09:18:48.953507453Z {"level":"debug","component":"internal/indexer/layerscannner/layerScanner.scan","kind":"repository","layer":"sha256:8c16f80565cf1dffeb961c9616f4d07641c771755cab827393074a1317f8c
      55f","manifest":"sha256:be4ddc89ca2c4a0dad707924e70b1ec8a9cb6d1a2bfd54a6b05a0d0ad8b74ca3","scanner":"rhel-repository-scanner","state":"ScanLayers","time":"2021-09-16T09:18:48Z","message":"scan done"}
      2021-09-16T09:18:48.981384230Z {"level":"debug","component":"rhel/RepositoryScanner.Scan","kind":"repository","layer":"sha256:2ef40b342a17b2ce657254db1d42638dd03302ffb29309f97b81e4ff516b8aa6","manifest":"sha2
      56:be4ddc89ca2c4a0dad707924e70b1ec8a9cb6d1a2bfd54a6b05a0d0ad8b74ca3","scanner":"rhel-repository-scanner","state":"ScanLayers","version":"1.1","uri":"https://catalog.redhat.com/api/containers/v1/images/nvr/s2i
      -base-docker-1-6.28","time":"2021-09-16T09:18:48Z","message":"making container API request"}
      2021-09-16T09:18:48.981626765Z {"level":"debug","component":"rhel/RepositoryScanner.Scan","kind":"repository","layer":"sha256:2ef40b342a17b2ce657254db1d42638dd03302ffb29309f97b81e4ff516b8aa6","manifest":"sha2
      56:be4ddc89ca2c4a0dad707924e70b1ec8a9cb6d1a2bfd54a6b05a0d0ad8b74ca3","scanner":"rhel-repository-scanner","state":"ScanLayers","version":"1.1","nvr":"s2i-base-docker-1-6.28","arch":"x86_64","cpes":[],"time":"2
      021-09-16T09:18:48Z","message":"Got CPEs from container API"}
      2021-09-16T09:18:48.981626765Z {"level":"debug","component":"rhel/RepositoryScanner.Scan","kind":"repository","layer":"sha256:2ef40b342a17b2ce657254db1d42638dd03302ffb29309f97b81e4ff516b8aa6","manifest":"sha2
      56:be4ddc89ca2c4a0dad707924e70b1ec8a9cb6d1a2bfd54a6b05a0d0ad8b74ca3","scanner":"rhel-repository-scanner","state":"ScanLayers","version":"1.1","time":"2021-09-16T09:18:48Z","message":"done"}
      2021-09-16T09:18:48.981626765Z {"level":"info","component":"internal/indexer/layerscannner/layerScanner.scan","kind":"repository","layer":"sha256:2ef40b342a17b2ce657254db1d42638dd03302ffb29309f97b81e4ff516b8a
      a6","manifest":"sha256:be4ddc89ca2c4a0dad707924e70b1ec8a9cb6d1a2bfd54a6b05a0d0ad8b74ca3","scanner":"rhel-repository-scanner","state":"ScanLayers","error":"Get \"https://catalog.redhat.com/api/containers/v1/im
      ages/nvr/s2i-base-docker-1-6.28\": context canceled","time":"2021-09-16T09:18:48Z"}
      2021-09-16T09:18:48.981626765Z {"level":"debug","component":"internal/indexer/layerscannner/layerScanner.scan","kind":"repository","layer":"sha256:2ef40b342a17b2ce657254db1d42638dd03302ffb29309f97b81e4ff516b8
      aa6","manifest":"sha256:be4ddc89ca2c4a0dad707924e70b1ec8a9cb6d1a2bfd54a6b05a0d0ad8b74ca3","scanner":"rhel-repository-scanner","state":"ScanLayers","time":"2021-09-16T09:18:48Z","message":"scan done"}
      2021-09-16T09:18:48.981662026Z {"level":"info","component":"internal/indexer/controller/Controller.Index","manifest":"sha256:be4ddc89ca2c4a0dad707924e70b1ec8a9cb6d1a2bfd54a6b05a0d0ad8b74ca3","state":"ScanLaye
      rs","time":"2021-09-16T09:18:48Z","message":"layers scan done"}
      2021-09-16T09:18:48.981662026Z {"level":"error","component":"internal/indexer/controller/Controller.Index","manifest":"sha256:be4ddc89ca2c4a0dad707924e70b1ec8a9cb6d1a2bfd54a6b05a0d0ad8b74ca3","state":"ScanLay
      ers","error":"failed to scan all layer contents: cpe: wfn attr edition is invalid: cpe: string contains space characters","time":"2021-09-16T09:18:48Z","message":"error during scan"}
      2021-09-16T09:18:49.053887115Z {"level":"info","component":"libindex/Libindex.Index","manifest":"sha256:be4ddc89ca2c4a0dad707924e70b1ec8a9cb6d1a2bfd54a6b05a0d0ad8b74ca3","time":"2021-09-16T09:18:49Z","message
      ":"index request done"}
      2021-09-16T09:18:49.053955978Z {"level":"info","component":"httptransport/New","remote_addr":"100.126.40.101:55614","method":"POST","request_uri":"/indexer/api/v1/index_report","status":500,"time":"2021-09-16
      T09:18:49Z","message":"handled HTTP request"}
      

      Even though timestamps do not match for this particular request, I believe that this is causing the 500 Python error in Quay's log:

      021-09-16T12:34:25.934653151Z securityworker stdout | 2021-09-16 12:34:25,933 [115] [DEBUG] [urllib3.connectionpool] http://caddy:80 "POST /indexer/api/v1/index_report HTTP/1.1" 500 163
      2021-09-16T12:34:25.935555953Z securityworker stdout | 2021-09-16 12:34:25,934 [115] [ERROR] [util.secscan.v4.api] Security scanner endpoint responded with non-200 HTTP status code: 500
      2021-09-16T12:34:25.935555953Z securityworker stdout | NoneType: None
      2021-09-16T12:34:25.935871884Z securityworker stdout | 2021-09-16 12:34:25,935 [115] [ERROR] [data.secscan_model.secscan_v4_model] Failed to perform indexing, security scanner API error
      2021-09-16T12:34:25.935871884Z securityworker stdout | Traceback (most recent call last):
      2021-09-16T12:34:25.935871884Z securityworker stdout |   File "/quay-registry/util/secscan/v4/api.py", line 204, in index
      2021-09-16T12:34:25.935871884Z securityworker stdout |     resp = self._perform(actions["Index"](body))
      2021-09-16T12:34:25.935871884Z securityworker stdout |   File "/quay-registry/util/secscan/v4/api.py", line 296, in _perform
      2021-09-16T12:34:25.935871884Z securityworker stdout |     raise Non200ResponseException(resp)
      2021-09-16T12:34:25.935871884Z securityworker stdout | util.secscan.v4.api.Non200ResponseException
      2021-09-16T12:34:25.935871884Z securityworker stdout | During handling of the above exception, another exception occurred:
      2021-09-16T12:34:25.935871884Z securityworker stdout | Traceback (most recent call last):
      2021-09-16T12:34:25.935871884Z securityworker stdout |   File "/quay-registry/data/secscan_model/secscan_v4_model.py", line 286, in perform_indexing
      2021-09-16T12:34:25.935871884Z securityworker stdout |     (report, state) = self._secscan_api.index(manifest, layers)
      2021-09-16T12:34:25.935871884Z securityworker stdout |   File "/quay-registry/util/secscan/v4/api.py", line 208, in index
      2021-09-16T12:34:25.935871884Z securityworker stdout |     raise APIRequestFailure(ex)
      2021-09-16T12:34:25.935871884Z securityworker stdout | util.secscan.v4.api.APIRequestFailure
      2021-09-16T12:34:25.936309381Z securityworker stdout | 2021-09-16 12:34:25,935 [115] [DEBUG] [data.database] Disconnecting from database.
      

      This error has caused all images in Quay to be locked in a constant state of "queued", the images are not being added to the manifestsecuritystatus table and we see a discrepancy of around 700 manifests between this table and the manifest table. This all started hapenning after a particular image was pushed yesterday. Matcher is working properly, all images that are already scanned have their results shown in Quay.

      The image in question seems to be the image with a manifest id 192103:

      registry-quay-app-55fc54774d-8k82r/quay-app/quay-app/logs/current.log:5379:2021-09-16T12:34:22.385692607Z securityworker stdout | 2021-09-16 12:34:22,384 [115] [DEBUG] [peewee] ('SELECT "t1"."id", "t1"."repository_id", "t1"."dige
      st", "t1"."media_type_id", "t1"."manifest_bytes", "t1"."config_media_type", "t1"."layers_compressed_size" FROM "manifest" AS "t1" WHERE ("t1"."id" = %s) LIMIT %s OFFSET %s', [192103, 1, 0])
      
      

      After this image has been pushed, no other images were scanned. This happened yesterday.

      Please check.

      Attachments

        Issue Links

          Activity

            People

              jcroslan@redhat.com Joseph Crosland
              rhn-support-ibazulic Ivan Bazulic
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: