Uploaded image for project: 'Clair'
  1. Clair
  2. CLAIRDEV-45

OSBS layers contain insufficient information to make CPE assertions

    XMLWordPrintable

Details

    • Epic
    • Resolution: Unresolved
    • Major
    • None
    • None
    • indexer
    • None
    • Precise Build Metadata
    • True
    • Hide

      Waiting on build system changes so that the required information is present.

      Show
      Waiting on build system changes so that the required information is present.
    • False
    • To Do
    • 50
    • 50% 50%

    Description

      TL;DR

      Red Hat's vulnerability data is organized according to Common Platform Enumeration (CPE) names. With the CPE and the Name-Epoch-Version-Release-Architecture (NEVRA) string for a given rpm package, one can evaluate evaluate whether any given advisory in the vulnerability data applies to it.

      Claircore's model of software has a "Package" type which contains the name, version, etc of a discovered package for any given package manager under consideration. The Package can be associated with "Distribution" and "Repository" types that are discovered in parallel processes*.

      Claircore's handling of Red Hat produced layers (a.k.a. rhel) overloads the Repository type to pass through the CPEs. Roughly, the rhel Repository Indexer† looks for content_manifest files, takes the discovered content_sets values (called "repositories"), looks them up in a mapping provided by PST, and constructs Repository objects for them. The "Coalescing" step (when all the per-layer domain types are brought together to produce a coherent view of the container image) then takes the Repository and Package objects and creates the set of PackageRepository (read: CPE). That is to say, given 3 Package and 3 Repository, the result is 9 "logical" packages. These are all present in the system for the matching step‡.

      The obvious problem with this is that it will generate both "phantom packages" and apparent duplicates. For example: given a container with two CPEs associated with it – baseos and appstream – and a package – zlib-1.2.12-5 – two logical packages will be created: zlib-1.2.12-5/baseos and zlib-1.2.12-5/appstream. The package manager only got the package from one place, so one of these packages is a phantom. When compared against an advisory for zlib that contains both baseos and appstream, each will match one of the packages. This is an appears as a duplicate at a glance but is not because of either a lack specificity in the indexing step (producing the phantom package) or excessive generality in the CPE list (including both baseos and appstream, when the package is only present in one).

      The solution to this problem would seem to be:

      1. Examine the package manager's metadata and determine the actual repository that the package came from.
      2. Record the repositories the package manager knows about in addition to the content_sets and delay the CPE resolution.

      The reason this wasn't done prior is that rpm does not record repository information. This is recorded by dnf (or yum), but it was never prioritized for implementation.

      PROJQUAY-5182 was reported and seemed to be stemming from something like the issue above – the Package/Repository association heuristic being too broad. Through the investigation into PROJQUAY-5182 (and subsequently PROJQUAY-5185), we determined that there's insufficient information in a layer produced by OSBS to be able to determine which CPE(s) a Package should be associated with. Moreover, a Package may not be associated with any content_set (and correspondingly, CPE) that the content_manifest contains.

      This is an overarching issue while we design and implement a solution with the relevant teams.


      *: That is, every domain type (Package, Distribution, Repository) is discovered by an independent piece of code that is unconcerned with any other type.

      †: An "Indexer" is Claircore's terminology for the code that examines a layer for a domain type. It's (in theory) a pure function of type LayerContent -> DomainType. Modeling them like that means it can be memoized as (IndexerImplementation, LayerContent) -> DomainType and is the key to reducing work when presented with multiple copies of the same layer.

      ‡: "Matching" is the Claircore term for matching an index of container contents with a set of vulnerabilities. This is sometimes what people mean when they say "scan."

      Attachments

        Issue Links

          Activity

            People

              hdonnay Henry Donnay
              hdonnay Henry Donnay
              Joseph Crosland
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:

                PagerDuty