Uploaded image for project: 'Machine Config Operator'
  1. Machine Config Operator
  2. MCO-428

Create an end-user friendly error library

XMLWordPrintable

    • Icon: Story Story
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • None
    • None
    • False
    • None
    • False
    • OCPSTRAT-554 - Improving error handling, propagation, collection, and disambiguation for users
    • 0
    • 0

      An idea discussed in An Action Plan For Actionable Errors is the creation of a library that handles unique error identifiers (aka error codes) and makes their management and application to errors more ergonomic. After soliciting feedback from other OpenShifters, the end result needs to be a bit more flexible than originally imagined. It should consist of the following components:

      Unique Error Identifier

      A unique error identifier should be:

      • Defined in a single place.
      • Immutable.
      • Applied to any Go error with a single line of code (e.g., return errCode.Wrap(err)).

       

      The API for creating a Unique Error Identifier could look like this for example:

       

      errCode := NewErrorCode(“ErrCode”)

       

       

      One should be able to associate an optional reason and resolution (defined below) associated with each error code. The API could look like this:

       

       

      errCode := NewErrorCode(“ErrCode”).WithReason(“I’m a reason”).WithResolution(“Here’s what you do”)

       

       

      Alternatively, a more literal-friendly style could look like this:

       

       

      errCode := NewErrorCodeFromDefinition(ErrorDefinition{
        Code: “ErrCode”,
        Reason: “I’m the reason”,
        Resolution: “I’m the resolution”,
      })
      

       

       

      Reason and Resolution

      A reason is why the error occurred. Sometimes, the error code can easily express the root-cause (e.g., ErrPrintPaperJam). Other times, it cannot. Similarly, a resolution is how to resolve the problem. While a bit grandiose in name and vision, a resolution could also contain a link to documentation such as a URL or knowledgebase article number.

       

      While it was initially seen as useful to associate each error code with a predefined and static reason and resolution, in practice this might prove to be more difficult. To that end, a unique error identifier should allow a default reason and resolution, but allow overrides where it makes sense (without mutating the default reason and resolution). The API could look like this for example:

       

      wrapped := errCode.WrapWithReason(err, “I’m a different reason”)
      wrapped = errCode.WrapWithResolution(wrapped, “Do this instead”)
      return wrapped 
      

       

      In short, having the reason and resolution predefined can be very useful for cases where one knows exactly what they are. However, that’s not always the case and we should have a way to provide local context.

      Investigate

      Within Go, it’s common to check for the presence of an error, potentially add some additional context to it and bubble it back up. Sometimes, one can disambiguate between one error or another by using some of the built-in error-handling and comparison tools that the Go stdlib provides (e.g., errors.Is() / errors.As() / comparing to sentinel values). In some cases, this information can be used to determine what course of action a program should take next. For example, if an HTTP request returns HTTP 301 or 302, it would be appropriate to make another HTTP request to the location returned as opposed to stopping and bubbling the error back up.

       

      With this in mind, we can build some of this logic into our program to interrogate the various errors and add more context based upon what we know about the given error. A potential API could look like this:

       

      return errCode.Investigate(originalErr, func(err error) error {
        if os.ErrNoExist(err) {
          return errCode.WrapWithResolution(err, “file could not be found”)
        }
       
        if k8sErrors.IsNotFound(err) {
          msg := fmt.Sprintf(“missing MachineConfigPool ‘%s’, create it by running $ oc create mcp/%s”, mcpName, mcpName)
          return errCode.WrapWithResolution(err, msg)
        }
       
        If k8sErrors.IsConflict(err) {
          return errCode.WrapWithResolution(err, “a conflict has been found, do _ to fix it”)
        }
        
        return err
      })
      

       

      In this way, we can add a bit of additional context and make the root cause easier to locate. And in cases where we can't provide that information, we can at least provide the next step.

       

      Done When:

      • Determine where this library can live. I feel that it would be useful to more than just the MCO and OpenShift, so I would like to open-source it. However, if that's not possible, it can live within the MCO repository within a Go module of its own.
      • We have code checked in which implements, at a minimum, the unique error identifier with an overridable reason and resolution.
      • All new code has a minimum of 80-90% test coverage, as determined using the Go test coverage tool ($ go test -coverprofile coverage.out -v . && go tool cover -html=coverage.out).
      • At least one unique error identifier within any component of the MCO is added to consume this new library.
      • (stretch) The Investigate component is implemented.

       

              Unassigned Unassigned
              zzlotnik@redhat.com Zack Zlotnik
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: