Uploaded image for project: 'WildFly'
  1. WildFly
  2. WFLY-8849

Correct runtime-only operations on profile resources

    XMLWordPrintable

Details

    Description

      OVERVIEW:

      WFCORE-389 and WFCORE-2858 are about supporting runtime-only ops on profile resources, something which we officially don't do (although there are violations of this policy as is shown below.) As part of the decision as to whether to complete WFCORE-389 and WFCORE-2858 for Core 3 / WildFly 11 I have performed an analysis of the existing runtime-only ops in WildFly, looking for any issues. This JIRA is about correcting those issues. The intent is to make initial changes that don't alter behavior in any negative way in order to allow WFCORE-389 and WFCORE-2858 to proceed, but to do them in such a way that if 389 and 2858 don't proceed there is no harm. For some items there will be a follow up issue to make further decisions.

      I performed 3 searches for operations that declare themselves as runtime-only, looking for any aspect of their behavior that might be problematic for WFCORE-389. I created a document which I'll attach with the results. I put a classifier to the left of each item as a shorthand re: the status of the item..

      CLASSIFIERS:

      NP – non-profile. Op is not used in profile resources, either because it is not registered in subsystems, is only in deployment=*/subsystem=x resources, or the subsystem does not register it in an HC. Any NP op is not relevant to WFCORE-389.

      WR – WRite. The op is not read-only.

      WR? – the op is not declared as read-only but seems to only being doing reads

      AO – the op is registered on the profile but execution is a no-op unless the process is admin only

      RU? – the op doesn't seem to really be runtime-only

      !!! – Miscellaneously problematic ops

      FIXES:

      1) A number of ops have WR?/NP classifiers. The NP means these aren't relevant to WFCORE-389 but correcting the metadata so they are declared as read-only is a useful minor task.

      2) The "migrate" ops in web, jacorb and messaging. These are registered on the profile (allowing profile migration) but will fail if the process isn't admin-only. An admin-only HC has no slaves or servers, so this means no domain-rollout of this op, and hence WFCORE-389, is not relevant. This is all by design; it allows users to migrate the subsystem in a domain profile. However, there is a question about them declaring themselves runtime-only, since they modify config. Correcting this is another useful minor task.

      3) The "describe-migration" ops. Same discussion as for "migrate" plus these don't seem to be write ops, so a minor useful side task is to correct the metadata to describe them as read-only.

      4) ModClusterConfigResourceDefinition registers 4 ops as runtime-only that seem to be modifying configuration; i.e. they are not runtime-only. These have a tangential relationship to WFCORE-389 in that they are pre-existing ops that break the no-runtime-only-on-profile rule that WFCORE-389 is about rescinding. I'm not aware of any issues reported about them so that's a tiny bit of additional evidence that the kernel can handle such ops. But, a subtask of this issue is to correct the metadata for these so they will not be affected by any subsequent changes related to runtime-only ops.

      5) JcaCachedConnectionManagerDefinition.CcmOperations has two operations that are not declared to be read-only that are registered on the profile resource. So these are pre-existing ops that break the no-runtime-only-on-profile rule that WFCORE-389 is about rescinding. A twist with these is they seem to actually be read-only and should be described as such. But if we do that we must implement WFCORE-2858 to avoid breaking existing behavior.

      Nothing will be done about these as part of this work, but I'll file an issue to get it sorted.

      6) The JSF subsystem's "list-active-jsf-impl" op. A read-only, runtime-only op that does runtime work (scanning modules) in Stage.MODEL on the profile resource. Lots of rules being broken! What this op does now if invoked against the profile is tell you what jsf impls are present on the DC. Which is not the same thing as telling you what impls are present on "the domain" since different hosts in the domain can have different sets of modules. So the op needs a rethink.
      a) If we correct the Stage.MODEL problem, we can't do WFCORE-2849. So we need to choose between the two.
      b) If we do WFCORE-2858, this op will now start getting rolled out to the domain servers resulting in getting data from all servers. This is arguably the correct behavior, as now the user learns the true situation in the domain, not just on the DC. But if we decide we don't want that we'll need to add OperationEntry.Flag.HOST_CONTROLLER_ONLY to the operation definition to prevent that rollout.
      c) If we do roll it out to the servers we can consider having it no longer do runtime work on the profile; i.e. don't analyze the DC, just the servers. That would remove the conflict with WFCORE-2849, but would be an incompatible change in behavior. I find it hard to believe anyone would be using this op in scripts though; not against the profile.
      d) We could just stop registering it on the profile, but that's a loss of functionality.
      Choice b) would let WFCORE-2858 go forward and preserve the status quo for this op, with a) c) and d) still options for the future.

      7) The transaction subsystem's "probe" operation. A read-only, runtime-only op registered on the profile resource but which is functionally a no-op if invoked on the profile resource. But WFCORE-2858 would mean this now gets rolled out to all servers in the domain that use the profile, triggering an actual probe on all. So, if we do WFCORE-2858 we could:
      a) Accept this, and let the op roll out. That should be an RFE though, with analysis that rolling it out would be harmless.
      b) Remove the op from the profile. It never did anything useful (just a no-op that isn't rolled out) so removing it is only
      a semi-breaking change.
      c) Add OperationEntry.Flag.HOST_CONTROLLER_ONLY to the operation definition to prevent that rollout.
      Choice c) would let WFCORE-2858 go forward and preserve the status quo for this op, with a) and b) still options for the future, so that's what will be done as part of this work.

      8) The messaging-activemq broadcast-group resource has problematic 'start' and 'stop' ops. These are not registered as runtime-only, but they are. They are registered on the profile resource and are not read-only, so the DC rolls them out to the domain. So, they are pre-existing ops that break the no-runtime-only-on-profile rule that WFCORE-389 is about rescinding. We have two
      options here:
      a) remove these ops on the profile as violations of the no-runtime-only-on-profile rule. This would be a breaking change. But it may be the correct thing to do anyway if it is unsafe to invoke these on the profile and have that roll out to all servers.
      b) Correct the description of these to declare runtime-only.
      Nothing will be done on these as part of this work, but a separate issue will be filed.

      9) The messaging-activemq broadcast-group resource also has problematic a get-connector-pairs-as-json op. This is a read-only op so it currently will not roll out. It will also fail if executed against the profile resource, as it fails if there is no activemq server present. So, the options here are:
      a) Remove the op from the profile resource. It never worked anyway.
      b) Allow them to roll out. This would be new behavior though.
      c) Add OperationEntry.Flag.HOST_CONTROLLER_ONLY to the operation definition to prevent that rollout.
      IMHO option c) is kind of silly, leaving a broken op in place, but it's a valid "emergency" step to prevent roll out inadvertently being turned on while a decision between a) and b) is made. So that's what will be done as part of this work.

      10) The messaging-activemq cluster-connection resource has problematic 'start' and 'stop' ops. These are not registered as runtime-only, but they are. They are registered on the profile resource and are not read-only, so the DC rolls them out to the domain. So, they are pre-existing ops that break the no-runtime-only-on-profile rule that WFCORE-389 is about rescinding. We have two
      options here:
      a) remove these ops on the profile as violations of the no-runtime-only-on-profile rule. This would be a breaking change. But it may be the correct thing to do anyway if it is unsafe to invoke these on the profile and have that roll out to all servers.
      b) Correct the description of these to declare runtime-only.
      Nothing will be done on these as part of this work, but a separate issue will be filed.

      11) The messaging-activemq cluster-connection resource also has problematic a get-nodes op. This is a read-only op so it currently will not roll out. It will also fail if executed against the profile resource, as it fails if there is no activemq server present. So, the options here are:
      a) Remove the op from the profile resource. It never worked anyway.
      b) Allow them to roll out. This would be new behavior though.
      c) Add OperationEntry.Flag.HOST_CONTROLLER_ONLY to the operation definition to prevent that rollout.
      IMHO option c) is kind of silly, leaving a broken op in place, but it's a valid "emergency" step to prevent roll out inadvertently being turned on while a decision between a) and b) is made. So that's what will be done as part of this work.

      12) A number of ops are using withFlags(OperationEntry.Flag.RUNTIME_ONLY) instead of setRuntimeOnly(). The effect is the same so this is harmless but a minor useful side task is to switch to setRuntimeOnly(). That will make it easier to find these ops.

      Attachments

        Issue Links

          Activity

            People

              bstansbe@redhat.com Brian Stansberry
              bstansbe@redhat.com Brian Stansberry
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: