Loading...

Type: Enhancement
Resolution: Done
Priority: Major
Fix Version/s: 3.0.0.Alpha2
Affects Version/s: None
Component/s: Management
Labels:
None

Git Pull Request:
https://github.com/wildfly/wildfly-core/pull/1566

When the handler for a configuration change operation determines that it cannot effect the change to the current runtime services, it places the process into "reload-required" state. From the moment this occurs until the reload is performed, the configuration model is inconsistent with the runtime services.

This can lead to problems when, prior to reload, the user makes further configuration changes. Those changes can succeed in Stage.MODEL, since the change is valid given the current state of the configuration model, but then when the handler attempts to update the runtime the changes fail because the runtime services are in a different state. Some common scenarios:

1) User removes a resource, triggering reload required. Then they re-add the resource, which fails with a DuplicateServiceException since the service from the original version of the resource hasn't been removed yet.

2) User makes some other config change to a resource which can't be effected immediately, so the server is put into reload-required. The user then adds another resource that depends on the services from the first resource, and that add fails because the runtime service from the first resource is not in the expected state.

A naive fix for this would be once the process goes into reload-required state to stop making any further runtime changes for steps that alter the persistent config. (Runtime changes for ops that don't touch persistent config would be ok, e.g. reload itself, or runtime-only ops like popping a message off a JMS queue.)

The problem with the naive approach is config changes that could take immediate effect no longer will. This could break existing scripts, or just be annoying in general. For example, a server is in reload-required state but is still running. Then the user wants to add a logger category or change the level of an existing one in order to get some diagnostic info. The logging change would not affect the runtime until the reload is done, forcing a reload to get the diagnostic data.

Stuart Douglas had an excellent suggestion today of looking into tying this in to capabilities and requirements. So, for example:

1) An op targeted at resource foo=bar causes the process to go into reload-required.
2) The kernel detects this and finds the registration for the foo=bar resource type, and sees that the resource provides capability org.wildfly.foo.bar.
3) The kernel records in the capability registry that org.wildfly.foo.bar is now "reload-required".
4) Thereafter, for any op that changes the model and then adds a runtime step, the kernel:
a) finds the registration for the resource type associated with that op's target address
b) finds any capabilities provided by the resource type
c) looks for direct or transitive requirements for those capabilites that are "reload-required"
d) if found, the runtime step is not executed, and instead the "server-requires-reload" response-header is added.

The effect here is the granularity of what ops have their runtime changes skipped is reduced to those associated with capabilities that put the server into reload-required. Unrelated ops, e.g. the logging changes mentioned above, are unaffected.

Some fine points:

1) The restart-required and reload-required states need to be tracked separately. The information regarding any restart-required capabilities needs to survive a reload.
2) The information that a capability is reload/restart-required needs to survive the removal of the capability. This allows the remove+add scenario to work. The remove op removes the capability, but the fact it is still present in the runtime is tracked, so when the add comes in no runtime changes are made.

blocks

WFLY-4970 After removing a resource through the CLI, one with the same name can not be added

Closed

causes

WFCORE-1710 WFCORE-1106 work ignores fact that HostCapabilityScope exists

Resolved

is blocked by

WFCORE-1450 ResourceBuilderRoot drops data when integrating an externally created child ResourceDefinition

Resolved

is related to

WFCORE-3385 Cannot batch-drop an extension and its subsystem

Resolved

JBEAP-13640 Cannot batch-drop an extension and its subsystem

Closed

WFLY-6873 WebCERTTestsSecurityDomainSetup prevents other TestCases to deploy arquillian service

Closed

relates to

JBEAP-1700 Unable to configure https using CLI with attribute enabled-cipher-suites

Closed

WFLY-5608 Unable to configure https using CLI with attribute enabled-cipher-suites

Closed

WFCORE-368 Ability for processes to be placed in a state rejecting config changes and requiring a restart

Open

(1 is related to, 3 relates to)

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates