Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-65662

Replication issue between masters using cert based authentication

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Normal Normal
    • rhel-9.6
    • None
    • 389-ds-base
    • No
    • Moderate
    • rhel-sst-idm-ds
    • 0
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None

      We have a very hot case from Swift. It currently has a lot of attention from high up management on both sides. This issue has caused their migration to halt (from HPDS -->> RHDS), and for them to move those plans to a later date.

      I encourage everyone to read the problem statement in the case as well as take a look at the topology diagram that is attached to the case.

      This is going to be quite long.

      Also, this is all currently fixed (via a reboot on the systems). This is an RCA.

      Currently as it stands, we have four Red Hat Directory Severs (there are more than four, but we only care about these four):

      nledsv11 - 8.4 (Ootpa) - 389-ds-base-1.4.3.22-4.1.
      nleddv11 - 8.4 (Ootpa) - 389-ds-base-1.4.3.22-4.1.
      chedsv11 - 8.4 (Ootpa) - 389-ds-base-1.4.3.22-4.1.
      cheddv11 - 8.4 (Ootpa) - 389-ds-base-1.4.3.22-4.1.

      newest version of RHDS packages: 389-ds-base-1.4.3.31-6. (they WILL NOT update, it is becuase they will need to get approval, and that will take a very, very, very long time)

      They are currently in a topolgy that looks like this:

      nledsv11 <<---->> chedsv11
      \ /
      \ /
      X
      / \
      / \
      nleddv11 <<---->> cheddv11

      All of these agreements are both ways. There is no firewall between these four servers (word of the customer on that one)

      There are more RHDS servers below this in the topology, but they are all consumers of the above.

      Something happened yesterday on DEC 14th at 21:38 ([14/Dec/2022:21:38:45.407181836 +0000]) which led to these following errors:
      _________________________________________________________________

      [14/Dec/2022:21:38:45.407181836 +0000] - ERR - setup_ol_tls_conn - failed: unable to create new TLS context - -1

      [14/Dec/2022:21:38:45.408106375 +0000] - ERR - slapi_ldap_init_ext - failed: unable to set SSL/TLS options

      [14/Dec/2022:21:38:45.408896518 +0000] - ERR - setup_ol_tls_conn - failed: unable to create new TLS context - -1

      [14/Dec/2022:21:38:45.409660453 +0000] - ERR - slapi_ldap_init_ext - failed: unable to set SSL/TLS options

      [14/Dec/2022:21:38:45.410301001 +0000] - WARN - Security Initialization - SSL alert: Sending pin request to SVRCore. You may need to run systemd-tty-ask-password-agent to provide the password.

      [14/Dec/2022:21:38:45.410998859 +0000] - WARN - Security Initialization - SSL alert: Sending pin request to SVRCore. You may need to run systemd-tty-ask-password-agent to provide the password.

      [14/Dec/2022:21:38:45.411735826 +0000] - WARN - Security Initialization - SSL alert: SSL key file ((null)) for client authentication does not exist. Using Server-Key

      [14/Dec/2022:21:38:45.412344376 +0000] - WARN - Security Initialization - SSL alert: SSL key file ((null)) for client authentication does not exist. Using Server-Key

      [14/Dec/2022:21:38:45.413005362 +0000] - WARN - Security Initialization - SSL alert: SSL cert file ((null)) for client authentication does not exist. Using Internal (Software) Token:Server-Cert

      [14/Dec/2022:21:38:45.413702495 +0000] - WARN - Security Initialization - SSL alert: SSL cert file ((null)) for client authentication does not exist. Using Internal (Software) Token:Server-Cert

      [14/Dec/2022:21:38:45.414333128 +0000] - ERR - setup_ol_tls_conn - failed: unable to create new TLS context - -1

      [14/Dec/2022:21:38:45.414955165 +0000] - ERR - slapi_ldap_bind - Error: could not configure the server for cert auth - error -1 - make sure the server is correctly configured for SSL/TLS

      [14/Dec/2022:21:38:45.415591512 +0000] - ERR - NSMMReplicationPlugin - bind_and_check_pwp - agmt="cn=From-edswift_NLEDSV11RBS.production.sipn.swift.com_389-to-edswift_CHEDDV11RBS.production.sipn.swift.com_49636-o\3Dswift" (CHEDDV11RBS:49636) - Replication bind with EXTERNAL auth failed: LDAP error 0 (Success) ()

      [14/Dec/2022:21:38:45.416286729 +0000] - ERR - setup_ol_tls_conn - failed: unable to create new TLS context - -1

      [14/Dec/2022:21:38:45.417004982 +0000] - ERR - slapi_ldap_bind - Error: could not configure the server for cert auth - error -1 - make sure the server is correctly configured for SSL/TLS

      [14/Dec/2022:21:38:45.417726595 +0000] - ERR - NSMMReplicationPlugin - bind_and_check_pwp - agmt="cn=From-edswift_NLEDSV11RBS.production.sipn.swift.com_389-to-edswift_CHEDSV11RBS.production.sipn.swift.com_49636-o\3Dswift" (CHEDSV11RBS:49636) - Replication bind with EXTERNAL auth failed: LDAP error 0 (Success) ()

      [14/Dec/2022:21:38:48.419211570 +0000] - ERR - setup_ol_tls_conn - failed: unable to create new TLS context - -1

      [14/Dec/2022:21:38:48.419906252 +0000] - ERR - slapi_ldap_init_ext - failed: unable to set SSL/TLS options

      [14/Dec/2022:21:38:48.420675562 +0000] - ERR - setup_ol_tls_conn - failed: unable to create new TLS context - -1

      [14/Dec/2022:21:38:48.421502803 +0000] - ERR - slapi_ldap_init_ext - failed: unable to set SSL/TLS options
      _________________________________________________________________

      I do not feel like these errors are the real culprit, they are only a symptom of what actually happened, which is what we are trying to figure out.

      I did get this output though from each system just in case:
      _________________________________________________________________

      nmae@cheddv11 ~]$ dsconf -D "cn=directory manager" ldap://cheddv11 security certificate list
      Enter password for cn=directory manager on ldap://cheddv11:
      Certificate Name: Server-Cert
      Subject DN: CN=cheddv11rbs.production.sipn.swift.com,OU=hosts,OU=ed,OU=swiftnet,O=swift,C=ww
      Issuer DN: O=SWIFT
      Expires: 2024-11-09 14:31:11
      Trust Flags: u,u,u

      [root@chedsv11 ~]# dsconf -D "cn=directory manager" ldap://chedsv11 security certificate list
      Enter password for cn=directory manager on ldap://chedsv11:
      Certificate Name: Server-Cert
      Subject DN: CN=chedsv11rbs.production.sipn.swift.com,OU=hosts,OU=ed,OU=swiftnet,O=swift,C=ww
      Issuer DN: O=SWIFT
      Expires: 2024-11-08 17:41:57
      Trust Flags: u,u,u

      [nmae@nleddv11 ~]$ dsconf -D "cn=directory manager" ldap://nleddv11 security certificate list
      Enter password for cn=directory manager on ldap://nleddv11:
      Certificate Name: Server-Cert
      Subject DN: CN=nleddv11rbs.production.sipn.swift.com,OU=hosts,OU=ed,OU=swiftnet,O=swift,C=ww
      Issuer DN: O=SWIFT
      Expires: 2024-11-08 19:50:39
      Trust Flags: u,u,u

      [nmae@nledsv11 ~]$ dsconf -D "cn=directory manager" ldap://nledsv11 security certificate list
      Enter password for cn=directory manager on ldap://nledsv11:
      Certificate Name: Server-Cert
      Subject DN: CN=nledsv11rbs.production.sipn.swift.com,OU=hosts,OU=ed,OU=swiftnet,O=swift,C=ww
      Issuer DN: O=SWIFT
      Expires: 2024-11-03 20:11:04
      Trust Flags: u,u,u
      _________________________________________________________________

      Which shows that the certs are all good. I could not find anything on those errors, except checking the code, and they are pretty self explanitory. But again, I do not think this is the actual issue, just a symptom.

      From the Customer this started on NLEDSV11, and then cascaded to the others:
      _________________________________________________________________

      The replication issue started first on NLEDSV11 at 21:38 GMT.
      CHEDSV11, NLEDDV11, CHEDDV11 had the issue start at 21:41 GMT.
      The time on all 4 hosts is in sync.
      We have verified in our firewall that during the time of the replication issue no replication traffic left the EDD hosts to any of the EDZ hosts. This shows no TLS session were being created when the issue occurred.
      We did verify ldapsearch over TLS between NLEDSV11 and the other 3 suppliers during the issue still worked.
      _________________________________________________________________

      We have a very small subset of data here for this incident.

      We have an SOS from each machine during the time of the incident, I will post some errors from the machines below ( I will keep it as minimal as I can):

      From nledsv11: These are the errors above and where the "issue" originated.

      From chedsv11:

      [14/Dec/2022:21:41:48.877700150 +0000] - ERR - setup_ol_tls_conn - failed: unable to create new TLS context - -1
      [14/Dec/2022:21:41:48.878515793 +0000] - ERR - slapi_ldap_init_ext - failed: unable to set SSL/TLS options
      [14/Dec/2022:21:41:48.880100385 +0000] - WARN - Security Initialization - SSL alert: Sending pin request to SVRCore. You may need to run systemd-tty-ask-password-agent to provide the password.
      [14/Dec/2022:21:41:48.881244075 +0000] - WARN - Security Initialization - SSL alert: SSL key file ((null)) for client authentication does not exist. Using Server-Key
      [14/Dec/2022:21:41:48.882295578 +0000] - WARN - Security Initialization - SSL alert: SSL cert file ((null)) for client authentication does not exist. Using Internal (Software) Token:Server-Cert
      [14/Dec/2022:21:41:48.883310045 +0000] - ERR - setup_ol_tls_conn - failed: unable to create new TLS context - -1
      [14/Dec/2022:21:41:48.883972537 +0000] - ERR - slapi_ldap_bind - Error: could not configure the server for cert auth - error -1 - make sure the server is correctly configured for SSL/TLS
      [14/Dec/2022:21:41:48.884538351 +0000] - ERR - NSMMReplicationPlugin - bind_and_check_pwp - agmt="cn=From-edswift_CHEDSV11RBS.production.sipn.swift.com_389-to-edswift_NLEDDV11RBS.production.sipn.swift.com_49636-o\3Dswift" (NLEDDV11RBS:49636) - Replication bind with EXTERNAL auth failed: LDAP er

      From nleddv11:

      [14/Dec/2022:21:41:48.994831496 +0000] - ERR - setup_ol_tls_conn - failed: unable to create new TLS context - -1
      [14/Dec/2022:21:41:48.995642544 +0000] - ERR - slapi_ldap_init_ext - failed: unable to set SSL/TLS options
      [14/Dec/2022:21:41:48.996939235 +0000] - WARN - Security Initialization - SSL alert: Sending pin request to SVRCore. You may need to run systemd-tty-ask-password-agent to provide the password.
      [14/Dec/2022:21:41:48.997817156 +0000] - WARN - Security Initialization - SSL alert: SSL key file ((null)) for client authentication does not exist. Using Server-Key
      [14/Dec/2022:21:41:48.998517460 +0000] - WARN - Security Initialization - SSL alert: SSL cert file ((null)) for client authentication does not exist. Using Internal (Software) Token:Server-Cert
      [14/Dec/2022:21:41:48.999324911 +0000] - ERR - setup_ol_tls_conn - failed: unable to create new TLS context - -1
      [14/Dec/2022:21:41:48.999912585 +0000] - ERR - slapi_ldap_bind - Error: could not configure the server for cert auth - error -1 - make sure the server is correctly configured for SSL/TLS
      [14/Dec/2022:21:41:49.000497032 +0000] - ERR - NSMMReplicationPlugin - bind_and_check_pwp - agmt="cn=From-NLEDDV11RBS.production.sipn.swift.com_389-to-edswift_NLEDZV12RBS.production.sipn.swift.com_49636-o\3Dswift" (NLEDZV12RBS:49636) - Replication bind with EXTERNAL auth failed: LDAP error 0 (S
      uccess) ()

      From cheddv11:

      [14/Dec/2022:21:41:48.906881115 +0000] - ERR - setup_ol_tls_conn - failed: unable to create new TLS context - -1
      [14/Dec/2022:21:41:48.907864585 +0000] - ERR - slapi_ldap_init_ext - failed: unable to set SSL/TLS options
      [14/Dec/2022:21:41:48.908927730 +0000] - ERR - setup_ol_tls_conn - failed: unable to create new TLS context - -1
      [14/Dec/2022:21:41:48.909495731 +0000] - ERR - slapi_ldap_init_ext - failed: unable to set SSL/TLS options
      [14/Dec/2022:21:41:48.910837517 +0000] - WARN - Security Initialization - SSL alert: Sending pin request to SVRCore. You may need to run systemd-tty-ask-password-agent to provide the password.
      [14/Dec/2022:21:41:48.911461963 +0000] - ERR - setup_ol_tls_conn - failed: unable to create new TLS context - -1
      [14/Dec/2022:21:41:48.912103750 +0000] - ERR - slapi_ldap_init_ext - failed: unable to set SSL/TLS options

      _________________________________________________________________

      Expectations from Swift:

      They want to know why this happened, becuase this scared them enough to make them not want to do their migration from HPDS to RHDS.
      They would also like for us to open a bug for this, which you will most likely be seeing very shortly. In which I will repeat all of this.

      I will be available if anyone has any questions. I will be watching replies here, if you need to reach me in IRC my nick is "toasty" and if you need something else, feel free to email me directly.

      Regards,

      Billy

              spichugi@redhat.com Simon Pichugin
              rhn-support-wrydberg William Rydberg
              IdM DS Dev IdM DS Dev
              IdM DS QE IdM DS QE
              Evgenia Martyniuk Evgenia Martyniuk
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated: