Uploaded image for project: 'Undertow'
  1. Undertow
  2. UNDERTOW-2121

Predicate Language regex capture all groups

XMLWordPrintable

    • Icon: Feature Request Feature Request
    • Resolution: Unresolved
    • Icon: Major Major
    • 2.4.0.Final
    • None
    • Predicate Language
    • None

      When using more than one regex predicate, only the capture groups from the last regex are added to the predicate context exchange attachment.  This seems to be consistent with Apache mod_rewrite for example. 
       
      However, while helping a user convert some IIS rewrites to use Undertow's Predicate Language, I noticed IIS has a construct to allow ALL capture groups to be tracked across more than one condition using the trackAllCaptures flag.
       
      https://docs.microsoft.com/en-us/iis/extensions/url-rewrite-module/url-rewrite-module-20-configuration-reference#tracking-capture-groups-across-conditions
       

      <rule name="Back-references with trackAllCaptures set to true">
        <match url="^article\.aspx" >
        <conditions trackAllCaptures="true">
          <add input="{QUERY_STRING}" pattern="p1=([0-9]+)" />
          <add input="{QUERY_STRING}" pattern="p2=([a-z]+)" />
        </conditions>
        <action type="Rewrite" url="article.aspx/{C:1}/{C:2}" /> <!-- rewrite action uses back-references to both conditions -->
      </rule> 

       
      I think this could be useful in Undertow as well.  Any thoughts or feedback on an additional parameter to the regex() predicate like so:
       

      regex( "^/article\.aspx" )
        and regex( value="%q", pattern="p1=([0-9]+)", track-all-captures=true )
        and regex( value="%q", pattern="p2=([a-z]+)", track-all-captures=true ) )
        -> rewrite( "/article.aspx/${2}/${4}" ) 

       
      In my hypothetical example above, the capture groups stored in the predicate context exchange attachment for a URL of /article.aspx?p1=123&p2=abc would be as follows:

      • 0 - /article.aspx
      • 1 - p1=123
      • 2 - 123
      • 3 - p2=abc
      • 4 - abc
         
        To implement this, I would modify the RegularExpressionPredicate class * to add the number of existing items in the TreeMap to the numeric map key when *track-all-captures was set to true.  
         
        A possible issue would be that when the regex predicate was used in combination with any other predicate that also adds predicate context values, it would throw the count off.  For example, if the first predicate in my example above were replaced with a path-prefix() predicate, which sets the "remaining" key in the predicate context exchange attachment, it would cause the regex predicates in this case to start their capture groups at 1 instead of 0 since there would already be one items in the tree map.  I'm not sure of a great way around this since there is no way to tell which items in the tree map came from the regex predicate outside of parsing each of them as an integer which feels a bit sloppy.  In reality, I wish the regex capture groups were somehow namespaced inside the predicate context treemap to tell them apart from other values.  
         

            flaviarnn Flavia Rainone
            bdw429s Brad Wood
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: