Uploaded image for project: 'Product Technical Learning'
  1. Product Technical Learning
  2. PTL-5834

RH134-513: ch01s05 mixes basic and extended regex syntax without clarification

XMLWordPrintable

    • 1
    • en-US (English)

      URL:
      Reporter RHNID:
      Section: 5 -
      Language: en-US (English)||||||||
      Workaround:

      Description: In ch01s05 "Match Text in Command Output with Regular Expressions", the chapter primarily focuses on using the grep command in basic regular expression syntax mode as documented in the regex(7) man page.  (Grep uses basic regex syntax unless you call it with -E, which we do not do in this section.)

      However, the mistitled table "Regular Expressions in Bash" uses extended regular expression syntax.

      For example, the book has an earlier example with the following basic regular expression:

      'c.\{2\}t'

      which matches a 'c', any two characters, and a t.  As an extended regular expression (using grep -E) that syntax would be

      'c.{2}t'

      but we are using basic regular expressions throughout.

       

      The table has lines like this that use extended regular expression syntax:

      {n}      The preceding item is matched exactly n times.

       
      However, the correct table entry in basic regular expression syntax is: 

      \{n\}    The preceding item is matched exactly n times.

       
       
      Similar modifications are needed for the following examples that use curly brace syntax.  Note carefully what regex(7) says about basic ("obsolete") regex syntax: 

             Obsolete ("basic") regular  expressions  differ  in  several  respects.
             '|',  '+',  and  '?' are ordinary characters and there is no equivalent
             for their functionality.  The delimiters for bounds are "\{" and  "\}",
             with  '{'  and  '}' by themselves ordinary characters.  The parentheses
             for nested subexpressions are "\(" and "\)", with '(' and ')' by  them‐
             selves ordinary characters.  '^' is an ordinary character except at the
             beginning of the RE or(!) the beginning of a  parenthesized  subexpres‐
             sion,  '$'  is  an ordinary character except at the end of the RE or(!)
             the end of a parenthesized subexpression, and '*' is an ordinary  char‐
             acter  if  it  appears at the beginning of the RE or the beginning of a
             parenthesized subexpression (after a possible leading '^').
      

      The grep man page also says this (apparently missing the closing curly brace):

         Basic vs Extended Regular Expressions
             In basic regular expressions the meta-characters ?, +, {, |, (,  and  )
             lose  their  special  meaning; instead use the backslashed versions \?,
             \+, \{, \|, \(, and \).
      

       

              glsbugs-hybridcloud@redhat.com PTL - RHEL Team
              rht-sbonnevi Steven Bonneville
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: