C++ PCRE (Perl compatible regular expressions) Wrapperklasse

C++ class template for PCRE (Perl compatible regular expressions)

Diese Klasse dient dazu, mithilfe der Perl Compatible Regular Expressions (PCRE) Bibliothek Volltext-Suchen-&-Ersetzen und Textextraktionen durchzuführen. Dabei werden Suchmuster, Ersetzungstext und Modifikatoren wie von Perl bekannt in einem string angegeben, so dass auf einfache Weise eine nutzerdefinierte Vor-/Nachverarbeitung (Konfigurationsdatei o.Ä.) von Textdaten in die C++ Software einfefügt werden kann:

Anmerkung: C++11 hat reguläre Ausdrücke in der STL #include <regex>.

Using this class you can easily add user definable text pre/post processing to your C++ software. It links against the (PCRE) library and wraps latter, so that search pattern, replace text and modifiers are defined in one string (as known form Perl: s/pattern/replace/modifiers):

Note: The C++11 STL encompasses regular expressions (#include <regex>).

// Config file, command line, or the like:
// preprocess= s/^ .*? (find\n) .* $/replace: \1/smix
 
// C++ Program:
std::string text_to_preprocess = "...";
 
// text_to_preprocess will be modified (passed by reference) to operator()
sw::pcre_regex(config.preprocess_regex())(text_to_preprocess);

Anmerkung: Für RegEx-Operationen, die im C++ Quelltext definiert werden, ist es besser und flexibler statt dieser Klasse direkt pcrecpp::RE verwenden. (Die Autoren von pcre/pcrecpp haben die Schnittstelle wirklich gut hinbekommen.)

Annotation: Use the pcrecpp::RE class directly if you deal with RegEx operations that you define in your C++ source code. This is more flexible and prevents unnecessary overhead. The pcre/pcrecpp authors made a good job according to the interface.

Dateien

Files

pcre.hh Example program: pcref.cc Makefile

Klassenquelltext

Class source code

/**
 * @package de.atwillys.cc.swl
 * @license BSD (simplified)
 * @author Stefan Wilhelm (stfwi)
 *
 * @file pcre.hh
 * @ccflags -Ipcre/include -Wno-long-long
 * @ldflags -lpcrecpp || libpcrecpp.a libpcre.a
 * @platform linux, bsd, windows
 * @standard >= c++98
 *
 * -----------------------------------------------------------------------------
 *
 * PCRE wrapper class template with implicit pattern parsing for text
 * extraction / replacement.  As search/match/replace/extract specifications
 * are given as one string, this class is suitable to be easily used as user
 * definable pre/post processing, e.g. via command line arguments or configuration
 * files.
 *
 * Perl-like patterns e.g.:
 *
 *  - '/pattern/mods'              returns first match
 *  - '/pattern/extract/mods'      returns first match (with replacement spec)
 *  - 's/pattern/replace/mods'     replaces all occurrences
 *
 *  - allowed separators: `/`, `|`, `#` (e.g. `m|pattern|opts`)
 *
 *  - allowed modifiers:
 *
 *   `i`  Ignore case (as in Perl).
 *   `x`  Permit whitespaces and comments in the pattern (as in Perl).
 *   `m`  Multi line: `^` and `$` match start/end of the whole text (as in Perl).
 *   `s`  `.` matches newlines as well (as in Perl)." nl2
 *   `$`  `$` matches only at the end (else normal dollar sign)." nl
 *   `!`  Meaning of `*?` and `*` swapped (`*?` now consumes as much as possible).
 *   `*`  Disable parenthesise (subexpression) matching.
 *   `X`  Extra (PCRE strict escape parsing).
 *
 * Pattern examples:
 *
 *  "/([xy]=[\\d\\.e])/\\1:\\2/"    Extract first of x,y=float, reformat = to :
 *  "m/([xyz]=[\\d\\.e])/$1:$2/"    Same as above
 *  "s/([xyz]=[\\d\\.e])/\\1:\\2/"  Replace all x,y,z=float from `=` to `:`
 *  "s| [\\n]*(abc) [\\s]* |X|smix" Replace abc with X, ignore case, multiline
 *
 * Usage example:
 *
 *  pcre_regex re;
 *  re.pattern(my_pattern).apply_to(string_reference);
 *  if(re.ok()) { ... } else { throw re.error(); }
 *
 * Template specialisation (std::string):
 *
 *  - typedef detail::basic_pcre<std::string> pcre_regex;
 *
 * -----------------------------------------------------------------------------
 *
 * Hint: Getting/building PCRE from source
 *
 * In your makefile this the target `update-pcre`, which will retrieve the
 * data form the official SVN repository into the subdirectory `pcre`, build,
 * and strip everything except the includes and libs.
 *
 *   +++ Makefile +++
 *
 *   .PHONY: update-pcre
 *   update-pcre:
 *     @-rm -rf pcre
 *     @mkdir pcre
 *     @cd pcre; svn co svn://vcs.exim.org/pcre/code/trunk src
 *     @cd pcre/src; ./autogen.sh
 *     @cd pcre/src; ./configure --enable-utf --prefix=$(shell pwd)/pcre/
 *     @cd pcre/src; make
 *     @cd pcre/src; make install
 *     @cd pcre/src; make clean
 *     @cd pcre; rm -rf bin
 *     @cd pcre; rm -rf share
 *     @cd pcre/lib; rm -rf pkgconfig
 *     @cd pcre; rm -rf src
 *
 * -----------------------------------------------------------------------------
 * +++ BSD license header +++
 * Copyright (c) 2009-2014, Stefan Wilhelm (stfwi, <cerbero s@atwilly s.de>)
 * All rights reserved.
 * Redistribution and use in source and binary forms, with or without modification,
 * are permitted provided that the following conditions are met: (1) Redistributions
 * of source code must retain the above copyright notice, this list of conditions
 * and the following disclaimer. (2) Redistributions in binary form must reproduce
 * the above copyright notice, this list of conditions and the following disclaimer
 * in the documentation and/or other materials provided with the distribution.
 * (3) Neither the name of atwillys.de nor the names of its contributors may be
 * used to endorse or promote products derived from this software without specific
 * prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS
 * AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING,
 * BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
 * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER
 * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
 * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT
 * OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
 * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
 * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY
 * WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
 * DAMAGE.
 * -----------------------------------------------------------------------------
 */
#ifndef SW__PCRE_HH
#define SW__PCRE_HH
 
#include <pcrecpp.h>
#include <string>
#include <iostream>
#include <vector>
 
using namespace std;
 
namespace sw { namespace detail {
 
template <typename str_t>
class basic_pcre {
public:
 
  /**
   * Construct empty PCRE
   */
  inline basic_pcre() : is_replace_(false), is_global_(false), sep_('/'),
      separators_("/|#%"), pattern_(), srch_(), flgs_(), repl_(), error_(),
      re_("")
  { ; }
 
  /**
   * PCRE with pattern to compile (immediately compiled)
   * @param const str_t pattern__
   */
  inline basic_pcre(const str_t pattern__) : is_replace_(false), is_global_(false),
      sep_('/'), separators_("/|#%"), pattern_(), srch_(), flgs_(),
      repl_(), error_(), re_("")
  { pattern(pattern__); }
 
  /**
   * Copy contstructor
   * @param re__
   */
  inline basic_pcre(const basic_pcre &re__) : is_replace_(re__.is_replace_),
      is_global_(re__.is_global_), sep_(re__.sep_), separators_(re__.separators_),
      pattern_(re__.pattern_), srch_(re__.srch_), flgs_(re__.flgs_), repl_(re__.repl_),
      error_(re__.error_), re_(re__.re_)
  { ; }
 
  /**
   * Destructor
   */
  virtual ~basic_pcre()
  { ; }
 
public:
 
  /**
   * Returns the complete pattern given
   * @return const str_t &
   */
  inline const str_t & pattern() const
  { return pattern_; }
 
  /**
   * Returns parsed search part of the pattern
   * @return const str_t &
   */
  inline const str_t & search() const
  { return srch_; }
 
  /**
   * Returns parsed replace part of the pattern (empty if no replace)
   * @return const str_t &
   */
  inline const str_t & replace() const
  { return repl_; }
 
  /**
   * Returns search/replace options part of the pattern
   * @return const str_t &
   */
  inline const str_t & modifiers() const
  { return flgs_; }
 
  /**
   * Returns an error text, empty string if no error
   * @return const str_t &
   */
  inline const str_t & error() const
  { return error_; }
 
  /**
   * Returns true if there is no error.
   * @return bool
   */
  inline bool ok() const
  { return error_.empty(); }
 
  /**
   * Returns true if global search/replace (all occurances, not only first one)
   * is set.
   * @return bool
   */
  inline bool is_global() const
  { return is_global_; }
 
  /**
   * Returns true if the pattern says that the expression shall replace, not
   * search.
   * @return bool
   */
  inline bool is_replace() const
  { return is_replace_; }
 
public:
 
  /**
   * Quote a string
   * @param const str_t& s
   * @return str_t
   */
  inline static str_t quote(const str_t& s)
  { pcrecpp::RE::QuoteMeta(pcrecpp::StringPiece(s)); }
 
public:
 
  /**
   * Resets the object, clear all contents.
   * @return basic_pcre& *this
   */
  inline basic_pcre& clear()
  { pattern_ = srch_ = repl_ = flgs_ = error_ = ""; return *this; }
 
  /**
   * Sets the pattern to search/replace, parses the pattern components
   * and compiles the regex string. Does not explicitly throw exceptions,
   * but sets an error string fetchable using`error()`.
   * @param const str_t &pattern
   * @return basic_pcre& *this
   */
  basic_pcre& pattern(const str_t &pattern)
  {
    str_t pt, op, rp; // pattern, options, replace
    char sep; // separator
    bool is_rp = false, is_match = false;
    clear();
 
    pattern_ = pt = pattern; // e.g. [sm]/^(.*?)$//[ig]
 
    if(pt.length()<3) { // e.g. "//" or "||"
      error_ = "Empty pattern";
      return *this;
    }
 
    // Optional first pattern characters 's', 'm'
    if(pt[0] == 's') { // definitely search replace, otherwise check
      is_rp = true;
      pt = pt.length()>1 ? pt.substr(1) : "";
    } else if(pt[0] == 'm') { // definitely match
      is_match = true;
      pt = pt.length()>1 ? pt.substr(1) : "";
    }
    // Pattern separator detection
    if(str_t(separators_).find(pt[0]) == str_t::npos) {
      error_ = "Invalid pattern (must start with one of the separators: ";
      error_ += separators_ + ")";
      return *this;
    }
    sep = pt[0];
    pt = pt.substr(1);
 
    const str_t aflags = "ixsmU!$X*g"; // allowed modifiers/flags/options
    unsigned k;
    for(k=pt.length()-1; k>1; k--) {
      if(pt[k] == sep) break;
      if(aflags.find_first_of(pt[k]) == str_t::npos) {
        error_ = str_t("Unknown modifier/option '") + pt[k] + "'";
        return *this;
      }
    }
    if(pt[k] != sep) {
      error_ = "Empty pattern";
      return *this;
    }
 
    if(k<pt.length()-1) op = pt.substr(k+1);
    pt = pt.substr(0, k);
 
    do {
      str_t pt1 = pt + "\\"; // Temporary \ to fit in tailing backslashes
      pt.clear(); pt.reserve(pt1.length()*2);
 
      // Search pattern: First unescaped character means "end of search",
      // except if explicitly defined only search using 'm' as first pattern
      // character.
      bool done = false;
      for(k=0; !done && k<pt1.length()-1; k++) {
        switch(pt1[k]) {
          case '\\':
            if(pt1[k+1]=='\\' || pt1[k+1]==sep) {
              pt.push_back(pt1[++k]);
            } else {
              pt.push_back(pt1[k]);
            }
            break;
          case '\n':
            // re-escape (depends on shell)
            pt += "\\n";
            break;
          case '\t':
            pt += "\\t";
            break;
          case '\r':
            pt += "\\r";
            break;
          default:
            if(!is_match && pt1[k]==sep) {
              done = true;
            } else { // user didn't escape the separator, try to "see it right".
              pt.push_back(pt1[k]);
            }
        }
      }
 
      if((k<pt1.length()-1 || done)) { // last char is "\"
        // Replace pattern: rest of it, unescaped separators ignored and used
        // as normal character, but escaping allowed.
        // References: $1... and \1... allowed.
        for(; k<pt1.length()-1; k++) {
          switch(pt1[k]) {
            case '\\':
              switch(pt1[k+1]) {
                case '\\': rp.push_back(pt1[++k]); break;
                case '$': rp.push_back(pt1[++k]); break;
                case '0': rp.push_back('\0'); ++k; break;
                case 'a': rp.push_back('\a'); ++k; break;
                case 'b': rp.push_back('\b'); ++k; break;
                case 'f': rp.push_back('\f'); ++k; break;
                case 'n': rp.push_back('\n'); ++k; break;
                case 'r': rp.push_back('\r'); ++k; break;
                case 't': rp.push_back('\t'); ++k; break;
                case 'v': rp.push_back('\v'); ++k; break;
                default:
                  rp.push_back( (pt1[k+1]==sep) ? pt1[++k] : pt1[k]);
              }
              break;
            case '$':
              rp += (std::isdigit(pt1[k+1])) ? "\\" : "$"; // replace $1 --> \1
              break;
            default:
              rp.push_back(pt1[k]);
          }
        }
      }
    } while(0);
 
    if(pt.empty()) {
      error_ = "Empty pattern";
      return *this;
    }
 
    pcrecpp::RE_Options opts;
    opts.set_match_limit(10000); // const for now.
    opts.set_match_limit_recursion(500); // const for now.
    opts.set_caseless((op.find('i') != str_t::npos));
    opts.set_utf8((op.find('U') == str_t::npos));
    opts.set_extended((op.find('x') != str_t::npos));
    opts.set_dotall((op.find('s') != str_t::npos));
    opts.set_multiline((op.find('m') != str_t::npos));
    opts.set_ungreedy((op.find('!') != str_t::npos));
    opts.set_dollar_endonly((op.find('$') != str_t::npos));
    opts.set_extra((op.find('X') != str_t::npos));
    opts.set_no_auto_capture((op.find_first_of("*") != str_t::npos));
    re_ = pcrecpp::RE(pt, opts);
    is_global_ = is_rp && (op.find_first_of("g") != str_t::npos);
    is_replace_ = is_rp;
    srch_ = pt;
    repl_ = rp;
    flgs_ = op;
    if(!re_.error().empty()) error_ = re_.error();
    return *this;
  }
 
  /**
   * Applies the search/replace pattern regex to the given string.
   * THE STRING WILL BE MODIFIED.
   * @param str_t & subject
   * @return basic_pcre& *this
   */
  basic_pcre& apply_to(str_t &subject)
  {
    if(!error_.empty()) return *this;
    if(srch_.empty()) { error_ = "No pattern given to search/replace."; return *this; }
 
    if(is_replace_) {
      if(is_global_) {
        re_.GlobalReplace(repl_, &subject);
      } else {
        re_.Replace(repl_, &subject);
      }
    } else {
      str_t out;
      str_t rep = repl_.empty() ? str_t("\\0") : repl_;
      re_.Extract(rep, subject, &out);
 
 
      subject = out;
    }
    if(!re_.error().empty()) {
      error_ = re_.error();
      subject = "";
    }
    return *this;
  }
 
  /**
   * Applies the search/replace pattern regex to the given string and returns the result.
   * THE STRING WILL BE MODIFIED.
   * @param const str_t& subject
   * @return str_t
   */
  str_t operator () (const str_t& subject)
  { str_t s=subject; apply_to(s); if(!ok()) s=""; return s; }
 
protected:
 
  bool is_replace_;
  bool is_global_;
  typename str_t::value_type sep_;
  str_t separators_; // Allowed separators
  str_t pattern_;    // The whole pattern given
  str_t srch_;
  str_t flgs_;
  str_t repl_;
  str_t error_;
  pcrecpp::RE re_;   // PCRE main object
 
};
 
/**
 * ostream <<
 * @param std::basic_ostream<typename str_t::value_type>& os
 * @param const basic_pcre<str_t> re
 * @return std::basic_ostream<typename str_t::value_type>&
 */
template <typename str_t>
std::basic_ostream<typename str_t::value_type>& operator << (
   std::basic_ostream<typename str_t::value_type>& os,
   const basic_pcre<str_t>& re
)
{
  #define nl std::endl
  os << nl << "{" << nl
     << " - ok: " << (re.ok() ? "yes" : "no") << nl
     << " - pattern: \"" << re.pattern() << "\"" << nl;
  if(re.is_replace()) {
    os << " - search: \""  << re.search() << "\"" << nl;
  } else {
    os << " - match: \""  << re.search() << "\"" << nl;
  }
  if(!re.replace().empty()) {
    os << " - replace: \"" << re.replace() << "\"" << nl;
  }
  os << " - modifiers: \"" <<  re.modifiers() << "\"," << nl;
  if(re.is_global()) {
    os << "   - global: replace all matches (`g`)" << nl;
  } else {
    os << "   - not global: replace only first match (no `g`)" << nl;
  }
  if(re.modifiers().find('i') != str_t::npos) {
    os << "   - case insensitive matching (`i`)" << nl;
  } else {
    os << "   - case sensitive matching (no `i`)" << nl;
  }
  if(re.modifiers().find('x') != str_t::npos) {
    os << "   - whitespaces and commente in pattern permitted (`x`)" << nl;
  } else {
    os << "   - comments/unmatched spaces in pattern not permitted (no `x`)" << nl;
  }
  if(re.modifiers().find('m') != str_t::npos) {
    os << "   - Multiline (^/$ match start/end of text) (`m`)" << nl;
  } else {
    os << "   - Line-by-line (^/$ match start/end of line) (no `m`)" << nl;
  }
  if(re.modifiers().find('s') != str_t::npos) {
    os << "   - `.` matches newlines as well (`s`)" << nl;
  } else {
    os << "   - `.` does not match newlines (no `s`)" << nl;
  }
  if(re.modifiers().find('$') != str_t::npos) {
    os << "   - `$` matches only at the end. (`$`)" << nl;
  }
  if(re.modifiers().find('!') != str_t::npos) {
    os << "   - Meaning of `*?` and `*` swapped. (`!`)" << nl;
  }
  if(re.modifiers().find('*') != str_t::npos) {
    os << "   - Sub pattern matching disabled (`*`)" << nl;
  }
  if(re.modifiers().find('X') != str_t::npos) {
    os << "   - (PCRE:) Extra strict pattern escape parsing. (`X`)" << nl;
  }
  if(re.modifiers().find('U') != str_t::npos) {
    os << "   - UTF support disabled. (`U`)" << nl;
  } else {
    os << "   - UTF support enabled. (no `U`)" << nl;
  }
  os << "}" << nl;
  return os;
  #undef nl
}
 
}}
 
namespace sw {
  typedef detail::basic_pcre<std::string> pcre_regex;
}
#endif

Beispielprogramm

Example program

/**
 * @package de.atwillys.cc.app
 * @license BSD (simplified)
 * @author stfwi
 *
 * @ccflags: -Ipcre/include -Wno-long-long
 * @ldflags: pcre/lib/libpcrecpp.a pcre/lib/libpcre.a
 *
 * -----------------------------------------------------------------------------
 *
 * PCRE based text filter.
 *
 * -----------------------------------------------------------------------------
 * +++ BSD license header (You know that ...) +++
 * Copyright (c) 2013, StfWi
 * All rights reserved.
 * Redistribution and use in source and binary forms, with or without modification,
 * are permitted provided that the following conditions are met: (1) Redistributions
 * of source code must retain the above copyright notice, this list of conditions
 * and the following disclaimer. (2) Redistributions in binary form must reproduce
 * the above copyright notice, this list of conditions and the following disclaimer
 * in the documentation and/or other materials provided with the distribution.
 * (3) Neither the name of atwillys.de nor the names of its contributors may be
 * used to endorse or promote products derived from this software without specific
 * prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS
 * AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING,
 * BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
 * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER
 * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
 * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT
 * OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
 * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
 * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY
 * WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
 * DAMAGE.
 * -----------------------------------------------------------------------------
 */
#include "pcre.hh"
#include <iostream>
#include <string>
#include <vector>
#include <sstream>
#if defined (__linux__) || defined __APPLE__ & __MACH__
#define IS_NIX
#endif
 
#define APP_NAME "pcref"
#define APP_VER  "1.0"
 
using namespace std;
 
/**
 * Prints help text
 */
void help()
{
  #define nl  << std::endl
  #define nl2 << std::endl << std::endl
  #define an << APP_NAME <<
  cerr
  << "NAME" nl2
  << "  " << APP_NAME nl2
  << "SYNOPSIS" nl2
  << "  " an " [-h|--help|-v] '<pattern1>' ['<pattern2>'] [...]" nl2
  << "DESCRIPTION" nl2
  << "  Perl Compatible Regular Expression text Filter." nl2
  << "  The program allows performing match-extract / search-replace operations" nl
  << "  with pattern known from PCRE (or Perl: stdout = stdin =~ <pattern>), where" nl
  << "  one of [`/`, `|`, `#`] can be chosen as pattern separator." nl2
  << "    `" an " 'm/pattern/'` or `" an " '/pattern/'`: Prints first match to" nl
  << "     stdout." nl2
  << "    `" an " 's/pattern/replace/': Replaces all occurrences ot the pattern with" nl
  << "    the replace text (accepted subexpression references are `\\1`,`\\2`, etc," nl
  << "    and `$1`,`$2` etc, both have the same meaning)." nl2
  << " Modifiers:" nl2
  << "   Modifiers are appended to the pattern as known from Perl / PCRE" nl
  << "   (`" an " /pattern/modifiers` or `" an "/pattern/replace/modifiers`)." nl2
  << "   `i`  Ignore case (as in Perl)." nl
  << "   `x`  Permit whitespaces and comments in the pattern (as in Perl)." nl
  << "   `m`  Multi line: `^` and `$` match start/end of the whole text (as in Perl)." nl
  << "   `s`  `.` matches newlines as well (as in Perl)." nl2
  << "   `1`  (Character 'one') Extract/replace only first match, not the whole text." nl
  << "   `$`  `$` matches only at the end (else normal dollar sign)." nl
  << "   `!`  Meaning of `*?` and `*` swapped (`*?` now consumes as much as possible)." nl
  << "   `*`  Disable parenthesise (subexpression) matching." nl
  << "   `X`  Extra (PCRE strict escape parsing)." nl
  << "   `U`  Disable UTF support." nl2
  << " Sequential execution:" nl2
  << "   You can specify multiple expressions as command line arguments, they" nl
  << "   will be processed sequentially, and the final result will be printed" nl
  << "   to stdout. E.g." nl2
  << "     echo 'ABC DEF YES' | " an " 's/ABC[\\s]?/X/' '/(\\w+)\\s(\\w+)/$1=$2/'" nl
  << "                                ( --> XDEF YES)    ( --> XDEF=YES)" nl2
  << " Examples:" nl2
  << "   - Remove tailing spaces of each line:" nl2
  << "     " an " 's/^(.*?)[\\s]+(\\n|$)/$1$2/m'" nl2
  << "   - Extract body from HTML:" nl2
  << "     " an " '|< [\\s]* body .*? > (.*?) <[\\s]* / [\\s]* body |$1|smix1'" nl2
  << "   - Section of an ini-file to json object:" nl2
  << "     " an " '/(.*)/\\n$1\\n/sm' \\" nl
  << "           '/.*? \\n \\[SECTION_NAME\\] [\\s]* (.*?) \\n (\\[|$) /$1/smix1' \\" nl
  << "           's/^([\\w]+) [\\s]* = [\\s]* (.*) ($|\\n)/$1: \"$2\"/imx' \\" nl
  << "           's#\\n#, #imx' \\" nl
  << "           'm|(.*)|{ $1 }|'" nl2
  << " Annotations:" nl2
  << "   - The replace function is global by default, as this is the most often" nl
  << "     used. You can switch it of to replace only one using the modifier `1`." nl2
  << "   - The match operation (optionally) takes a replace part to rearrange" nl
  << "     the matched string using subexpressions (`m/<pattern>/replace/mods`)," nl
  << "     so that the match operation is practically an extract operation." nl2
  << "   - Replace returns the input string if no pattern matches, extract an" nl
  << "     empty string if a pattern does not match." nl2
  << "   - The program always reads the complete text (to memory) before processing." nl
  << "     Hence, large texts cause a higher memory consumption." nl2
  << "   - On error the program does not return any text to stdout." nl2
  << "   - The program understands common escape sequences in the replace text:" nl
  << "     \\n, \\r, \\t, \\v, \\f, \\a, \\b." nl2
  << "ARGUMENTS" nl2
  << "  -h,  --help     Show this help" nl2
  << "  -v,  --verbose  Increased verbosity (outputs to stderr)" nl2
  << "  -vv, --debug    High verbosity (debug information if compiled with)" nl2
  << "  <pattern>       A perl compatible regex pattern as described above." nl2
  << "RETURN VALUES" nl2
  << "  returns 0 on success," nl
  << "          1 on error" nl2
  << "SEE ALSO" nl2
  << "  perlre, pcregrep, grep, egrep, sed, awk, ex" nl2
  << APP_NAME << " v" << APP_VER << ", stfwi; credits to libpcre author(s)." nl
  ;;
  #undef nl
  #undef nl2
  #undef an
}
 
typedef std::string str_t;
typedef std::vector<sw::pcre_regex> pcre_vector;
 
/**
 * Main
 * @param int argc
 * @param char** argv
 * @return int
 */
int main(int argc, char** argv)
{
  try {
 
    // Command line arguments
    if(argc < 2) throw "No expression given (try " APP_NAME " --help)";
    str_t s;
    pcre_vector rx;
    int verbosity = 0;
 
    // Command line first arg (the very rudimentary way ...)
    int i=1;
    if(argc > 1 && argv[1]) {
      str_t arg = argv[1];
      if(arg == "-h" || arg == "--help") {
        help();
        return 1;
      } else if(arg == "-v" || arg == "--verbose") {
        verbosity = 1;
        i++;
      } else if(arg == "-vv" || arg == "-v2" || arg == "--debug") {
        verbosity = 2;
        i++;
      }
    }
 
    // Assign and parse patterns before dealing with the text
    for(; i<argc && argv[i]; i++) {
      rx.push_back(sw::pcre_regex(argv[i]));
      if(!rx.back().ok()) {
        s = "Expression "; // ref s existing in main()
        if(i>10) s.push_back('0'+(i/10));
        s.push_back('0'+(i%10));
        s += ": ";
        s += rx.back().error();
        throw s;
      }
    }
 
    #ifdef IS_NIX
    fd_set fds; struct timeval t; t.tv_sec = 2; t.tv_usec = 0;
    FD_ZERO(&fds); FD_SET(STDIN_FILENO, &fds);
    if(select(2, &fds, NULL, NULL, &t) <= 0 || !FD_ISSET(STDIN_FILENO, &fds)) {
      throw "Pipe in your text data.";
    }
    int n = 0; char buf[512]; buf[511] = '\0';
    while((n=::read(STDIN_FILENO, buf, 511)) > 0) { buf[n]='\0'; s += buf; }
    if(n!=0) throw "Failed to read from stdin";
    #else
    s.clear();
    char c; while(cin.get(c)) s += c;
    #endif
 
    // Verbose: print before applying expressions
    if(verbosity > 0) {
      for(unsigned i=0; i<rx.size(); i++) {
        cerr << "Expression " << ((int)(i+1)) << ": " << rx[i];
      }
    }
 
    for(unsigned i=0; i<rx.size(); i++) {
      sw::pcre_regex &re = rx[i];
      if(!re(s).ok()) {
        s.clear(); // reassign s, output no more valid.
        s.reserve(32);
        s = "Expression ";
        if(i>10) s.push_back('0'+(i/10));
        s.push_back('0'+(i%10));
        s += ": ";
        s += re.error();
        throw s;
      }
    }
 
    cout << s;
  } catch(const str_t &e) {
    cerr << "Error: " << e << endl;
    return 1;
  } catch(const char *e) {
    cerr << "Error: " << e << endl;
    return 1;
  }
  return 0;
}

Makefile

CC=g++ -c
LD=g++
CCFLAGS=-Wall -O3 -pedantic -Wno-long-long
LDFLAGS=
OUTNAME=pcref
INC=-Ipcre/include
LIBS=pcre/lib/libpcrecpp.a pcre/lib/libpcre.a
 
#-------------------------------------------------------------------------------
# OS specific
#-------------------------------------------------------------------------------
ifeq ($(shell uname), Linux)
LDFLAGS+= -static-libgcc -s
#LDFLAGS+= -static
endif
ifeq ($(shell uname), Darwin)
endif
#-------------------------------------------------------------------------------
 
.PHONY: all
all: $(OUTNAME) clean
 
.PHONY: clean
clean:
    @-rm -f *.o >/dev/null 2>&1
 
.PHONY: info
info: $(OUTNAME)
    @# Well let's use the program directly to format its info ...
    @file $(OUTNAME) | ./$(OUTNAME) '/^.*?:[\s]*(.*)/ - \1/ms' 's/,/\n -/ms'
    -@readelf -d $(OUTNAME) 2>/dev/null | grep -i shared | ./$(OUTNAME) \
          's/^.*?:[\s]*\[(.*?)\].*$$/\1/m' 's/[\s]+$$//sm' 's/\n/, /sm' \
          's/^(.*?)$$/ - dependencies: \1\n/sm'
    @du -h $(OUTNAME) | ./$(OUTNAME) '/^([\w\d\.]+)/ - size: $$1\n/smx'
 
.PHONY: test
test: $(OUTNAME)
    @cd test; ./test.sh
 
#.PHONY: $(OUTNAME)
$(OUTNAME): main.cc
    @$(CC) -o main.o main.cc $(CCFLAGS) $(INC)
    @$(LD) -o $(OUTNAME) main.o $(LDFLAGS) $(LIBS)
    -@rm -f main.o
    @echo Output binary is named \"$(OUTNAME)\"
 
.PHONY: get-pcre
get-pcre:
    @-rm -rf pcre
    @mkdir pcre
    @cd pcre; svn co svn://vcs.exim.org/pcre/code/trunk src
    @cd pcre/src; ./autogen.sh
    @cd pcre/src; ./configure --enable-utf --prefix=$(shell pwd)/pcre/
    @cd pcre/src; make
    @cd pcre/src; make install
    @cd pcre/src; make clean
    @cd pcre; rm -rf bin
    @cd pcre; rm -rf share
    @cd pcre/lib; rm -f *.dylib
    @cd pcre/lib; rm -f *.la
    @cd pcre/lib; rm -rf pkgconfig
    @cd pcre; rm -rf src

GIT repositories

Services

GNU octave web interface

C++ PCRE (Perl compatible regular expressions) Wrapperklasse

C++ class template for PCRE (Perl compatible regular expressions)

Dateien

Files

Klassenquelltext

Class source code

Beispielprogramm

Example program

Makefile

Makefile