C++ PCRE (Perl compatible regular expressions) Wrapperklasse
C++ class template for PCRE (Perl compatible regular expressions)
Diese Klasse dient dazu, mithilfe der Perl Compatible Regular Expressions
(PCRE) Bibliothek Volltext-Suchen-&-Ersetzen und Textextraktionen
durchzuführen. Dabei werden Suchmuster, Ersetzungstext und Modifikatoren wie
von Perl bekannt in einem string
angegeben, so dass auf einfache Weise eine
nutzerdefinierte Vor-/Nachverarbeitung (Konfigurationsdatei o.Ä.) von Textdaten
in die C++ Software einfefügt werden kann:
Anmerkung: C++11
hat reguläre Ausdrücke in der STL #include <regex>
.
Using this class you can easily add user definable text pre/post processing to
your C++ software. It links against the (PCRE) library and wraps
latter, so that search pattern, replace text and modifiers are defined in one
string (as known form Perl: s/pattern/replace/modifiers
):
Note: The C++11
STL encompasses regular expressions (#include <regex>
).
// Config file, command line, or the like:
// preprocess= s/^ .*? (find\n) .* $/replace: \1/smix
// C++ Program:
std::string text_to_preprocess = "...";
// text_to_preprocess will be modified (passed by reference) to operator()
sw::pcre_regex(config.preprocess_regex())(text_to_preprocess);
Anmerkung: Für RegEx
-Operationen, die im C++ Quelltext definiert werden,
ist es besser und flexibler statt dieser Klasse direkt pcrecpp::RE
verwenden.
(Die Autoren von pcre
/pcrecpp
haben die Schnittstelle wirklich gut hinbekommen.)
Annotation: Use the pcrecpp::RE
class directly if you deal with RegEx
operations that you define in your C++ source code. This is more flexible and
prevents unnecessary overhead. The pcre
/pcrecpp
authors made a good job
according to the interface.
Dateien
Files
pcre.hh Example program: pcref.cc Makefile
Klassenquelltext
Class source code
/**
* @package de.atwillys.cc.swl
* @license BSD (simplified)
* @author Stefan Wilhelm (stfwi)
*
* @file pcre.hh
* @ccflags -Ipcre/include -Wno-long-long
* @ldflags -lpcrecpp || libpcrecpp.a libpcre.a
* @platform linux, bsd, windows
* @standard >= c++98
*
* -----------------------------------------------------------------------------
*
* PCRE wrapper class template with implicit pattern parsing for text
* extraction / replacement. As search/match/replace/extract specifications
* are given as one string, this class is suitable to be easily used as user
* definable pre/post processing, e.g. via command line arguments or configuration
* files.
*
* Perl-like patterns e.g.:
*
* - '/pattern/mods' returns first match
* - '/pattern/extract/mods' returns first match (with replacement spec)
* - 's/pattern/replace/mods' replaces all occurrences
*
* - allowed separators: `/`, `|`, `#` (e.g. `m|pattern|opts`)
*
* - allowed modifiers:
*
* `i` Ignore case (as in Perl).
* `x` Permit whitespaces and comments in the pattern (as in Perl).
* `m` Multi line: `^` and `$` match start/end of the whole text (as in Perl).
* `s` `.` matches newlines as well (as in Perl)." nl2
* `$` `$` matches only at the end (else normal dollar sign)." nl
* `!` Meaning of `*?` and `*` swapped (`*?` now consumes as much as possible).
* `*` Disable parenthesise (subexpression) matching.
* `X` Extra (PCRE strict escape parsing).
*
* Pattern examples:
*
* "/([xy]=[\\d\\.e])/\\1:\\2/" Extract first of x,y=float, reformat = to :
* "m/([xyz]=[\\d\\.e])/$1:$2/" Same as above
* "s/([xyz]=[\\d\\.e])/\\1:\\2/" Replace all x,y,z=float from `=` to `:`
* "s| [\\n]*(abc) [\\s]* |X|smix" Replace abc with X, ignore case, multiline
*
* Usage example:
*
* pcre_regex re;
* re.pattern(my_pattern).apply_to(string_reference);
* if(re.ok()) { ... } else { throw re.error(); }
*
* Template specialisation (std::string):
*
* - typedef detail::basic_pcre<std::string> pcre_regex;
*
* -----------------------------------------------------------------------------
*
* Hint: Getting/building PCRE from source
*
* In your makefile this the target `update-pcre`, which will retrieve the
* data form the official SVN repository into the subdirectory `pcre`, build,
* and strip everything except the includes and libs.
*
* +++ Makefile +++
*
* .PHONY: update-pcre
* update-pcre:
* @-rm -rf pcre
* @mkdir pcre
* @cd pcre; svn co svn://vcs.exim.org/pcre/code/trunk src
* @cd pcre/src; ./autogen.sh
* @cd pcre/src; ./configure --enable-utf --prefix=$(shell pwd)/pcre/
* @cd pcre/src; make
* @cd pcre/src; make install
* @cd pcre/src; make clean
* @cd pcre; rm -rf bin
* @cd pcre; rm -rf share
* @cd pcre/lib; rm -rf pkgconfig
* @cd pcre; rm -rf src
*
* -----------------------------------------------------------------------------
* +++ BSD license header +++
* Copyright (c) 2009-2014, Stefan Wilhelm (stfwi, <cerbero s@atwilly s.de>)
* All rights reserved.
* Redistribution and use in source and binary forms, with or without modification,
* are permitted provided that the following conditions are met: (1) Redistributions
* of source code must retain the above copyright notice, this list of conditions
* and the following disclaimer. (2) Redistributions in binary form must reproduce
* the above copyright notice, this list of conditions and the following disclaimer
* in the documentation and/or other materials provided with the distribution.
* (3) Neither the name of atwillys.de nor the names of its contributors may be
* used to endorse or promote products derived from this software without specific
* prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS
* AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING,
* BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
* A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER
* OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
* EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT
* OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
* INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
* STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY
* WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
* DAMAGE.
* -----------------------------------------------------------------------------
*/
#ifndef SW__PCRE_HH
#define SW__PCRE_HH
#include <pcrecpp.h>
#include <string>
#include <iostream>
#include <vector>
using namespace std;
namespace sw { namespace detail {
template <typename str_t>
class basic_pcre {
public:
/**
* Construct empty PCRE
*/
inline basic_pcre() : is_replace_(false), is_global_(false), sep_('/'),
separators_("/|#%"), pattern_(), srch_(), flgs_(), repl_(), error_(),
re_("")
{ ; }
/**
* PCRE with pattern to compile (immediately compiled)
* @param const str_t pattern__
*/
inline basic_pcre(const str_t pattern__) : is_replace_(false), is_global_(false),
sep_('/'), separators_("/|#%"), pattern_(), srch_(), flgs_(),
repl_(), error_(), re_("")
{ pattern(pattern__); }
/**
* Copy contstructor
* @param re__
*/
inline basic_pcre(const basic_pcre &re__) : is_replace_(re__.is_replace_),
is_global_(re__.is_global_), sep_(re__.sep_), separators_(re__.separators_),
pattern_(re__.pattern_), srch_(re__.srch_), flgs_(re__.flgs_), repl_(re__.repl_),
error_(re__.error_), re_(re__.re_)
{ ; }
/**
* Destructor
*/
virtual ~basic_pcre()
{ ; }
public:
/**
* Returns the complete pattern given
* @return const str_t &
*/
inline const str_t & pattern() const
{ return pattern_; }
/**
* Returns parsed search part of the pattern
* @return const str_t &
*/
inline const str_t & search() const
{ return srch_; }
/**
* Returns parsed replace part of the pattern (empty if no replace)
* @return const str_t &
*/
inline const str_t & replace() const
{ return repl_; }
/**
* Returns search/replace options part of the pattern
* @return const str_t &
*/
inline const str_t & modifiers() const
{ return flgs_; }
/**
* Returns an error text, empty string if no error
* @return const str_t &
*/
inline const str_t & error() const
{ return error_; }
/**
* Returns true if there is no error.
* @return bool
*/
inline bool ok() const
{ return error_.empty(); }
/**
* Returns true if global search/replace (all occurances, not only first one)
* is set.
* @return bool
*/
inline bool is_global() const
{ return is_global_; }
/**
* Returns true if the pattern says that the expression shall replace, not
* search.
* @return bool
*/
inline bool is_replace() const
{ return is_replace_; }
public:
/**
* Quote a string
* @param const str_t& s
* @return str_t
*/
inline static str_t quote(const str_t& s)
{ pcrecpp::RE::QuoteMeta(pcrecpp::StringPiece(s)); }
public:
/**
* Resets the object, clear all contents.
* @return basic_pcre& *this
*/
inline basic_pcre& clear()
{ pattern_ = srch_ = repl_ = flgs_ = error_ = ""; return *this; }
/**
* Sets the pattern to search/replace, parses the pattern components
* and compiles the regex string. Does not explicitly throw exceptions,
* but sets an error string fetchable using`error()`.
* @param const str_t &pattern
* @return basic_pcre& *this
*/
basic_pcre& pattern(const str_t &pattern)
{
str_t pt, op, rp; // pattern, options, replace
char sep; // separator
bool is_rp = false, is_match = false;
clear();
pattern_ = pt = pattern; // e.g. [sm]/^(.*?)$//[ig]
if(pt.length()<3) { // e.g. "//" or "||"
error_ = "Empty pattern";
return *this;
}
// Optional first pattern characters 's', 'm'
if(pt[0] == 's') { // definitely search replace, otherwise check
is_rp = true;
pt = pt.length()>1 ? pt.substr(1) : "";
} else if(pt[0] == 'm') { // definitely match
is_match = true;
pt = pt.length()>1 ? pt.substr(1) : "";
}
// Pattern separator detection
if(str_t(separators_).find(pt[0]) == str_t::npos) {
error_ = "Invalid pattern (must start with one of the separators: ";
error_ += separators_ + ")";
return *this;
}
sep = pt[0];
pt = pt.substr(1);
const str_t aflags = "ixsmU!$X*g"; // allowed modifiers/flags/options
unsigned k;
for(k=pt.length()-1; k>1; k--) {
if(pt[k] == sep) break;
if(aflags.find_first_of(pt[k]) == str_t::npos) {
error_ = str_t("Unknown modifier/option '") + pt[k] + "'";
return *this;
}
}
if(pt[k] != sep) {
error_ = "Empty pattern";
return *this;
}
if(k<pt.length()-1) op = pt.substr(k+1);
pt = pt.substr(0, k);
do {
str_t pt1 = pt + "\\"; // Temporary \ to fit in tailing backslashes
pt.clear(); pt.reserve(pt1.length()*2);
// Search pattern: First unescaped character means "end of search",
// except if explicitly defined only search using 'm' as first pattern
// character.
bool done = false;
for(k=0; !done && k<pt1.length()-1; k++) {
switch(pt1[k]) {
case '\\':
if(pt1[k+1]=='\\' || pt1[k+1]==sep) {
pt.push_back(pt1[++k]);
} else {
pt.push_back(pt1[k]);
}
break;
case '\n':
// re-escape (depends on shell)
pt += "\\n";
break;
case '\t':
pt += "\\t";
break;
case '\r':
pt += "\\r";
break;
default:
if(!is_match && pt1[k]==sep) {
done = true;
} else { // user didn't escape the separator, try to "see it right".
pt.push_back(pt1[k]);
}
}
}
if((k<pt1.length()-1 || done)) { // last char is "\"
// Replace pattern: rest of it, unescaped separators ignored and used
// as normal character, but escaping allowed.
// References: $1... and \1... allowed.
for(; k<pt1.length()-1; k++) {
switch(pt1[k]) {
case '\\':
switch(pt1[k+1]) {
case '\\': rp.push_back(pt1[++k]); break;
case '$': rp.push_back(pt1[++k]); break;
case '0': rp.push_back('\0'); ++k; break;
case 'a': rp.push_back('\a'); ++k; break;
case 'b': rp.push_back('\b'); ++k; break;
case 'f': rp.push_back('\f'); ++k; break;
case 'n': rp.push_back('\n'); ++k; break;
case 'r': rp.push_back('\r'); ++k; break;
case 't': rp.push_back('\t'); ++k; break;
case 'v': rp.push_back('\v'); ++k; break;
default:
rp.push_back( (pt1[k+1]==sep) ? pt1[++k] : pt1[k]);
}
break;
case '$':
rp += (std::isdigit(pt1[k+1])) ? "\\" : "$"; // replace $1 --> \1
break;
default:
rp.push_back(pt1[k]);
}
}
}
} while(0);
if(pt.empty()) {
error_ = "Empty pattern";
return *this;
}
pcrecpp::RE_Options opts;
opts.set_match_limit(10000); // const for now.
opts.set_match_limit_recursion(500); // const for now.
opts.set_caseless((op.find('i') != str_t::npos));
opts.set_utf8((op.find('U') == str_t::npos));
opts.set_extended((op.find('x') != str_t::npos));
opts.set_dotall((op.find('s') != str_t::npos));
opts.set_multiline((op.find('m') != str_t::npos));
opts.set_ungreedy((op.find('!') != str_t::npos));
opts.set_dollar_endonly((op.find('$') != str_t::npos));
opts.set_extra((op.find('X') != str_t::npos));
opts.set_no_auto_capture((op.find_first_of("*") != str_t::npos));
re_ = pcrecpp::RE(pt, opts);
is_global_ = is_rp && (op.find_first_of("g") != str_t::npos);
is_replace_ = is_rp;
srch_ = pt;
repl_ = rp;
flgs_ = op;
if(!re_.error().empty()) error_ = re_.error();
return *this;
}
/**
* Applies the search/replace pattern regex to the given string.
* THE STRING WILL BE MODIFIED.
* @param str_t & subject
* @return basic_pcre& *this
*/
basic_pcre& apply_to(str_t &subject)
{
if(!error_.empty()) return *this;
if(srch_.empty()) { error_ = "No pattern given to search/replace."; return *this; }
if(is_replace_) {
if(is_global_) {
re_.GlobalReplace(repl_, &subject);
} else {
re_.Replace(repl_, &subject);
}
} else {
str_t out;
str_t rep = repl_.empty() ? str_t("\\0") : repl_;
re_.Extract(rep, subject, &out);
subject = out;
}
if(!re_.error().empty()) {
error_ = re_.error();
subject = "";
}
return *this;
}
/**
* Applies the search/replace pattern regex to the given string and returns the result.
* THE STRING WILL BE MODIFIED.
* @param const str_t& subject
* @return str_t
*/
str_t operator () (const str_t& subject)
{ str_t s=subject; apply_to(s); if(!ok()) s=""; return s; }
protected:
bool is_replace_;
bool is_global_;
typename str_t::value_type sep_;
str_t separators_; // Allowed separators
str_t pattern_; // The whole pattern given
str_t srch_;
str_t flgs_;
str_t repl_;
str_t error_;
pcrecpp::RE re_; // PCRE main object
};
/**
* ostream <<
* @param std::basic_ostream<typename str_t::value_type>& os
* @param const basic_pcre<str_t> re
* @return std::basic_ostream<typename str_t::value_type>&
*/
template <typename str_t>
std::basic_ostream<typename str_t::value_type>& operator << (
std::basic_ostream<typename str_t::value_type>& os,
const basic_pcre<str_t>& re
)
{
#define nl std::endl
os << nl << "{" << nl
<< " - ok: " << (re.ok() ? "yes" : "no") << nl
<< " - pattern: \"" << re.pattern() << "\"" << nl;
if(re.is_replace()) {
os << " - search: \"" << re.search() << "\"" << nl;
} else {
os << " - match: \"" << re.search() << "\"" << nl;
}
if(!re.replace().empty()) {
os << " - replace: \"" << re.replace() << "\"" << nl;
}
os << " - modifiers: \"" << re.modifiers() << "\"," << nl;
if(re.is_global()) {
os << " - global: replace all matches (`g`)" << nl;
} else {
os << " - not global: replace only first match (no `g`)" << nl;
}
if(re.modifiers().find('i') != str_t::npos) {
os << " - case insensitive matching (`i`)" << nl;
} else {
os << " - case sensitive matching (no `i`)" << nl;
}
if(re.modifiers().find('x') != str_t::npos) {
os << " - whitespaces and commente in pattern permitted (`x`)" << nl;
} else {
os << " - comments/unmatched spaces in pattern not permitted (no `x`)" << nl;
}
if(re.modifiers().find('m') != str_t::npos) {
os << " - Multiline (^/$ match start/end of text) (`m`)" << nl;
} else {
os << " - Line-by-line (^/$ match start/end of line) (no `m`)" << nl;
}
if(re.modifiers().find('s') != str_t::npos) {
os << " - `.` matches newlines as well (`s`)" << nl;
} else {
os << " - `.` does not match newlines (no `s`)" << nl;
}
if(re.modifiers().find('$') != str_t::npos) {
os << " - `$` matches only at the end. (`$`)" << nl;
}
if(re.modifiers().find('!') != str_t::npos) {
os << " - Meaning of `*?` and `*` swapped. (`!`)" << nl;
}
if(re.modifiers().find('*') != str_t::npos) {
os << " - Sub pattern matching disabled (`*`)" << nl;
}
if(re.modifiers().find('X') != str_t::npos) {
os << " - (PCRE:) Extra strict pattern escape parsing. (`X`)" << nl;
}
if(re.modifiers().find('U') != str_t::npos) {
os << " - UTF support disabled. (`U`)" << nl;
} else {
os << " - UTF support enabled. (no `U`)" << nl;
}
os << "}" << nl;
return os;
#undef nl
}
}}
namespace sw {
typedef detail::basic_pcre<std::string> pcre_regex;
}
#endif
Beispielprogramm
Example program
/**
* @package de.atwillys.cc.app
* @license BSD (simplified)
* @author stfwi
*
* @ccflags: -Ipcre/include -Wno-long-long
* @ldflags: pcre/lib/libpcrecpp.a pcre/lib/libpcre.a
*
* -----------------------------------------------------------------------------
*
* PCRE based text filter.
*
* -----------------------------------------------------------------------------
* +++ BSD license header (You know that ...) +++
* Copyright (c) 2013, StfWi
* All rights reserved.
* Redistribution and use in source and binary forms, with or without modification,
* are permitted provided that the following conditions are met: (1) Redistributions
* of source code must retain the above copyright notice, this list of conditions
* and the following disclaimer. (2) Redistributions in binary form must reproduce
* the above copyright notice, this list of conditions and the following disclaimer
* in the documentation and/or other materials provided with the distribution.
* (3) Neither the name of atwillys.de nor the names of its contributors may be
* used to endorse or promote products derived from this software without specific
* prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS
* AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING,
* BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
* A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER
* OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
* EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT
* OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
* INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
* STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY
* WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
* DAMAGE.
* -----------------------------------------------------------------------------
*/
#include "pcre.hh"
#include <iostream>
#include <string>
#include <vector>
#include <sstream>
#if defined (__linux__) || defined __APPLE__ & __MACH__
#define IS_NIX
#endif
#define APP_NAME "pcref"
#define APP_VER "1.0"
using namespace std;
/**
* Prints help text
*/
void help()
{
#define nl << std::endl
#define nl2 << std::endl << std::endl
#define an << APP_NAME <<
cerr
<< "NAME" nl2
<< " " << APP_NAME nl2
<< "SYNOPSIS" nl2
<< " " an " [-h|--help|-v] '<pattern1>' ['<pattern2>'] [...]" nl2
<< "DESCRIPTION" nl2
<< " Perl Compatible Regular Expression text Filter." nl2
<< " The program allows performing match-extract / search-replace operations" nl
<< " with pattern known from PCRE (or Perl: stdout = stdin =~ <pattern>), where" nl
<< " one of [`/`, `|`, `#`] can be chosen as pattern separator." nl2
<< " `" an " 'm/pattern/'` or `" an " '/pattern/'`: Prints first match to" nl
<< " stdout." nl2
<< " `" an " 's/pattern/replace/': Replaces all occurrences ot the pattern with" nl
<< " the replace text (accepted subexpression references are `\\1`,`\\2`, etc," nl
<< " and `$1`,`$2` etc, both have the same meaning)." nl2
<< " Modifiers:" nl2
<< " Modifiers are appended to the pattern as known from Perl / PCRE" nl
<< " (`" an " /pattern/modifiers` or `" an "/pattern/replace/modifiers`)." nl2
<< " `i` Ignore case (as in Perl)." nl
<< " `x` Permit whitespaces and comments in the pattern (as in Perl)." nl
<< " `m` Multi line: `^` and `$` match start/end of the whole text (as in Perl)." nl
<< " `s` `.` matches newlines as well (as in Perl)." nl2
<< " `1` (Character 'one') Extract/replace only first match, not the whole text." nl
<< " `$` `$` matches only at the end (else normal dollar sign)." nl
<< " `!` Meaning of `*?` and `*` swapped (`*?` now consumes as much as possible)." nl
<< " `*` Disable parenthesise (subexpression) matching." nl
<< " `X` Extra (PCRE strict escape parsing)." nl
<< " `U` Disable UTF support." nl2
<< " Sequential execution:" nl2
<< " You can specify multiple expressions as command line arguments, they" nl
<< " will be processed sequentially, and the final result will be printed" nl
<< " to stdout. E.g." nl2
<< " echo 'ABC DEF YES' | " an " 's/ABC[\\s]?/X/' '/(\\w+)\\s(\\w+)/$1=$2/'" nl
<< " ( --> XDEF YES) ( --> XDEF=YES)" nl2
<< " Examples:" nl2
<< " - Remove tailing spaces of each line:" nl2
<< " " an " 's/^(.*?)[\\s]+(\\n|$)/$1$2/m'" nl2
<< " - Extract body from HTML:" nl2
<< " " an " '|< [\\s]* body .*? > (.*?) <[\\s]* / [\\s]* body |$1|smix1'" nl2
<< " - Section of an ini-file to json object:" nl2
<< " " an " '/(.*)/\\n$1\\n/sm' \\" nl
<< " '/.*? \\n \\[SECTION_NAME\\] [\\s]* (.*?) \\n (\\[|$) /$1/smix1' \\" nl
<< " 's/^([\\w]+) [\\s]* = [\\s]* (.*) ($|\\n)/$1: \"$2\"/imx' \\" nl
<< " 's#\\n#, #imx' \\" nl
<< " 'm|(.*)|{ $1 }|'" nl2
<< " Annotations:" nl2
<< " - The replace function is global by default, as this is the most often" nl
<< " used. You can switch it of to replace only one using the modifier `1`." nl2
<< " - The match operation (optionally) takes a replace part to rearrange" nl
<< " the matched string using subexpressions (`m/<pattern>/replace/mods`)," nl
<< " so that the match operation is practically an extract operation." nl2
<< " - Replace returns the input string if no pattern matches, extract an" nl
<< " empty string if a pattern does not match." nl2
<< " - The program always reads the complete text (to memory) before processing." nl
<< " Hence, large texts cause a higher memory consumption." nl2
<< " - On error the program does not return any text to stdout." nl2
<< " - The program understands common escape sequences in the replace text:" nl
<< " \\n, \\r, \\t, \\v, \\f, \\a, \\b." nl2
<< "ARGUMENTS" nl2
<< " -h, --help Show this help" nl2
<< " -v, --verbose Increased verbosity (outputs to stderr)" nl2
<< " -vv, --debug High verbosity (debug information if compiled with)" nl2
<< " <pattern> A perl compatible regex pattern as described above." nl2
<< "RETURN VALUES" nl2
<< " returns 0 on success," nl
<< " 1 on error" nl2
<< "SEE ALSO" nl2
<< " perlre, pcregrep, grep, egrep, sed, awk, ex" nl2
<< APP_NAME << " v" << APP_VER << ", stfwi; credits to libpcre author(s)." nl
;;
#undef nl
#undef nl2
#undef an
}
typedef std::string str_t;
typedef std::vector<sw::pcre_regex> pcre_vector;
/**
* Main
* @param int argc
* @param char** argv
* @return int
*/
int main(int argc, char** argv)
{
try {
// Command line arguments
if(argc < 2) throw "No expression given (try " APP_NAME " --help)";
str_t s;
pcre_vector rx;
int verbosity = 0;
// Command line first arg (the very rudimentary way ...)
int i=1;
if(argc > 1 && argv[1]) {
str_t arg = argv[1];
if(arg == "-h" || arg == "--help") {
help();
return 1;
} else if(arg == "-v" || arg == "--verbose") {
verbosity = 1;
i++;
} else if(arg == "-vv" || arg == "-v2" || arg == "--debug") {
verbosity = 2;
i++;
}
}
// Assign and parse patterns before dealing with the text
for(; i<argc && argv[i]; i++) {
rx.push_back(sw::pcre_regex(argv[i]));
if(!rx.back().ok()) {
s = "Expression "; // ref s existing in main()
if(i>10) s.push_back('0'+(i/10));
s.push_back('0'+(i%10));
s += ": ";
s += rx.back().error();
throw s;
}
}
#ifdef IS_NIX
fd_set fds; struct timeval t; t.tv_sec = 2; t.tv_usec = 0;
FD_ZERO(&fds); FD_SET(STDIN_FILENO, &fds);
if(select(2, &fds, NULL, NULL, &t) <= 0 || !FD_ISSET(STDIN_FILENO, &fds)) {
throw "Pipe in your text data.";
}
int n = 0; char buf[512]; buf[511] = '\0';
while((n=::read(STDIN_FILENO, buf, 511)) > 0) { buf[n]='\0'; s += buf; }
if(n!=0) throw "Failed to read from stdin";
#else
s.clear();
char c; while(cin.get(c)) s += c;
#endif
// Verbose: print before applying expressions
if(verbosity > 0) {
for(unsigned i=0; i<rx.size(); i++) {
cerr << "Expression " << ((int)(i+1)) << ": " << rx[i];
}
}
for(unsigned i=0; i<rx.size(); i++) {
sw::pcre_regex &re = rx[i];
if(!re(s).ok()) {
s.clear(); // reassign s, output no more valid.
s.reserve(32);
s = "Expression ";
if(i>10) s.push_back('0'+(i/10));
s.push_back('0'+(i%10));
s += ": ";
s += re.error();
throw s;
}
}
cout << s;
} catch(const str_t &e) {
cerr << "Error: " << e << endl;
return 1;
} catch(const char *e) {
cerr << "Error: " << e << endl;
return 1;
}
return 0;
}
Makefile
Makefile
CC=g++ -c
LD=g++
CCFLAGS=-Wall -O3 -pedantic -Wno-long-long
LDFLAGS=
OUTNAME=pcref
INC=-Ipcre/include
LIBS=pcre/lib/libpcrecpp.a pcre/lib/libpcre.a
#-------------------------------------------------------------------------------
# OS specific
#-------------------------------------------------------------------------------
ifeq ($(shell uname), Linux)
LDFLAGS+= -static-libgcc -s
#LDFLAGS+= -static
endif
ifeq ($(shell uname), Darwin)
endif
#-------------------------------------------------------------------------------
.PHONY: all
all: $(OUTNAME) clean
.PHONY: clean
clean:
@-rm -f *.o >/dev/null 2>&1
.PHONY: info
info: $(OUTNAME)
@# Well let's use the program directly to format its info ...
@file $(OUTNAME) | ./$(OUTNAME) '/^.*?:[\s]*(.*)/ - \1/ms' 's/,/\n -/ms'
-@readelf -d $(OUTNAME) 2>/dev/null | grep -i shared | ./$(OUTNAME) \
's/^.*?:[\s]*\[(.*?)\].*$$/\1/m' 's/[\s]+$$//sm' 's/\n/, /sm' \
's/^(.*?)$$/ - dependencies: \1\n/sm'
@du -h $(OUTNAME) | ./$(OUTNAME) '/^([\w\d\.]+)/ - size: $$1\n/smx'
.PHONY: test
test: $(OUTNAME)
@cd test; ./test.sh
#.PHONY: $(OUTNAME)
$(OUTNAME): main.cc
@$(CC) -o main.o main.cc $(CCFLAGS) $(INC)
@$(LD) -o $(OUTNAME) main.o $(LDFLAGS) $(LIBS)
-@rm -f main.o
@echo Output binary is named \"$(OUTNAME)\"
.PHONY: get-pcre
get-pcre:
@-rm -rf pcre
@mkdir pcre
@cd pcre; svn co svn://vcs.exim.org/pcre/code/trunk src
@cd pcre/src; ./autogen.sh
@cd pcre/src; ./configure --enable-utf --prefix=$(shell pwd)/pcre/
@cd pcre/src; make
@cd pcre/src; make install
@cd pcre/src; make clean
@cd pcre; rm -rf bin
@cd pcre; rm -rf share
@cd pcre/lib; rm -f *.dylib
@cd pcre/lib; rm -f *.la
@cd pcre/lib; rm -rf pkgconfig
@cd pcre; rm -rf src