I also expected REGEXP_COUNT to be slower than the API, but I did not expect that it would be that much slower.


-----Ursprüngliche Nachricht-----
Von: WDSCI-L [mailto:wdsci-l-bounces@xxxxxxxxxxxxxxxxxx] Im Auftrag von Mark Murphy
Gesendet: Montag, 2. Dezember 2019 21:07
An: Rational Developer for IBM i / Websphere Development Studio Client for System i & iSeries
Betreff: Re: [WDSCI-L] Source Search

REGEXP_COUNT uses International Components for Unicode, but if you use the
C API for that rather than going through the database, I would expect that
to be much faster.

On Sun, Nov 3, 2019 at 11:31 AM Tools/400 <thomas.raddatz@xxxxxxxxxxx>

FYI: I cannot make regcomp() and regexec() working with character
classes such as "\s". I tried various things without success. Using
REGEXP_COUNT works like a charm but is incredible slow (200 times slower
than regexec()).

Therefore I posted the problem at the rpg400-l mailing list hoping to
get help there: "Regular expression (regcomp()) ccsid issue".


Am 02.11.2019 um 11:26 schrieb Tools/400:

Interesting stuff. Thank you for letting us know.

Because of the "\s" issue, I assume that it is a ccsid problem. That is
what needs to be debugged. I hope that I can do that today or tomorrow.



Am 01.11.2019 um 17:48 schrieb Craig Richards:
A slightly more efficient version might be


or in your case
dcl-f(?> +)filea
dcl-f(?>[ ]+)filea

(I'm very surprised the \s suggested by David did not work - that's
standard stuff).

Essentially this is wrapping the one-or-more whitespaces \s+ with (?>)
which is called Atomic Grouping.

The \s+ is greedy which is to say it will grab as many whitespace
characters as it can and then look at the next part of the expression to
carry on matching (in your case the filea) If that fails to match, it
backtrack, so if it grabbed 3 spaces, it will drop one and then try to
match and so on until it can't backtrack anymore.

The atomic grouping stops that backtracking process - essentially once
gets past the closing parenthesis, it throws away all states so it
go back and try with, say 2 spaces then one space.

Maybe not an issue for you and maybe not supported if not even \s is
supported but it's a good performance thing to be aware of for the
situations where it's obvious that once you've done a greedy match and
next bit has failed - there is no point in dropping the last character
the greedy match and retrying the expression again.


This is the Rational Developer for IBM i / Websphere Development Studio
Client for System i & iSeries (WDSCI-L) mailing list
To post a message email: WDSCI-L@xxxxxxxxxxxxxxxxxx
To subscribe, unsubscribe, or change list options,
visit: https://lists.midrange.com/mailman/listinfo/wdsci-l
or email: WDSCI-L-request@xxxxxxxxxxxxxxxxxx
Before posting, please take a moment to review the archives
at https://archive.midrange.com/wdsci-l.

Help support midrange.com by shopping at amazon.com with our affiliate
link: https://amazon.midrange.com

As an Amazon Associate we earn from qualifying purchases.

This thread ...


Follow On AppleNews
Return to Archive home page | Return to MIDRANGE.COM home page

This mailing list archive is Copyright 1997-2022 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].

Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.