× The internal search function is temporarily non-functional. The current search engine is no longer viable and we are researching alternatives.
As a stop gap measure, we are using Google's custom search engine service.
If you know of an easy to use, open source, search engine ... please contact support@midrange.com.



I also expected REGEXP_COUNT to be slower than the API, but I did not expect that it would be that much slower.

Thomas.

-----Ursprüngliche Nachricht-----
Von: WDSCI-L [mailto:wdsci-l-bounces@xxxxxxxxxxxxxxxxxx] Im Auftrag von Mark Murphy
Gesendet: Montag, 2. Dezember 2019 21:07
An: Rational Developer for IBM i / Websphere Development Studio Client for System i & iSeries
Betreff: Re: [WDSCI-L] Source Search

REGEXP_COUNT uses International Components for Unicode, but if you use the
C API for that rather than going through the database, I would expect that
to be much faster.

On Sun, Nov 3, 2019 at 11:31 AM Tools/400 <thomas.raddatz@xxxxxxxxxxx>
wrote:

FYI: I cannot make regcomp() and regexec() working with character
classes such as "\s". I tried various things without success. Using
REGEXP_COUNT works like a charm but is incredible slow (200 times slower
than regexec()).

Therefore I posted the problem at the rpg400-l mailing list hoping to
get help there: "Regular expression (regcomp()) ccsid issue".

Thomas.

Am 02.11.2019 um 11:26 schrieb Tools/400:
Craig,

Interesting stuff. Thank you for letting us know.

Because of the "\s" issue, I assume that it is a ccsid problem. That is
what needs to be debugged. I hope that I can do that today or tomorrow.

Regards,

Thomas.

Am 01.11.2019 um 17:48 schrieb Craig Richards:
A slightly more efficient version might be

dcl-f(?>\s+)filea

or in your case
dcl-f(?> +)filea
or
dcl-f(?>[ ]+)filea

(I'm very surprised the \s suggested by David did not work - that's
pretty
standard stuff).

Essentially this is wrapping the one-or-more whitespaces \s+ with (?>)
which is called Atomic Grouping.

The \s+ is greedy which is to say it will grab as many whitespace
characters as it can and then look at the next part of the expression to
carry on matching (in your case the filea) If that fails to match, it
will
backtrack, so if it grabbed 3 spaces, it will drop one and then try to
match and so on until it can't backtrack anymore.

The atomic grouping stops that backtracking process - essentially once
it
gets past the closing parenthesis, it throws away all states so it
doesn't
go back and try with, say 2 spaces then one space.

Maybe not an issue for you and maybe not supported if not even \s is
supported but it's a good performance thing to be aware of for the
situations where it's obvious that once you've done a greedy match and
the
next bit has failed - there is no point in dropping the last character
of
the greedy match and retrying the expression again.

regards,
Craig



--
This is the Rational Developer for IBM i / Websphere Development Studio
Client for System i & iSeries (WDSCI-L) mailing list
To post a message email: WDSCI-L@xxxxxxxxxxxxxxxxxx
To subscribe, unsubscribe, or change list options,
visit: https://lists.midrange.com/mailman/listinfo/wdsci-l
or email: WDSCI-L-request@xxxxxxxxxxxxxxxxxx
Before posting, please take a moment to review the archives
at https://archive.midrange.com/wdsci-l.

Help support midrange.com by shopping at amazon.com with our affiliate
link: https://amazon.midrange.com


As an Amazon Associate we earn from qualifying purchases.

This thread ...

Replies:

Follow On AppleNews
Return to Archive home page | Return to MIDRANGE.COM home page

This mailing list archive is Copyright 1997-2024 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].

Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.