× The internal search function is temporarily non-functional. The current search engine is no longer viable and we are researching alternatives.
As a stop gap measure, we are using Google's custom search engine service.
If you know of an easy to use, open source, search engine ... please contact support@midrange.com.



Not jumping back in on solutions such as explicit looping, %scanrpl and
%xlate as all three (and others) will meet the needs of the original
poster, but the reason I was hoping Barbara might jump in is that I have no
idea how many comparisons a built-in such as %xlate might require due to
possible optimizations by the compiler and runtime.

If we assume that in the original input string the vast majority of bytes
are >= x'40' then a simple analysis of each input string byte would require
closer to 64 comparisons (rather than 32) in order to determine that the
from byte is not there.

But if the compiler/run-time analyzed the from string and pulled out the
low and high code points (x'00' and x'3F' in this case) then did a simple
compare of the current string byte to high and low (where low could be
ignored in this specific case) and if not in the range just move on to the
next string byte the majority of data would require one or two compares and
string bytes within the range an average of 32 compares (if done blindly).
This could drastically change the run-time of %xlate and could be applied
in other environments such as %xlate('abc...xyz' :'ABC...XYZ' :String) as
it would cut compares to one or two for many (though not all) diacritics,
numerics, many (though not all) punctuation marks, etc. If the string byte
is in the range then, if the compiler/runtime resequenced the from and to
values into strict code-point sequence, then a binary search of from to
determine existence could even then reduce the number of compares required
to less than 32 (compared to a simple sequential scan/compare of from
values).

Another possible optimization (though specific to SBCS, as is the case
here, and a storage disaster if using graphic or Unicode data) would be to
simply allocate a from-control variable representing 256 "flags"
representing base-0 offset values of x'00' - x'FF' and set each flag to
yes/no for there is or is not a from-value; then a corresponding to-control
variable containing 256 code point replacements populated with the to
values. Take the numeric value of the current input string byte as an
offset into from-control, see if the corresponding flag is set to yes and
if so use the to-control replacement code point at that offset value and
move on to the next string byte. Again a significant reduction in the
number of compares. Or just not bother with flags: have a pre-populated
to-control variable of x'00' - x'FF', update to-control offsets found in
the from string with the corresponding to string value, and just always use
the to-control value corresponding to the current input string byte value.

Other optimizations are possible (and can certainly lead to the classic SQL
quandary of when does optimization work end up costing more than just doing
it with brute force) and as I mentioned earlier, I have no idea what RPG
actually generates in the case of %xlate. Only people working on the
compiler and run-time could say what the current optimizations (if any)
are.

On Fri, Nov 19, 2021 at 11:35 AM Charles Wilt <charles.wilt@xxxxxxxxx>
wrote:

Not surprising...

One comparison vs an average of 32 comparisons...

Charles

On Fri, Nov 19, 2021 at 9:18 AM Jon Paris <jon.paris@xxxxxxxxxxxxxx>
wrote:

My first thought was the same as Bruce's - that an individual character
loop would be by far the fastest approach.

You always have to remember that no magic is involved and that in order
to
implement %Xlate or %ScanRpl RPG still has to implement a character at a
time loop unless there is can underlying system function that can perform
the task - and even then that just moves that loop down closer to the
hardware.

But so many in this thread seemed convinced that the BIF approach would
be
faster I decided to test and compare Francois' code with a simple loop.

The time difference is dramatic. The single char loop is orders of
magnitude faster.

Run the following and see for yourself. On my system I get the
following:

DSPLY Xlate took: 1094
DSPLY Loop took: 59

And those results are consistent.

Here's the code I used - if you spot a bug let me know.

**free

dcl-ds str;
strChar char(1) Dim(1000);
end-ds;

dcl-s blank64 char(64) inz;
dcl-s i int(5);
dcl-s startTime timestamp;
dcl-s endTime timestamp;

startTime = %timestamp();

for i = 1 to %elem(strChar);

str = X'12' + 'some string' +X'350A13';
str = %Xlate(X'000102030405060708090A0B0C0D0E0F+
101112131415161718191A1B1C1D1E1F+
202122232425262728292A2B2C2D2E2F+
303132333435363738393A3B3C3D3E3F'
: blank64
:str);
endfor;

endTime = %timestamp();

Dsply ( 'Xlate took: ' + %char(%diff( endTime : startTime : *MS )));


startTime = %timestamp();

for i = 1 to %elem(strChar);

if strChar(i) < X'40';
strChar(i) = *Blank;
endif;

endfor;

endTime = %timestamp();

Dsply ( 'Loop took: ' + %char(%diff( endTime : startTime : *MS )));


*InLr = *On;



On Nov 18, 2021, at 6:45 PM, Francois Lavoie <
Francois.Lavoie@xxxxxxxxxxxxxxxxxxxx> wrote:

Here's the code I tested: ultra fast and ultra compact. A 1-liner beats
any slow do 1-char at a time multi-line of code loop
dcl-s str char(1000);

str=X'12'+'some string'+X'350A13';
str=%Xlate(X'000102030405060708090A0B0C0D0E0F+
101112131415161718191A1B1C1D1E1F+
202122232425262728292A2B2C2D2E2F+
303132333435363738393A3B3C3D3E3F'
:' '+
' '
:str);


make sure the 2nd parm of the Xlate bif has exactly 64 blanks


-----Original Message-----
From: RPG400-L <rpg400-l-bounces@xxxxxxxxxxxxxxxxxx> On Behalf Of
Bruce
Vining
Sent: Thursday, November 18, 2021 18:33
To: RPG programming on IBM i <rpg400-l@xxxxxxxxxxxxxxxxxx>
Subject: Re: [EXTERNAL] Re: Remove unprintable chars

Caution: This email originates from outside of Fresche | Attention: Ce
message provient de l'extérieur de Fresche


I'm going to exit this thread as I'm clearly missing something, but do
believe that the explicit loop looking for < blank (as proposed by Alan)
is
the best performer. I really hope that Barbara notices this thread and
joins in as I'm always open to education from her :)

To me Xlate (and I agree with Scott on Xlating to x'40' rather than
x'39'
or some such followed by a ScanRpl) is "somewhere" (I don't know if it
might be an implicit RPG compiler generated lookup or a MI instruction
as I
haven't checked MI in a long, long time) having to look up the current
byte
being processed and having to search to see if it is in the Xlate from
string in order to replace it with the Xlate to string, which certainly
suggests more machine processing to me. Doing a direct loop on < blank
eliminates this additional processing of specific from values. Please
note
that I'm looking at this strictly from a system performance (cycles)
point
of view.

I also agree with Scott on < '40' might be a problem in some
environments and I'm thinking DBCS shift controls in particular (though
there is nothing in this thread suggesting DBCS needs).

Confidentiality Warning/Avertissement de confidentialité:

This message is intended only for the named recipients. This message
may
contain information that is privileged or confidential. If you are not
the
named recipient, its employee or its agent, please notify us immediately
and permanently destroy this message and any copies you may have. Ce
message est destiné uniquement aux destinataires dûment nommés. Il peut
contenir de l'information privilégiée ou confidentielle. Si vous n'êtes
pas
le destinataire dûment nommé, son employé ou son mandataire, veuillez
nous
aviser sans tarder et supprimer ce message ainsi que toute copie qui peut
en avoir été faite.
--
This is the RPG programming on IBM i (RPG400-L) mailing list
To post a message email: RPG400-L@xxxxxxxxxxxxxxxxxx
To subscribe, unsubscribe, or change list options,
visit: https://lists.midrange.com/mailman/listinfo/rpg400-l
or email: RPG400-L-request@xxxxxxxxxxxxxxxxxx
Before posting, please take a moment to review the archives
at https://archive.midrange.com/rpg400-l.

Please contact support@xxxxxxxxxxxxxxxxxxxx for any subscription
related questions.

Help support midrange.com by shopping at amazon.com with our affiliate
link: https://amazon.midrange.com

--
This is the RPG programming on IBM i (RPG400-L) mailing list
To post a message email: RPG400-L@xxxxxxxxxxxxxxxxxx
To subscribe, unsubscribe, or change list options,
visit: https://lists.midrange.com/mailman/listinfo/rpg400-l
or email: RPG400-L-request@xxxxxxxxxxxxxxxxxx
Before posting, please take a moment to review the archives
at https://archive.midrange.com/rpg400-l.

Please contact support@xxxxxxxxxxxxxxxxxxxx for any subscription related
questions.

Help support midrange.com by shopping at amazon.com with our affiliate
link: https://amazon.midrange.com

--
This is the RPG programming on IBM i (RPG400-L) mailing list
To post a message email: RPG400-L@xxxxxxxxxxxxxxxxxx
To subscribe, unsubscribe, or change list options,
visit: https://lists.midrange.com/mailman/listinfo/rpg400-l
or email: RPG400-L-request@xxxxxxxxxxxxxxxxxx
Before posting, please take a moment to review the archives
at https://archive.midrange.com/rpg400-l.

Please contact support@xxxxxxxxxxxxxxxxxxxx for any subscription related
questions.

Help support midrange.com by shopping at amazon.com with our affiliate
link: https://amazon.midrange.com




As an Amazon Associate we earn from qualifying purchases.

This thread ...

Replies:

Follow On AppleNews
Return to Archive home page | Return to MIDRANGE.COM home page

This mailing list archive is Copyright 1997-2024 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].

Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.