On 20-Jul-2016 11:32 -0500, Fuchs, James M wrote:
On Wednesday, July 20, 2016 11:02 AM CRPence wrote:
On 20-Jul-2016 10:07 -0500, Fuchs, James M wrote:
I am at a loss. Have an Awk script that I need to run in the
QSH/QShell environment but it will not run to completion. The
AWK script runs without issue if I run it on a PC but when I run
it on the AS400 in QShell it only processes/recognizes the first
pattern match.
<<SNIP>>
Verify that the /special-characters/ were properly translated into
the expected hex code point of the EBCDIC CCSID <<SNIP>>
The script and input files are in the IFS and are ASCII coded files,
CCSID is 1252
At the time I replied, quoted above, I for some reason incorrectly
recalled awk as being a QSH utility rather than running in PASE. The
files should be ASCII, and from Win, the 1252 is presumably correct;
even so, the DMP provides the hex data for verifying.
As already clarified in other followup messages, the IBM i per
running the AIX equivalent of awk, via PASE
[
http://archive.midrange.com/java400-l/201607/msg00017.html] is going to
default to an expectation that the stream files will have the LineFeed
(LF) as the end of record (EOR) delimiter for the script. Nevertheless,
the awk script being run has the intention to deal with alternate EORs
in the data from the input file; yet that code fails to handle the
situation, because of a dependence on the Record Separator (RS) value
supporting an awk-regexp, which is a feature that AIX version of awk
does not support. When the code uses the tilde (~) character as RS when
run against the sample data, the LF [or CRLF] that follows the tilde are
treated as the first [and second] characters of the next record; plus,
the Field Separator (FS) implicitly always understands \n to be a
separator. These issues are not handled in the original script. I will
reply in a moment with a proposed revision, but to better see the
effect, as just described:
Running the following script against the data given in an earlier
followup [
http://archive.midrange.com/midrange-l/201607/msg00431.html]
helps to show with the output following that script, how the
original\unchanged (Orig:) records, for all but the first, would have
the effect from print(), of appearing on a new line per the CR and\or
the LF that would not be trimmed; the control character(s) end up
becoming the first character(s) of the next /record/ of input. Those
unexpected characters also will cause the tokenize field values to be
unexpected\corrupted, when running the original script. This script
removes those control characters and shows the changed (Chgd:) record:
Script as file awktestscript:
BEGIN {
# RS="~\r\n|~\n"; # futile assignment; RS functions as set next:
RS="~"; # tilde set as RS, despite desire to handle ~\r\n|~\n
FS="*";
}
{ print("NR: " NR );
}
$0 !~ /\n/ {
print("Orig: " $0);
}
/^\n/ {
sub(/^\n/, ""); # LTrim LF from BOL
print("Orig: ␊" $0); # show original rcd prefixed with \n
print("Chgd: " $0);
}
/^\r\n/ {
sub(/^\r\n/, ""); # LTrim CRLF from BOL
print("Orig: ␍␊" $0); # show original rcd prefixed with \r\n
print("Chgd: " $0);
}
In QSH, the invocation of awk of the script named awktestscript,
naming the input file as awkinputcrlf expected to reflect as if created
on Win and transmitted from Win [thus including <CRLF>] with the sample
data:
awk -f awktestscript awkinputcrlf
NR: 1
Orig: ISA*00* *00* *ZZ*10301 *ZZ*TN001988
*151204*1217*^*00501*000000001*0*P*:
NR: 2
Orig: ␍␊GS*FA*10301*TN001988*20151204*121752*1*X*005010X231A1
Chgd: GS*FA*10301*TN001988*20151204*121752*1*X*005010X231A1
NR: 3
Orig: ␍␊ST*999*0001*005010X231A1
Chgd: ST*999*0001*005010X231A1
NR: 4
Orig: ␍␊AK1*HC*11723001*005010X223A2
Chgd: AK1*HC*11723001*005010X223A2
NR: 5
Orig: ␍␊AK2*837*011723001*005010X223A2
Chgd: AK2*837*011723001*005010X223A2
NR: 6
Orig: ␍␊IK5*A
Chgd: IK5*A
NR: 7
Orig: ␍␊AK9*A*1*1*1
Chgd: AK9*A*1*1*1
NR: 8
Orig: ␍␊SE*6*0001
Chgd: SE*6*0001
NR: 9
Orig: ␍␊GE*1*1
Chgd: GE*1*1
NR: 10
Orig: ␍␊IEA*1*000000001
Chgd: IEA*1*000000001
As an Amazon Associate we earn from qualifying purchases.