Re: Awk script running in QShell -- MIDRANGE-L

On 20-Jul-2016 11:32 -0500, Fuchs, James M wrote:

On Wednesday, July 20, 2016 11:02 AM CRPence wrote:

On 20-Jul-2016 10:07 -0500, Fuchs, James M wrote:

I am at a loss. Have an Awk script that I need to run in the
QSH/QShell environment but it will not run to completion. The
AWK script runs without issue if I run it on a PC but when I run
it on the AS400 in QShell it only processes/recognizes the first
pattern match.
<<SNIP>>

Verify that the /special-characters/ were properly translated into
the expected hex code point of the EBCDIC CCSID <<SNIP>>

The script and input files are in the IFS and are ASCII coded files,
CCSID is 1252

At the time I replied, quoted above, I for some reason incorrectly recalled awk as being a QSH utility rather than running in PASE. The files should be ASCII, and from Win, the 1252 is presumably correct; even so, the DMP provides the hex data for verifying.

As already clarified in other followup messages, the IBM i per running the AIX equivalent of awk, via PASE [http://archive.midrange.com/java400-l/201607/msg00017.html] is going to default to an expectation that the stream files will have the LineFeed (LF) as the end of record (EOR) delimiter for the script. Nevertheless, the awk script being run has the intention to deal with alternate EORs in the data from the input file; yet that code fails to handle the situation, because of a dependence on the Record Separator (RS) value supporting an awk-regexp, which is a feature that AIX version of awk does not support. When the code uses the tilde (~) character as RS when run against the sample data, the LF [or CRLF] that follows the tilde are treated as the first [and second] characters of the next record; plus, the Field Separator (FS) implicitly always understands \n to be a separator. These issues are not handled in the original script. I will reply in a moment with a proposed revision, but to better see the effect, as just described:

Running the following script against the data given in an earlier followup [http://archive.midrange.com/midrange-l/201607/msg00431.html] helps to show with the output following that script, how the original\unchanged (Orig:) records, for all but the first, would have the effect from print(), of appearing on a new line per the CR and\or the LF that would not be trimmed; the control character(s) end up becoming the first character(s) of the next /record/ of input. Those unexpected characters also will cause the tokenize field values to be unexpected\corrupted, when running the original script. This script removes those control characters and shows the changed (Chgd:) record:

Script as file awktestscript:

BEGIN {
# RS="~\r\n|~\n"; # futile assignment; RS functions as set next:
RS="~"; # tilde set as RS, despite desire to handle ~\r\n|~\n
FS="*";
}
{ print("NR: " NR );
}
$0 !~ /\n/ {
print("Orig: " $0);
}
/^\n/ {
sub(/^\n/, ""); # LTrim LF from BOL
print("Orig: ␊" $0); # show original rcd prefixed with \n
print("Chgd: " $0);
}
/^\r\n/ {
sub(/^\r\n/, ""); # LTrim CRLF from BOL
print("Orig: ␍␊" $0); # show original rcd prefixed with \r\n
print("Chgd: " $0);
}

In QSH, the invocation of awk of the script named awktestscript, naming the input file as awkinputcrlf expected to reflect as if created on Win and transmitted from Win [thus including <CRLF>] with the sample data:

awk -f awktestscript awkinputcrlf
NR: 1
Orig: ISA*00* *00* *ZZ*10301 *ZZ*TN001988 *151204*1217*^*00501*000000001*0*P*:
NR: 2
Orig: ␍␊GS*FA*10301*TN001988*20151204*121752*1*X*005010X231A1
Chgd: GS*FA*10301*TN001988*20151204*121752*1*X*005010X231A1
NR: 3
Orig: ␍␊ST*999*0001*005010X231A1
Chgd: ST*999*0001*005010X231A1
NR: 4
Orig: ␍␊AK1*HC*11723001*005010X223A2
Chgd: AK1*HC*11723001*005010X223A2
NR: 5
Orig: ␍␊AK2*837*011723001*005010X223A2
Chgd: AK2*837*011723001*005010X223A2
NR: 6
Orig: ␍␊IK5*A
Chgd: IK5*A
NR: 7
Orig: ␍␊AK9*A*1*1*1
Chgd: AK9*A*1*1*1
NR: 8
Orig: ␍␊SE*6*0001
Chgd: SE*6*0001
NR: 9
Orig: ␍␊GE*1*1
Chgd: GE*1*1
NR: 10
Orig: ␍␊IEA*1*000000001
Chgd: IEA*1*000000001