Re: Awk script running in QShell -- MIDRANGE-L

On 21-Jul-2016 13:09 -0500, CRPence wrote:

<<SNIP>> I infer the primary problem is with the value being
assigned to the Record Separator (RS) variable in\by the script, or
in the way the awk utility is [not] handling what gets assigned for
that value [as contrasted with the results seen by the OP on a PC].
Unsure why the string of '~|~\n|~\r\n' is not a valid\functional
awk-regexp for denoting three possible values for EOL when
establishing a value for $0 on either Mac or IBMi, but that seems to
be an [if not the] issue. <<SNIP>>

Reading the docs on the Record Separator (RS) /special variable/ in an AIX manual, I determined that, likely to be expected, is that the RS does not support the (extended regular expression) [https://www.ibm.com/support/knowledgecenter/ssw_aix_61/com.ibm.aix.cmds1/awk.htm#awk__a10498fe], whilst Field Separator (FS) /special variable/ apparently does. And, that the RS apparently accepts only the first character; i.e. RS="~\r\n|~\n"; is no different than RS="~";

Thus for the script [http://pastebin.com/qepNCBsG] and data [http://archive.midrange.com/midrange-l/201607/msg00431.html] given by the OP, the issue seems that when RS='~', the /records/ [i.e. the value of $0 upon the read] are going to include all of the line data up to the tilde character, but excluding the tilde. Thus the CRLF [\r\n] or the LF [\n] characters [as the record delimiter of the file] that follow the tilde [which serves as the record separator for the data], will become the first character(s) of the next line\record of input defined by $0. The exact same issue occurs for awk on my Mac; see [http://archive.midrange.com/midrange-l/201607/msg00521.html]

Understanding that, is the key to solving the problem. And while not intending to imply that the following is a _good_ or even a proper means of resolution, the changes were successful in my tests, per producing expected results both with the given test data and with that test data since changed to have the tildes removed -- and functional given both of those data variants, with either the CRLF or just the LF, as the record delimiter for the input file.

Changing the awk script from [http://pastebin.com/qepNCBsG] to add the following lines of comments\code [and removing the CR from the CRLF as end-of-record] enables the following script to be functional on both my Mac and on the IBM i, irrespective the data file having either CRLF or just LF. I expect the following revisions to the awk script, changes to depend on Dynamic Regexp [assigned to the variable named RSvar] for enabling truncation of the record separator characters, should resolve the issue somewhat generally. However I did not test with blank lines, and that is something the original script might handle on Windows:

The changes per diff:

$ diff awkscript.orig.txt awkscript.fix.txt
12d11
≤ RS = sep "|" sep "\n|" sep "\r\n";
13a13,14
≥ # RS = "\n"; defaults, so RSvar need not handle this case
≥ RSvar = sep "$|" sep "\\r$" ; # EOR=( ?\n | ?\r\n ) where sep=?
15c16,17
≤ RS = "~|~\n|~\r\n|\n|\r\n|\32+"; # Input Record Separator
---
≥ # RS = "\n"; defaults, so RSvar need not handle this case
≥ RSvar = "~$|~\\r$|\r$" ; # EOR=( ~\n | ~\r\n | \r\n )
171a174,179
≥
≥ $0 ~ RSvar {
≥ gsub(RSvar, ""); # strip EOL [and \r] from end of $0, if exists
≥ # print("$0: " $0);
≥ }
≥

Or to see where the changes are with context [in case the line numbers from the diff are not an exact match], some lines, otherwise unchanged, were given the comment "# *reference only; unchanged*". This is merely to help identify them as the lines to be located, between which to insert the code changes, and remove the lines of code that were between there prior to the editing:

if (ARGC > 3) { # *reference only; unchanged*
sep = ARGV[3]; # *reference only; unchanged*
### following two lines added ###
# RS = "\n"; defaults, so RSvar need not handle this case
RSvar = sep "$|" sep "\\r$" ; # EOR=( ?\n | ?\r\n ) where sep=?
### previous two lines added ###
ARGV[3] = ""; # *reference only; unchanged*
} else { # *reference only; unchanged*
### following two lines added ###
# RS = "\n"; defaults, so RSvar need not handle this case
RSvar = "~$|~\\r$|\r$" ; # EOR=( ~\n | ~\r\n | \r\n )
### previous two lines added ###
} # *reference only; unchanged*

…

PayerName["00000"] = "Unknown"; # *reference only; unchanged*
} # *reference only; unchanged*
### following six lines added ###

$0 ~ RSvar {
gsub(RSvar, ""); # strip EOL [and \r] from end of $0, if exists
# print("$0: " $0);
}

### previous six lines added ###
/^ISA/ { # *reference only; unchanged*