On 21-Jul-2016 13:09 -0500, CRPence wrote:
<<SNIP>> I infer the primary problem is with the value being
assigned to the Record Separator (RS) variable in\by the script, or
in the way the awk utility is [not] handling what gets assigned for
that value [as contrasted with the results seen by the OP on a PC].
Unsure why the string of '~|~\n|~\r\n' is not a valid\functional
awk-regexp for denoting three possible values for EOL when
establishing a value for $0 on either Mac or IBMi, but that seems to
be an [if not the] issue. <<SNIP>>
Reading the docs on the Record Separator (RS) /special variable/ in
an AIX manual, I determined that, likely to be expected, is that the RS
does not support the (extended regular expression)
[
https://www.ibm.com/support/knowledgecenter/ssw_aix_61/com.ibm.aix.cmds1/awk.htm#awk__a10498fe],
whilst Field Separator (FS) /special variable/ apparently does. And,
that the RS apparently accepts only the first character; i.e.
RS="~\r\n|~\n"; is no different than RS="~";
Thus for the script [
http://pastebin.com/qepNCBsG] and data
[
http://archive.midrange.com/midrange-l/201607/msg00431.html] given by
the OP, the issue seems that when RS='~', the /records/ [i.e. the value
of $0 upon the read] are going to include all of the line data up to the
tilde character, but excluding the tilde. Thus the CRLF [\r\n] or the
LF [\n] characters [as the record delimiter of the file] that follow the
tilde [which serves as the record separator for the data], will become
the first character(s) of the next line\record of input defined by $0.
The exact same issue occurs for awk on my Mac; see
[
http://archive.midrange.com/midrange-l/201607/msg00521.html]
Understanding that, is the key to solving the problem. And while not
intending to imply that the following is a _good_ or even a proper means
of resolution, the changes were successful in my tests, per producing
expected results both with the given test data and with that test data
since changed to have the tildes removed -- and functional given both of
those data variants, with either the CRLF or just the LF, as the record
delimiter for the input file.
Changing the awk script from [
http://pastebin.com/qepNCBsG] to add
the following lines of comments\code [and removing the CR from the CRLF
as end-of-record] enables the following script to be functional on both
my Mac and on the IBM i, irrespective the data file having either CRLF
or just LF. I expect the following revisions to the awk script, changes
to depend on Dynamic Regexp [assigned to the variable named RSvar] for
enabling truncation of the record separator characters, should resolve
the issue somewhat generally. However I did not test with blank lines,
and that is something the original script might handle on Windows:
The changes per diff:
$ diff awkscript.orig.txt awkscript.fix.txt
12d11
≤ RS = sep "|" sep "\n|" sep "\r\n";
13a13,14
≥ # RS = "\n"; defaults, so RSvar need not handle this case
≥ RSvar = sep "$|" sep "\\r$" ; # EOR=( ?\n | ?\r\n ) where sep=?
15c16,17
≤ RS = "~|~\n|~\r\n|\n|\r\n|\32+"; # Input Record Separator
---
≥ # RS = "\n"; defaults, so RSvar need not handle this case
≥ RSvar = "~$|~\\r$|\r$" ; # EOR=( ~\n | ~\r\n | \r\n )
171a174,179
≥
≥ $0 ~ RSvar {
≥ gsub(RSvar, ""); # strip EOL [and \r] from end of $0, if exists
≥ # print("$0: " $0);
≥ }
≥
Or to see where the changes are with context [in case the line
numbers from the diff are not an exact match], some lines, otherwise
unchanged, were given the comment "# *reference only; unchanged*". This
is merely to help identify them as the lines to be located, between
which to insert the code changes, and remove the lines of code that were
between there prior to the editing:
if (ARGC > 3) { # *reference only; unchanged*
sep = ARGV[3]; # *reference only; unchanged*
### following two lines added ###
# RS = "\n"; defaults, so RSvar need not handle this case
RSvar = sep "$|" sep "\\r$" ; # EOR=( ?\n | ?\r\n ) where sep=?
### previous two lines added ###
ARGV[3] = ""; # *reference only; unchanged*
} else { # *reference only; unchanged*
### following two lines added ###
# RS = "\n"; defaults, so RSvar need not handle this case
RSvar = "~$|~\\r$|\r$" ; # EOR=( ~\n | ~\r\n | \r\n )
### previous two lines added ###
} # *reference only; unchanged*
…
PayerName["00000"] = "Unknown"; # *reference only; unchanged*
} # *reference only; unchanged*
### following six lines added ###
$0 ~ RSvar {
gsub(RSvar, ""); # strip EOL [and \r] from end of $0, if exists
# print("$0: " $0);
}
### previous six lines added ###
/^ISA/ { # *reference only; unchanged*
As an Amazon Associate we earn from qualifying purchases.