On Tue, Mar 27, 2018 at 6:06 PM, B. Dietz <bdietz400@xxxxxxxxx> wrote:
maybe Scotts CSVUTIL would work.
https://www.scottklement.com/csv/ <https://www.scottklement.com/csv/>
would allow you to resume picking your columns.
As one would expect, his code is quite good. But his logic doesn't
*quite* parse CSV data the way Excel does for certain pathological
corner cases. I don't think Excel will create CSV files that Scott's
logic won't handle, but one can construct CSV files which Excel parses
differently than Scott's logic.
In particular, Scott's utility (according to the verbal description of
the logic; I haven't actually run the code) enters the QUOTED state
after encountering the first quote character anywhere in the field.
Excel only enters the QUOTED state if the *first* character in the
field is a quote. If a field contains a quote anywhere else, and
that's the first time a quote appears in the field, Excel treats it as
a literal quote (not an escape character nor a delimiter). So, for
example, if the CSV contains the following line:
foo",bar"
Scott's logic says this is one field, which contains the 7-character
string 'foo,bar'. Excel parses that CSV as two fields, each containing
a 4-character string with trailing literal quote: 'foo"' and 'bar"'.
Excel would never generate such a CSV itself. Faced with the
7-character string 'foo,bar', Excel would generate
"foo,bar"
And the two 4-character strings 'foo"' and 'bar"' would be saved by Excel as
"foo""","bar"""
I cannot stress enough that for practical purposes, I don't think you
will run into any problems with Scott's code. I am just being
academic. (I will note, though, that Python's standard CSV module does
follow Excel's logic by default, and is configurable to some extent.)
John Y.
As an Amazon Associate we earn from qualifying purchases.