Subject: RE: can this reject bad data? From: "Needles,Stephen J" Date: Wed, 7 May 2014 20:19:18 +0000 List-archive: List-help: List-id: Midrange Systems Technical Discussion List-post: List-subscribe: , List-unsubscribe: ,

You're right Rob, so I took it upon myself to solve this weird issue. And I wanted to solve this wholly within SQL rather than use RPG.

Here is my solution. I expect that someone has a cleaner method, but I didn't want to burn too much time on it.

Given a similar data set to before, a list of dec(8,0) numbers that are expected to represent dates in YYYYMMDD format.

....+....1.
NBRDTE
20,140,228
20,140,229
20,140,430
20,140,431
20,140,304
20,140,532
20,140,631
19,840,228
19,840,229
19,880,228
19,890,228
20,000,229
21,000,229
16,000,229
17,000,229
16,000,228
16,000,230
17,000,228
17,010,229
24,000,229
22,000,229

Running this SQL will convert those numbers that meet the WHEN criteria in the SQL's CASE logic while leaving nulls if it fails the CASE.

SELECT nbrdte,
case when mod(dec(substr(digits(nbrdte),1,4),4,0),4) = 0 and // no remainder when divide by 4 means Leap Year when...
mod(dec(substr(digits(nbrdte),1,4),4,0),100) <> 0 and // it is NOT a year divisible by 100
((dec(substr(digits(nbrdte),5,2),2,0) = 2 and // the rest of the WHEN is to determine the appropriate
dec(substr(digits(nbrdte),7,2),2,0) <= 29) or // number of days for each month
(dec(substr(digits(nbrdte),5,2),2,0) in (1,3,5,7,8,10,12) and
dec(substr(digits(nbrdte),7,2),2,0) <= 31) or
(dec(substr(digits(nbrdte),5,2),2,0) in (4,6,9,11) and
dec(substr(digits(nbrdte),7,2),2,0) <= 30))
then timestamp_format(digits(nbrdte),'YYYYMMDD') // convert the time using the TIME_FORMAT scalar

when mod(dec(substr(digits(nbrdte),1,4),4,0),100) = 0 and // this is a Century Year and
mod(dec(substr(digits(nbrdte),1,4),4,0),400) <> 0 and // it is NOT evenly divisible by 400 means
((dec(substr(digits(nbrdte),5,2),2,0) = 2 and // NOT
dec(substr(digits(nbrdte),7,2),2,0) <= 28) or //Leap Year
(dec(substr(digits(nbrdte),5,2),2,0) in (1,3,5,7,8,10,12) and
dec(substr(digits(nbrdte),7,2),2,0) <= 31) or
(dec(substr(digits(nbrdte),5,2),2,0) in (4,6,9,11) and
dec(substr(digits(nbrdte),7,2),2,0) <= 30))
then timestamp_format(digits(nbrdte),'YYYYMMDD') // convert the time using the TIME_FORMAT scalar

when mod(dec(substr(digits(nbrdte),1,4),4,0),4) <> 0 and // year not divisible by 4 and
mod(dec(substr(digits(nbrdte),1,4),4,0),100) <> 0 and // not a century year (divisible by 100) means
((dec(substr(digits(nbrdte),5,2),2,0) = 2 and // NOT
dec(substr(digits(nbrdte),7,2),2,0) <= 28) or // Leap Year
(dec(substr(digits(nbrdte),5,2),2,0) in (1,3,5,7,8,10,12) and
dec(substr(digits(nbrdte),7,2),2,0) <= 31) or
(dec(substr(digits(nbrdte),5,2),2,0) in (4,6,9,11) and
dec(substr(digits(nbrdte),7,2),2,0) <= 30))
then timestamp_format(digits(nbrdte),'YYYYMMDD') // convert the time using the TIME_FORMAT scalar

when mod(dec(substr(digits(nbrdte),1,4),4,0),100) = 0 and // Year is Century Year AND IS
mod(dec(substr(digits(nbrdte),1,4),4,0),400) = 0 and // divisible by 400 means
((dec(substr(digits(nbrdte),5,2),2,0) = 2 and
dec(substr(digits(nbrdte),7,2),2,0) <= 29) or // Leap Year
(dec(substr(digits(nbrdte),5,2),2,0) in (1,3,5,7,8,10,12) and
dec(substr(digits(nbrdte),7,2),2,0) <= 31) or
(dec(substr(digits(nbrdte),5,2),2,0) in (4,6,9,11) and
dec(substr(digits(nbrdte),7,2),2,0) <= 30))
then timestamp_format(digits(nbrdte),'YYYYMMDD') // convert the time using the TIME_FORMAT scalar
// else leave as NULL
end as Converted_Date
FROM needles.fakedates
order by 2

This will produce a result that can be sorted or processed to identify those rows that failed by looking for the Converted_Date values as NULLs

NBRDTE CONVERTED_DATE
16000228 1600-02-28 00:00:00.000000
16000229 1600-02-29 00:00:00.000000
17000228 1700-02-28 00:00:00.000000
19840228 1984-02-28 00:00:00.000000
19840229 1984-02-29 00:00:00.000000
19880228 1988-02-28 00:00:00.000000
19890228 1989-02-28 00:00:00.000000
20000229 2000-02-29 00:00:00.000000
20140228 2014-02-28 00:00:00.000000
20140304 2014-03-04 00:00:00.000000
20140430 2014-04-30 00:00:00.000000
24000229 2400-02-29 00:00:00.000000
20140229 -
20140431 -
20140532 -
20140631 -
21000229 -
17000229 -
16000230 -
17010229 -
22000229 -

Probably not as clean as idate (didn't look at it), but it does the job. I think that I'll wrap this in a UDF.

Steve Needles

-----Original Message-----
From: MIDRANGE-L [mailto:midrange-l-bounces@xxxxxxxxxxxx] On Behalf Of rob@xxxxxxxxx
Sent: Tuesday, May 06, 2014 1:30 PM
To: Midrange Systems Technical Discussion
Subject: RE: can this reject bad data?

Now assume you have two million rows and only three invalid dates scattered throughout the table.
How do you find those?

Any attempt to do an
ORDER BY 2 to get the invalid dates to all bubble up or down will fail in an error:
QRY2283 - Selection error involving field *N.

Tried the 'easy' way before using something I long ago downloaded, idate.

Rob Berendt
--
IBM Certified System Administrator - IBM i 6.1 Group Dekko Dept 1600 Mail to: 2505 Dekko Drive
Garrett, IN 46738
Ship to: Dock 108
6928N 400E
Kendallville, IN 46755
http://www.dekko.com

From: "Needles,Stephen J" <SNEEDLES@xxxxxxxxxxxxxxxx>
To: Midrange Systems Technical Discussion <midrange-l@xxxxxxxxxxxx>
Date: 05/06/2014 02:10 PM
Subject: RE: can this reject bad data?
Sent by: "MIDRANGE-L" <midrange-l-bounces@xxxxxxxxxxxx>

Assume a table with date data stored as numbers defined as 8,0.

NBRDTE
20,140,228
20,140,229
20,140,430
20,140,431
20,140,304

In a 6.1 or newer box, you can use "timestamp_format" to convert these
numbers into dates, if they are valid dates.

Running this SQL will convert those numbers that can be translated into
dates and fail to convert the others.

SELECT nbrdte,
timestamp_format(digits(nbrdte),'YYYYMMDD')
FROM fakedates

NBRDTE TIMESTAMP_FORMAT
20,140,228 2014-02-28-00.00.00.000000
20,140,229 ++++++++++++++++++++++++++
20,140,430 2014-04-30-00.00.00.000000
20,140,431 ++++++++++++++++++++++++++
20,140,304 2014-03-04-00.00.00.000000

So you could use this to test the numbers entered as representing dates or
to identify those numbers that are not correct representations of dates.

Steve Needles

-----Original Message-----
From: MIDRANGE-L [mailto:midrange-l-bounces@xxxxxxxxxxxx] On Behalf Of
Hoteltravelfundotcom
Sent: Tuesday, May 06, 2014 10:26 AM
To: Midrange Systems Technical Discussion
Subject: can this reject bad data?

There is some bad data in our system. there is a lack of

validate checking. in this case, can this sql reject for processing

any such bad dates? bad dates are defined as month > 12 and days
accordinlgy

SELECT
ALL T01.UAUSRN,
SUBSTR(DIGITS(UAFLWD),5,2) CONCAT '/' CONCAT
SUBSTR(DIGITS(UAFLWD)
,7,2) CONCAT '/' CONCAT SUBSTR(DIGITS(UAFLWD),3,2) AS
FOLLOWUP,
T01.UAFLWD
FROM ASTDTA.NOTEHDUA T01 INNER JOIN
WHERE UAFLWD BETWEEN 20000101 AND 20991231
AND UAPRGD < 1
AND UANOTT = 'E'
ORDER BY 001 ASC, 008 ASC
--
This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing
list To post a message email: MIDRANGE-L@xxxxxxxxxxxx To subscribe,
unsubscribe, or change list options,
visit: http://lists.midrange.com/mailman/listinfo/midrange-l
or email: MIDRANGE-L-request@xxxxxxxxxxxx Before posting, please take a
moment to review the archives at http://archive.midrange.com/midrange-l.

________________________________
This communication, including attachments, is confidential, may be subject
to legal privileges, and is intended for the sole use of the addressee.
Any use, duplication, disclosure or dissemination of this communication,
other than by the addressee, is prohibited. If you have received this
communication in error, please notify the sender immediately and delete or
destroy this communication and all copies.

TRVDiscDefault::1201