× The internal search function is temporarily non-functional. The current search engine is no longer viable and we are researching alternatives.
As a stop gap measure, we are using Google's custom search engine service.
If you know of an easy to use, open source, search engine ... please contact support@midrange.com.



Thanks for all of the replies, the previous thread gave me everything I
needed.

CTE and the PARTITION keyword!

-----Original Message-----
From: midrange-l-bounces@xxxxxxxxxxxx
[mailto:midrange-l-bounces@xxxxxxxxxxxx] On Behalf Of CRPence
Sent: 07 September 2009 17:33
To: midrange-l@xxxxxxxxxxxx
Subject: Re: SQL Question - sampling matching rows

In that case I was just going to suggest the same prior message
thread already referenced by Dennis; omitted from my message,
pending clarification. I had penned but not sent another message in
this thread, in part to clarify additionally, the originally noted
"all six record" from the parent as an apparent requirement:

"That solution will provide fewer than X rows and no rows for a
parent value when there are less than X or zero rows in the child.
If the intent is sampling to reflect in some way the actual data,
then the given query against just the child table is probably
reasonable and desirable [albeit giving a falsely even-skewed
distribution of data in the row counts, versus proportional
sampling]. If the originally stated "all records" of the parent
file is absolutely required, then a left outer join would be used in
the CTE instead, to produce the necessary row with the NULL value
for the child row."

If a minimum number of X rows [more than one] or proportionally
reflecting the child data was required for each parent record, then
probably the UDTF idea noted in the same referenced message thread
would be more appropriate to generate some valid rows [i.e. to
/create rows/ versus, per the inability to effect, the
selecting\sampling of actual rows from the child table].

Regards, Chuck

Neill Harper wrote:
This is exactly correct, it is acceptable for the number
of rows returned per key code to be less than x, if there
are not x rows to return.

CRPence wrote:

Neill Harper wrote:
I had a requirement in SQL today that I could not find
an elegant solution to. <<SNIP>>

Do the following "how to?" questions reflect what is desired:?
. "SELECT two random child rows for each parent row"
. "Produce a /sample/ set of matching rows; i.e. data
sampling"

Is the following a valid rewrite\restatement of the scenario?:

Given:

CREATE TABLE A ( KEYA CHAR(10) NOT NULL
, PRIMARY KEY (KEYA) )
INSERT INTO A VALUES
('A'),('B'),('C'),('D'),('E'),('F')
CREATE TABLE B ( FKEYB CHAR(10) NOT NULL
, JUNK INT
, FOREIGN KEY (FKEYB) REFERENCES A (KEYA) )
INSERT INTO B VALUES
('A',1),('A',2),('A',3),('A',4)
,('B',1),('B',2),('B',3),('B',4)
,('C',1),('C',2),('C',3),('C',4)
,('D',1),('D',2),('D',3),('D',4)
,('E',1),('E',2),('E',3),('E',4)
,('F',1),('F',2),('F',3),('F',4)

Desired:

In one SQL statement [if possible], I would like....
For all rows in file A, to return X number of child
rows from file B where the key relationship matches
(preferably random, but not an absolute requirement).

For example, the SQL request to CALL RANDOM_B(X)
where X=2, should generate a result set like;
i.e. two random child rows, for each parent row:
('A',1),('A',4)
('B',2),('B',3)
('C',1),('C',2)
('D',3),('D',4)
('E',2),('E',4)
('F',1),('F',2)

If that describes the scenario, how important is the X rows
per matching parent key; i.e. can fewer than X be allowed,
especially in cases where there are not even X rows from which
to choose?


As an Amazon Associate we earn from qualifying purchases.

This thread ...

Replies:

Follow On AppleNews
Return to Archive home page | Return to MIDRANGE.COM home page

This mailing list archive is Copyright 1997-2024 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].

Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.