Re: SQL Question - sampling matching rows -- MIDRANGE-L

In that case I was just going to suggest the same prior message thread already referenced by Dennis; omitted from my message, pending clarification. I had penned but not sent another message in this thread, in part to clarify additionally, the originally noted "all six record" from the parent as an apparent requirement:

"That solution will provide fewer than X rows and no rows for a parent value when there are less than X or zero rows in the child. If the intent is sampling to reflect in some way the actual data, then the given query against just the child table is probably reasonable and desirable [albeit giving a falsely even-skewed distribution of data in the row counts, versus proportional sampling]. If the originally stated "all records" of the parent file is absolutely required, then a left outer join would be used in the CTE instead, to produce the necessary row with the NULL value for the child row."

If a minimum number of X rows [more than one] or proportionally reflecting the child data was required for each parent record, then probably the UDTF idea noted in the same referenced message thread would be more appropriate to generate some valid rows [i.e. to /create rows/ versus, per the inability to effect, the selecting\sampling of actual rows from the child table].

Regards, Chuck

Neill Harper wrote:

This is exactly correct, it is acceptable for the number
of rows returned per key code to be less than x, if there
are not x rows to return.

CRPence wrote:

Neill Harper wrote:

I had a requirement in SQL today that I could not find
an elegant solution to. <<SNIP>>

Do the following "how to?" questions reflect what is desired:?
. "SELECT two random child rows for each parent row"
. "Produce a /sample/ set of matching rows; i.e. data
sampling"

Is the following a valid rewrite\restatement of the scenario?:

Given:

CREATE TABLE A ( KEYA CHAR(10) NOT NULL
, PRIMARY KEY (KEYA) )
INSERT INTO A VALUES
('A'),('B'),('C'),('D'),('E'),('F')
CREATE TABLE B ( FKEYB CHAR(10) NOT NULL
, JUNK INT
, FOREIGN KEY (FKEYB) REFERENCES A (KEYA) )
INSERT INTO B VALUES
('A',1),('A',2),('A',3),('A',4)
,('B',1),('B',2),('B',3),('B',4)
,('C',1),('C',2),('C',3),('C',4)
,('D',1),('D',2),('D',3),('D',4)
,('E',1),('E',2),('E',3),('E',4)
,('F',1),('F',2),('F',3),('F',4)

Desired:

In one SQL statement [if possible], I would like....
For all rows in file A, to return X number of child
rows from file B where the key relationship matches
(preferably random, but not an absolute requirement).

For example, the SQL request to CALL RANDOM_B(X)
where X=2, should generate a result set like;
i.e. two random child rows, for each parent row:
('A',1),('A',4)
('B',2),('B',3)
('C',1),('C',2)
('D',3),('D',4)
('E',2),('E',4)
('F',1),('F',2)

If that describes the scenario, how important is the X rows
per matching parent key; i.e. can fewer than X be allowed,
especially in cases where there are not even X rows from which
to choose?