Hi Jerry,
Occasionally I'm sent large amounts of text (ie surgical pathology and/or
DNA reports). Recently a hospital was trying to send me a report that was
over 32k.
I don't understand why 32k is an issue for you? What does your code do
with the data? Why does it matter if the data is larger than 32k?
After reading your message over a few times, I can only assume that you
are either referring to RPG's 64k variable limit in V5R4 and earlier, or
you are referring to QDCXLATE's 32702 (which isn't even 32k) limit on
it's parameter size. Since QDCXLATE isn't a great tool anyway, and
since you already know you can call it in 32k chunks, I'll assume you
really meant to ask about 64k instead of 32k.
As time goes by I suspect these types of reports and the amount of data
they include will become larger and more prevalent and I'm trying to
prepare for that eventuality.
So write your code so that it doesn't have a limit to the size it'll
receive... Personally, I have code that can receive terabytes of data
successfully. (Though, I must admit I haven't tested it out that
high... so I guess at this point it's just a theoretical limit... but
still... I've tested it waaaaay higher than 32k... indeed, I'm always
careful to test it higher than 4.2 GB to make sure I'm not using a 10u 0
improperly somewhere.)
Is there any way to trap/detect if a data stream is larger than the what
the socket's input field can handle?
Since you mention a stream, I'll assume you're referring to a stream
socket (SOCK_STREAM) like TCP. Not a datagram or raw socket.
I guess you don't understand the concept of a stream. The data is
flowing down a stream, where each byte is like a car on a train. Ugh, I
don't know if these analogies are doing any good...
You seem to be thinking in terms of a buffer. One program writes data
into a buffer (like a data area or user space) and another program reads
that data out. Then the process continues in a loop. (That's not how a
stream works -- but you seem to be thinking that way.) You seem to be
thinking that if the sender writes 70k of data, then the receiver must
read 70k of data, or lose the data that he didn't read. That's not true
at all. That's not how a stream works.
Think of it like a file. When you write a record, each record is
limited to 32766 bytes (at the most) in a single record, right? That
doesn't mean the WHOLE file is limited to 32k. You can write more than
one record... except that a stream isn't organized into records...
err... perhaps you can think of it as 1 byte per record... ugh, this
example is probably too confusing...
Okay, in simpler terms, just think of it as a queue. A queue of bytes.
Like people lined up to go on the roller coaster at your local
amusement park. Each time a new car comes down the track, it takes
people out of the line, and gives them a ride. The people who didn't
fit will wait for the next car. You might call this a "continuous
stream of people" or a "queue of people". You don't have to have a
car/train on the roller coaster that's large enough for everyone in the
line to get on at once. If it can only take 17 people at a time, and
there's 340 people in line, that just means that it'll take 20 trips to
get all of the people. But you'll get them all in the end... While
those people are waiting, more people can be added on to the end of the
line. It doesn't matter, it'll get to those new people eventually.
The stream socket is the same way. When you recv() 32k from the socket,
it doesn't have to be everything. You can read 32k, then read another
32k, then another... each time it takes THAT many bytes off of the
stream (if there's that many available at the moment you call recv).
The remaining bytes in the stream will wait and be returned the /next/
time you call recv. Meanwhile, the sender is free to keep adding more
bytes to the stream, he doesn't have to wait for you to clear all of the
data out.
Don't think of it as a fixed-length piece of data. Therein lies
madness. Think of it as a continuous stream of data...
Can I extract a portion of the data?
Hopefully I've already answered that, but... YES, of course... That's
the default action.
I ask that since I've been testing receiving data streams over 32k, but in
debug the data appears as a string of @'s and I'm not sure what's causing
that. I'm assuming it's 'error' related.
No idea about that. Sounds like a bug in your code to me. Maybe you're
not taking the return value from recv() into consideration? Maybe
you're thinking that if you ask for 32k of data, it'll always return 32k
of data? (Again, you're thinking in fixed-length chunks, rather than a
stream...)
But, it's almost certainly some sort of bug in your code.
Also I know that QDCXLATE is limited to 32767 bytes. Are there any
alternatives for translating text larger than 32k or should I just
deconstruct and then reconstruct it in 32k chunks?
I use QDCXLATE in simple one-off programs, etc. I use iconv() in any
non-trivial applications. iconv() is more complicated, but it's a much
better tool than QDCXLATE, and it doesn't have the 32k limit.
Calling QDCXLATE one chunk at a time (in a loop) will also work... but
it's really hard to get the exact right translation table with QDCXLATE,
so I greatly prefer iconv().
At the very least I'd like to be able to detect a large string, pull some
identifiable data out of it and send off an email to the sender letting
them know I can't handle data stream larger than x bytes long. Does that
make sense?
It makes sense... though I'm not familiar with your application's
protocol, so I don't know how you'd do it, specifically.
Personally, I'd just write my code to handle any size of data and be
done with it. (Of course, I know NOTHING about your application or what
you're doing with the data, so maybe I'm totally wrong...)
Any suggestions/comments as to how I should attack this?
Think in terms of an ongoing, never-ending, stream. Or perhaps a pipe.
Don't think of it in terms of fixed-length records... Then design
your application to handle it appropriately.
As an Amazon Associate we earn from qualifying purchases.