Re: Receiving large text strings with RECV api question -- RPG400-L

Hi Jerry,

Occasionally I'm sent large amounts of text (ie surgical pathology and/or DNA reports). Recently a hospital was trying to send me a report that was over 32k.

I don't understand why 32k is an issue for you? What does your code do with the data? Why does it matter if the data is larger than 32k?

After reading your message over a few times, I can only assume that you are either referring to RPG's 64k variable limit in V5R4 and earlier, or you are referring to QDCXLATE's 32702 (which isn't even 32k) limit on it's parameter size. Since QDCXLATE isn't a great tool anyway, and since you already know you can call it in 32k chunks, I'll assume you really meant to ask about 64k instead of 32k.

As time goes by I suspect these types of reports and the amount of data they include will become larger and more prevalent and I'm trying to prepare for that eventuality.

So write your code so that it doesn't have a limit to the size it'll receive... Personally, I have code that can receive terabytes of data successfully. (Though, I must admit I haven't tested it out that high... so I guess at this point it's just a theoretical limit... but still... I've tested it waaaaay higher than 32k... indeed, I'm always careful to test it higher than 4.2 GB to make sure I'm not using a 10u 0 improperly somewhere.)

Is there any way to trap/detect if a data stream is larger than the what the socket's input field can handle?

Since you mention a stream, I'll assume you're referring to a stream socket (SOCK_STREAM) like TCP. Not a datagram or raw socket.

I guess you don't understand the concept of a stream. The data is flowing down a stream, where each byte is like a car on a train. Ugh, I don't know if these analogies are doing any good...

You seem to be thinking in terms of a buffer. One program writes data into a buffer (like a data area or user space) and another program reads that data out. Then the process continues in a loop. (That's not how a stream works -- but you seem to be thinking that way.) You seem to be thinking that if the sender writes 70k of data, then the receiver must read 70k of data, or lose the data that he didn't read. That's not true at all. That's not how a stream works.

Think of it like a file. When you write a record, each record is limited to 32766 bytes (at the most) in a single record, right? That doesn't mean the WHOLE file is limited to 32k. You can write more than one record... except that a stream isn't organized into records... err... perhaps you can think of it as 1 byte per record... ugh, this example is probably too confusing...

Okay, in simpler terms, just think of it as a queue. A queue of bytes. Like people lined up to go on the roller coaster at your local amusement park. Each time a new car comes down the track, it takes people out of the line, and gives them a ride. The people who didn't fit will wait for the next car. You might call this a "continuous stream of people" or a "queue of people". You don't have to have a car/train on the roller coaster that's large enough for everyone in the line to get on at once. If it can only take 17 people at a time, and there's 340 people in line, that just means that it'll take 20 trips to get all of the people. But you'll get them all in the end... While those people are waiting, more people can be added on to the end of the line. It doesn't matter, it'll get to those new people eventually.

The stream socket is the same way. When you recv() 32k from the socket, it doesn't have to be everything. You can read 32k, then read another 32k, then another... each time it takes THAT many bytes off of the stream (if there's that many available at the moment you call recv). The remaining bytes in the stream will wait and be returned the /next/ time you call recv. Meanwhile, the sender is free to keep adding more bytes to the stream, he doesn't have to wait for you to clear all of the data out.

Don't think of it as a fixed-length piece of data. Therein lies madness. Think of it as a continuous stream of data...

Can I extract a portion of the data?

Hopefully I've already answered that, but... YES, of course... That's the default action.

I ask that since I've been testing receiving data streams over 32k, but in debug the data appears as a string of @'s and I'm not sure what's causing that. I'm assuming it's 'error' related.

No idea about that. Sounds like a bug in your code to me. Maybe you're not taking the return value from recv() into consideration? Maybe you're thinking that if you ask for 32k of data, it'll always return 32k of data? (Again, you're thinking in fixed-length chunks, rather than a stream...)

But, it's almost certainly some sort of bug in your code.

Also I know that QDCXLATE is limited to 32767 bytes. Are there any alternatives for translating text larger than 32k or should I just deconstruct and then reconstruct it in 32k chunks?

I use QDCXLATE in simple one-off programs, etc. I use iconv() in any non-trivial applications. iconv() is more complicated, but it's a much better tool than QDCXLATE, and it doesn't have the 32k limit.

Calling QDCXLATE one chunk at a time (in a loop) will also work... but it's really hard to get the exact right translation table with QDCXLATE, so I greatly prefer iconv().

At the very least I'd like to be able to detect a large string, pull some identifiable data out of it and send off an email to the sender letting them know I can't handle data stream larger than x bytes long. Does that make sense?

It makes sense... though I'm not familiar with your application's protocol, so I don't know how you'd do it, specifically.

Personally, I'd just write my code to handle any size of data and be done with it. (Of course, I know NOTHING about your application or what you're doing with the data, so maybe I'm totally wrong...)

Any suggestions/comments as to how I should attack this?

Think in terms of an ongoing, never-ending, stream. Or perhaps a pipe. Don't think of it in terms of fixed-length records... Then design your application to handle it appropriately.