|
On Fri, 13 Jun 2003, Richard B Baird wrote: > > 1. what are the performance ramifications of using these string and IFS > functions in a service program rather than, say, creating modules and > binding them together into one program, or by simply defining all my procs > in the driver program, eliminating the binding altogether? Assuming your service program isn't ACTGRP(*NEW) (which nobody should ever do for a service program!!) the differences between direct binding and calling a service program routine should be fairly nominal. Though, if you think about it, it would be pretty easy to make a test copy of your programs and pull the code for the procedures in, and just see how much faster it is. I really doubt it would outweigh the advantages of having them separate, though. > 2. what is the best way of passing potentially large strings to and from a > sub-proc? By reference. Never pass long strings by value... Don't use long strings as fixed-length CONST, either. (Though, variable length CONST is fine) > (see my examples for how I did it - mostly varying strings. > I don't like the 256a return values. I *really* don't like the 32k return values. That's 32k of data that it has to make a copy of every time your procedure returns. Instead, pass the variable by reference. Then, doesn't have to allocate new memory for it, it doesn't have to make a copy of it, etc. that should make a difference in performance. > 3. I've put all my IFS open/read/close stuff in a subproc for ease of use, > but I'm not saving as much in ease as i'm losing in performance. I have no idea what you're losing in performance. I doubt the open/close has any impact at all on performance... a few milliseconds for the total run time, maybe. the read procedure has a huge problem in the fact that it reads one byte at a time from the file. You REALLY need to read things into a larger buffer if you want it to perform well. This will make a HUGE difference. But, more on that later, when I reach the part where I pick apart your code :) > 4. my csv parsers do 1 variable at a time - this may be where I'm wasting a > lot of time - but i was hoping to reuse it later - It recieves the entire > record, a delimiter, and a 'field index' and returns the nth variable in > the record. can anyone improve upon this design with out making it 'file > specific'? Hmmm... there are a LOT of ways to approach this. However, if it's well designed, a field-at-a-time can be a good way of doing things. > below are some of my subprocs - cut them to pieces please - show no mercy. Thank you, I will. > P ReadStrmF B export > D ReadStrmF PI 32765A varying > D peFD 10I 0 > D peEOF n Okay, here's that 32k return value that I was complaining about before. (Hmmm... "peFD","peEOF"?? That's the same naming convention I use!!) Instead, I'd pass the parms like this: D ReadStrmF PR 1N D peFD 10I 0 value D peRecord 32765A varying options(*varsize) D peMaxLen 10I 0 value I made the EOF indicator the return value. Since that's only 1 byte, the time it takes to make a copy of it is totally insignificant. This also gives you another advantage, you can use it in a DOW loop or IF statement. For example: /free fd = my_open(etc:etc); dow ReadStrmf(fd: rec: %size(rec)); // do something to record here enddo; my_close(fd); /end-free For peFD I passed it by value. That makes it when you are calling it that the procedure will not modify the value. It allows you to use expressions for the parameter. And, best of all, it only needs to pass 32-bits. When you pass by reference, you were passint 128-bits (the size of a pointer on the iSeries) , so passing by value should be marginally faster. For peRecord, I passed it by reference, and I also added options(*varsize). Before, you did it as a return value. Return values are passed by VALUE, so instead of passing a 128-bit pointer, it had to actually copy the entire 32k variable. 32767 bytes copied instead of 16. That'll make a difference. Furthermore, because it's options(*varsize) you don't always have to declare your caller's variable as 32k long. If you wanted it to be only 100 bytes for some records, it would still work. That variable size is also why I added the peMaxLen parameter. Since we're allowing various sizes to be passed, we can use peMaxLen to make sure we don't overflow the ACTUAL size of the variable. > C dou Char = x'0A' or Char = x'0a' Hmm... this is probably a bug, but... why are you checking it for x'0a' twice in the same statement?! This makes no sense. They're the same thing. > C eval Eof = Read(peFD: %addr(Char): 1) Here's the big one. This is where your performance is falling flat. Never read one byte at a time where performance is an issue. You might try using the ILE C routines for "buffered stream IO". Just replace the open(), read() and close() APIs with _C_IFS_fopen(), _C_IFS_fgets(), _C_IFS_fclose(), etc. The really nice thing about _C_IFS_fgets() is that it reads up until the end-of-line delimiter for you. You don't have to search for the x'0A'. The downside to the _C_IFS_fopen() family of functions is that they're not as versatile (IMHO) as the open() family. If you do use continue to use open(), read(), close(), you can improve performance dramatically by simply reading the data in chunks into a buffer, and then reading it from the buffer byte-by-byte instead of reading the file byte-by-byte. Keep in mind that every time you read a byte, it has to transfer at least one disk sector into memory. If a sector is 512 bytes, then you're reading one sector 200 times to get a 200 byte string. But, if you read it 1024 bytes at a time, then it loads only two sectors to get 5 full strings. Think about that... it takes 1/100th the time to get 5 times as much data!! Okay, that's not entirely true. Operating systems do various things to optimize file reads, including loading things into memory ahead of time, etc, etc. However, that example does illustrate (even if it's an extreme case) why reading larger buffers performs better. The fopen/fread/fgets/fclose style APIs are optimized by loading things into static buffers behind the scenes, and searching those buffers to get the end-of-line characters, etc. You can accomplish the same type of thing with open/read/close style APIs simply by loading things in larger chunks. Like, if you load 1024 bytes at a time from the file, then just loop through those 1024 bytes to find the x'0A', things will perform better. I've got an example of a (very simple) buffered read routine on my web site here: http://www.scottklement.com/rpg/ifs_ebook/textreading.html of course, you'd have to convert it to use a varying string... but I think you can probably handle that. > > C if Length > 0 > C call 'QDCXLATE' > C parm Length > C parm String > C parm 'QEBCDIC' Table > C end If you open the file with O_TEXTDATA defined, you shouldn't need to manually translate it to EBCDIC. That should perform a bit better. If nothing else, using iconv() instead of QDCXLATE will perform better becase you'd be calling a service program instead of loading/running a *PGM object. > * Return that line > > C eval StringOut = %trimr(String) > C return StringOut > I don't understand why you're trimming StringOut. Wasn't that a varying field?! why would it have trailing blanks? And if it does have trailing blanks, you probably wanted to keep them! As for your field parsing routines, I'm going to have to pick those apart tomorrow or something, since it's now 2:00am, and I'm suddenly REALLY tired... But, I think I've probably given you some food for thought.
As an Amazon Associate we earn from qualifying purchases.
This mailing list archive is Copyright 1997-2024 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].
Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.