Re: NFS Backed image catalogs Was: [7.3] IFS save 21 vs. sav with chgperiod (no BRMS!) -- MIDRANGE-L

Hello Rob,

Am 16.02.2025 um 23:10 schrieb Rob Berendt <robertowenberendt@xxxxxxxxx>:

About those NFS backed image catalogs...
How did you get started on that?

Mainly, it was reading and understanding the messages and weblinks from Diego from 2021, "SavSys21 to virtual tape and restore from virtual tape".

I'm very comfortable and fluent with the limited environment of V4R5, real Twinax console, and no LPARs. Learning some basics needed for installing and running newer machines was a chore for me. As you might recall from earlier comments from me, I still feel the HMC and LPAR handling being rather cumbersome compared to VMware ESXi — which I support partially professionally for customers. (My main area of professional expertise is Linux servers in general, and networking, also in a broad sense, albeit rather focused on Cisco equipment — I get along very well with their CLI.)

So, I first had to grok the concept of a service tools adapter in a given LPAR. The IBM website and some comments in the group weren't too helpful for me back then. Now I know: Add two network adapters to a given LPAR: One for the guest OS in the LPAR, one for the service tools adapter. The latter is configured from within DST/SST only. NFS based image catalogs exclusively use said service tools adapter for communications. The NFS client on the IBM i LPAR is not involved at all.

Being fluent in Linux and already having Linux based NFS servers (hardware) as backing store for ESXi virtualization, I merely exported a separately created directory on Linux.

Example /etc/exports entry:
/i-backups 10.0.0.50/32(rw,no_root_squash,no_subtree_check,async)

After changing /etc/exports, run exportfs -ra.

(10.0.0.50 is the IP address of the service tools adapter!)

"Async" is important to force writes to be cached by the Linux machine, hence improving performance considerably if the remote side uses mainly synchronous writes for ensuring data being written to disk.

It's of utmost importance to be aware that the service tools adapter has a non changeable MTU of 1500 bytes, at least with 7.3. If the NFS server has a different (higher, for jumbo frames) MTU setting, you'll get many misleading error messages. Check twice! If (switching) infrastructure in the middle has a higher MTU, that's fine and no issue.

Next, create required files on the NFS store, in the exported directory:
- An image file to hold the data,
- an index file (just ASCII, use preferred text editor).

truncate -s 25G 0-$(date '+%Y%m%d').udf

=> Yields a 25G sparse file in an instant, using no disk space at all, and not imposing needless I/O lengthy filling an ordinary file when running dd as suggested by IBM documentation. (When you write to the sparse file, it will grow until all desired data is written, or the maximum size is exceeded. So you can safely create larger images to save into, and still not waste disk space or I/Os. ls -l shows the apparent size, and du shows the actual size. Sparse files IMHO are the best thing since sliced bread, in terms of handling large but not precisely known or varying amounts of data.)

(Names starting with 0 are images for full saves, aka Save 21. Those starting with 1 are for differential saves. That's a habit from my "dump" based Linux backup, where full backups are named level 0, and differentials are level 1. Names don't matter as long as they 're used consistent in the file system and index file.)

Create said index text file, which must be named VOLUME_LIST (mind the case!). Each image file goes into a separate line. Line endings are UNIX like: Just LFs. After the name of the file, a blank, and a "W" has to be appended to make the image(s) read/write capable:

0-20250217.udf W

That can be easily automated with simple shell scripts, and cron on the Linux machine. The first entry in the VOLUME_LIST gets attached upon varying on the appropriate *devd on IBM i. If you add new image entries to the top of said file, you don't even need to bother about attaching the desired image prior to running inzopt, and saves. Can be easily done with sed:

sed -i "1i 0-$(date '+%Y%m%d').udf W" /i-backups/VOLUME_LIST

Note: You cannot create or modify remote (NFS based) image catalogs with the IBM i provided commands. Those only work on local QSYS.LIB based objects.

VOLUME_LIST may not contain entries without a corresponding image file. If you do this, upon varying on the *devd, you'll get a generic error emitted by IBM i hinting to an issue in LIC. But it's not. Check twice! On the other hand, files without an entry in VOLUME_LIST are completely ignored by the image catalog logic.

Note that the contents of VOLUME_LIST are read in only, but each time you vary on the according *devd! In other words, changing VOLUME_LIST while the *devd is varied on will not pass changes to IBM i.

Next, create the virtual optical device:

crtdevopt devd(optbkup01) online(*no) rsrcname(*vrt) lclintneta(*srvlan) rmtintneta('10.0.0.30') netimgdir('/i-backups')

(10.0.0.30 is the IP address of the NFS server machine!)

Note: online(*no) should force one to do a "manual" vary on each time, to ensure the actual contents of VOLUME_LIST are read.

Vary on the device:

vrycfg cfgobj(optbkup01) cfgtype(*dev) status(*on)

The system log on the Linux NFS server will emit a message that an NFS mount request came in.

Check if the image(s) appear(s):

wrkimgclge imgclg(*dev) dev(optbkup01)

Finally, initialize the image file, so you can actually use it:

inzopt vol(*mounted) newvol('backup-20250217') dev(optbkup01) check(*no) threshold(*calc) medfmt(*udf)

Now you can run either a Save 21, or a savchgobj, savdlo dlo(*chg), and sav chgperiod(*lastsave) to backup (changed) things accordingly. See the discussion about differential saves we had prior.

After the saves are finished, vary off. That's important, so changes to VOLUME_LIST happening until the next save get picked up by the subsequent vary on. See above.

vrycfg cfgobj(optbkup01) cfgtype(*dev) status(*off)

The system log on the Linux NFS server will emit a message that an NFS unmount request came in.

You see, that's really straightforward and comparably easy to handle once you avoid the pointed out pitfalls. I would expect some limitations regarding maximum image sizes, which are most likely different for using (save, restore) and IPLing from, and also dependent on the OS release.

What I've described so far is just handling the NFS optical image stuff. If you have further questions, just ask: I might have missed something.

For my preferred way of restore Save 21's in remote image files, see below. For single file restores, you just can vary on the *devd, use wrkimgclge to pick the desired image (date of save), and run the appropriate rstobj, rstdlo, or rst command. And vary off, of course.

I'm interested for this one job I'm doing.
Currently their P9 with four spinners going to LTO6 is dead slow compared to my P9 with internal IBM i to IBM i hosted SSD's going to VTL.

Four rotating disks are most likely the primary bottleneck on this machine. The Power6 I'm nursing had disk after disk experiencing bad sectors (and one completely failing) over the years, and one disk being 100% OK but SMART complaining about counter values indicating failure in the near future.

Thanks to Larry's help, I rearranged the RAID5 to add the SMART complaining disk as hot spare, and have the remaining four disks configured as RAID5. There was (is!) no need for lots of DASD capacity for the duties of that machine, but when it was obtained, all disk slots were filled, also for performance reasons. I was surprised how slow writes had become after the rearrangement. This might be in part also due to a failing cache battery.

Anyway, that machine is on its way to decommission, as soon as I manage to orderly migrate the projects on it to the freshly installed and newer Power7 with underlying VIOS, and SSD storage. Just copying would be possible but the "orderly" also involves writing proper documentation, putting source code into a git repository server, etc. I want to relieve myself from "you're the only one knowing this" duties.

Takes almost 10 times longer for less data. As much as I like VTL thought I should try NFS since they have such stuff and this is yet another of those cases where the future of the hardware is in doubt. Well, anyway, comparisons never hurt.

Definitely. But I think you'll experience disappointment.

Said NFS solution is to me a low entry, low cost remedy for a single machine in a remote DC where changing tapes just isn't feasible, attaching to existing backup infrastructure is at least challenging, and where IBM i is a "loner" besides probably more critical "customary" machines. Also, the amount of data to be handled probably appears laughably little compared to what you're used to. A full save of the "empty" P7 occupies 20 GB. Later, when data and applications are migrated, that might be 21 GB. Or still be rounded down to 20. :-)

Hence, I want to find the best possible solution yielding the least manual intervention possible but with a solid recovery plan. Defective hardware, driving there, etc. is all someone else's problem. :-)

What I've observed is that between pressing return to start a Save 21, and the image file actually growing, minutes pass without apparent activity shown on the console screen running Save 21. I suspect that this might be due to TCP/IP being stopped by the Save 21 logic, and failing to detect TCP/IP has successfully been stopped. Hence no activity for at least the default time of 300 seconds, followed by the same TCP/IP "server stopping" messages. I'll investigate this further, because it's annoying.

I know, any backup should fit with a restore plan. I could say more but this subject is not to chase that squirrel but to at least look at that NFS option.

Well, the restore option I desire most is to import the remote backup directory with NFS into VIOS, essentially mounting over the directory where the various guest installation images are located. I have not yet found time to actually locate those images in the VIOS directory tree, nor did I try out if VIOS (being a somewhat stripped down AIX) can be a NFS client. I'm very skilled in Linux but my first hand experiences with AIX and its possible peculiarities are close to zero.

If this won't work, there's always the possibility to manually upload the last Save 21 image to VIOS. In turn, the file's "sparseness" will be lost and actual zeros will be written. Restore will be delayed, because an extra copy task is required. Not particularly desired. Will keep posting about my findings!

:wq! PoC