Re: SNDDST and multithreading -- MIDRANGE-L

On 17-Dec-2013 07:29 -0800, Michael Schutte wrote:

Every month we are getting an error from SNDDST saying that it cannot
be ran in jobs with multiple threads. I get that this command cannot
run with multiple threads. But I am curious on why we are getting
multiple threads in an interactive session.

The /system/ can create additional threads, system threads, irrespective the Allow Multiple Threads (ALWMLTTHD) attribute for the job. Only user threads are prevented from being created in the job contrary to the threading limitations.

The program trying to use this command is vendor code. All of their
code is written in RPG3 and CLP running in the default activation
group. We now own the code and as we make changes to the code, we
convert to RPGLE. We have created service programs, some with named
activation groups, others with named activation groups.

Some with named ActGrp and some with ¿caller? or ¿new?

We have also introduced SQLRPGLE into some programs (FYI, when we
make changes to the vendor code, we move to a new library higher in
the LIBL, we leave the original code alone, just our decision to do
that). Side note, all programs called interactively is called by a
menu program written in RPG3 by the same vendor, as we convert the
called programs from RPG3 to RPGLE we sometimes add H-SPEC
actgrp(*CALLER) other times we do not. When using *CALLER, it's my
understanding that the program run in the default activation group.
Not sure if this matters to anyone trying help me with this
situation.

In the *CALLER scenario for an SQLRPGLE called from the DftActGrp, a reusable query will not close [Close SQL Cursor (CLSSQLCSR) parameter], so the associated threads can remain active, awaiting more work to complete the query activity. Similarly any open\active query implemented with multiple threads that has not been full-closed could presumably cause the same issue.

Depending on the origin for the multiple threads, if by query activity, the Change Query Attributes (CHGQRYA) to assign a Parallel Processing Degree (DEGREE) of *NONE can suppress the use of multiple threads for a query. If however the threads are due to User Defined Function (UDF) invocation versus either I/O parallel or SMP processing, then the change to the Query Attributes [AFaIK] does not apply.

I've been reading articles on the web about multithreading (new
concept to me). I remember once reading that to start a new thread,
the program has to specifically call the spawn routine. I'm certain
that none of us at this shop has done this in any programs (lack of
understanding being the biggest hurdle). I've also read that Only
batch immediate jobs and prestart jobs provide multithread-capable
support here:
<http://publib.boulder.ibm.com/html/as400/v4r5/ic2924/index.htm?info/RZAHWOVEPO.HTM>

Warning: The above link froze my browsers for several minutes
The above link is further described by the following:

v4r5 InfoCenter -> Programming -> Programming Support -> Multithreaded Applications
_i Multithreaded applications i_
"
_What is a thread_?
...
_Thread basics_:
...
_Programming with threads_:
...
_Code snippets that show how to use threads_:
..."

With that information, I found the effective equivalent v7r1 link is:
<http://pic.dhe.ibm.com/infocenter/iseries/v7r1m0/topic/rzahw/rzahwovepo.htm>
wherein there is a link to:
<http://pic.dhe.ibm.com/infocenter/iseries/v7r1m0/topic/rzahw/rzahwprint.htm>
_PDF file for Multithreaded applications_

The statement about the spawn is that "The spawn() API is the only programming method that can start a batch immediate or prestart job that is capable of supporting multiple threads." However earlier a statement notes that "In the i5/OS® kernel threads support, only a subset of the supported job types can create threads. Interactive and communication jobs do not provide multithread-capable support." And that the "OS examines all job types except communications jobs and interactive jobs for the ALWMLTTHD parameter." So while the spawn() API may effect a multithreaded job type of BCI or PJ, that does not suggest other job types are precluded from multi-threading per use of ALWMLTTHD; i.e. allowing multiple threads, beyond just system-initiated /system threads/. For example even a batch job started with Submit Job (SBMJOB) using either the Allow Multiple Threads (ALWNLTTHD) parameter to *YES, or instead deferring to a Job Description (*JOBD) set to allow threads, will start a multithreaded job type of BCH; the threads within the job may not be started with the spawn() API, but other thread APIs such as pthread_create() (a POSIX thread API). So anyhow, hopefully to clarify:

<http://pic.dhe.ibm.com/infocenter/iseries/v7r1m0/topic/apis/293735.htm>
_Running threaded programs_
"When you run a threaded program, the job that runs a threaded program must be specially initialized by the system to support threads. Currently, several mechanisms allow you to start a job that is capable of creating multiple kernel threads:
..."

But that's documentation for V4R5, we are currently on V7R1. But if
this is still the case why are our interactive jobs spawning multiple
threads.

Again, most likely system-initiated threads; i.e. system-threads that are allowed to disregard the ALWMLTTHD(*NO) specification of an interactive job.

Unfortunately, I do not have any job logs to include in this email,
however as I recall the error was message CPFA0A8 "Operation not
allowed in a job running multiple threads."

When the error occurs, the output from WRKJOB OPTION(*ALL) OUTPUT(*PRINT) SLTTHD(*ALL) I expect should produce output that includes both the *THREAD information and the thread stacks. With the details from the threads, perhaps what created those other threads will be obvious. There is however an allusion to a potential restriction for the above WRKJOB invocation, so the OPTION(*PGMSTK) might have to be specified; i.e. the docs suggest a [but not what specific] failure, if OPTION(*ALL) is invoked "in a job that allows multiple threads" versus being invoked in a job that _has_ multiple threads. Another more likely issue, is that the system threads for the Query Engine will sit idle for an open query not actively running, such that the threads have no active stack, pending receipt of further work.

The Dump Job (DMPJOB) with Job Threads (JOBTHD) parameter as *YES asking for thread information might be required to get the necessary information, because the threads may have no active stack. While I am not aware of any system vs user nor any identification of what initiated the thread visible from Work With Job (WRKJOB) output, perhaps something in the DMPJOB gives more details; e.g.:
DMPJOB PGM(*NONE) JOBARA(*NONE) ADROBJ(*NO) JOBTHD(*YES)

A job trace [Start Job Trace (STRTRC)] started at the point of the failure is potentially helpful, but the tracing must catch and include in its output, future activity of those other threads, even if only the termination of the threads.

I also came across this website. Multithreaded programs in Java
<http://pic.dhe.ibm.com/infocenter/wmqv7/v7r1/topic/com.ibm.mq.doc/ja11160_.htm>
"The Java™ runtime environment is inherently multithreaded."

While essentially true, though for a different reason, the reference to MQM docs is probably not the best reference for the issue at hand. More appropriately the following doc reference; see my emphasis with asterisks for the JVM:
<http://pic.dhe.ibm.com/infocenter/iseries/v7r1m0/topic/rzahw/rzahwjavco.htm>
IBM i 7.1 Information Center -> Programming -> Multithreaded applications -> Language access and threads
_i Threads considerations for Java language i_
"Java™ threads operate on top of the i5/OS® kernel threads model using the java.lang.Thread class. Each Java thread is one of the many tasks that run in the process.

You can do all of the activities that are listed in the Threads Management section.

The Java virtual machine (JVM) *always creates several threads* to perform services such as Java garbage collection. The system uses these threads; applications should not use them.

You can use native methods to access system functions that are not available in Java. Native methods are not *PGM objects. They are procedures that are exported from Integrated Language Environment® (ILE) service programs (*SRVPGM). These *native methods always run* *in multithreaded processes*; therefore, they must be threadsafe. The ILE COBOL, RPG IV, CL, C, and C++ compilers are threadsafe.
..."

So I can see where this could be our issue. Some new emails that we
send out are being sent using RPGMAIL. RPGMAIL uses Java. So if the
user did something to send out an email, I can see this being our
culprit. But there's only one program that I recall that gets sent
using RPGMAIL and I know that it hasn't been sent because I am copied
on it. All other RPGMAIL emails are sent from batch job that are
submitted from the Job scheduler and therefore wouldn't have an
effect on any one interactive session.

Perhaps then, JAVA has been excluded.?

This doesn't happen for all users, just a select few. The ones that
get the error are different from month to month.

Maybe review the PRTSQLINF of the SQLRPGLE [service] programs to look for UDF utilization and any parallel implementations.

The boss started asking why has this just started. The truth is we
really don't know if it just now started to happen as users used to
ignore these kind of messages. As far as they were concerned the
program completed successfully because accounts receivables had
generated an invoice (but no email was sent).

Perhaps changing the code in reaction to the failure of the SNDDST to do something that the user can not merely ignore; performing something that [unlike the SNDDST] does not fail when multiple threads are active in the process.

Others, in discussions found on the web for a matching failure, suggest using something other than SNDDST to avoid the restriction; e.g. either use a command [or API\program-call] to effect the send of the message\mail that is not restricted to a process with only the one\initial thread, or run the SNDDST in another process such as in a submitted job [per SBMJOB] or off-load to a job awaiting an entry on a queue that notifies of the SNDDST work to be done.

But if this is new what could be spawning the multiple threads? Is it
the SQL, Activation Groups, or is the Java?

I am betting some SQL, and the ActGrp choice may play a role, due to the effect on the [lack of] closure of cursors. Possibly the query implementation for certain\specific inquiries is done using parallel processing, such that only those jobs later becoming the /incident/ with the failing symptom; or perhaps they always implement with parallel, but only sometimes does a user end up taking that particular path through the code.

The open-files list [WRKJOB OPTION(*OPNF)] might help to infer some query activity, and the activation group for the open. If the job remains active with multiple threads, and the user can be allowed to issue a SNDDST for testing, then servicing [STRSRVJOB] and debug [STRDBG] could be issued against the job. After verifying the SNDDST still fails, reclaim each of the activations [RCLACTGRP] and test again if the SNDDST still fails [and\or the WRKJOB OPTION(*THREAD) to see if the threads went away]; a SQL cursor getting closed as a side effect will appear in the joblog as a query debug message. That could help to find the program\query responsible for the extra threads.