Automatic problem capture with event-driven snap

Snap takes a back-in-time snapshot of the CICS internal trace and writes it to an auxiliary trace data set. The effect of snap is that it captures a history of what recently occurred in CICS - when a problem is detected, snap peeks back in time to see what went wrong. When paired with tools that can be programmed to respond to error conditions or other important events in CICS, snap becomes a powerful problem-determination aid as it can perform a lightweight capture of the events leading up to the event.

In this section, you will learn how to:

Triggering a snap via operator command

To set up automatic problem capture with C\Prof, you must modify your automation tools to issue a snap request to a C\Prof collection server as soon as possible after the error event is detected. This ensures that you capture as much of the event history as possible before subsequent activity in CICS overwrites the internal trace.

Request

The snap request is made using an MVS system command as follows:

MODIFY <collector-jobname>,SNAP,<cics-regions> /* <optional-comment>

The parameters in the modify command have the following meanings.

  • <collector-jobname>

    The job name of your C\Prof collection server.

  • <cics-regions>

    The CICS region or regions that you wish to snap. You can specify one of the following options:

    • The job name of a CICS region. The CICS region does not have to be defined in your C\Prof collection server configuration file to take a snapshot.

    • A group of CICS regions where the group is defined in the collector configuration file using the GROUP control statement (for example, a group of regions using multiregion operation (MRO)).

    • A wildcard pattern such as CICS* to match multiple CICS regions and groups.

  • <optional-comment>

    The optional comment at the end of the command is echoed into the collector SYSPRINT when the collector issues message TXC0502I Operator command echo in response to the snap request. You can use this to annotate your snap request with additional problem information as desired. At a minimum, it is recommended that you annotate your snap request with information about the CICS error message or event that was used to trigger the snapshot.

Result

Depending on the size of the internal trace, a snapshot takes no more than a few seconds to complete. The resulting auxiliary trace data set can be analyzed using the following methods:

What you see in the trace will depend on the trace point levels set in your CICS internal trace, the size of the trace table, and how quickly the snapshot was requested after the problem occurred. For deeper snapshot history, increase the size of the CICS internal trace table in each CICS region you wish to snap.

The CICS internal trace in each region must be in the STARTED state to capture more than just exception events. If you are using the C\Prof trace control program (TRACECONTROL control statement) in that CICS region, consider how the settings for ACTIVATETRACE and RESETTRACE might affect your snap requests as they can be configured to change the status, size, and trace point levels of the CICS internal trace.

Automation example using IBM Tivoli Netview

The following simple example demonstrates how you could use IBM Tivoli Netview for z/OS to set up automated problem capture. Use the following information as a guide when integrating with the automation products available in your own mainframe environment.

To set up automated problem capture using IBM Tivoli Netview, complete the following steps:

  1. Start the CICS internal trace in the regions you wish to monitor. Configure trace point levels as desired and set the size of the trace table to a minimum of 32 MB.

    Note: If the CICS internal trace is not started, only exception events will be captured.

  2. Start a C\Prof collection server. For convenience, you may wish to start the collector as a started task.

  3. Create an IBM Tivoli NetView automation table entry to respond to a message from CICS. In the following example, a message handling script is invoked when CICS becomes short on storage:

    ...
    \*---------------------------------------------------------------------\*
    \* Handle DFHSM0133 CICS SHORT on STORAGE \*
    \* Call MSGHDL with message ID and message text \*
    \*---------------------------------------------------------------------\*
    IF MSGID = 'DFHSM0133'
       THEN EXEC(CMD('MSGHDL ' 'DFHSM0133 ' RESTMSG) ROUTE(ONE AUTOMGR)
            DISPLAY(Y) NETLOG(Y);
    ...
    

    Adding an Automation Table entry for snap in IBM Tivoli Netview

  4. Create or update a message handling script to request a snap from the C\Prof collection server. In the following example, a snap operator command is requested for the running C\Prof collection server CPROFCOLL. The CICS region is supplied on the operator command and a comment containing the text of the message is also appended.

    ...
    /*---------------------------------------------------------------------------------*/
    /* REXX for DFHSM0133 CICS SHORT on STORAGE                                        */
    /*---------------------------------------------------------------------------------*/
    WHEN msgid = 'DFHSM0133' Then ,
     Do
      /* Initiate C\\Prof SNAP on collector CPROFCOLL to capture recent events */
      /* The msgtext is written to the C\\Prof collection server job log           */
      'F CPROFCOLL,SNAP,'  || region || ' /*' || msgtext
    End
    ...
    

    Initiating the snap operator command using REXX

    Note: Additional operator commands can be found in Controlling the collector via operator commands.

  5. To apply changes, reload the IBM Tivoli NetView active automation table.

  6. To direct a message to IBM Tivoli NetView for an automated response, use the AUTO(YES) keyword in the NetView message processing facility (MPF).

Result

The next time the message is reported by CICS, Netview will trigger a C\Prof snap. The resulting auxiliary trace data set will be reported in the SYSPRINT message log. For example:

TXC0391I SNAP to trace data set is complete; 32768K bytes
         written to DSN=USR.CPROF.CICSP1.D161208.T175102.X001

If your C\Prof configuration file uses a checkpoint data set, the resulting auxiliary trace data set will now be available for viewing via option 1 Regions (see Opening registered auxiliary trace data sets) or option 3 Trace (see Opening registered auxiliary trace data sets for ad hoc analysis on the C\Prof primary option menu. Alternatively, you can import the snapshot into the C\Prof transaction profiler to gain an application perspective of the trace. For more information, see Importing auxiliary trace data sets into the profiler.

Tip: If you find that your snapshots do not contain the data you need, try the following:

  • Increase the table size of the CICS internal trace. A larger trace table means that you can store more history in the trace before it is overwritten by events for subsequent activity in CICS.
  • Reduce the time spent in your automation tools between problem detection and triggering of the snapshot. The faster you can trigger the snap, the less likely it will be that important events will be overwritten.
  • Activate additional trace point domains to gather more event data. Activating the trace points supplied in ACTIVATETRACE=(level [,START [,upsize][,recording-mode] will supply the profiler with a complete end-to-end view of any transactions in progress.
  • Remember that the CICS internal trace is a shared resource. The status, size, and trace points written to the trace can be changed by users who have access to transaction CETR, or other C\Prof collection tasks (particularly if they are using the C\Prof trace control program, described further in Configuring automatic trace control for specific CICS regions
  • If you are using the C\Prof trace control program with this CICS region, be sure that it is configured to leave the status of the trace in the STARTED state when collection ends to capture more than just exception events.

Setting up a C\Prof collection server as a dedicated snap server

If you need to separate your automated problem capture system from other C\Prof collection tasks, you can run a dedicated C\Prof collection server that restricts operations to snap requests made via the snap MVS system command.

Use the following examples to help you get started.

Dedicated snap server with dynamic data set allocation

To snap the CICS internal trace to new dynamically allocated auxiliary trace data sets, use the following example JCL:

//JOBCARD JOB (ACCOUNT),'NAME'
/*JOBPARM SYSAFF=SYS1
//*
//S1     EXEC PGM=TXCMAIN
//STEPLIB  DD DISP=SHR,DSN=TXC.V1R2M0.STXCAUTH
//SYSPRINT DD SYSOUT=*
//SYSIN    DD *
SERVER                     /* Run the collector in SERVER mode
AUXTRSW=ARCHIVE            /* Allocate data sets dynamically
+PREFIX=CPROF              /* Define a data set prefix
CHECKPT=+PREFIX.CHECKPT    /* Checkpoint new data sets
ARCHDSN=+PREFIX.+CICS      /* New data set naming pattern
AUXILIARY SPACE=(CYL,10,10) UNIT=DISK BLKSIZE=24576 /* Aux trace attributes
/*

JCL for a dedicated snap server with dynamic data set allocation

The JCL contains the following elements:

  • S1 EXEC statement

    The C\Prof utility program TXCMAIN.

  • STEPLIB

    STXCAUTH must be APF authorized. For further information, see Before you begin.

  • SYSPRINT

    The collector message log. C\Prof messages that appear in the SYSPRINT are described in Messages and Codes.

  • SYSIN

    The C\Prof collection server is configured as follows:

    • Start the collector in SERVER mode (SERVER). As there is no LIMIT control statement specified, the collector will run indefinitely until an operator command is issued to stop listening for snap requests. For more information on operator commands, see Controlling the collector via operator commands.

    • New data sets are allocated dynamically as required (AUXTRSW=ARCHIVE).

    • New data sets are registered in the checkpoint data set (CHECKPT=+PREFIX.CHECKPT). The optional CHECKPT control statement is used to register the data sets created by C\Prof so you can find them in the C\Prof ISPF dialog. When used in combination with the RETPD option on the AUXILIARY control statement, you can use the C\Prof housekeeping feature to perform cleanup of expired data sets. For more information on housekeeping, see Cleaning up old collection data via housekeeping.

    • New data set names are generated using a pattern (ARCHDSN=+PREFIX.+CICS).

    • The allocation attributes for new auxiliary trace data sets are specified (AUXILIARY SPACE=(CYL,10,10) UNIT=DISK BLKSIZE=24576). The amount of I/O for the snap request is reduced by using a larger block size for auxiliary trace data sets. Auxiliary trace data sets created using this block size will no longer be compatible with the CICS trace utility print program. For more information, see BLKSIZE=value. With this configuration file, user requests to record for profiling will be rejected as we have omitted the necessary SUMMARY and DETAIL control statements.

    • No CICS regions have been specified (CICS) and no socket connection has been defined (SOCKET). This is because snap requests are made via an operator command that requires you to specify the CICS region.

Dedicated snap server with static data sets

If you would like to use pre-defined data set names instead of dynamically generated names, use the following example JCL:

//JOBCARD JOB (ACCOUNT),'NAME'
/*JOBPARM SYSAFF=SYS1
//*
//S1 EXEC PGM=TXCMAIN
//STEPLIB DD DISP=SHR,DSN=TXC.V1R2M0.STXCAUTH
//SYSPRINT DD SYSOUT=*
//TRACE01 DD DISP=SHR,DSN=CPROF.DFHAUXT1
//TRACE02 DD DISP=SHR,DSN=CPROF.DFHAUXT2
//TRACE03 DD DISP=SHR,DSN=CPROF.DFHAUXT3
//TRACE04 DD DISP=SHR,DSN=CPROF.DFHAUXT4
//TRACE05 DD DISP=SHR,DSN=CPROF.DFHAUXT5
//SYSIN DD *
SERVER
AUXTRSW=ALL           /* Static mode options are NO, NEXT, and ALL
CICS=* OUTDD=TRACE*   /* Record to DD names prefixed with TRACE
/*

JCL for a dedicated snap server writing to static data sets

The JCL contains the following elements:

  • S1 EXEC statement

    The C\Prof utility program TXCMAIN.

  • TRACEnn

    The TRACEnn DD statements define the names of the output auxiliary trace data sets used during recording. You can use any name you wish provided that you use the same prefix in the OUTDD parameter on the CICS control statement. The maximum number of auxiliary trace data sets that you may specify is 32.

  • SYSIN

    The C\Prof collection server is configured as follows:

    • Start the collector in SERVER mode (SERVER). As there is no LIMIT control statement specified, the collector will run indefinitely until an operator command is issued to stop listening for snap requests. For more information on operator commands, see Controlling the collector via operator commands.

    • Write the results of each snap request to a sequence of auxiliary trace data sets. Once the data sets are full, begin overwriting data sets starting from the first data set. (AUXTRSW=ALL).

      In this example, the first snap request received by the C\Prof collection server will be written to TRACE1, the second snap request will be written to TRACE2, and so on until the last data set, TRACE5, is used on the fifth snap request. Once the data sets are full, snap requests resume from TRACE1, overwriting any data that was present. This process continues indefinitely until the collector is stopped via operator command. For more information on operator commands, see Controlling the collector via operator commands.

      Additional static mode options for the AUXTRSW control statement can be found in AUXTRSW=option.

    • Use all DD names starting with TRACE to store the resulting auxiliary trace data sets (OUTDD=TRACE*). In this configuration, the collector can take a snap of any CICS region (CICS=*).