HP MPE SUPPORT – TROUBLESHOOTING A CRASH

A Service IT Direct customer has HP MPE Support / HP MPE Maintenance on their HP 3000 A500 server running MPE/iX 7.5.  The system had shut down without any outward indicators that a hardware malfunction had caused the issue.  The user had rebooted the system and it was working ok.  I went out there to look at the system and its logs to try and determine what could have caused the system to shut down.

Since the server was under our HP MPE Support we performed the usual visual checks of all the peripherals and their cables.  I also checked the power source and its cables and connections. All looked intact and nothing jumped out at me as to what could have shut down the system.

I logged onto the Maintenance Processor to see if the system had logged any hardware errors.  The only thing I found was an error saying that there had been a software error recorded at the time the customer thought the system had gone down.   I also checked the status of the processors and the memory DIMMs. They were all working ok and did not have any errors.

I then checked the system hardware logs and nothing jumped out here either.  At this point the customer had to leave.  The system was working fine, so I said we would keep an eye on the system for any additional errors or problems that could show up on the system console.

The system went down again a day or two later.  I went through all the checks I had done the previous time I was on-site there and I got exactly the same results.  I inquired with the customer as to the time the system had gone down, and what applications, and or jobs were running at the time.   The only job running at the time appeared to be the job that runs the customer backup.

I checked the job status reports and it appears that the backup job never completed.  I checked the tape drive and it was working ok.  I then checked their tapes and they all looked good.  I looked at the job that was running at the time to make sure it was ok and also checked to make sure another job was not trying to do the same thing.  I have seen jobs fail when two jobs were trying to access the same file and data structures simultaneously.

I then checked their disks with the command “dstat all” and then used the “discfree” command.  It appears that the private volume group the customer used the most had little free space.

I asked the customer if they would remove any data they no longer needed on that volume to increase the amount of free space.  They agreed and got their admin personnel to do that.  This cleared up the problem of the system going down due to a software error.  Eventually the customer added some disks to this private volume and we have not had any more problems.

Because this customer has HP MPE Software support / HP MPE maintenance our engineers kept in contact with the customer and made sure that the issue was solved to the customer’s satisfaction.  All Service IT Direct engineers have the same commitment to customer support and will see any and all problems through to their ultimate solution.

Call us at 888-596-4720 to discuss our various support options or visit our web site at ServiceITDirect.com.

Written by Ray Torres – Service IT Direct Critical Systems Engineer