Troubleshooting Storage Protect scheduled backup failure reports

This article aims to assist you if you have received an e-mail advising of a Missed, Failed or Severed backup. Such an email will have the subject heading HFS Scheduled backup missed/failed report.

If you have received this message then some or all of your data will NOT have been backed up. Therefore it is important both to resolve this issue as soon as possible to enable future scheduled backups to run; and also to perform a manual backup to ensure that your current data is backed up and secure.

Within the email there will be one or more Nodenames listed. Since each node has its own unique nodename, password and installation of Storage Protect, you will need to repeat this process for each of the nodes listed in the email.

The email will state for each node whether the scheduled backup was Missed, Failed or Severed:

Dear HFS User

Following is a list of machines which missed or failed their scheduled backups last night:

 Nodename          Server     Status Scheduled start  Schedule name
 ----------------- ---------- ------ ---------------- -------------
 SNTEST28092023-IT OX_HFS_P31 Missed 2023-10-02 01:30 DAILY_0130
 SNTEST29092023-IT OX_HFS_P22 Failed 2023-10-02 02:00 DAILY_0200

For each node, check which status has been given and chose from the below options.

Expand All

Missed

You have a node whose scheduled backup is reported to have been MISSED.

If you already know why this scheduled backup was missed then you may just wish to run a manual backup.

To troubleshoot why your scheduled backup was missed, please proceed through the following steps in order:

Is the node in question still active?

The first thing to check is whether the machine that used this nodename has been replaced, rebuilt, or is no longer used.

If the node has been replaced, rebuilt or is no longer used then please deregister the node so that it is no longer on the backup schedule.
Check the HFS Backup Services Portal to ensure that you do not have two or more similarly named registered nodes. It could be that one is active and being backed up, whilst the other does not exist and is therefore being reported as missed. If you have nodes registered to you that you know do not represent existing machines, please deregister the node.

Is your Storage Protect node locked?

If your Storage Protect node is locked then you will not be able to back up until it is unlocked. To check your node's status do as follows:

Go to the HFS Backup Services Portal.
Select the problem node.
If the Summary tab shows your node's Locked status as Yes, contact hfs@ox.ac.uk to request it be unlocked.

Check your machine

If your machine was NOT switched on overnight:

You need to leave your machine on overnight to run a scheduled backup.
If the machine was off when the schedule was due to run then the schedule will have been missed.

If your machine was switched on overnight:

Your machine may still have switched itself off or gone into sleep mode, meaning that the scheduled backup was missed.
Windows users
1. Check Start > In the Windows search box enter ‘powercfg.cpl’ and click on this control panel item when it appears in the list > Change when the computer sleeps > Put the computer to sleep is set to Never.
2. On the same window, also check Choose what the power button does > When I press the power button is not set to Sleep.
3. If your machine is a laptop and you close its lid when you leave it on for backup, also check that When I close the lid is set to Do nothing.
Mac users:
- If not, go to System Preferences > Energy Saver and check that the computer (not the display) is set never to go to sleep.

If power management was set:

You must have a physical or VPN network connection to the Oxford University network for the scheduled backups to run.
If the machine was on and there was a physical connection to the Oxford University network, please see our page on checking the Storage Protect scheduler.

Summary

You should now have performed enough troubleshooting to ensure that you know why the scheduled backup was missed and hopefully put corrective measures in place to ensure subsequent scheduled backups are successful.

Once the issues have been rectified, we suggest you run a manual backup so that you can:

test that the issue has been resolved.
ensure that we have an up-to-date copy of your data.

If you have been unable to determine the likely cause of why the backups are being missed then please contact us at hfs@ox.ac.uk including your log files.

To find out when your next scheduled backup is, please see the FAQ item When is my scheduled backup due to run?.

Failed

You have a node whose scheduled backup is reported to have FAILED.

A failed backup generally means that Storage Protect was successful in starting a backup but that it was unable to complete it successfully. Further investigation is required to determine how much of your data was backed up. It could be some, all, or none of it that got sent to the Storage Protect servers.

Objects on a user's machine that may cause a schedule to fail include:

Files that are exclusively locked open by another program and cannot be backed up, such as database files.
Files that are corrupt, making them unreadable.
Files that are excessively large, causing them to make the network connection time out.
Folder/File structures that breach Storage Protect maximum file length restrictions.
Folder/File structures that create memory issues on the client machine, causing backup to fail.

Another possibility is that Storage Protect is wrongly configured. If it is looking for a file system of partition that does not exist then such a backup would be deemed a failure - e.g., if Storage Protect is set to back up D: but there is no D: drive present.

TROUBLESHOOTING FAILED SCHEDULED BACKUPS

1. Checking the dsmsched.log file

If you are IT Support Staff, or are an advanced user and are confident reviewing and interpreting log files, then please follow the suggestions below. You will need to open the file dsmsched.log. The location of the file is platform specific.

Once dsmsched.log is opened (see below), you will need to search for ANS entries. These are in the format of ANS####?, where the # represents a number, and the ? represents either an E (Errors), W (Warnings) or I (Informational).

Informational (ANS####I) messages will not indicate the cause of a scheduled backup failing or being severed; the problem is usually indicated by an error (ANS####E) message. The relevant message could occur at any time during the failed backup, so it is important to check what dsmsched.log lists for the whole of the night when the backup failed. The remainder of this page explains how to view dsmsched.log; and it then lists the most commonly found error messages, along with their solutions.

Examining dsmsched.log using a text editor

Browse to the appropriate location and open dsmsched.log.
Once the log is open, which may take a while if it is large, scroll to the bottom of the log file where the most recent information will be.
Use the text editor's search function to check back through the log for ANS entries, checking those that end in either an E or a W.

2. Error messages

For each ANS####E or ANS####W entry, you need to review the text which follows the error code to determine whether this could have been a cause of the scheduled backup failure. You will most likely find the message ANS1512E Scheduled event ... failed and at least one other message as well.

Examples of common messages that cause scheduled backup failures are listed below.

> 'ANS4037E Object ... changed during processing'

Storage Protect may send most of your data but ultimately report overall scheduled backup failure if other files are left open. Storage Protect only deems a schedule to have failed if one or more files have been prevented from backup in a certain way. Not all file failures cause schedule failures but Windows in particular does sometimes lock open files in such a way that it causes Storage Protect to call a schedule failed when really only a small number of files failed to get backed up.

In general it is best to try to close all files and programs before a backup runs. To locate the problem, first of all please check your dsmsched.log to see if any file failures were caused by one or more files being changed while Storage Protect was trying to back up. There may be lines like:

06-10-2023 23:00:35 ANS1228E Sending of object '/home/hfsuser/Desktop/myfile' failed.
06-10-2023 23:00:35 ANS4037E Object '/home/hfsuser/Desktop/myfile' changed during processing.  Object skipped.
06-10-2023 23:00:35 ANS1802E Incremental backup of '/' finished with 1 failure(s)

Additional information near the end of dsmsched.log will show the total number of failed files. In order to find the relevant part of text it is usually easiest to go to the end of the document, and then scroll upwards until you find an end-of-schedule report similar to the following example:

06-10-2023 23:00:35 --- SCHEDULEREC STATUS BEGIN
06-10-2023 23:00:35 Total number of objects inspected:      170,348
06-10-2023 23:00:35 Total number of objects backed up:            3
06-10-2023 23:00:35 Total number of objects updated:              0
06-10-2023 23:00:35 Total number of objects rebound:              0
06-10-2023 23:00:35 Total number of objects deleted:              0
06-10-2023 23:00:35 Total number of objects expired:              0
06-10-2023 23:00:35 Total number of objects failed:               1
06-10-2023 23:00:35 Total number of objects encrypted:            0
06-10-2023 23:00:35 Total objects deduplicated:                   0
06-10-2023 23:00:35 Total number of objects grew:                 0
06-10-2023 23:00:35 Total number of retries:                      5
06-10-2023 23:00:35 Total number of bytes inspected:           6.79 GB
06-10-2023 23:00:35 Total number of bytes processed:            401  B
06-10-2023 23:00:35 Total bytes before deduplication:             0  B
06-10-2023 23:00:35 Total bytes after deduplication:              0  B
06-10-2023 23:00:35 Total number of bytes transferred:         1.54 KB
06-10-2023 23:00:35 Data transfer time:                        0.00 sec
06-10-2023 23:00:35 Network data transfer rate:                0.00 KB/sec
06-10-2023 23:00:35 Aggregate data transfer rate:              0.14 KB/sec
06-10-2023 23:00:35 Objects compressed by:                        0%
06-10-2023 23:00:35 Deduplication reduction:                   0.00%
06-10-2023 23:00:35 Total data reduction ratio:              100.00%
06-10-2023 23:00:35 Elapsed processing time:               00:00:10
06-10-2023 23:00:35 --- SCHEDULEREC STATUS END
06-10-2023 23:00:35 --- SCHEDULEREC OBJECT END DAILY_2300 06-10-2023 23:00:00
06-10-2023 23:00:35 Scheduled event 'DAILY_2300' completed successfully.
06-10-2023 23:00:35 Sending results for scheduled event 'DAILY_2300'.
06-10-2023 23:00:35 Results sent to server for scheduled event 'DAILY_2300'.

It is quite normal for a few files to fail to get backed up. In particular log files that are currently being written to at time of backup will fail; a local policy of daily log rotation will ensure that the log data will be backed up at the next backup.

If you find that the files that failed also failed on days when the schedule completed successfully, then those file failures are very unlikely to be what caused the schedule to fail as a whole.

If you cannot close the files that are causing the schedule failure before scheduled backup occurs, then you should exclude them from backup. Files that are continually open, such as database files, would fall into this latter category.

> 'ANS1071E Invalid domain name entered' or 'ANS1063E The specified path is not a valid file system or logical volume name' or 'ANS1134E Drive is an invalid drive specification'

If Storage Protect is configured to back up drives or partitions that it cannot see, then scheduled backups will fail with a message like one of the following:

ANS1071E Invalid domain name entered: '/data/fred'

ANS1063E The specified path is not a valid file system or logical volume name

ANS1134E Drive is an invalid drive specification

Either the error message itself, or a message preceding or following it, will state which drive or partition is causing the problem.

There are three likely possible reasons for such an error:

The listed domain entry does not exist as a drive or partition:
- A folder/directory could have been specified as a separate domain. This will cause an error because only drives or partitions may normally be used as domains, hence Storage Protect cannot find the example drive /data/fred and so it deems that the schedule has failed. In this case, /data/fred must be a folder/directory that is part of the larger partition /data or part of the root partition /.
- Alternatively, it could be that a drive is listed as part of the backup domain but is no longer present on the machine. Perhaps the drive has been removed, or perhaps (on Windows machines only) the Storage Protect backup domain contains references to UNC paths that are no longer valid (for example if the machine has been renamed).
There is a space in the domain name. In this case quotation marks need to be used around the drive name, because otherwise Storage Protect will assume that you mean several domains. For example the above error message would occur if you wanted to back up the drive /data/fred backup but you specified the incorrect DOMAIN /data/fred backup instead of the correct DOMAIN "/data/fred backup".

To fix this problem:

Run Storage Protect (Mac users must use Storage Protect Tools for Administrators) and go Edit > (Client) Preferences > Backup tab, then correct your backup domain.

> 'ANS1149E No domain is available for incremental backup. The domain may be empty or all file systems in the domain are excluded.'

The message indicates a problem similar to that described in the previous section; however rather than the backup domain having been set incorrectly, it has instead not been set at all.

This can be fixed by changing the backup domain so that it includes at least one valid drive or partition. To do this, see our instructions on excluding drives and partitions from backup; but instead of excluding a drive, ensure that at least one is included in the backup domain.

> 'ANS1492S Invalid virtual mountpoint ...: File not found' (Linux/Unix only)

The error message indicates a problem similar to that described in the previous section. In this case, Storage Protect could not find a directory that has been nominated in dsm.sys as a virtual mount point. For more information on virtual mount points.

To fix the problem, remove the line in dsm.sys or else correct it to point to an existing directory. Then check for the offending virtual mount point's name in dsm.opt: if your domain is not set to ALL-LOCAL then you will need to remove or correct it there too. Lastly, stop and restart the Storage Protect scheduler.

> 'ANS1512E Scheduled event ... failed' - but no other ANS warning/error messages

Sometimes Storage Protect may think that the schedule has failed because of a communication problem with the server. In this case, you will be able to tell from the end of dsmerror.log and dsmsched.log that no files failed during the backup.

For example, you may see a report like the following in the dsmsched.log file:

06-10-2023 23:01:20 --- SCHEDULEREC STATUS BEGIN
06-10-2023 23:01:20 Total number of objects inspected:      170,341
06-10-2023 23:01:20 Total number of objects backed up:            6
06-10-2023 23:01:20 Total number of objects updated:              0
06-10-2023 23:01:20 Total number of objects rebound:              0
06-10-2023 23:01:20 Total number of objects deleted:              0
06-10-2023 23:01:20 Total number of objects expired:              0
06-10-2023 23:01:20 Total number of objects failed:               0
06-10-2023 23:01:20 Total number of objects encrypted:            0
06-10-2023 23:01:20 Total objects deduplicated:                   0
06-10-2023 23:01:20 Total number of objects grew:                 0
06-10-2023 23:01:20 Total number of retries:                      5
06-10-2023 23:01:20 Total number of bytes inspected:           6.79 GB
06-10-2023 23:01:20 Total number of bytes processed:            402  B
06-10-2023 23:01:20 Total bytes before deduplication:             0  B
06-10-2023 23:01:20 Total bytes after deduplication:              0  B
06-10-2023 23:01:20 Total number of bytes transferred:         1.53 KB
06-10-2023 23:01:20 Data transfer time:                        0.00 sec
06-10-2023 23:01:20 Network data transfer rate:                0.00 KB/sec
06-10-2023 23:01:20 Aggregate data transfer rate:              0.13 KB/sec
06-10-2023 23:01:20 Objects compressed by:                        0%
06-10-2023 23:01:20 Deduplication reduction:                   0.00%
06-10-2023 23:01:20 Total data reduction ratio:              100.00%
06-10-2023 23:01:20 Elapsed processing time:               00:00:11
06-10-2023 23:01:20 --- SCHEDULEREC STATUS END
06-10-2023 23:01:20 --- SCHEDULEREC OBJECT END DAILY_2300 06-10-2023 23:00:00
06-10-2023 23:01:20 ANS1512E Scheduled event 'DAILY_2300' failed. Return code =12.
06-10-2023 23:01:20 Sending results for scheduled event 'DAILY_2300'.
06-10-2023 23:01:20 Results sent to server for scheduled event 'DAILY_2300'.

Storage Protect has inspected 170,341 files and has backed up 6 of them. The number of failed files is zero.

The Storage Protect client has experienced an error when signing off from the server and has recorded this as a failure. However, it is clear that the scheduled backup itself has completed and the failure message can be ignored.

> 'ANS4023E Error processing ...: file input/output error' or 'ANS4046E There is an error processing ... the object is corrupted and unreadable' or 'ANS4047E There is a read error on .... The file is skipped.'

If Storage Protect is having trouble reading certain files, then it could be because they are corrupted. If this is the case then you will see error messages in your dsmerror.log about certain files being unreadable by Storage Protect. For example, they may take the form:

ANS4023E Error processing '/var/log/test.log': file input/output error.

ANS4046E There is an error processing '/var/log/test.log': the object is corrupted and unreadable.

ANS4047E There is a read error on '/var/log/test.log'. The file is skipped.

If the fault is only software-related, then the problem can be fixed by checking the disk. Basic steps are as follows, though you may want to do further research before implementing them.

Windows:
1. In My Computer, right click on the offending drive (such as C:), select Properties > Tools > Error-checking > Check button
2. If available, tick the box marked Automatically fix file system errors
3. Run the scan, if prompted confirm the request to run a disk check when the computer next restarts to fix file system errors on the next reboot.
4. If the problem persists, a more thorough disk check can be performed by running chkdsk /r from the command line.
Mac:
1. In Finder, select Applications > Utilities > Disk Utility
2. In the left-hand window, select the relevant drive
3. Select First Aid tab > Verify Disk or, if appropriate, Repair Disk.
Linux:
1. Use the command fsck to check your disk. Please refer to your system documentation for the appropriate procedure.

In the worst case scenario, if you have file errors despite trying to fix them, or if you are concerned that your hard disk may have a fault, please see your local IT for advice.

> Windows VSS (Volume Shadow Copy Service) problem causes backup to fail (Windows servers only)

By default, Storage Protect is set to back up Windows system files (System State) for server accounts. It does this by using the Windows VSS (Volume Shadow Copy Service). If your Windows server is failing its backups then this may be caused by a problem related to the interaction of Storage Protect with VSS.

If you find errors reported in dsmerror.log which mention System State or VSS then this is likely to be the cause of the failed backups.

For example:

20-11-2018 20:28:42 ANS5250E An unexpected error was encountered.
TSM function name : baProcessRequest
TSM function      : VSS Create Local Backup failed
TSM return code   : 1
TSM file          : incrdrv.cpp (6866)

In such a situation, you will probably also find that you can back up your data drives manually, but not System State.

If you do not need to back up System State data then you can work around this issue by excluding it from backup. Storage Protect effectively classes System State as a separate drive, meaning that you can exclude it from the backup domain by using our instructions on how to exclude files, folders and drives from backup.

If you wish to back up System State, check that you have the latest version of Storage Protect for your version of Windows as recent versions fix certain issues with System State backup. If required you can download the latest HFS Storage Protect package.

If a Storage Protect upgrade does not fix the problem, please contact us at hfs@ox.ac.uk including your log files.

3. Summary

You should now have performed enough troubleshooting to ensure that you know why the scheduled backup failed and hopefully put corrective measures in place to ensure subsequent scheduled backups are successful. If you have been unable to determine the likely cause of why the backups are failing then please contact us at hfs@ox.ac.uk including your log files.

If you have not already done so, we recommend that you run a manual backup.

To find out when your next scheduled backup is, please see the FAQ item When is my scheduled backup due to run?.

Severed

You have a node whose scheduled backup is reported to have been SEVERED.

A SEVERED backup generally means a loss of communication between the Storage Protect client on your machine and the HFS Storage Protect server, whilst the backup was in progress. Possible reasons include:

Intervention at the client (user) end - the user forcibly cancelled the backup/stopped Storage Protect services, or switched the machine off during the backup.
Intervention at the server end - the backup may have been cut off from the Storage Protect server for exceeding a daily limit.
Failure, such as a crash, of the client machine.
Storage Protect client machine going into sleep mode/state of hibernation.
Several large files causing multiple connection timeouts between the server and client.
Dropped network connection either at the client end, or somewhere on the network between the client and the server.
Firewall intervention prohibiting/delaying network traffic.

If you are aware of your machine crashing or the backup being forcibly cancelled then you may wish to simply run a manual backup.

To troubleshoot why your backup was SEVERED please follow the above troubleshooting steps provided for FAILED backups.

Sending us your log files

If you have been through the troubleshooting steps provided and your issue has not been resolved, you need to log a call with the HFS Backup Services Team by replying to the email you received advising you of the MISSED, FAILED or SEVERED backup. In order to provide effective support for this issue the HFS Backup Services Team will need all of your log and configuration files.

The easiest option to gather these files is via the HFS Hub app as per the below steps:

HFS Hub > Logs > Save logs > enter a filename and select a location where you wish to create a compressed file which contains all the necessary log and config files > click Save. Please send us this file.

Alternatively, If you do not have the HFS Hub app installed, then please send us the below files:

Log File Locations
Platform	File	Location
Windows	dsmerror.log, dsmsched.log	C:\Program Files\tivoli\tsm\baclient
Linux	dsmerror.log, dsmsched.log, tsm-install.log	/var/log or /opt/tivoli/tsm/client/ba/bin
macOS	dsmerror.log, dsmsched.log, tsm-install.log	/Library/Logs/Tivoli/TSM

Config File Locations
Platform	File	Location
Windows	dsm.opt	C:\Program Files\tivoli\tsm\baclient
Linux	dsm.sys, dsm.opt	/usr or /opt /tivoli/tsm/client/ba/bin
macOS	dsm.sys, dsm.opt	/Library/Preferences/Tivoli Storage Manager