We have been using NDMP to backup NetApp snapshot for about 2 years now. Since yesterday a new issue as arise. All ndmp backup are failing, either on status 86, media position or on status 24, socket write failed(24).
20/01/2015 11:43:19 AM - Info nbjm(pid=5220) starting backup job (jobid=175381) for client dick-san01, policy ddc_tld0_ndmp_nfsdsprd_01, schedule incr_daily
20/01/2015 11:43:19 AM - Info nbjm(pid=5220) requesting STANDARD_RESOURCE resources from RB for backup job (jobid=175381, request id:{2E765ED3-9D2A-4DDD-8082-799E8E1C4CF0})
20/01/2015 11:43:19 AM - requesting resource smtlhq143-hcart2-robot-tld-0-dick-san01
20/01/2015 11:43:19 AM - requesting resource smtlhq143.NBU_CLIENT.MAXJOBS.dick-san01
20/01/2015 11:43:19 AM - requesting resource smtlhq143.NBU_POLICY.MAXJOBS.ddc_tld0_ndmp_nfsdsprd_01
20/01/2015 11:43:19 AM - Info bpbrm(pid=9432) dick-san01 is the host to backup data from
20/01/2015 11:43:19 AM - Info bpbrm(pid=9432) reading file list for client
20/01/2015 11:43:19 AM - Info bpbrm(pid=9432) starting ndmpagent on client
20/01/2015 11:43:19 AM - Info ndmpagent(pid=6756) Backup started
20/01/2015 11:43:19 AM - Info bptm(pid=7096) start
20/01/2015 11:43:19 AM - granted resource smtlhq143.NBU_CLIENT.MAXJOBS.dick-san01.viarail.intra
20/01/2015 11:43:19 AM - granted resource smtlhq143.NBU_POLICY.MAXJOBS.ddc_tld0_ndmp_nfsdsprd_01
20/01/2015 11:43:19 AM - granted resource 015257
20/01/2015 11:43:19 AM - granted resource ddc-tld0-lto5-003
20/01/2015 11:43:19 AM - granted resource smtlhq143-hcart2-robot-tld-0-dick-san01
20/01/2015 11:43:19 AM - estimated 3222944486 Kbytes needed
20/01/2015 11:43:19 AM - Info nbjm(pid=5220) started backup (backupid=dick-san01_1421772199) job for client dick-san01, policy ddc_tld0_ndmp_nfsdsprd_01, schedule incr_daily on storage unit smtlhq143-hcart2-robot-tld-0-dick-san01
20/01/2015 11:43:19 AM - started process bpbrm (9432)
20/01/2015 11:43:19 AM - connecting
20/01/2015 11:43:19 AM - connected; connect time: 0:00:00
20/01/2015 11:43:20 AM - Info bptm(pid=7096) using 64 data buffers
20/01/2015 11:43:20 AM - Info bptm(pid=7096) using 65536 data buffer size
20/01/2015 11:43:20 AM - Info bptm(pid=7096) start backup
20/01/2015 11:43:20 AM - Info bptm(pid=7096) Waiting for mount of media id 015257 (copy 1) on server smtlhq143.
20/01/2015 11:43:20 AM - mounting 015257
20/01/2015 11:44:15 AM - Info bptm(pid=7096) media id 015257 mounted on drive index 0, drivepath nrst1a, drivename ddc-tld0-lto5-003, copy 1
20/01/2015 11:44:15 AM - mounted; mount time: 0:00:55
20/01/2015 11:44:19 AM - Info ndmpagent(pid=6756) dick-san01: SCSI: TAPE READ: short read for nrst1a
20/01/2015 11:44:19 AM - positioning 015257 to file 73
20/01/2015 11:45:19 AM - Error bptm(pid=7096) io_ioctl_ndmp (MTFSF) failed on media id 015257, drive index 0, return code 18 (NDMP_XDR_DECODE_ERR) (bptm.c.7157)
20/01/2015 11:45:19 AM - Error ndmpagent(pid=6756) eof is set, connection 000000000101AF80
20/01/2015 11:45:19 AM - Error ndmpagent(pid=6756) NDMP backup failed, path = UNKNOWN
20/01/2015 11:45:20 AM - Error ndmpagent(pid=6756) connection 0x101b360 ndmp_message_process_one_failed, status = NDMP_ILLEGAL_STATE_ERR
20/01/2015 11:45:20 AM - Error ndmpagent(pid=6756) eof is set - connection 0x101b360
20/01/2015 11:45:20 AM - Info bptm(pid=7096) EXITING with status 86 <----------
20/01/2015 11:45:20 AM - Info ndmpagent(pid=0) done. status: 86: media position error
20/01/2015 11:45:20 AM - end writing
media position error(86)
Error 2 (start writing then fail)
20/01/2015 9:58:47 AM - Info nbjm(pid=5220) starting backup job (jobid=175367) for client dick-san01, policy ddc_tld0_ndmp_nfsdsprd_05, schedule incr_daily
20/01/2015 9:58:47 AM - Info nbjm(pid=5220) requesting STANDARD_RESOURCE resources from RB for backup job (jobid=175367, request id:{A7DF3C66-7600-4CE0-B086-CD43C24F91D5})
20/01/2015 9:58:47 AM - requesting resource smtlhq143-hcart2-robot-tld-0-dick-san01
20/01/2015 9:58:47 AM - requesting resource smtlhq143.NBU_CLIENT.MAXJOBS.dick-san01
20/01/2015 9:58:47 AM - requesting resource smtlhq143.NBU_POLICY.MAXJOBS.ddc_tld0_ndmp_nfsdsprd_05
20/01/2015 9:58:48 AM - Info bpbrm(pid=7036) dick-san01 is the host to backup data from
20/01/2015 9:58:48 AM - Info bpbrm(pid=7036) reading file list for client
20/01/2015 9:58:48 AM - Info bpbrm(pid=7036) starting ndmpagent on client
20/01/2015 9:58:48 AM - Info ndmpagent(pid=9764) Backup started
20/01/2015 9:58:48 AM - Info bptm(pid=1392) start
20/01/2015 9:58:48 AM - Info bptm(pid=1392) using 64 data buffers
20/01/2015 9:58:48 AM - Info bptm(pid=1392) using 65536 data buffer size
20/01/2015 9:58:48 AM - granted resource smtlhq143.NBU_CLIENT.MAXJOBS.dick-san01
20/01/2015 9:58:48 AM - granted resource smtlhq143.NBU_POLICY.MAXJOBS.ddc_tld0_ndmp_nfsdsprd_05
20/01/2015 9:58:48 AM - granted resource 015176
20/01/2015 9:58:48 AM - granted resource ddc-tld0-lto5-002
20/01/2015 9:58:48 AM - granted resource smtlhq143-hcart2-robot-tld-0-dick-san01
20/01/2015 9:58:48 AM - estimated 1340055948 Kbytes needed
20/01/2015 9:58:48 AM - Info nbjm(pid=5220) started backup (backupid=dick-san01_1421765928) job for client dick-san01, policy ddc_tld0_ndmp_nfsdsprd_05, schedule incr_daily on storage unit smtlhq143-hcart2-robot-tld-0-dick-san01
20/01/2015 9:58:48 AM - started process bpbrm (7036)
20/01/2015 9:58:48 AM - connecting
20/01/2015 9:58:48 AM - connected; connect time: 0:00:00
20/01/2015 9:58:49 AM - Info bptm(pid=1392) start backup
20/01/2015 9:58:49 AM - Info bptm(pid=1392) Waiting for mount of media id 015176 (copy 1) on server smtlhq143.
20/01/2015 9:58:49 AM - mounting 015176
20/01/2015 9:59:45 AM - Info bptm(pid=1392) media id 015176 mounted on drive index 2, drivepath nrst0a, drivename ddc-tld0-lto5-002, copy 1
20/01/2015 9:59:45 AM - mounted; mount time: 0:00:56
20/01/2015 9:59:49 AM - Info ndmpagent(pid=9764) dick-san01: SCSI: TAPE READ: short read for nrst0a
20/01/2015 9:59:49 AM - positioning 015176 to file 1
20/01/2015 9:59:53 AM - Info ndmpagent(pid=9764) dick-san01: DUMP: Using inowalk incremental dump algorithm for qtree
20/01/2015 9:59:53 AM - positioned 015176; position time: 0:00:04
20/01/2015 9:59:53 AM - begin writing
20/01/2015 10:00:04 AM - Info ndmpagent(pid=9764) dick-san01: DUMP: Date of this level 2 dump: Tue Jan 20 02:39:03 2015.
20/01/2015 10:00:04 AM - Info ndmpagent(pid=9764) dick-san01: DUMP: Date of last level 1 dump: Mon Jan 19 02:40:09 2015.
20/01/2015 10:00:04 AM - Info ndmpagent(pid=9764) dick-san01: DUMP: Dumping /vol/v_nfsds_prd_05/q_nfsds_prd_05/.snapshot/smvi_backup_datastore_prod_05_mfm_only_novmsnap-daily_recent to NDMP connection
20/01/2015 10:00:04 AM - Info ndmpagent(pid=9764) dick-san01: DUMP: mapping (Pass I)[regular files]
20/01/2015 10:00:40 AM - Info ndmpagent(pid=9764) dick-san01: DUMP: mapping (Pass II)[directories]
20/01/2015 10:00:40 AM - Info ndmpagent(pid=9764) dick-san01: DUMP: estimated 848756807 KB.
20/01/2015 10:00:40 AM - Info ndmpagent(pid=9764) dick-san01: DUMP: dumping (Pass III) [directories]
20/01/2015 10:00:40 AM - Info ndmpagent(pid=9764) dick-san01: DUMP: dumping (Pass IV) [regular files]
20/01/2015 10:01:40 AM - Info ndmpagent(pid=9764) dick-san01: MOVER: signaling backup EOW at 10485760000 (160000 records)
20/01/2015 10:02:49 AM - Info ndmpagent(pid=9764) dick-san01: MOVER: signaling backup EOW at 20971520000 (320000 records)
20/01/2015 10:02:50 AM - Error ndmpagent(pid=9764) terminated by parent process
20/01/2015 10:02:50 AM - Info ndmpagent(pid=0) done
20/01/2015 10:03:03 AM - Info bptm(pid=1392) EXITING with status 24 <----------
20/01/2015 10:03:03 AM - Error ndmpagent(pid=9764) Connection was closed but has not yet been destroyed.
20/01/2015 10:03:03 AM - Info ndmpagent(pid=0) done. status: 24: socket write failed
20/01/2015 10:03:03 AM - Error ndmpagent(pid=9764) Connection was closed but has not yet been destroyed.
20/01/2015 10:03:03 AM - end writing; write time: 0:03:10
socket write failed(24)
20/01/2015 10:03:03 AM - Error ndmpagent(pid=9764) MoverGetState called with no session
20/01/2015 10:03:03 AM - Error ndmpagent(pid=9764) NDMP backup failed, path = /vol/v_nfsds_prd_05/q_nfsds_prd_05/.snapshot/smvi_backup_datastore_prod_05_mfm_only_novmsnap-daily_recent
20/01/2015 10:03:03 AM - Error ndmpagent(pid=9764) Connection was closed but has not yet been destroyed.
20/01/2015 10:03:03 AM - Error ndmpagent(pid=9764) Connection was closed but has not yet been destroyed.
20/01/2015 10:03:03 AM - Error ndmpagent(pid=9764) Connection was closed but has not yet been destroyed.
20/01/2015 10:03:03 AM - Error ndmpagent(pid=9764) Connection was closed but has not yet been destroyed.
20/01/2015 10:03:03 AM - Error ndmpagent(pid=9764) Connection was closed but has not yet been destroyed.
I was able to write a dump 0f of Netapp snapshot to tape using NepApp command line.
Robot has been restarted.
still no luck. any help will be appreciated. Thank you.