Project

General

Profile

Bug #46360

Updated by Ramana Raja almost 4 years ago

During Hit the EDQUOT error by libcephfs during `fs subvolume clone`, libcephfs hit the "Disk quota exceeded error" that clone` and this caused the subvolume clone fs clones to be stuck in progress instead of entering failed state. I could see the following traceback in the mgr log, 

 failing. 
 <pre> 
   File "/home/rraja/git/ceph/src/pybind/mgr/volumes/fs/fs_util.py", line 117, in copy_file 
     written += fs.write(dst_fd, data[written:], -1) 
   File "cephfs.pyx", line 1463, in cephfs.LibCephFS.write 
 cephfs.Error: error in write: Disk quota exceeded [Errno 122] 

 During handling of the above exception, another exception occurred: 

 Traceback (most recent call last): 
   File "/home/rraja/git/ceph/src/pybind/mgr/volumes/fs/async_job.py", line 44, in run 
     self.async_job.execute_job(vol_job[0], vol_job[1], should_cancel=lambda: thread_id.should_cancel()) 
   File "/home/rraja/git/ceph/src/pybind/mgr/volumes/fs/async_cloner.py", line 309, in execute_job 
     clone(self.vc, volname, job[0].decode('utf-8'), job[1].decode('utf-8'), self.state_table, should_cancel) 
   File "/home/rraja/git/ceph/src/pybind/mgr/volumes/fs/async_cloner.py", line 222, in clone 
     start_clone_sm(volume_client, volname, index, groupname, subvolname, state_table, should_cancel) 
   File "/home/rraja/git/ceph/src/pybind/mgr/volumes/fs/async_cloner.py", line 202, in start_clone_sm 
     (next_state, finished) = handler(volume_client, volname, index, groupname, subvolname, should_cancel) 
   File "/home/rraja/git/ceph/src/pybind/mgr/volumes/fs/async_cloner.py", line 159, in handle_clone_in_progress 
     do_clone(volume_client, volname, groupname, subvolname, should_cancel) 
   File "/home/rraja/git/ceph/src/pybind/mgr/volumes/fs/async_cloner.py", line 155, in do_clone 
     bulk_copy(fs_handle, src_path, dst_path, should_cancel) 
   File "/home/rraja/git/ceph/src/pybind/mgr/volumes/fs/async_cloner.py", line 144, in bulk_copy 
     cptree(source_path, dst_path) 
   File "/home/rraja/git/ceph/src/pybind/mgr/volumes/fs/async_cloner.py", line 129, in cptree 
     copy_file(fs_handle, d_full_src, d_full_dst, mo, cancel_check=should_cancel) 
   File "/home/rraja/git/ceph/src/pybind/mgr/volumes/fs/fs_util.py", line 120, in copy_file 
     raise VolumeException(-e.args[0], e.args[1]) 
 TypeError: bad operand type for unary -: 'str' 
 </pre> 


 Digging further found that if a libcephfs return code is not converted into a python exception by cephfs.pyx, then cephfs.pyx raises an exception with a different argument than it normally does. See in cephfs.pyx, 

 <pre> 
 cdef make_ex(ret, msg): 
     """ 
     Translate a librados return code into an exception. 
     """ 
     ret = abs(ret) 
     if ret in errno_to_exception: 
         return errno_to_exception[ret](ret, msg) 
     else: 
         return Error(msg + ': {} [Errno {:d}]'.format(os.strerror(ret), ret)) 
 </pre> 

 So it sometimes raises cephfs.Error(ret, msg) and sometimes cephfs.Error(msg). The mgr/volumes only handles cephfs.Error(ret, msg) correctly. 

Back