Project

General

Profile

Actions

Bug #2524

closed

librados crashed while connecting to cluster

Added by Xiaopong Tran almost 12 years ago. Updated almost 12 years ago.

Status:
Won't Fix
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Librados crahsed while connecting to the cluster.

Here is some log information. Unfortunately, I don't have more information regarding the crash. It is quite hard to reproduce.


Thread::try_create(): pthread_create failed with error 11common/Thread.cc: In function 'void Thread::create(size_t)' thread 7fa83c660700 time 2012-06-07 18:36:59.553989
common/Thread.cc: 108: FAILED assert(ret == 0)
ceph version 0.47.2 (8bf9fde89bd6ebc4b0645b2fe02dadb1c17ad372)
1: (Thread::create(unsigned long)+0x8a) [0x7fa860bd7f8a]
2: (SimpleMessenger::Pipe::connect()+0x3377) [0x7fa860b8e577]
3: (SimpleMessenger::Pipe::writer()+0x8ae) [0x7fa860b8f8ce]
4: (SimpleMessenger::Pipe::Writer::entry()+0xd) [0x7fa860a7629d]
5: (()+0x7e9a) [0x7fa86979ae9a]
6: (clone()+0x6d) [0x7fa8692c04bd]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
terminate called after throwing an instance of 'ceph::FailedAssertion'
Aborted


I'm not sure how to do "objdump -rdS <executable>", because this is wrapped in an Erlang application. I have written an Erlang NIF to the librados API (https://github.com/renzhi/erlrados) so that we can access the storage from Erlang. The NIF functions are package into a shared library (.so) and it is loaded by the Erlang program, and calls are made from Erlang thru NIF functions to the librados API. Please let me know how you'd like me to provide to troubleshoot this.

Normally, it is running ok, but during load testing, we got crashes quite often when calling either one of the following functions:

rados_connect()
rados_shutdown() *
rados_conf_read_file() *
rados_ioctx_destroy()

The ones with the star happen the most. But it is quite hard to reproduce. It usually happens when we have a lot of concurrent clients, i.e. a lot of clients making call to librados API at the same time to read and write to the storage.


Files

objdump.txt (1.19 MB) objdump.txt Xiaopong Tran, 06/07/2012 06:45 PM
test_rados.dump.txt (52.7 KB) test_rados.dump.txt Xiaopong Tran, 06/08/2012 03:15 AM
main.cpp (4.53 KB) main.cpp Xiaopong Tran, 06/08/2012 03:15 AM
Makefile (389 Bytes) Makefile Xiaopong Tran, 06/08/2012 03:15 AM
Actions

Also available in: Atom PDF