Project

General

Profile

Actions

Bug #2524

closed

librados crashed while connecting to cluster

Added by Xiaopong Tran almost 12 years ago. Updated almost 12 years ago.

Status:
Won't Fix
Priority:
High
Assignee:
-
Category:
-
Target version:
-
% Done:

0%

Source:
Development
Tags:
Backport:
Regression:
Severity:
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):

Description

Librados crahsed while connecting to the cluster.

Here is some log information. Unfortunately, I don't have more information regarding the crash. It is quite hard to reproduce.


Thread::try_create(): pthread_create failed with error 11common/Thread.cc: In function 'void Thread::create(size_t)' thread 7fa83c660700 time 2012-06-07 18:36:59.553989
common/Thread.cc: 108: FAILED assert(ret == 0)
ceph version 0.47.2 (8bf9fde89bd6ebc4b0645b2fe02dadb1c17ad372)
1: (Thread::create(unsigned long)+0x8a) [0x7fa860bd7f8a]
2: (SimpleMessenger::Pipe::connect()+0x3377) [0x7fa860b8e577]
3: (SimpleMessenger::Pipe::writer()+0x8ae) [0x7fa860b8f8ce]
4: (SimpleMessenger::Pipe::Writer::entry()+0xd) [0x7fa860a7629d]
5: (()+0x7e9a) [0x7fa86979ae9a]
6: (clone()+0x6d) [0x7fa8692c04bd]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
terminate called after throwing an instance of 'ceph::FailedAssertion'
Aborted


I'm not sure how to do "objdump -rdS <executable>", because this is wrapped in an Erlang application. I have written an Erlang NIF to the librados API (https://github.com/renzhi/erlrados) so that we can access the storage from Erlang. The NIF functions are package into a shared library (.so) and it is loaded by the Erlang program, and calls are made from Erlang thru NIF functions to the librados API. Please let me know how you'd like me to provide to troubleshoot this.

Normally, it is running ok, but during load testing, we got crashes quite often when calling either one of the following functions:

rados_connect()
rados_shutdown() *
rados_conf_read_file() *
rados_ioctx_destroy()

The ones with the star happen the most. But it is quite hard to reproduce. It usually happens when we have a lot of concurrent clients, i.e. a lot of clients making call to librados API at the same time to read and write to the storage.


Files

objdump.txt (1.19 MB) objdump.txt Xiaopong Tran, 06/07/2012 06:45 PM
test_rados.dump.txt (52.7 KB) test_rados.dump.txt Xiaopong Tran, 06/08/2012 03:15 AM
main.cpp (4.53 KB) main.cpp Xiaopong Tran, 06/08/2012 03:15 AM
Makefile (389 Bytes) Makefile Xiaopong Tran, 06/08/2012 03:15 AM
Actions #1

Updated by Greg Farnum almost 12 years ago

This assert means that either a malloc or a call to pthread_create failed. It's probably resource exhaustion of some type; if you can provide more details we might be able to suggest something.

Actions #2

Updated by Xiaopong Tran almost 12 years ago

This is weird, if the problem is caused by resource exhaustion. I run this app on a machine with i7 CPU (with 8 cores), 16GB of RAM. Right before the crash, ps showed the VSZ of the application around 142MB. I just had a few other apps running at the time, free showed I still have at least 8GB of RAM not being used. When it crashed, I only had about 20 connections. This is still far from the thousands and tens of thousands that we are expecting.

What would you suggest I do to try to reproduce that? Right now, I've found no pattern. I can help with testing this.

Actions #3

Updated by Xiaopong Tran almost 12 years ago

objdump on the NIF shared library.

Updated by Xiaopong Tran almost 12 years ago

Alright, more information. I was thinking, maybe it was the max number of open files, or the stack size is too low, so I change these two parameters to the following values:

xp@china:~/workspace-xp/stress.rados$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 128005
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 100000
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 50000
cpu time (seconds, -t) unlimited
max user processes (-u) 128005
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited

And I have plenty of memory:

xp@china:~/workspace-xp/stress.rados$ free
total used free shared buffers cached
Mem: 16402252 2697460 13704792 0 70864 689208
-/+ buffers/cache: 1937388 14464864
Swap: 15625212 0 15625212

But it still crashed. The attached files include:

- result from objdump
- The source code of the program
- Makefile

This is how I run the program:

./test_rados -n 50000 -p test1001 -i infile

-n specifies how many threads I want to have. Each thread will create a cluster handle, connect to it, and write the input file to the pool.

-p name of the pool

-i input file to write to rados

Please let me know what other information need to help troubleshoot this issue.

Actions #5

Updated by Xiaopong Tran almost 12 years ago

Ah, formatting... sorry

xp@china:~/workspace-xp/stress.rados$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 128005
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 100000
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 50000
cpu time               (seconds, -t) unlimited
max user processes              (-u) 128005
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited
xp@china:~/workspace-xp/stress.rados$ free
             total       used       free     shared    buffers     cached
Mem:      16402252    2697460   13704792          0      70864     689208
-/+ buffers/cache:    1937388   14464864
Swap:     15625212          0   15625212

BTW, I think it would great if the API returns an error code nicely instead of just crashing.

Actions #6

Updated by Sage Weil almost 12 years ago

can you cat /proc/sys/kernel/threads-max ? on my system it's only 127837.

Actions #7

Updated by Sage Weil almost 12 years ago

Sage Weil wrote:

can you cat /proc/sys/kernel/threads-max ? on my system it's only 127837.

Yeah, for each librados 'instance' (via rados_create) there are several threads that run (3-4 minimum, but more depending on how many osds you talk to). on my system, this will exhaust the kernel's per-process thread limit well before -n 40000.

Actions #8

Updated by Xiaopong Tran almost 12 years ago

This is on my system:

xp@china:~/workspace-xp/stress.rados$ cat /proc/sys/kernel/threads-max 
256010

Does it create a thread to every configured osd or only one thread to every osd where the data is located? If it's the first case, and if we have a lot of osd, that could be quite an issue. For testing, we only have 3 osd.

Actions #9

Updated by Xiaopong Tran almost 12 years ago

I bumped up the threads-max to:

xp@china:~/workspace-xp/stress.rados$ cat /proc/sys/kernel/threads-max 
768030

And it still crashed at less than 5000 connections :(

Actions #10

Updated by Sage Weil almost 12 years ago

Xiaopong Tran wrote:

This is on my system:
[...]

Does it create a thread to every configured osd or only one thread to every osd where the data is located? If it's the first case, and if we have a lot of osd, that could be quite an issue. For testing, we only have 3 osd.

There are two threads per peer it is communicating with, plus several in overhead for each instance. How many you really consume depends somewhat on the size of the cluster and how much io each instance does.

The client library is really not intended to be instantiated thousands of times in the same process... that is horribly inefficient because you are duplicating all of the cluster session state, TCP connections, etc. You're better off sharing a single instance. librados is thread safe.

If you're just trying to do load testing, generate more workload on each client instance, and spawn multiple processes. That's much closer to what would happen in a real environment anyway.

Actions #11

Updated by Xiaopong Tran almost 12 years ago

Thanks for the update. Yes, we do have different models, including a pool of set number of rados_t instances, etc. But since we are still doing testing, we'd like to know how it behaves with different ways of connecting to the cluster. And there's an issue (#2525) which Josh opened right after this one, which we might have to check as well.

Actions #12

Updated by Sage Weil almost 12 years ago

  • Status changed from New to Won't Fix
Actions

Also available in: Atom PDF