Project

General

Profile

Bug #2064

ceph-client: messenger: nocrc flag not implemented correctly

Added by Alex Elder about 12 years ago. Updated almost 12 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
-
% Done:

0%

Source:
Tags:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Crash signature (v1):
Crash signature (v2):

Description

The "nocrc" option is supposed to disable CRC32 calculation on messages
sent between ceph entities. The default is that the CRC should be
computed on messages written and verified when they're read.

In read_partial_message() and write_partial_msg_pages() however, it looks
like the logic is reversed, so if the "nocrc" flag is set, then CRC's
are used, and if it is not set they are not. I.e., by default we are
not computing and checking message CRC's.

History

#1 Updated by Sage Weil about 12 years ago

  • Priority changed from Normal to Urgent

#2 Updated by Alex Elder about 12 years ago

This is a trivial fix. The only thing that needs to be done really is
testing the result to make sure things don't fall apart. Unfortunately
I have had no access to the Teuthology since yesterday afternoon, and
until that is fixed I won't have any confidence in the fix for this.

#3 Updated by Alex Elder about 12 years ago

The problem also exists in write_partial_msg_pages(). In other words, it
looks like neither the reader nor the writer has been (by default) computing
CRC's for the data portion of messages. The header and footer, and the
front and middle sections of messages are computed unconditionally.

Note that it looks to me like the footer crc is not checked on receive.
I'll file a new bug on that once I've verified that...

#4 Updated by Alex Elder about 12 years ago

  • Status changed from New to In Progress

I've been unable to test rigorously today so I analyzed the affected code.

If the "nocrc" flag is supplied, the ceph_options->flags field gets the
CEPH_OPT_NOCRC set. (And conversely if "crc" is supplied, that flag is
cleared.)

The CEPH_OPT_NOCRC flag is checked in two places:
- In ceph_show_options(), "nocrc" is added to the options shown if it's set.
- And in ceph_create_client(), "nocrc" flag on the newly-created client's
messenger gets assigned (via ceph_test_opt()) either true or false based
on whether CEPH_OPT_NOCRC is set or cleared, respectively.

A messenger's "nocrc" flag is used in two places:
- write_partial_msg_pages()
- read_partial_message()
In both of these the value is assigned to a local variable, and that
variable is used to determine whether to compute a CRC on the data
portion of the message and either send or verify it.

That's it. These last two uses of the "nocrc" flag is where the logic
is inverted. The fix is to logically negate the assigned value.

In write_partial_msg_pages(), the assigned flag is used consistently
to determine whether to map a page to a memory address, with the result
passed to crc32c(). It is later used to determine whether to unmap
the mapped page address, and finally, it is used to decide whether the
CEPH_MSG_FOOTER_NOCRC should be set in the message footer flags. This
all appears to be correct.

In read_partial_message() the assigned flag is passed to either
read_partial_message_pages() or read_partial_message_bio(), and
later if it's used--if the CEPH_MSG_FOOTER_NOCRC flag is not set in
the message footer--to determine whether to check the data CRC
received in the message footer matches the CRC calculated for the
data portion of the message. Assuming the two called functions
use the values correctly, this looks correct.

read_partial_message_pages() receives no more than one page from
the socket, and if the "do_datacrc" flag indicates to, updates
the computed data CRC based on the data received. This looks
right.

The logic in read_partial_message_bio() matches the above, using
the length of the current bio segment rather than the current page
as the limit to the size of the data received.

So in summary, I have confidence that simply inverting the
"nocrc" values as described above will produce the desired,
correct behavior.

Now it just has to be tested...

#5 Updated by Alex Elder about 12 years ago

This is fixed by this commit:

086da4c6f8  libceph: fix inverted crc option logic

That is now present in the ceph-client testing branch (subject to nightly
testing), and will be moved into the master branch soon.

#6 Updated by Alex Elder about 12 years ago

  • Status changed from In Progress to 7

Update: the commit had to be rebased, so it's id is now: 4d3e7aa992

#7 Updated by Alex Elder about 12 years ago

It got rebased once more, and this should be the last:
37675b0f42a8f7699c3602350d1c3b2a1698a3d3
This has been sent as part of a pull request to Linus for
Linux 3.4. Will close this for good when I see it's been
accepted into mainline.

#8 Updated by Alex Elder almost 12 years ago

  • Status changed from 7 to Resolved

Linus pulled in the changes without any immediate trouble, so
I'm marking this and a few others resolved.

Also available in: Atom PDF