On Tue, Dec 11, 2012 at 3:06 PM, Sébastien Barthélémy
<barthelemy [..] ...> wrote:
> Hello,
>
> if I got it right, DataObjectLockFree advertises more than it actually does.
> See attached patch.
Hmm. Thanks for fixing, the code diverged a lot from the docs. What
remains undocumented is the importance of max_threads. We should
document this variable instead of :
+ *
+ * The internal buffer can get full if too many concurrent reads are
+ * taking to long. In such a case, the write occurs anyway and late
+ * readers can get corrupted values.
Which should imo read:
+ *
+ * The internal buffer will get full if more concurrent threads
than max_threads
+ * are accessing this object. In such a case, the read occurs
anyway and late
+ * readers can get corrupted values.
And then also fixup the documentation of the constructor. So for
clarity, this is a single-write 'max_threads-1' readers
implementation, and it can't detect itself if the user violates this.
I suppose you were you bitten by these false promises ?
Peter
DataObjectLockFree: fix facts in doxygen
On Wed, Dec 12, 2012 at 1:07 PM, Sébastien Barthélémy
<barthelemy [..] ...> wrote:
>
>
> On Tue, Dec 11, 2012 at 3:46 PM, Peter Soetens <peter [..] ...>
> wrote:
>>
>> Which should imo read:
>>
>> + *
>> + * The internal buffer will get full if more concurrent threads
>> than max_threads
>> + * are accessing this object. In such a case, the read occurs
>> anyway and late
>> + * readers can get corrupted values.
>
>
>
> I misread the return at line 210 for a break. So with this new reading, I
> think
> data corruption cannot occur if the buffer gets full: new readers will get
> an old value, but it won't be corrupted.
>
> So please do not consider my previous patch but instead the two attached to
> this email.
>
> The first one fixes the doxygen regarding the single writer thing.
>
> The second improves the comments (up to my understanding), especially
> regarding the race condition.
>
> Regarding the way this race condition is dealt with (in the loop at line
> 170), if I got it right, a reader thread can get delayed if a write occurs
> while it is between lines 171 and 172: it has to spin the loop again (it
> kinds of polls the buffer).
>
> If this happens repetitively the reader might be delayed forever. Even if
> the writer has lower priority.
On the contrary ! This is the whole reason of using lock-free loops
instead of mutexes: if the writer has lower priority, it won't preampt
the read and the read returns immediately. If the writer has higher
priority, the writer returns immediately. It's always the higher
priority thread which is favoured, and never the lower priority
thread. Indeed, the lower priority thread can starve, but that's your
architecture doing that, not my algorithm.
>
> I agree that this would be an occurrence of "very high frequency systematic
> bad luck", but still, is not that lack of determinism a concern?
It's 100% deterministic for the highest priority thread contending on this data.
>
> I have no great solution to propose. Replacing
>
> reading != read_ptr
>
> with
>
> reading != write_ptr
>
> would reduce the probability of occurrence, at the price of a loss of some
> freshness.
>
>
> Hopefully, I got it wrong, or some pre-condition I'm not aware of ensures
> Murphy cannot win in this case.
He can't. We left nothing to the coincidence.
Peter
DataObjectLockFree: fix facts in doxygen
On Tue, 11 Dec 2012, Sébastien Barthélémy wrote:
> Hello,
>
> if I got it right, DataObjectLockFree advertises more than it actually does. See attached patch.
Can you comment on your patch, please?
Because I am a bit confused with the remarks in the patch about "too many
readers", or "reading takes too long", without quantification of the "too"
violations.
Anyway, the Lock Free pattern has as pre-condition for its use that there are
maximally N clients of the data, so that it can foresee N+1 buffer spaces
of the shared data.
Did you then still find problems with the code in cases where this
pre-condition is satisfied? Since that would be a big bug, I guess.
Or are you trying to add more robustness into the implementation in case
the pre-condition is not satisfied? If so, I then think this is not the
right place to realise such robustness. (The right place is at deployment,
which has to guarantee/check that the pre-conditions are fulfilled.)
Maybe I am completely mis-interpreting your patch, so please don't feel shy
to tell me frankly :-)
DataObjectLockFree: fix facts in doxygen
On Wed, Dec 12, 2012 at 1:07 PM, Sébastien Barthélémy
<barthelemy [..] ...> wrote:
>
>
> On Tue, Dec 11, 2012 at 3:46 PM, Peter Soetens <peter [..] ...>
> wrote:
>>
>> Which should imo read:
>>
>> + *
>> + * The internal buffer will get full if more concurrent threads
>> than max_threads
>> + * are accessing this object. In such a case, the read occurs
>> anyway and late
>> + * readers can get corrupted values.
>
>
>
> I misread the return at line 210 for a break. So with this new reading, I
> think
> data corruption cannot occur if the buffer gets full: new readers will get
> an old value, but it won't be corrupted.
>
> So please do not consider my previous patch but instead the two attached to
> this email.
>
> The first one fixes the doxygen regarding the single writer thing.
>
> The second improves the comments (up to my understanding), especially
> regarding the race condition.
>
> Regarding the way this race condition is dealt with (in the loop at line
> 170), if I got it right, a reader thread can get delayed if a write occurs
> while it is between lines 171 and 172: it has to spin the loop again (it
> kinds of polls the buffer).
>
> If this happens repetitively the reader might be delayed forever. Even if
> the writer has lower priority.
>
> I agree that this would be an occurrence of "very high frequency systematic
> bad luck", but still, is not that lack of determinism a concern?
>
> I have no great solution to propose. Replacing
>
> reading != read_ptr
>
> with
>
> reading != write_ptr
>
> would reduce the probability of occurrence, at the price of a loss of some
> freshness.
>
>
> Hopefully, I got it wrong, or some pre-condition I'm not aware of ensures
> Murphy cannot win in this case.
Please stop quoting data corruption in your patches. We have setup
extreme unit tests testing any combination of concurrent reads and
writes for hours on multi-core machines and this algorithm will never
corrupt your data. Also checkout the RTT unit tests which have never
found a bug in this very algorithm with multiple threads/cores.
A write might fail (ie result in a nop) if you don't respect the
max_threads, that's it.
Peter
DataObjectLockFree: fix facts in doxygen
On Tue, 11 Dec 2012, Sébastien Barthélémy wrote:
> On Tue, Dec 11, 2012 at 3:50 PM, Herman Bruyninckx <
> Herman [dot] Bruyninckx [..] ...> wrote:
>
>> Anyway, the Lock Free pattern has as pre-condition for its use that there
>> are
>> maximally N clients of the data, so that it can foresee N+1 buffer spaces
>> of the shared data.
>> Did you then still find problems with the code in cases where this
>> pre-condition is satisfied?
>
>
> Yes.
>
> Among the N clients there can be only one writer. This additional
> precondition is clear from the implementation and even more clear from the
> comments at line
> 119.<http://gitorious.org/orocos-toolchain/rtt/blobs/master/rtt/base/DataObjectLockFree.hpp#line194>
> Alas, it was not documented. Worst, the doxygen claimed the contrary (see
> the truth table).
>
>
>> Since that would be a big bug, I guess.
>>
>
> I do not know if having several writers for a non-buffered data port is a
> common scenario.
It should not! Because there can be only one "owner" of a particular piece
of newly created data. But I'm all about making these "hidden assumptions"
more clear (even though they are "natural"), so thanks for your noticing
these problems!
> But it would be a hard to find one certainly, since there is no lock nor
> warning, only random data corruption.
>
> Or are you trying to add more robustness into the implementation in case
>> the pre-condition is not satisfied?
>>
>
> no, I'm just fixing the doc so that it matches the implementation.
>
> Cheers
Best regards,
Herman Bruyninckx