[Bug 627] New: Race condition in disconnecting asynchronous event handlers

For more infomation about this bug, visit <https://www.fmtc.be/bugzilla/orocos/show_bug.cgi?id=627>
Summary: Race condition in disconnecting asynchronous event
handlers
Product: RTT
Version: 1.8.0
Platform: AMD 64bit
OS/Version: All
Status: NEW
Severity: normal
Priority: P3
Component: Real-Time Toolkit (RTT)
AssignedTo: orocos-dev [..] ...
ReportedBy: peter [dot] soetens [..] ...
CC: orocos-dev [..] ...
Estimated Hours: 0.0

The event-test disclosed another race condition when an asynchronous handler is
disconnected and destroyed.

This is the backtrace:

Core was generated by `./event-test'.
Program terminated with signal 6, Aborted.
[New process 4835]
[New process 4834]
[New process 4836]
#0 0x00007f4c8c286015 in raise () from /lib/libc.so.6
(gdb) bt
#0 0x00007f4c8c286015 in raise () from /lib/libc.so.6
#1 0x00007f4c8c287b83 in abort () from /lib/libc.so.6
#2 0x00007f4c8cb2af94 in __gnu_cxx::__verbose_terminate_handler () from
/usr/lib/libstdc++.so.6
#3 0x00007f4c8cb29396 in ?? () from /usr/lib/libstdc++.so.6
#4 0x00007f4c8cb293c3 in std::terminate () from /usr/lib/libstdc++.so.6
#5 0x00007f4c8cb29c6f in __cxa_pure_virtual () from /usr/lib/libstdc++.so.6
#6 0x00007f4c8d481e66 in boost::_mfi::mf0<void,
RTT::detail::EventCatcher>::operator() (this=0x40ad8fd0, p=0x1b40140) at
/usr/include/boost/bind/mem_fn_template.hpp:49
#7 0x00007f4c8d481eac in boost::_bi::list1<boost::arg<1>
>::operator()<boost::_mfi::mf0 boost::_bi::list1<RTT::detail::EventCatcher*&> > (this=0x40ad8fe0,
f=@0x40ad8fd0, a=@0x40ad8f50)
at /usr/include/boost/bind.hpp:232
#8 0x00007f4c8d481ef2 in boost::_bi::bind_t<void, boost::_mfi::mf0 RTT::detail::EventCatcher>, boost::_bi::list1<boost::arg<1> >
>::operator()<RTT::detail::EventCatcher*> (this=0x40ad8fd0, a1=@0x1b3afa8)
at /usr/include/boost/bind/bind_template.hpp:32
#9 0x00007f4c8d482083 in
RTT::ListLockFree<RTT::detail::EventCatcher*>::apply<boost::_bi::bind_t boost::_mfi::mf0 boost::_bi::list1<boost::arg<1> > > > (this=0x1b3ac28, func=
{f_ = {f_ = &virtual table offset 16}, l_ =
{<boost::_bi::storage1 fields>}}) at
/home/sspr/src/www/orocos-1.0/export/build/orocos-rtt-1.8.1/src/impl/../ListLockFree.hpp:451
#10 0x00007f4c8d4811cc in RTT::EventProcessor::step (this=0x1b3ac10) at
/home/sspr/src/www/orocos-1.0/export/build/orocos-rtt-1.8.1/src/EventProcessor.cpp:116
#11 0x00007f4c8d4cf67c in RTT::OS::RunnableInterface::loop (this=0x1b3ac10) at
/home/sspr/src/www/orocos-1.0/export/build/orocos-rtt-1.8.1/src/os/RunnableInterface.cpp:60
#12 0x00007f4c8d4a6a70 in RTT::NonPeriodicActivity::loop (this=0x1b3ac58) at
/home/sspr/src/www/orocos-1.0/export/build/orocos-rtt-1.8.1/src/NonPeriodicActivity.cpp:84
#13 0x00007f4c8d4cf342 in RTT::OS::singleThread_f (t=0x1b3ac68) at
/home/sspr/src/www/orocos-1.0/export/build/orocos-rtt-1.8.1/src/os/SingleThread.cpp:111
#14 0x00007f4c8eb813ea in start_thread () from /lib/libpthread.so.0
#15 0x00007f4c8c339cbd in clone () from /lib/libc.so.6
#16 0x0000000000000000 in ?? ()
(gdb) info threads
3 process 4836 0x00007f4c8eb872f1 in sem_wait () from /lib/libpthread.so.0
2 process 4834 0x00007f4c8eb874bd in sem_post () from /lib/libpthread.so.0
* 1 process 4835 0x00007f4c8c286015 in raise () from /lib/libc.so.6

A thread (CompletionProcessor) is trying to process an EventCatcher object
which has already been cleaned up because its handler object was cleaned up as
well.

The work around is to keep your handlers (RTT::Handler) for asynchronous
callbacks long enough alive :-/

I don't know yet how to fix this. It would require reference counting and
allowing the EventProcessor to cleanup EventCatcher objects (only when the race
occurs),
which wouldn't be quite real-time (but is better than seg-faulting).

Peter

[Bug 627] Race condition in disconnecting asynchronous event han

For more infomation about this bug, visit <https://www.fmtc.be/bugzilla/orocos/show_bug.cgi?id=627>

--- Comment #1 from Herman Bruyninckx <herman [dot] bruyninckx [..] ...> 2009-03-03 11:16:26 ---
(In reply to comment #0)
> The event-test disclosed another race condition when an asynchronous
> handler is disconnected and destroyed.
>
[...]
> A thread (CompletionProcessor) is trying to process an EventCatcher object
> which has already been cleaned up because its handler object was cleaned up as
> well.
>
> The work around is to keep your handlers (RTT::Handler) for asynchronous
> callbacks long enough alive :-/
>
> I don't know yet how to fix this. It would require reference counting and
> allowing the EventProcessor to cleanup EventCatcher objects (only when the race
> occurs),
> which wouldn't be quite real-time (but is better than seg-faulting).

This is a typical case of "Coordination" between several activities. The
_only_ way that it can be solved is indeed by a non-realtime coordination
activity that has full knowledge about the computations and communications
that are going on in the system that it coordinates.

Hence, I would not try to find the "bug" in the current code and improve
it: this kind of things require _extra_ stuff that isn't yet in Orocos.