Dear All,
the post about "messages" was going to became a debate about event handling, therefore I have created this separate post.
As I have stated in the wiki page [Contribute! Which weakness have you detected in RTT?], I have seen a serious issue in the current implementation of synchronous event handling.
We do all agree (I guess)that, from a theoretical point of view, something like an "immediate" reaction to an event must be present in RTT.
On the other hand, if the event handle is executed in the thread of the component which emitted the event (TaskA), it is easy to demonstrate that we are very sensible to catastrophic errors (and it is impossible to prevent such situation in the user's code!!!).
The only solution I can see is that the subscribers to the event (TaskB) creates an extra thread responsible for the execution of synchronous handles.
I know, I know!! it sounds non optimal, resource consuming and "dirty", but I insist that the current implementation is much unsafer.
I hope you can be help finding out a better solution.
Davide
synch event handling
ok, I was too optimistic thinking that adetailed explanation was not needed...
This is the weakness I see.
That means that it is executing code that its writer doesn't know because it is part of the callee's code.
This is indeed a huge problem: think about it.
You put in a lab a robot full of state-of-the-art Orocos components: motors-controllers, global controller with impedance control, obstacle avoidance, human detection for safety...
You give the robot to a student for experiments thinking "it can't go wrong".
But the young researcher subscribe synchronously to some event, that maybe was just a warning or a "reached position".
Since in the handle there is a bug (an infinite loop), the vital components such as impedance control and safety are blocked.
As result... robot broken or, even worse, people injured.
Is this the future of Orocos?
We also should think about this kind of risk when it comes to methods, right?
Wrong! When you call a method, you know what you are doing and the risk you have. In my example, the software of the "newbie student" is the one that calls the provided methods. He knows what he is executing! In the worst case it is the buggy thread the one blocked, not the good one.
With synch events, the "good" component is blocked by the "bad" component, and there is no way to foreseen, prevent and avoid it from the good component point of view.
The same philosophy is applied to single process.
There is currently an obsession for single processin RTT. Deployment with Corba is considered an optional feature that, so far, wasn't that important. "Of course we have Corba, but forget about it, use a single process"
If so, this is not a real problem for collocated caller and callee: if the former "crashes", the latter crashes too, since they are both in the same process space.
No real problem? It is a terrible problem! One more time even the strongest and most robust code in the world can be corrupted by an attached components with some bugs!
Considering the goal of Orocos (and thinking about BRICS) I think that we need to bring security at the first place.
if both are not in the same process space, you can _only_ use asynchronous events anyway; the 'synchronous' part is then nothing else but registering the fact that an event was emitted by the caller, and then forwarding this to the "middleware". That middleware will have to have some (configurable) policy about what to do with such "remote event handling" errors.
Absolutely right! I think that the synchronous event handling should be substituted with a asynch handle with a mechanism of prioritized execution (best effort to execute the handle as soon as possible).
Davide
synch event handling
On Monday 27 April 2009 12:21:50 faconti [..] ... wrote:
> ok, I was too optimistic thinking that adetailed explanation was not
> needed...
>
> This is the weakness I see.
>
> know because it is part of the callee's code.
> problem: think about it.
> You put in a lab a robot full of state-of-the-art Orocos components:
> motors-controllers, global controller with impedance control, obstacle
> avoidance, human detection for safety... You give the robot to a student
> for experiments thinking "it can't go wrong". But the young researcher
> subscribe synchronously to some event, that maybe was just a warning or a
> "reached position". Since in the handle there is a bug (an infinite loop),
> the vital components such as impedance control and safety are blocked. As
> result... robot broken or, even worse, people injured.
> Is this the future of Orocos?
Safety has been of a lesser priority because of the 'system level' aspect of
it (as Herman puts it). For example: we don't provide a WatchDog component,
you should if your application requires it. Orocos provides all (?) the tools
to create advanced and intelligent watchdogs [1].
BUT, from a component level view, there's also a safety aspect which should
guarantee that your 'server' thread does not go out for lunch because a
'client' process did something stupid: the rule has always been: don't trust
the client. The student example is perfect for this: the student writes a
faulty client application, it fails, the component should survive it.
The question is how far the RTT should go to protect you. As Herman puts it,
throwing in another thread in user code will always work, but if this gets a
pattern, there's clear evidence that users (like in Sander's example) are
working around Orocos deficiencies. Not fixing these is plain stupid and indeed
hurts users or people in many ways.
Since Events are at the root of this discussion, the solution clearly lies in
what Events will evolve to in RTT 2.0. Maybe an event publisher will be able
to choose if syn/asyn callbacks are allowed or not, or everything can be
expresses as messages. I honnestly don't know yet. What *is* clear, is that
the 'local' safety concerns lies with the server component and not the client,
and thus should be specified there.
Peter
[1] I'm not voting against an OCL::WatchDog component. I believe it's required
to solve global/system level problems and that providing a basic one would
help users getting started.
synch event handling
On Mon, 27 Apr 2009, Peter Soetens wrote:
> On Monday 27 April 2009 12:21:50 faconti [..] ... wrote:
>> ok, I was too optimistic thinking that adetailed explanation was not
>> needed...
>>
>> This is the weakness I see.
>>
>> know because it is part of the callee's code.
>> problem: think about it.
>> You put in a lab a robot full of state-of-the-art Orocos components:
>> motors-controllers, global controller with impedance control, obstacle
>> avoidance, human detection for safety... You give the robot to a student
>> for experiments thinking "it can't go wrong". But the young researcher
>> subscribe synchronously to some event, that maybe was just a warning or a
>> "reached position". Since in the handle there is a bug (an infinite loop),
>> the vital components such as impedance control and safety are blocked. As
>> result... robot broken or, even worse, people injured.
>> Is this the future of Orocos?
>
> Safety has been of a lesser priority because of the 'system level' aspect of
> it (as Herman puts it). For example: we don't provide a WatchDog component,
> you should if your application requires it. Orocos provides all (?) the tools
> to create advanced and intelligent watchdogs [1].
>
> BUT, from a component level view, there's also a safety aspect which should
> guarantee that your 'server' thread does not go out for lunch because a
> 'client' process did something stupid: the rule has always been: don't trust
> the client. The student example is perfect for this: the student writes a
> faulty client application, it fails, the component should survive it.
This corroborates what I wanted to express too: safety is to a large extent
the result of a good architecture (on top of bug free components, of
course).
> The question is how far the RTT should go to protect you. As Herman puts it,
> throwing in another thread in user code will always work, but if this gets a
> pattern, there's clear evidence that users (like in Sander's example) are
> working around Orocos deficiencies. Not fixing these is plain stupid and
> indeed hurts users or people in many ways.
I don't think that Sander is "working around Orocos deficiencies"! His use
case is just not so "distributed" and "asynchronous" as the use cases of
many other people on this list...
> Since Events are at the root of this discussion, the solution clearly lies in
> what Events will evolve to in RTT 2.0. Maybe an event publisher will be able
> to choose if syn/asyn callbacks are allowed or not,
This is not a scalable solution: the publisher should not be forced to make
decisions for the whole _system_! Whatever RTT provides, there will always
be a need for deciding about such policies at "deployment time" (and even
later, if possible...).
> or everything can be
> expresses as messages. I honnestly don't know yet. What *is* clear, is that
> the 'local' safety concerns lies with the server component and not the client,
> and thus should be specified there.
>
> Peter
>
> [1] I'm not voting against an OCL::WatchDog component. I believe it's required
> to solve global/system level problems and that providing a basic one would
> help users getting started.
I support this! :-)
Herman