RTT 1.10.3 and OCL 1.10.2 bug-fix releases

Both RTT and OCL feature new bug fix releases which accumulate the fixes for the bugs reported the last months. Users tracking the svn 1.10 branches will find nothing new in these releases.

For OCL, the fixes include:

  • Fixing Ctrl-D at the TaskBrowser prompt
  • Fixes in the DeploymentComponent for saving properties and setting activities.
  • Fix endless loop when unloading of a component fails.

And for RTT, the fixes include:

  • Remote Commands over CORBA did not evaluate their arguments.
  • Fix bug #737: Autosave adds unnecessary element when writing arrays.
  • Plenty of cmake fixes wrt Omniorb/TAO detection logic.
  • Fix bug #759: ControlTaskServer::CleanupServer causing segfault.
  • Improve compatibility with recent Xenomai releases.
  • Fix bug in ListLockFree where initial size was ignored.

See http://www.orocos.org/rtt/source and http://www.orocos.org/ocl/source for the downloads.

Peter

[Orocos] RTT 1.10.3 and OCL 1.10.2 bug-fix releases

On Apr 7, 2010, at 13:15 , Peter Soetens wrote:

> Both RTT and OCL feature new bug fix releases which accumulate the
> fixes for the bugs reported the last months. Users tracking the svn
> 1.10 branches will find nothing new in these releases.
>
> For OCL, the fixes include:
> * Fixing Ctrl-D at the TaskBrowser prompt
> * Fixes in the DeploymentComponent for saving properties and setting activities.
> * Fix endless loop when unloading of a component fails.
>
> And for RTT, the fixes include:
> * Remote Commands over CORBA did not evaluate their arguments.
> * Fix bug #737: Autosave adds unnecessary element when writing arrays.
> * Plenty of cmake fixes wrt Omniorb/TAO detection logic.
> * Fix bug #759: ControlTaskServer::CleanupServer causing segfault.

I still believe that the original cause of this bug remains. I still have problems quitting cleanly, e.g.

2.242 [ Debug  ][Deployer]  done
2.242 [ Debug  ][~ExecutionEngine] Destroying ExecutionEngine of Deployer
2.242 [ Info   ][ShutdownOrb] Cleaning up ControlTaskServers...
pure virtual method called
terminate called without an active exception
 
Program received signal SIGABRT, Aborted.
0x00007fff85aeafe6 in __kill ()
(gdb) bt
#0  0x00007fff85aeafe6 in __kill ()
#1  0x00007fff85b8be32 in abort ()
#2  0x00007fff86c585d2 in __gnu_cxx::__verbose_terminate_handler ()
#3  0x00007fff86c56ae1 in __cxxabiv1::__terminate ()
#4  0x00007fff86c56b16 in std::terminate ()
#5  0x00007fff86c56fd6 in __cxa_pure_virtual ()
#6  0x000000010167251d in RTT::Corba::ControlTaskServer::~ControlTaskServer ()
#7  0x0000000101672993 in RTT::Corba::ControlTaskServer::CleanupServers ()
#8  0x0000000101672cd1 in RTT::Corba::ControlTaskServer::DoShutdownOrb ()
#9  0x0000000101672ea7 in RTT::Corba::ControlTaskServer::ShutdownOrb ()
#10 0x0000000100007fe9 in main ()
(gdb) 

[Orocos] RTT 1.10.3 and OCL 1.10.2 bug-fix releases

On Wed, Apr 7, 2010 at 9:07 PM, S Roderick <kiwi [dot] net [..] ...> wrote:
> On Apr 7, 2010, at 13:15 , Peter Soetens wrote:
>
>> Both RTT and OCL feature new bug fix releases which accumulate the
>> fixes for the bugs reported the last months. Users tracking the svn
>> 1.10 branches will find nothing new in these releases.
>>
>> For OCL, the fixes include:
>> * Fixing Ctrl-D at the TaskBrowser prompt
>> * Fixes in the DeploymentComponent for saving properties and setting activities.
>> * Fix endless loop when unloading of a component fails.
>>
>> And for RTT, the fixes include:
>> * Remote Commands over CORBA did not evaluate their arguments.
>> * Fix bug #737: Autosave adds unnecessary element when writing arrays.
>> * Plenty of cmake fixes wrt Omniorb/TAO detection logic.
>> * Fix bug #759: ControlTaskServer::CleanupServer causing segfault.
>
> I still believe that the original cause of this bug remains. I still have problems quitting cleanly, e.g.
>
>

> 2.242 [ Debug  ][Deployer]  done
> 2.242 [ Debug  ][~ExecutionEngine] Destroying ExecutionEngine of Deployer
> 2.242 [ Info   ][ShutdownOrb] Cleaning up ControlTaskServers...
> pure virtual method called
> terminate called without an active exception
>
> Program received signal SIGABRT, Aborted.
> 0x00007fff85aeafe6 in __kill ()
> (gdb) bt
> #0  0x00007fff85aeafe6 in __kill ()
> #1  0x00007fff85b8be32 in abort ()
> #2  0x00007fff86c585d2 in __gnu_cxx::__verbose_terminate_handler ()
> #3  0x00007fff86c56ae1 in __cxxabiv1::__terminate ()
> #4  0x00007fff86c56b16 in std::terminate ()
> #5  0x00007fff86c56fd6 in __cxa_pure_virtual ()
> #6  0x000000010167251d in RTT::Corba::ControlTaskServer::~ControlTaskServer ()
> #7  0x0000000101672993 in RTT::Corba::ControlTaskServer::CleanupServers ()
> #8  0x0000000101672cd1 in RTT::Corba::ControlTaskServer::DoShutdownOrb ()
> #9  0x0000000101672ea7 in RTT::Corba::ControlTaskServer::ShutdownOrb ()
> #10 0x0000000100007fe9 in main ()
> (gdb)
 
Darn, this looks indeed familiar :-( Could you post the output of
valgrind for reference ?
 
I'll take a look at it next week.
 
Peter

[Orocos] RTT 1.10.3 and OCL 1.10.2 bug-fix releases

On Apr 7, 2010, at 15:53 , Peter Soetens wrote:

> On Wed, Apr 7, 2010 at 9:07 PM, S Roderick <kiwi [dot] net [..] ...> wrote:
>> On Apr 7, 2010, at 13:15 , Peter Soetens wrote:
>>
>>> Both RTT and OCL feature new bug fix releases which accumulate the
>>> fixes for the bugs reported the last months. Users tracking the svn
>>> 1.10 branches will find nothing new in these releases.
>>>
>>> For OCL, the fixes include:
>>> * Fixing Ctrl-D at the TaskBrowser prompt
>>> * Fixes in the DeploymentComponent for saving properties and setting activities.
>>> * Fix endless loop when unloading of a component fails.
>>>
>>> And for RTT, the fixes include:
>>> * Remote Commands over CORBA did not evaluate their arguments.
>>> * Fix bug #737: Autosave adds unnecessary element when writing arrays.
>>> * Plenty of cmake fixes wrt Omniorb/TAO detection logic.
>>> * Fix bug #759: ControlTaskServer::CleanupServer causing segfault.
>>
>> I still believe that the original cause of this bug remains. I still have problems quitting cleanly, e.g.
>>
>>

>> 2.242 [ Debug  ][Deployer]  done
>> 2.242 [ Debug  ][~ExecutionEngine] Destroying ExecutionEngine of Deployer
>> 2.242 [ Info   ][ShutdownOrb] Cleaning up ControlTaskServers...
>> pure virtual method called
>> terminate called without an active exception
>> 
>> Program received signal SIGABRT, Aborted.
>> 0x00007fff85aeafe6 in __kill ()
>> (gdb) bt
>> #0  0x00007fff85aeafe6 in __kill ()
>> #1  0x00007fff85b8be32 in abort ()
>> #2  0x00007fff86c585d2 in __gnu_cxx::__verbose_terminate_handler ()
>> #3  0x00007fff86c56ae1 in __cxxabiv1::__terminate ()
>> #4  0x00007fff86c56b16 in std::terminate ()
>> #5  0x00007fff86c56fd6 in __cxa_pure_virtual ()
>> #6  0x000000010167251d in RTT::Corba::ControlTaskServer::~ControlTaskServer ()
>> #7  0x0000000101672993 in RTT::Corba::ControlTaskServer::CleanupServers ()
>> #8  0x0000000101672cd1 in RTT::Corba::ControlTaskServer::DoShutdownOrb ()
>> #9  0x0000000101672ea7 in RTT::Corba::ControlTaskServer::ShutdownOrb ()
>> #10 0x0000000100007fe9 in main ()
>> (gdb)
> 
> Darn, this looks indeed familiar :-( Could you post the output of
> valgrind for reference ?
> 
> I'll take a look at it next week.
> 
> Peter
 
I think that I've found it.
 
There is an _implicit_ assumption within RTT::ControlTaskServer that ControlTasks are deleted _after_ their associated ControlTaskServer (as the ControlTaskServer dtor uses "mtaskcontext->getName()" ). What this means, is that you can not destroy an OCL::CorbaDeploymentComponent (which destroys all ControlTasks) before running ControlTaskServer::DestroyOrb (as that will finally delete all control task servers). 
 
Peter, you and I both violated this without realising it I think. You in 52386e8eef3ca58d1b03fa7992ce1452c3abaa00, and me in a private patch related to the rtlogging branch.
 
The following patch won't apply clean I think, but it gets across what is required to fix this nasty little issue.
 
I think, however, that the underlying design is bad. This could easily crop up again. There is a one-way assocation from ControlTaskServer to ControlTask. Anyone deleting a ControlTask _must_  explicitly deal with any associated ControlTaskServer, or this bug will occur again. It appears to me that ControlTaskServer is only using the name from ControlTask. If this is true, why not just store the name and be done with it? Is it this simple?
 
Cheers
Stephen

[Orocos] RTT 1.10.3 and OCL 1.10.2 bug-fix releases

On Wed, Apr 28, 2010 at 16:31, S Roderick <kiwi [dot] net [..] ...> wrote:

> On Apr 7, 2010, at 15:53 , Peter Soetens wrote:
>
> > On Wed, Apr 7, 2010 at 9:07 PM, S Roderick <kiwi [dot] net [..] ...> wrote:
> >> On Apr 7, 2010, at 13:15 , Peter Soetens wrote:
> >>
> >>> Both RTT and OCL feature new bug fix releases which accumulate the
> >>> fixes for the bugs reported the last months. Users tracking the svn
> >>> 1.10 branches will find nothing new in these releases.
> >>>
> >>> For OCL, the fixes include:
> >>> * Fixing Ctrl-D at the TaskBrowser prompt
> >>> * Fixes in the DeploymentComponent for saving properties and setting
> activities.
> >>> * Fix endless loop when unloading of a component fails.
> >>>
> >>> And for RTT, the fixes include:
> >>> * Remote Commands over CORBA did not evaluate their arguments.
> >>> * Fix bug #737: Autosave adds unnecessary element when writing arrays.
> >>> * Plenty of cmake fixes wrt Omniorb/TAO detection logic.
> >>> * Fix bug #759: ControlTaskServer::CleanupServer causing segfault.
> >>
> >> I still believe that the original cause of this bug remains. I still
> have problems quitting cleanly, e.g.
> >>
> >>

> >> 2.242 [ Debug  ][Deployer]  done
> >> 2.242 [ Debug  ][~ExecutionEngine] Destroying ExecutionEngine of
> Deployer
> >> 2.242 [ Info   ][ShutdownOrb] Cleaning up ControlTaskServers...
> >> pure virtual method called
> >> terminate called without an active exception
> >>
> >> Program received signal SIGABRT, Aborted.
> >> 0x00007fff85aeafe6 in __kill ()
> >> (gdb) bt
> >> #0  0x00007fff85aeafe6 in __kill ()
> >> #1  0x00007fff85b8be32 in abort ()
> >> #2  0x00007fff86c585d2 in __gnu_cxx::__verbose_terminate_handler ()
> >> #3  0x00007fff86c56ae1 in __cxxabiv1::__terminate ()
> >> #4  0x00007fff86c56b16 in std::terminate ()
> >> #5  0x00007fff86c56fd6 in __cxa_pure_virtual ()
> >> #6  0x000000010167251d in
> RTT::Corba::ControlTaskServer::~ControlTaskServer ()
> >> #7  0x0000000101672993 in RTT::Corba::ControlTaskServer::CleanupServers
> ()
> >> #8  0x0000000101672cd1 in RTT::Corba::ControlTaskServer::DoShutdownOrb
> ()
> >> #9  0x0000000101672ea7 in RTT::Corba::ControlTaskServer::ShutdownOrb ()
> >> #10 0x0000000100007fe9 in main ()
> >> (gdb)
> >
> > Darn, this looks indeed familiar :-( Could you post the output of
> > valgrind for reference ?
> >
> > I'll take a look at it next week.
> >
> > Peter
>
> I think that I've found it.
>
> There is an _implicit_ assumption within RTT::ControlTaskServer that
> ControlTasks are deleted _after_ their associated ControlTaskServer (as the
> ControlTaskServer dtor uses "mtaskcontext->getName()" ). What this means, is
> that you can not destroy an OCL::CorbaDeploymentComponent (which destroys
> all ControlTasks) before running ControlTaskServer::DestroyOrb (as that will
> finally delete all control task servers).
>
> Peter, you and I both violated this without realising it I think. You in
> 52386e8eef3ca58d1b03fa7992ce1452c3abaa00, and me in a private patch related
> to the rtlogging branch.
>
> The following patch won't apply clean I think, but it gets across what is
> required to fix this nasty little issue.
>
> I think, however, that the underlying design is bad. This could easily crop
> up again. There is a one-way assocation from ControlTaskServer to
> ControlTask. Anyone deleting a ControlTask _must_  explicitly deal with any
> associated ControlTaskServer, or this bug will occur again. It appears to me
> that ControlTaskServer is only using the name from ControlTask. If this is
> true, why not just store the name and be done with it? Is it this simple?
>
 
The design is indeed bad. The problem is that the server process can crash
from the moment the TaskContext is deleted. Since from that moment *both*
ControlTaskServer and ControlTask_i point to invalid memory. So your
proposal only fixes the first one. Any remote client call would cause the
second to crash though. Your patch is really a 'causality hack' btw :-).
 
The solution is to have a system to inform interested parties that a
TaskContext disappears /  is being deleted. I can only do this cleanly in
2.x. In 1.x, we could insert a specially crafted TaskObject in the interface
that calls us (ControlTaskServer) back when it's being cleaned up. That's a
hack, but we could hide this TaskObject when exporting the interface to the
CORBA layer. Call it something like 'corbaserver'. Maybe I'm overlooking
something...
 
Peter

[Orocos] RTT 1.10.3 and OCL 1.10.2 bug-fix releases

On Apr 28, 2010, at 11:36 , Peter Soetens wrote:

> On Wed, Apr 28, 2010 at 16:31, S Roderick <kiwi [dot] net [..] ...> wrote:
> On Apr 7, 2010, at 15:53 , Peter Soetens wrote:
>
> > On Wed, Apr 7, 2010 at 9:07 PM, S Roderick <kiwi [dot] net [..] ...> wrote:
> >> On Apr 7, 2010, at 13:15 , Peter Soetens wrote:
> >>
> >>> Both RTT and OCL feature new bug fix releases which accumulate the
> >>> fixes for the bugs reported the last months. Users tracking the svn
> >>> 1.10 branches will find nothing new in these releases.
> >>>
> >>> For OCL, the fixes include:
> >>> * Fixing Ctrl-D at the TaskBrowser prompt
> >>> * Fixes in the DeploymentComponent for saving properties and setting activities.
> >>> * Fix endless loop when unloading of a component fails.
> >>>
> >>> And for RTT, the fixes include:
> >>> * Remote Commands over CORBA did not evaluate their arguments.
> >>> * Fix bug #737: Autosave adds unnecessary element when writing arrays.
> >>> * Plenty of cmake fixes wrt Omniorb/TAO detection logic.
> >>> * Fix bug #759: ControlTaskServer::CleanupServer causing segfault.
> >>
> >> I still believe that the original cause of this bug remains. I still have problems quitting cleanly, e.g.
> >>
> >>

> >> 2.242 [ Debug  ][Deployer]  done
> >> 2.242 [ Debug  ][~ExecutionEngine] Destroying ExecutionEngine of Deployer
> >> 2.242 [ Info   ][ShutdownOrb] Cleaning up ControlTaskServers...
> >> pure virtual method called
> >> terminate called without an active exception
> >>
> >> Program received signal SIGABRT, Aborted.
> >> 0x00007fff85aeafe6 in __kill ()
> >> (gdb) bt
> >> #0  0x00007fff85aeafe6 in __kill ()
> >> #1  0x00007fff85b8be32 in abort ()
> >> #2  0x00007fff86c585d2 in __gnu_cxx::__verbose_terminate_handler ()
> >> #3  0x00007fff86c56ae1 in __cxxabiv1::__terminate ()
> >> #4  0x00007fff86c56b16 in std::terminate ()
> >> #5  0x00007fff86c56fd6 in __cxa_pure_virtual ()
> >> #6  0x000000010167251d in RTT::Corba::ControlTaskServer::~ControlTaskServer ()
> >> #7  0x0000000101672993 in RTT::Corba::ControlTaskServer::CleanupServers ()
> >> #8  0x0000000101672cd1 in RTT::Corba::ControlTaskServer::DoShutdownOrb ()
> >> #9  0x0000000101672ea7 in RTT::Corba::ControlTaskServer::ShutdownOrb ()
> >> #10 0x0000000100007fe9 in main ()
> >> (gdb)
> >
> > Darn, this looks indeed familiar :-( Could you post the output of
> > valgrind for reference ?
> >
> > I'll take a look at it next week.
> >
> > Peter
> 
> I think that I've found it.
> 
> There is an _implicit_ assumption within RTT::ControlTaskServer that ControlTasks are deleted _after_ their associated ControlTaskServer (as the ControlTaskServer dtor uses "mtaskcontext->getName()" ). What this means, is that you can not destroy an OCL::CorbaDeploymentComponent (which destroys all ControlTasks) before running ControlTaskServer::DestroyOrb (as that will finally delete all control task servers).
> 
> Peter, you and I both violated this without realising it I think. You in 52386e8eef3ca58d1b03fa7992ce1452c3abaa00, and me in a private patch related to the rtlogging branch.
> 
> The following patch won't apply clean I think, but it gets across what is required to fix this nasty little issue.
> 
> I think, however, that the underlying design is bad. This could easily crop up again. There is a one-way assocation from ControlTaskServer to ControlTask. Anyone deleting a ControlTask _must_  explicitly deal with any associated ControlTaskServer, or this bug will occur again. It appears to me that ControlTaskServer is only using the name from ControlTask. If this is true, why not just store the name and be done with it? Is it this simple?
> 
> The design is indeed bad. The problem is that the server process can crash from the moment the TaskContext is deleted. Since from that moment *both* ControlTaskServer and ControlTask_i point to invalid memory. So your proposal only fixes the first one. Any remote client call would cause the second to crash though. Your patch is really a 'causality hack' btw :-). 
 
Yes, it is a hack! Gets the job done for now though ...
 
> The solution is to have a system to inform interested parties that a TaskContext disappears /  is being deleted. I can only do this cleanly in 2.x. In 1.x, we could insert a specially crafted TaskObject in the interface that calls us (ControlTaskServer) back when it's being cleaned up. That's a hack, but we could hide this TaskObject when exporting the interface to the CORBA layer. Call it something like 'corbaserver'. Maybe I'm overlooking something...
 
Yes, we need to manage the two-way association somehow. If you can fix this in v2 and then leave a nasty big warning with v1, maybe that is good enough for now? We'd also need to ensure that the next v1 point release notes spelled out how and why the issue occurs, for those not using the deployers. Seems I'm the only one to date that has encountered this ... or at least the only one that complains when I do ... :-)
 
Can you fix this cleanly (and easily) in v2?
S

[Orocos] RTT 1.10.3 and OCL 1.10.2 bug-fix releases

On Wednesday 28 April 2010 18:25:27 Stephen Roderick wrote:
> On Apr 28, 2010, at 11:36 , Peter Soetens wrote:
> > On Wed, Apr 28, 2010 at 16:31, S Roderick <kiwi [dot] net [..] ...> wrote:
> >
> > On Apr 7, 2010, at 15:53 , Peter Soetens wrote:
> > > On Wed, Apr 7, 2010 at 9:07 PM, S Roderick <kiwi [dot] net [..] ...> wrote:
> > >> On Apr 7, 2010, at 13:15 , Peter Soetens wrote:
> > >>> Both RTT and OCL feature new bug fix releases which accumulate the
> > >>> fixes for the bugs reported the last months. Users tracking the svn
> > >>> 1.10 branches will find nothing new in these releases.
> > >>>
> > >>> For OCL, the fixes include:
> > >>> * Fixing Ctrl-D at the TaskBrowser prompt
> > >>> * Fixes in the DeploymentComponent for saving properties and setting
> > >>> activities. * Fix endless loop when unloading of a component fails.
> > >>>
> > >>> And for RTT, the fixes include:
> > >>> * Remote Commands over CORBA did not evaluate their arguments.
> > >>> * Fix bug #737: Autosave adds unnecessary element when writing
> > >>> arrays. * Plenty of cmake fixes wrt Omniorb/TAO detection logic.
> > >>> * Fix bug #759: ControlTaskServer::CleanupServer causing segfault.
> > >>
> > >> I still believe that the original cause of this bug remains. I still
> > >> have problems quitting cleanly, e.g.
> > >>
> > >>

> > >> 2.242 [ Debug  ][Deployer]  done
> > >> 2.242 [ Debug  ][~ExecutionEngine] Destroying ExecutionEngine of
> > >> Deployer 2.242 [ Info   ][ShutdownOrb] Cleaning up
> > >> ControlTaskServers... pure virtual method called
> > >> terminate called without an active exception
> > >>
> > >> Program received signal SIGABRT, Aborted.
> > >> 0x00007fff85aeafe6 in __kill ()
> > >> (gdb) bt
> > >> #0  0x00007fff85aeafe6 in __kill ()
> > >> #1  0x00007fff85b8be32 in abort ()
> > >> #2  0x00007fff86c585d2 in __gnu_cxx::__verbose_terminate_handler ()
> > >> #3  0x00007fff86c56ae1 in __cxxabiv1::__terminate ()
> > >> #4  0x00007fff86c56b16 in std::terminate ()
> > >> #5  0x00007fff86c56fd6 in __cxa_pure_virtual ()
> > >> #6  0x000000010167251d in
> > >> RTT::Corba::ControlTaskServer::~ControlTaskServer () #7 
> > >> 0x0000000101672993 in RTT::Corba::ControlTaskServer::CleanupServers ()
> > >> #8  0x0000000101672cd1 in RTT::Corba::ControlTaskServer::DoShutdownOrb
> > >> () #9  0x0000000101672ea7 in
> > >> RTT::Corba::ControlTaskServer::ShutdownOrb () #10 0x0000000100007fe9
> > >> in main ()
> > >> (gdb)
> > >
> > > Darn, this looks indeed familiar :-( Could you post the output of
> > > valgrind for reference ?
> > >
> > > I'll take a look at it next week.
> > >
> > > Peter
> >
> > I think that I've found it.
> >
> > There is an _implicit_ assumption within RTT::ControlTaskServer that
> > ControlTasks are deleted _after_ their associated ControlTaskServer (as
> > the ControlTaskServer dtor uses "mtaskcontext->getName()" ). What this
> > means, is that you can not destroy an OCL::CorbaDeploymentComponent
> > (which destroys all ControlTasks) before running
> > ControlTaskServer::DestroyOrb (as that will finally delete all control
> > task servers).
> >
> > Peter, you and I both violated this without realising it I think. You in
> > 52386e8eef3ca58d1b03fa7992ce1452c3abaa00, and me in a private patch
> > related to the rtlogging branch.
> >
> > The following patch won't apply clean I think, but it gets across what is
> > required to fix this nasty little issue.
> >
> > I think, however, that the underlying design is bad. This could easily
> > crop up again. There is a one-way assocation from ControlTaskServer to
> > ControlTask. Anyone deleting a ControlTask _must_  explicitly deal with
> > any associated ControlTaskServer, or this bug will occur again. It
> > appears to me that ControlTaskServer is only using the name from
> > ControlTask. If this is true, why not just store the name and be done
> > with it? Is it this simple?
> >
> > The design is indeed bad. The problem is that the server process can
> > crash from the moment the TaskContext is deleted. Since from that moment
> > *both* ControlTaskServer and ControlTask_i point to invalid memory. So
> > your proposal only fixes the first one. Any remote client call would
> > cause the second to crash though. Your patch is really a 'causality hack'
> > btw :-).
> 
> Yes, it is a hack! Gets the job done for now though ...
> 
> > The solution is to have a system to inform interested parties that a
> > TaskContext disappears /  is being deleted. I can only do this cleanly in
> > 2.x. In 1.x, we could insert a specially crafted TaskObject in the
> > interface that calls us (ControlTaskServer) back when it's being cleaned
> > up. That's a hack, but we could hide this TaskObject when exporting the
> > interface to the CORBA layer. Call it something like 'corbaserver'. Maybe
> > I'm overlooking something...
> 
> Yes, we need to manage the two-way association somehow. If you can fix this
>  in v2 and then leave a nasty big warning with v1, maybe that is good
>  enough for now? We'd also need to ensure that the next v1 point release
>  notes spelled out how and why the issue occurs, for those not using the
>  deployers. Seems I'm the only one to date that has encountered this ... or
>  at least the only one that complains when I do ... :-)
> 
> Can you fix this cleanly (and easily) in v2?
 
It's the 'service discovery' feature of 2.2, where creation or destruction of 
components, their interfaces and connections are being announced to the 
process, or to the network. Allowing automatic reconnection when a component 
goes out and back in.
 
See the bug report for the patch I had in mind. It's even binary compatible, 
but it adds the 'corbaservice' to any component running with a CORBA server.
 
I did not apply your patch, but I did apply some memory leak fixes for the 
CORBA code, which I don't think influence this bug.
 
If this works for you, I'll push my fixes to svn & github.
 
Peter

[Orocos] RTT 1.10.3 and OCL 1.10.2 bug-fix releases

On Apr 29, 2010, at 09:43 , Peter Soetens wrote:

> On Wednesday 28 April 2010 18:25:27 Stephen Roderick wrote:
>> On Apr 28, 2010, at 11:36 , Peter Soetens wrote:
>>> On Wed, Apr 28, 2010 at 16:31, S Roderick <kiwi [dot] net [..] ...> wrote:
>>>
>>> On Apr 7, 2010, at 15:53 , Peter Soetens wrote:
>>>> On Wed, Apr 7, 2010 at 9:07 PM, S Roderick <kiwi [dot] net [..] ...> wrote:
>>>>> On Apr 7, 2010, at 13:15 , Peter Soetens wrote:
>>>>>> Both RTT and OCL feature new bug fix releases which accumulate the
>>>>>> fixes for the bugs reported the last months. Users tracking the svn
>>>>>> 1.10 branches will find nothing new in these releases.
>>>>>>
>>>>>> For OCL, the fixes include:
>>>>>> * Fixing Ctrl-D at the TaskBrowser prompt
>>>>>> * Fixes in the DeploymentComponent for saving properties and setting
>>>>>> activities. * Fix endless loop when unloading of a component fails.
>>>>>>
>>>>>> And for RTT, the fixes include:
>>>>>> * Remote Commands over CORBA did not evaluate their arguments.
>>>>>> * Fix bug #737: Autosave adds unnecessary element when writing
>>>>>> arrays. * Plenty of cmake fixes wrt Omniorb/TAO detection logic.
>>>>>> * Fix bug #759: ControlTaskServer::CleanupServer causing segfault.
>>>>>
>>>>> I still believe that the original cause of this bug remains. I still
>>>>> have problems quitting cleanly, e.g.
>>>>>
>>>>>

>>>>> 2.242 [ Debug  ][Deployer]  done
>>>>> 2.242 [ Debug  ][~ExecutionEngine] Destroying ExecutionEngine of
>>>>> Deployer 2.242 [ Info   ][ShutdownOrb] Cleaning up
>>>>> ControlTaskServers... pure virtual method called
>>>>> terminate called without an active exception
>>>>> 
>>>>> Program received signal SIGABRT, Aborted.
>>>>> 0x00007fff85aeafe6 in __kill ()
>>>>> (gdb) bt
>>>>> #0  0x00007fff85aeafe6 in __kill ()
>>>>> #1  0x00007fff85b8be32 in abort ()
>>>>> #2  0x00007fff86c585d2 in __gnu_cxx::__verbose_terminate_handler ()
>>>>> #3  0x00007fff86c56ae1 in __cxxabiv1::__terminate ()
>>>>> #4  0x00007fff86c56b16 in std::terminate ()
>>>>> #5  0x00007fff86c56fd6 in __cxa_pure_virtual ()
>>>>> #6  0x000000010167251d in
>>>>> RTT::Corba::ControlTaskServer::~ControlTaskServer () #7 
>>>>> 0x0000000101672993 in RTT::Corba::ControlTaskServer::CleanupServers ()
>>>>> #8  0x0000000101672cd1 in RTT::Corba::ControlTaskServer::DoShutdownOrb
>>>>> () #9  0x0000000101672ea7 in
>>>>> RTT::Corba::ControlTaskServer::ShutdownOrb () #10 0x0000000100007fe9
>>>>> in main ()
>>>>> (gdb)
>>>> 
>>>> Darn, this looks indeed familiar :-( Could you post the output of
>>>> valgrind for reference ?
>>>> 
>>>> I'll take a look at it next week.
>>>> 
>>>> Peter
>>> 
>>> I think that I've found it.
>>> 
>>> There is an _implicit_ assumption within RTT::ControlTaskServer that
>>> ControlTasks are deleted _after_ their associated ControlTaskServer (as
>>> the ControlTaskServer dtor uses "mtaskcontext->getName()" ). What this
>>> means, is that you can not destroy an OCL::CorbaDeploymentComponent
>>> (which destroys all ControlTasks) before running
>>> ControlTaskServer::DestroyOrb (as that will finally delete all control
>>> task servers).
>>> 
>>> Peter, you and I both violated this without realising it I think. You in
>>> 52386e8eef3ca58d1b03fa7992ce1452c3abaa00, and me in a private patch
>>> related to the rtlogging branch.
>>> 
>>> The following patch won't apply clean I think, but it gets across what is
>>> required to fix this nasty little issue.
>>> 
>>> I think, however, that the underlying design is bad. This could easily
>>> crop up again. There is a one-way assocation from ControlTaskServer to
>>> ControlTask. Anyone deleting a ControlTask _must_  explicitly deal with
>>> any associated ControlTaskServer, or this bug will occur again. It
>>> appears to me that ControlTaskServer is only using the name from
>>> ControlTask. If this is true, why not just store the name and be done
>>> with it? Is it this simple?
>>> 
>>> The design is indeed bad. The problem is that the server process can
>>> crash from the moment the TaskContext is deleted. Since from that moment
>>> *both* ControlTaskServer and ControlTask_i point to invalid memory. So
>>> your proposal only fixes the first one. Any remote client call would
>>> cause the second to crash though. Your patch is really a 'causality hack'
>>> btw :-).
>> 
>> Yes, it is a hack! Gets the job done for now though ...
>> 
>>> The solution is to have a system to inform interested parties that a
>>> TaskContext disappears /  is being deleted. I can only do this cleanly in
>>> 2.x. In 1.x, we could insert a specially crafted TaskObject in the
>>> interface that calls us (ControlTaskServer) back when it's being cleaned
>>> up. That's a hack, but we could hide this TaskObject when exporting the
>>> interface to the CORBA layer. Call it something like 'corbaserver'. Maybe
>>> I'm overlooking something...
>> 
>> Yes, we need to manage the two-way association somehow. If you can fix this
>> in v2 and then leave a nasty big warning with v1, maybe that is good
>> enough for now? We'd also need to ensure that the next v1 point release
>> notes spelled out how and why the issue occurs, for those not using the
>> deployers. Seems I'm the only one to date that has encountered this ... or
>> at least the only one that complains when I do ... :-)
>> 
>> Can you fix this cleanly (and easily) in v2?
> 
> It's the 'service discovery' feature of 2.2, where creation or destruction of 
> components, their interfaces and connections are being announced to the 
> process, or to the network. Allowing automatic reconnection when a component 
> goes out and back in.
> 
> See the bug report for the patch I had in mind. It's even binary compatible, 
> but it adds the 'corbaservice' to any component running with a CORBA server.
> 
> I did not apply your patch, but I did apply some memory leak fixes for the 
> CORBA code, which I don't think influence this bug.
> 
> If this works for you, I'll push my fixes to svn & github.
> 
> Peter
 
Probably will be early next week before I get a chance to look at this, but it sounds good. Thanks!!
Stephen