[Bug 883] New: Real-time logging segfaults

http://bugs.orocos.org/show_bug.cgi?id=883

Summary: Real-time logging segfaults
Product: Toolchain
Version: master
Platform: All
OS/Version: All
Status: NEW
Severity: normal
Priority: P3
Component: RTT
AssignedTo: orocos-dev [..] ...
ReportedBy: steven [dot] bellens [..] ...
CC: orocos-dev [..] ...
Estimated Hours: 0.0

Using the real-time logging service, I regularly get segfaults. GDB backtrace
gives me:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffadffb700 (LWP 21995)]
0x00007ffff7d56494 in malloc_ex (size=96, mem_pool=0x628900)
at /home/u0063509/stacks/orocos_toolchain_ros/rtt/rtt/os/tlsf/tlsf.c:727
727 EXTRACT_BLOCK_HDR(b, tlsf, fl, sl);
(gdb) bt
#0 0x00007ffff7d56494 in malloc_ex (size=96, mem_pool=0x628900)
at /home/u0063509/stacks/orocos_toolchain_ros/rtt/rtt/os/tlsf/tlsf.c:727
#1 0x00007ffff7d56b9f in oro_rt_malloc (size=85)
at /home/u0063509/stacks/orocos_toolchain_ros/rtt/rtt/os/tlsf/tlsf.c:634
#2 0x00007ffff7a228d9 in allocate (n=<optimized out>, this=<optimized out>)
at
/home/u0063509/stacks/orocos_toolchain_ros/install/include/rtt/os/oro_allocator.hpp:276
#3 std::basic_string<char, std::char_traits RTT::os::rt_allocator<char> >::_Rep::_S_create (__capacity=60,
__old_capacity=<optimized out>,
__alloc=<optimized out>)
at
/usr/lib/gcc/x86_64-redhat-linux/4.6.0/../../../../include/c++/4.6.0/bits/basic_string.tcc:609
#4 0x00007ffff7a2296d in std::basic_string<char,
std::char_traits >::_S_construct<char const*> (
__beg=0x62a7b8 "Calculated theta_df and phi_df: [-0.000102996,
0.000807035 ]", __end=<optimized out>, __a=<optimized out>)
at
/usr/lib/gcc/x86_64-redhat-linux/4.6.0/../../../../include/c++/4.6.0/bits/basic_string.tcc:140
#5 0x00007ffff7a22a63 in _S_construct_aux<char const*> (__a=...,
__end=<optimized out>,
__beg=0x62a7b8 "Calculated theta_df and phi_df: [-0.000102996,
0.000807035 ]")
at
/usr/lib/gcc/x86_64-redhat-linux/4.6.0/../../../../include/c++/4.6.0/bits/b---Type
<return> to continue, or q <return> to quit---µ
asic_string.h:1692
#6 _S_construct<char const*> (__a=..., __end=<optimized out>,
__beg=0x62a7b8 "Calculated theta_df and phi_df: [-0.000102996,
0.000807035 ]")
at
/usr/lib/gcc/x86_64-redhat-linux/4.6.0/../../../../include/c++/4.6.0/bits/basic_string.h:1713
#7 std::basic_string<char, std::char_traits RTT::os::rt_allocator<char> >::basic_string (this=0x7fffadffaa00,
__s=0x62a7b8 "Calculated theta_df and phi_df: [-0.000102996,
0.000807035 ]", __a=...)
at
/usr/lib/gcc/x86_64-redhat-linux/4.6.0/../../../../include/c++/4.6.0/bits/basic_string.tcc:217
#8 0x00007ffff7a1acc9 in OCL::logging::Category::_logUnconditionally2 (
this=0x7d42a0, priority=600, message=
"Calculated theta_df and phi_df: [-0.000102996, 0.000807035 ]")
at /home/u0063509/stacks/orocos_toolchain_ros/ocl/logging/Category.cpp:97
#9 0x00007ffff7a330b2 in OCL::logging::CategoryStream::flush (
this=0x7fffadffab20)
at
/home/u0063509/stacks/orocos_toolchain_ros/ocl/logging/CategoryStream.cpp:21
#10 0x00007ffff7a3331b in OCL::logging::CategoryStream::~CategoryStream (
this=0x7fffadffab20, __in_chrg=<optimized out>)
at
/home/u0063509/stacks/orocos_toolchain_ros/ocl/logging/CategoryStream.cpp---Type
<return> to continue, or q <return> to quit---
:16
#11 0x00007fffc015f9db in asctec::HybridControl::updateHook (this=0x757710)
at /home/u0063509/stacks/pelican/pel_hybrid_cont/src/hybrid_control.cpp:96
#12 0x00007ffff7d4cc68 in RTT::ExecutionEngine::processChildren (this=0x6dd810)
at
/home/u0063509/stacks/orocos_toolchain_ros/rtt/rtt/ExecutionEngine.cpp:331
#13 0x00007ffff7d5a7c3 in RTT::os::thread_function (t=0x77f180)
at /home/u0063509/stacks/orocos_toolchain_ros/rtt/rtt/os/Thread.cpp:182
#14 0x0000003fe8a07af1 in start_thread (arg=0x7fffadffb700)
at pthread_create.c:305
#15 0x0000003fe86dfb7d in clone ()
at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

hybrid_control.cpp: 96:

logger->getRTStream(log4cpp::Priority::INFO) << "Calculated theta_df and
phi_df: [" << m_theta_df << ", " << m_phi_df <<     " ]";

(both variables are doubles)

It doesn't always crash on this particular line, but it does always point to a
"logger->getRTStream" statement.

--
Configure bugmail: http://bugs.orocos.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
You are the assignee for the bug.
--
Orocos-Dev mailing list
Orocos-Dev [..] ...
http://lists.mech.kuleuven.be/mailman/listinfo/orocos-dev

[Bug 883] Real-time logging segfaults

http://bugs.orocos.org/show_bug.cgi?id=883

Steven Bellens <steven [dot] bellens [..] ...> changed:

What |Removed |Added
----------------------------------------------------------------------------
Status|ASSIGNED |RESOLVED
Resolution| |FIXED

[Bug 883] Real-time logging segfaults

Hi Mark,

On Wed, Sep 26, 2012 at 8:45 AM, <m [dot] horsthuis [..] ...> wrote:

> Hello,
>
> >> Also, one thing that I'm not sure about is that each component logs in
> the
> same category. Currently, each category is associated with a RTT::Port to
> send logging events. So, if different threads are logging in the same
> category, this mean that there's many threads writing to a single port.
> Does
> that make sense?
>
> > The structure of this approach matches the underlying
> log4j/log4cpp/log4cxx
> approach, good, bad or otherwise. It is very common to have multiple
> categories end up in one parent category which does the actual logging.
> For a
> multi-component system, even with each component logging to separate
> categories, the hierarchy may well lead to multiple components (hence
> threads) logging to the same port.
>
> > The standard approach that I originally proposed had a BufferPort
> attached
> to that port, and we ended up using a locking policy on that port IIRC. It
> might well have been for exactly this reason. Either way, the projects and
> customers I work with have logged literal terabytes of data with the v1
> implementation and we've not noticed any issues yet (that we know of! ;-).
>
> I had the same problem. The application crashes with segaults continiously,
> often within seconds. I added a mutex to the log_port in the category class
> and the crashes are gone. I don't think that this is the best solution but
> for me it works.
>

Is this RTT 1.x or 2.x logging ?

Peter

[Bug 883] Real-time logging segfaults

Hi Peter,

On 09/27/2012 03:07 PM, Peter Soetens wrote:
> Hi Mark,
>
> On Wed, Sep 26, 2012 at 8:45 AM, <m [dot] horsthuis [..] ...
> <mailto:m [dot] horsthuis [..] ...>> wrote:
>
> Hello,
>
> >> Also, one thing that I'm not sure about is that each component
> logs in the
> same category. Currently, each category is associated with a
> RTT::Port to
> send logging events. So, if different threads are logging in the same
> category, this mean that there's many threads writing to a single
> port. Does
> that make sense?
>
> > The structure of this approach matches the underlying
> log4j/log4cpp/log4cxx
> approach, good, bad or otherwise. It is very common to have multiple
> categories end up in one parent category which does the actual
> logging. For a
> multi-component system, even with each component logging to separate
> categories, the hierarchy may well lead to multiple components (hence
> threads) logging to the same port.
>
> > The standard approach that I originally proposed had a
> BufferPort attached
> to that port, and we ended up using a locking policy on that port
> IIRC. It
> might well have been for exactly this reason. Either way, the
> projects and
> customers I work with have logged literal terabytes of data with
> the v1
> implementation and we've not noticed any issues yet (that we know
> of! ;-).
>
> I had the same problem. The application crashes with segaults
> continiously,
> often within seconds. I added a mutex to the log_port in the
> category class
> and the crashes are gone. I don't think that this is the best
> solution but
> for me it works.
>
>
> Is this RTT 1.x or 2.x logging ?
>
> Peter
>
RTT/OCL 2.x logging (OCL::Logging)

Mark

[Bug 883] Real-time logging segfaults

http://bugs.orocos.org/show_bug.cgi?id=883

--- Comment #4 from Steven Bellens <steven [dot] bellens [..] ...> 2011-07-28 09:22:45 CEST ---
(In reply to comment #3)
[...]
>
> You'll have to recompile RTT with 'Debug' setting and re-run the gdb session.
> The values of the optimized out variables are necessary, and due to
> optimisation it may point at the wrong line of code causing the error.

(gdb) info locals
tlsf = 0x628900
b = 0x62cc30
b2 = 0x62c7e0
next_b = 0x62c800
fl = 0
sl = 12
tmp_size = 16

>
> Also enable statistics as the others suggest to get the most data.

Could you explain how I do that?

Steven

>
> If the pool is exhausted, the code should return NULL, not segfault. So it's a
> bug either way.
>
> Peter

[Bug 883] Real-time logging segfaults

2011/10/7 Steven Bellens <steven [dot] bellens [..] ...>

> 2011/10/6 Philippe Hamelin <philippe [dot] hamelin [..] ...>:
> > 2011/10/6 Steven Bellens <steven [dot] bellens [..] ...>
> >>
> >> 2011/10/6 Steven Bellens <steven [dot] bellens [..] ...>:
> >> >
> >> > http://bugs.orocos.org/show_bug.cgi?id=883
> >> >
> >> > --- Comment #7 from Steven Bellens <steven [dot] bellens [..] ...>
> >> > 2011-10-06 15:33:08 CEST ---
> >> > (In reply to comment #6)
> >> >> (In reply to comment #5)
> >> >> > (In reply to comment #4)
> >> >> > > (In reply to comment #3)
> >> >> > > [...]
> >> >> > > >
> >> >> > > > You'll have to recompile RTT with 'Debug' setting and re-run
> the
> >> >> > > > gdb session.
> >> >> > > > The values of the optimized out variables are necessary, and
> due
> >> >> > > > to
> >> >> > > > optimisation it may point at the wrong line of code causing the
> >> >> > > > error.
> >> >> > >
> >> >> > > (gdb) info locals
> >> >> > > tlsf = 0x628900
> >> >> > > b = 0x62cc30
> >> >> > > b2 = 0x62c7e0
> >> >> > > next_b = 0x62c800
> >> >> > > fl = 0
> >> >> > > sl = 12
> >> >> > > tmp_size = 16
> >> >> > >
> >> >> > >
> >> >> > > >
> >> >> > > > Also enable statistics as the others suggest to get the most
> >> >> > > > data.
> >> >> > >
> >> >> > > Could you explain how I do that?
> >> >> > >
> >> >> >
> >> >> > set the Cmake options OS_RT_MALLOC_DEBUG and OS_RT_MALLOC_STATS
> >> >>
> >> >> I set the OS_RT_MALLOC_DEBUG option (the other one was already
> >> >> enabled), but it
> >> >> doesn't change the output.
> >> >>
> >> >> >
> >> >> > Is this issue still present ?
> >> >>
> >> >> Yes, still present.
> >> >>
> >> >> Steven
> >> >
> >> > OS details:
> >> > Linux pma-09-032 2.6.40.4-5.fc15.x86_64 #1 SMP Tue Aug 30 14:38:32 UTC
> >> > 2011
> >> > x86_64 x86_64 x86_64 GNU/Linux
> >> >
> >> > I put together a small example, which crashes here after about 10 - 20
> >> > seconds:
> >> >
> >> > https://gitorious.org/rt-logging/rtlog_test
> >> >
> >> > Note that the time-to-crash does vary (sometimes a lot!), I didn't
> >> > manage to
> >> > find anything deterministic in that.
> >>
> >> Changing the logging statements to use
> >>
> >> logger->debug(RTT::rt_string
> >>
> >> instead of
> >>
> >> logger->getRTStream(log4cpp::Priority::INFO) <<
> >>
> >> Seems to solve the problem. I'm however using the latter because I want
> to
> >> output variables as well. So it looks like a problem in the getRTStream
> >> and
> >> subsequently the CategoryStream implementation?
> >>
> >
> > This is strange because we use getRTStream everwhere in our application
> and
> > we don't have this problem.
>
> Can you take a look at the example and check whether you get the crash as
> well?
>
>
Your example doesn't link on my machine with Orocos 2.3.1. I'm not using ROS
and it seems that log4cpp isn't include in the pkg-config file of
ocl-logging. Also, I don't have the log4cxx appender in my version, but I
modified the deployment script to use the OstreamAppender.

You just have to put a try-catch in your updateHook to see that this is a
std::bad_alloc exception (memory exhausted). What I suspect is that there's
too much logging events in the queue so that the appender isn't able to
manage them. There's a hardcoded buffer of 100 data in the appender input
port. So, if the buffer is full you can easily exhaust the default real-time
memory pool of 20 kb. Anyways, I must admit we should handle more cleanly
this exception. To solve your problem, try to execute the deployer with this
parameter:

--rtalloc-mem-size 200000

With 200kb, I don't have std::bad_alloc exceptions anymore on my system.
Indeed, after quitting I get the following stats:

TLSF bytes allocated=200000 overhead=6384 max-used=58064
currently-used=55520 still-allocated=49136

As you can see, the maximum used memory is ~58kb, so this is the bare
minimum you would need for your test case.

Also, one thing that I'm not sure about is that each component logs in the
same category. Currently, each category is associated with a RTT::Port to
send logging events. So, if different threads are logging in the same
category, this mean that there's many threads writing to a single port. Does
that make sense?

Philippe

[Bug 883] Real-time logging segfaults

2011/10/7 Philippe Hamelin <philippe [dot] hamelin [..] ...>:
> 2011/10/7 Steven Bellens <steven [dot] bellens [..] ...>
>>
>> 2011/10/6 Philippe Hamelin <philippe [dot] hamelin [..] ...>:
>> > 2011/10/6 Steven Bellens <steven [dot] bellens [..] ...>
>> >>
>> >> 2011/10/6 Steven Bellens <steven [dot] bellens [..] ...>:
>> >> >
>> >> > http://bugs.orocos.org/show_bug.cgi?id=883
>> >> >
>> >> > --- Comment #7 from Steven Bellens <steven [dot] bellens [..] ...>
>> >> > 2011-10-06 15:33:08 CEST ---
>> >> > (In reply to comment #6)
>> >> >> (In reply to comment #5)
>> >> >> > (In reply to comment #4)
>> >> >> > > (In reply to comment #3)
>> >> >> > > [...]
>> >> >> > > >
>> >> >> > > > You'll have to recompile RTT with 'Debug' setting and re-run
>> >> >> > > > the
>> >> >> > > > gdb session.
>> >> >> > > > The values of the optimized out variables are necessary, and
>> >> >> > > > due
>> >> >> > > > to
>> >> >> > > > optimisation it may point at the wrong line of code causing
>> >> >> > > > the
>> >> >> > > > error.
>> >> >> > >
>> >> >> > > (gdb) info locals
>> >> >> > > tlsf = 0x628900
>> >> >> > > b = 0x62cc30
>> >> >> > > b2 = 0x62c7e0
>> >> >> > > next_b = 0x62c800
>> >> >> > > fl = 0
>> >> >> > > sl = 12
>> >> >> > > tmp_size = 16
>> >> >> > >
>> >> >> > >
>> >> >> > > >
>> >> >> > > > Also enable statistics as the others suggest to get the most
>> >> >> > > > data.
>> >> >> > >
>> >> >> > > Could you explain how I do that?
>> >> >> > >
>> >> >> >
>> >> >> > set the Cmake options OS_RT_MALLOC_DEBUG and OS_RT_MALLOC_STATS
>> >> >>
>> >> >> I set the OS_RT_MALLOC_DEBUG option (the other one was already
>> >> >> enabled), but it
>> >> >> doesn't change the output.
>> >> >>
>> >> >> >
>> >> >> > Is this issue still present ?
>> >> >>
>> >> >> Yes, still present.
>> >> >>
>> >> >> Steven
>> >> >
>> >> > OS details:
>> >> > Linux pma-09-032 2.6.40.4-5.fc15.x86_64 #1 SMP Tue Aug 30 14:38:32
>> >> > UTC
>> >> > 2011
>> >> > x86_64 x86_64 x86_64 GNU/Linux
>> >> >
>> >> > I put together a small example, which crashes here after about 10 -
>> >> > 20
>> >> > seconds:
>> >> >
>> >> > https://gitorious.org/rt-logging/rtlog_test
>> >> >
>> >> > Note that the time-to-crash does vary (sometimes a lot!), I didn't
>> >> > manage to
>> >> > find anything deterministic in that.
>> >>
>> >> Changing the logging statements to use
>> >>
>> >> logger->debug(RTT::rt_string
>> >>
>> >> instead of
>> >>
>> >> logger->getRTStream(log4cpp::Priority::INFO) <<
>> >>
>> >> Seems to solve the problem. I'm however using the latter because I want
>> >> to
>> >> output variables as well. So it looks like a problem in the getRTStream
>> >> and
>> >> subsequently the CategoryStream implementation?
>> >>
>> >
>> > This is strange because we use getRTStream everwhere in our application
>> > and
>> > we don't have this problem.
>>
>> Can you take a look at the example and check whether you get the crash as
>> well?
>>
>
> Your example doesn't link on my machine with Orocos 2.3.1. I'm not using ROS
> and it seems that log4cpp isn't include in the pkg-config file of
> ocl-logging. Also, I don't have the log4cxx appender in my version, but I
> modified the deployment script to use the OstreamAppender.
> You just have to put a try-catch in your updateHook to see that this is a
> std::bad_alloc exception (memory exhausted). What I suspect is that there's
> too much logging events in the queue so that the appender isn't able to
> manage them. There's a hardcoded buffer of 100 data in the appender input
> port. So, if the buffer is full you can easily exhaust the default real-time
> memory pool of 20 kb. Anyways, I must admit we should handle more cleanly
> this exception. To solve your problem, try to execute the deployer with this
> parameter:
>  --rtalloc-mem-size 200000
> With 200kb, I don't have std::bad_alloc exceptions anymore on my system.
> Indeed, after quitting I get the following stats:
> TLSF bytes allocated=200000 overhead=6384 max-used=58064
> currently-used=55520 still-allocated=49136
> As you can see, the maximum used memory is ~58kb, so this is the bare
> minimum you would need for your test case.

Thanks, Philippe, for this detailed analysis. I tested it on my real
application, and there I get max-used numbers around 48000, so my
problem is indeed a memory exhaust. I added an rtalloc-mem-size 100000
option and everything seems to be running fine now.

Thanks,

Steven

> Also, one thing that I'm not sure about is that each component logs in the
> same category. Currently, each category is associated with a RTT::Port to
> send logging events. So, if different threads are logging in the same
> category, this mean that there's many threads writing to a single port. Does
> that make sense?
> Philippe
>

[Bug 883] Real-time logging segfaults

On Oct 7, 2011, at 09:39 , Philippe Hamelin wrote:

> 2011/10/7 Steven Bellens <steven [dot] bellens [..] ...>
> 2011/10/6 Philippe Hamelin <philippe [dot] hamelin [..] ...>:
> > 2011/10/6 Steven Bellens <steven [dot] bellens [..] ...>
> >>
> >> 2011/10/6 Steven Bellens <steven [dot] bellens [..] ...>:
> >> >
> >> > http://bugs.orocos.org/show_bug.cgi?id=883
> >> >
> >> > --- Comment #7 from Steven Bellens <steven [dot] bellens [..] ...>
> >> > 2011-10-06 15:33:08 CEST ---
> >> > (In reply to comment #6)
> >> >> (In reply to comment #5)
> >> >> > (In reply to comment #4)
> >> >> > > (In reply to comment #3)
> >> >> > > [...]
> >> >> > > >
> >> >> > > > You'll have to recompile RTT with 'Debug' setting and re-run the
> >> >> > > > gdb session.
> >> >> > > > The values of the optimized out variables are necessary, and due
> >> >> > > > to
> >> >> > > > optimisation it may point at the wrong line of code causing the
> >> >> > > > error.
> >> >> > >
> >> >> > > (gdb) info locals
> >> >> > > tlsf = 0x628900
> >> >> > > b = 0x62cc30
> >> >> > > b2 = 0x62c7e0
> >> >> > > next_b = 0x62c800
> >> >> > > fl = 0
> >> >> > > sl = 12
> >> >> > > tmp_size = 16
> >> >> > >
> >> >> > >
> >> >> > > >
> >> >> > > > Also enable statistics as the others suggest to get the most
> >> >> > > > data.
> >> >> > >
> >> >> > > Could you explain how I do that?
> >> >> > >
> >> >> >
> >> >> > set the Cmake options OS_RT_MALLOC_DEBUG and OS_RT_MALLOC_STATS
> >> >>
> >> >> I set the OS_RT_MALLOC_DEBUG option (the other one was already
> >> >> enabled), but it
> >> >> doesn't change the output.
> >> >>
> >> >> >
> >> >> > Is this issue still present ?
> >> >>
> >> >> Yes, still present.
> >> >>
> >> >> Steven
> >> >
> >> > OS details:
> >> > Linux pma-09-032 2.6.40.4-5.fc15.x86_64 #1 SMP Tue Aug 30 14:38:32 UTC
> >> > 2011
> >> > x86_64 x86_64 x86_64 GNU/Linux
> >> >
> >> > I put together a small example, which crashes here after about 10 - 20
> >> > seconds:
> >> >
> >> > https://gitorious.org/rt-logging/rtlog_test
> >> >
> >> > Note that the time-to-crash does vary (sometimes a lot!), I didn't
> >> > manage to
> >> > find anything deterministic in that.
> >>
> >> Changing the logging statements to use
> >>
> >> logger->debug(RTT::rt_string
> >>
> >> instead of
> >>
> >> logger->getRTStream(log4cpp::Priority::INFO) <<
> >>
> >> Seems to solve the problem. I'm however using the latter because I want to
> >> output variables as well. So it looks like a problem in the getRTStream
> >> and
> >> subsequently the CategoryStream implementation?
> >>
> >
> > This is strange because we use getRTStream everwhere in our application and
> > we don't have this problem.
>
> Can you take a look at the example and check whether you get the crash as well?
>
>
> Your example doesn't link on my machine with Orocos 2.3.1. I'm not using ROS and it seems that log4cpp isn't include in the pkg-config file of ocl-logging. Also, I don't have the log4cxx appender in my version, but I modified the deployment script to use the OstreamAppender.
>
> You just have to put a try-catch in your updateHook to see that this is a std::bad_alloc exception (memory exhausted). What I suspect is that there's too much logging events in the queue so that the appender isn't able to manage them. There's a hardcoded buffer of 100 data in the appender input port. So, if the buffer is full you can easily exhaust the default real-time memory pool of 20 kb. Anyways, I must admit we should handle more cleanly this exception. To solve your problem, try to execute the deployer with this parameter:
>
> --rtalloc-mem-size 200000
>
> With 200kb, I don't have std::bad_alloc exceptions anymore on my system. Indeed, after quitting I get the following stats:
>
> TLSF bytes allocated=200000 overhead=6384 max-used=58064 currently-used=55520 still-allocated=49136

This is precisely why we output regular diagnostics on the memory pool state. It is incredibly easy to exhaust the pool without trying, particularly if you have things like buffers of RT strings floating around. We actually use buffers on the order of 20 MB in our (admittedly desktop based) systems, but we do log a _lot_ of data.

The fact that ~48k is still allocated might be worth looking into ...

One thing I should mention is that with a 100 element buffer, once the buffer has been filled once with logging data and had all the data removed and logged (ie the buffer is now "empty"), you are from then on carrying allocation for 100 "unused" RT strings around. So for a 20k buffer with ~6k of overhead, that means if your strings average 140 bytes or more, you exhaust the memory pool!! These "unused" RT strings occur as the elements in the buffer are not destroyed after they're popped, and so the memory hasn't been deallocated yet. It's only the next time that a particular element is used (ie when the buffer rolls over that element again) that the current memory is deallocated, and memory is allocated for the new string (ok, that simplifies it as it may not necessarily deallocate, it may reuse, but the point remains). So you basically have to budget for a buffer completely filled with allocated RT strings of whatever your maximum string length is. So say a 1000-char max logging line, and a 100 element buffer, then you need a memory pool of at least 1000 * 100 + 6k overhead = ~106k (even with max 200 char lines, you still need 26k, which is more than the default.

The above is mentioned in passing on the following page, but probably should be spelled out in more detail ...?

http://www.orocos.org/wiki/rtt/rtt-20/real-time-logging/using-real-time-...

> As you can see, the maximum used memory is ~58kb, so this is the bare minimum you would need for your test case.

The 20k default is way too low for anything but toy systems. I wonder if we should increase this?

> Also, one thing that I'm not sure about is that each component logs in the same category. Currently, each category is associated with a RTT::Port to send logging events. So, if different threads are logging in the same category, this mean that there's many threads writing to a single port. Does that make sense?

The structure of this approach matches the underlying log4j/log4cpp/log4cxx approach, good, bad or otherwise. It is very common to have multiple categories end up in one parent category which does the actual logging. For a multi-component system, even with each component logging to separate categories, the hierarchy may well lead to multiple components (hence threads) logging to the same port.

The standard approach that I originally proposed had a BufferPort attached to that port, and we ended up using a locking policy on that port IIRC. It might well have been for exactly this reason. Either way, the projects and customers I work with have logged literal terabytes of data with the v1 implementation and we've not noticed any issues yet (that we know of! ;-).
S

Re: [Bug 883] Real-time logging segfaults

Hello,

>> Also, one thing that I'm not sure about is that each component logs in the same category. Currently, each category is associated with a RTT::Port to send logging events. So, if different threads are logging in the same category, this mean that there's many threads writing to a single port. Does that make sense?

> The structure of this approach matches the underlying log4j/log4cpp/log4cxx approach, good, bad or otherwise. It is very common to have multiple categories end up in one parent category which does the actual logging. For a multi-component system, even with each component logging to separate categories, the hierarchy may well lead to multiple components (hence threads) logging to the same port.

> The standard approach that I originally proposed had a BufferPort attached to that port, and we ended up using a locking policy on that port IIRC. It might well have been for exactly this reason. Either way, the projects and customers I work with have logged literal terabytes of data with the v1 implementation and we've not noticed any issues yet (that we know of! ;-).

I had the same problem. The application crashes with segaults continiously, often within seconds. I added a mutex to the log_port in the category class and the crashes are gone. I don't think that this is the best solution but for me it works.

Mark

[Bug 883] Real-time logging segfaults

http://bugs.orocos.org/show_bug.cgi?id=883

Peter Soetens <peter [..] ...> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |peter [..] ...

--- Comment #3 from Peter Soetens <peter [..] ...> 2011-07-27 21:45:25 CEST ---
(In reply to comment #2)
> Per request, gdb 'info locals' after the crash:
>
> (gdb) info locals
> tlsf = 0x62b870
> b = <optimized out>
> b2 = <optimized out>
> next_b = <optimized out>
> fl = 1
> sl = 8
> tmp_size = <optimized out>
> (gdb)

You'll have to recompile RTT with 'Debug' setting and re-run the gdb session.
The values of the optimized out variables are necessary, and due to
optimisation it may point at the wrong line of code causing the error.

Also enable statistics as the others suggest to get the most data.

If the pool is exhausted, the code should return NULL, not segfault. So it's a
bug either way.

Peter

[Bug 883] New: Real-time logging segfaults

On Jul 27, 2011, at 04:50 , Steven Bellens wrote:

>
> http://bugs.orocos.org/show_bug.cgi?id=883
>
> Summary: Real-time logging segfaults
> Product: Toolchain
> Version: master
> Platform: All
> OS/Version: All
> Status: NEW
> Severity: normal
> Priority: P3
> Component: RTT
> AssignedTo: orocos-dev [..] ...
> ReportedBy: steven [dot] bellens [..] ...
> CC: orocos-dev [..] ...
> Estimated Hours: 0.0
>
>
> Using the real-time logging service, I regularly get segfaults. GDB backtrace
> gives me:
>
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7fffadffb700 (LWP 21995)]
> 0x00007ffff7d56494 in malloc_ex (size=96, mem_pool=0x628900)
> at /home/u0063509/stacks/orocos_toolchain_ros/rtt/rtt/os/tlsf/tlsf.c:727
> 727 EXTRACT_BLOCK_HDR(b, tlsf, fl, sl);
> (gdb) bt
> #0 0x00007ffff7d56494 in malloc_ex (size=96, mem_pool=0x628900)
> at /home/u0063509/stacks/orocos_toolchain_ros/rtt/rtt/os/tlsf/tlsf.c:727
> #1 0x00007ffff7d56b9f in oro_rt_malloc (size=85)
> at /home/u0063509/stacks/orocos_toolchain_ros/rtt/rtt/os/tlsf/tlsf.c:634

Most likely you've exhausted the memory pool. There are some functions to monitor the usage of a pool, suggest you add those in: get_used_size(), get_max_size()

TLSF does not deal with this situation gracefully .... :-(
S

[Bug 883] New: Real-time logging segfaults

Le mercredi 27 juillet 2011, S Roderick <kiwi [dot] net [..] ...> a écrit :
> On Jul 27, 2011, at 04:50 , Steven Bellens wrote:
>
>>
>> http://bugs.orocos.org/show_bug.cgi?id=883
>>
>> Summary: Real-time logging segfaults
>> Product: Toolchain
>> Version: master
>> Platform: All
>> OS/Version: All
>> Status: NEW
>> Severity: normal
>> Priority: P3
>> Component: RTT
>> AssignedTo: orocos-dev [..] ...
>> ReportedBy: steven [dot] bellens [..] ...
>> CC: orocos-dev [..] ...
>> Estimated Hours: 0.0
>>
>>
>> Using the real-time logging service, I regularly get segfaults. GDB
backtrace
>> gives me:
>>
>> Program received signal SIGSEGV, Segmentation fault.
>> [Switching to Thread 0x7fffadffb700 (LWP 21995)]
>> 0x00007ffff7d56494 in malloc_ex (size=96, mem_pool=0x628900)
>> at
/home/u0063509/stacks/orocos_toolchain_ros/rtt/rtt/os/tlsf/tlsf.c:727
>> 727 EXTRACT_BLOCK_HDR(b, tlsf, fl, sl);
>> (gdb) bt
>> #0 0x00007ffff7d56494 in malloc_ex (size=96, mem_pool=0x628900)
>> at
/home/u0063509/stacks/orocos_toolchain_ros/rtt/rtt/os/tlsf/tlsf.c:727
>> #1 0x00007ffff7d56b9f in oro_rt_malloc (size=85)
>> at
/home/u0063509/stacks/orocos_toolchain_ros/rtt/rtt/os/tlsf/tlsf.c:634
>
> Most likely you've exhausted the memory pool. There are some functions to
monitor the usage of a pool, suggest you add those in: get_used_size(),
get_max_size()

The allocator usually throws a std::bad_alloc exception when the memory is
exhausted. This doesn't seems to be the case here.

Philippe

[Bug 883] New: Real-time logging segfaults

On Jul 27, 2011, at 13:46 , Philippe Hamelin wrote:

>
>
> Le mercredi 27 juillet 2011, S Roderick <kiwi [dot] net [..] ...> a écrit :
> > On Jul 27, 2011, at 04:50 , Steven Bellens wrote:
> >
> >>
> >> http://bugs.orocos.org/show_bug.cgi?id=883
> >>
> >> Summary: Real-time logging segfaults
> >> Product: Toolchain
> >> Version: master
> >> Platform: All
> >> OS/Version: All
> >> Status: NEW
> >> Severity: normal
> >> Priority: P3
> >> Component: RTT
> >> AssignedTo: orocos-dev [..] ...
> >> ReportedBy: steven [dot] bellens [..] ...
> >> CC: orocos-dev [..] ...
> >> Estimated Hours: 0.0
> >>
> >>
> >> Using the real-time logging service, I regularly get segfaults. GDB backtrace
> >> gives me:
> >>
> >> Program received signal SIGSEGV, Segmentation fault.
> >> [Switching to Thread 0x7fffadffb700 (LWP 21995)]
> >> 0x00007ffff7d56494 in malloc_ex (size=96, mem_pool=0x628900)
> >> at /home/u0063509/stacks/orocos_toolchain_ros/rtt/rtt/os/tlsf/tlsf.c:727
> >> 727 EXTRACT_BLOCK_HDR(b, tlsf, fl, sl);
> >> (gdb) bt
> >> #0 0x00007ffff7d56494 in malloc_ex (size=96, mem_pool=0x628900)
> >> at /home/u0063509/stacks/orocos_toolchain_ros/rtt/rtt/os/tlsf/tlsf.c:727
> >> #1 0x00007ffff7d56b9f in oro_rt_malloc (size=85)
> >> at /home/u0063509/stacks/orocos_toolchain_ros/rtt/rtt/os/tlsf/tlsf.c:634
> >
> > Most likely you've exhausted the memory pool. There are some functions to monitor the usage of a pool, suggest you add those in: get_used_size(), get_max_size()
>
> The allocator usually throws a std::bad_alloc exception when the memory is exhausted. This doesn't seems to be the case here.

We've found that to not be the case. Sometimes std::alloc, sometimes assert(). I don't remember what the fundamental cause of the assert() is ...

S

[Bug 883] Real-time logging segfaults

http://bugs.orocos.org/show_bug.cgi?id=883

--- Comment #2 from Steven Bellens <steven [dot] bellens [..] ...> 2011-07-27 15:06:42 CEST ---
Per request, gdb 'info locals' after the crash:

(gdb) info locals
tlsf = 0x62b870
b = <optimized out>
b2 = <optimized out>
next_b = <optimized out>
fl = 1
sl = 8
tmp_size = <optimized out>
(gdb)

[Bug 883] Real-time logging segfaults

http://bugs.orocos.org/show_bug.cgi?id=883

--- Comment #1 from Steven Bellens <steven [dot] bellens [..] ...> 2011-07-27 14:59:18 CEST ---
Additionally, valgrind output, which doesn't tell us anything more:

WrenchController [R]> ==17138== Thread 12:
==17138== Invalid write of size 8
==17138== at 0x4D19494: malloc_ex (tlsf.c:727)
==17138== by 0x4D19B9E: oro_rt_malloc (tlsf.c:634)
==17138== by 0x50038D8: std::basic_string<char, std::char_traits RTT::os::rt_allocator<char> >::_Rep::_S_create(unsigned long, unsigned long,
RTT::os::rt_allocator<char> const&) (oro_allocator.hpp:276)
==17138== by 0x500396C: char* std::basic_string<char,
std::char_traits const*>(char const*, char const*, RTT::os::rt_allocator<char> const&,
std::forward_iterator_tag) (basic_string.tcc:140)
==17138== by 0x5003A62: std::basic_string<char, std::char_traits RTT::os::rt_allocator<char> >::basic_string(char const*,
RTT::os::rt_allocator<char> const&) (basic_string.h:1692)
==17138== by 0x4FFBCC8: OCL::logging::Category::_logUnconditionally2(int,
std::basic_string<char, std::char_traits const&) (Category.cpp:97)
==17138== by 0x50140B1: OCL::logging::CategoryStream::flush()
(CategoryStream.cpp:21)
==17138== by 0x501431A: OCL::logging::CategoryStream::~CategoryStream()
(CategoryStream.cpp:16)
==17138== by 0x2302A5E6: asctec::WrenchController::updateHook()
(wrench_controller.cpp:91)
==17138== by 0x4D0FC67: RTT::ExecutionEngine::processChildren()
(ExecutionEngine.cpp:331)
==17138== by 0x4D1D7F5: RTT::os::thread_function(void*) (Thread.cpp:135)
==17138== by 0x3FE8A07AF0: start_thread (pthread_create.c:305)
==17138== Address 0x36342d206f742084 is not stack'd, malloc'd or (recently)
free'd
==17138==
==17138==
==17138== Process terminating with default action of signal 11 (SIGSEGV)
==17138== General Protection Fault
==17138== at 0x4D19494: malloc_ex (tlsf.c:727)
==17138== by 0x4D19B9E: oro_rt_malloc (tlsf.c:634)
==17138== by 0x50038D8: std::basic_string<char, std::char_traits RTT::os::rt_allocator<char> >::_Rep::_S_create(unsigned long, unsigned long,
RTT::os::rt_allocator<char> const&) (oro_allocator.hpp:276)
==17138== by 0x500396C: char* std::basic_string<char,
std::char_traits const*>(char const*, char const*, RTT::os::rt_allocator<char> const&,
std::forward_iterator_tag) (basic_string.tcc:140)
==17138== by 0x5003A62: std::basic_string<char, std::char_traits RTT::os::rt_allocator<char> >::basic_string(char const*,
RTT::os::rt_allocator<char> const&) (basic_string.h:1692)
==17138== by 0x4FFBCC8: OCL::logging::Category::_logUnconditionally2(int,
std::basic_string<char, std::char_traits const&) (Category.cpp:97)
==17138== by 0x50140B1: OCL::logging::CategoryStream::flush()
(CategoryStream.cpp:21)
==17138== by 0x501431A: OCL::logging::CategoryStream::~CategoryStream()
(CategoryStream.cpp:16)
==17138== by 0x2302A5E6: asctec::WrenchController::updateHook()
(wrench_controller.cpp:91)
==17138== by 0x4D0FC67: RTT::ExecutionEngine::processChildren()
(ExecutionEngine.cpp:331)
==17138== by 0x4D1D7F5: RTT::os::thread_function(void*) (Thread.cpp:135)
==17138== by 0x3FE8A07AF0: start_thread (pthread_create.c:305)
==17138==
==17138== HEAP SUMMARY:
==17138== in use at exit: 2,412,974 bytes in 24,373 blocks
==17138== total heap usage: 83,574,287 allocs, 83,549,914 frees,
5,145,241,795 bytes allocated
==17138==
==17138== LEAK SUMMARY:
==17138== definitely lost: 1,248 bytes in 37 blocks
==17138== indirectly lost: 41,349 bytes in 1,009 blocks
==17138== possibly lost: 416,495 bytes in 3,887 blocks
==17138== still reachable: 1,953,882 bytes in 19,440 blocks
==17138== suppressed: 0 bytes in 0 blocks
==17138== Rerun with --leak-check=full to see details of leaked memory
==17138==
==17138== For counts of detected and suppressed errors, rerun with: -v
==17138== Use --track-origins=yes to see where uninitialised values come from
==17138== ERROR SUMMARY: 7 errors from 6 contexts (suppressed: 188 from 8)
Killed