Required steps to use TLSF in hard realtime?

We are writing a new hard realtime, Orocos-based software system, and
are trying to get hard realtime performance under preempt_rt patched
Linux kernels.

In our realtime sections, we are passing around ROS messages that
contain strings and vectors. In order to make this HRT-safe, we are
setting the allocator to RTT::os::rt_allocator<uint8_t>. However, we
are missing our deadlines during copies of the data structures as well
as during writing to ports.

We are manually installing Orocos, which seems to default
OS_RT_MALLOC_SBRK and OS_RT_MALLOC_MMAP to enabled, which is warned
against in some Orocos documentation. Should these be turned off? I
suspect that if the memory pool needs to be grown, then disabling the
growing functionality will lead to a bad_alloc exception. Is this the
case?

If these settings need to be disabled, are they disabled in the Ubuntu
packages in the repositories? If so, we'll just switch to using those
packages.

We need to be able to manipulate these messages in HRT -- is there
anything else we're missing that we need to do in order to make this
work?

Thank you for any help.

Thank You,
Johnathan Van Why
Dynamic Robotics Laboratory
Oregon State University

Required steps to use TLSF in hard realtime?

On May 20, 2013, at 17:39 , Johnathan Van Why wrote:

> We are writing a new hard realtime, Orocos-based software system, and
> are trying to get hard realtime performance under preempt_rt patched
> Linux kernels.
>
> In our realtime sections, we are passing around ROS messages that
> contain strings and vectors. In order to make this HRT-safe, we are
> setting the allocator to RTT::os::rt_allocator<uint8_t>. However, we
> are missing our deadlines during copies of the data structures as well
> as during writing to ports.
>
> We are manually installing Orocos, which seems to default
> OS_RT_MALLOC_SBRK and OS_RT_MALLOC_MMAP to enabled, which is warned
> against in some Orocos documentation. Should these be turned off? I
> suspect that if the memory pool needs to be grown, then disabling the
> growing functionality will lead to a bad_alloc exception. Is this the
> case?
>
> If these settings need to be disabled, are they disabled in the Ubuntu
> packages in the repositories? If so, we'll just switch to using those
> packages.
>
> We need to be able to manipulate these messages in HRT -- is there
> anything else we're missing that we need to do in order to make this
> work?
>
> Thank you for any help.
>
> Thank You,
> Johnathan Van Why
> Dynamic Robotics Laboratory
> Oregon State University

Some things to consider
- turn off OS_RT_MALLOC_SBRK and OS_RT_MALLOC_MMAP
- use a real time memory pool of sufficient size
- pre-allocate as much as possible before going real-time
- make sure that your priorites are good, particularly w.r.t. the existing interrupt thread priorities (default=50)
- make sure that you lock memory appropriately, see
https://rt.wiki.kernel.org/index.php/RT_PREEMPT_HOWTO

What are your real-time requirements? What performance are you trying to achieve, and on what hardware?

As with others on this list, we achieve real-time performance on stock Dell, etc, desktop hardware with the RT kernel from Ubuntu 10.04. Our application uses plenty of strings and vectors, and primarily runs at 500 Hz. We don't use ROS though, and are still using Orocos v1.

HTH
S

Required steps to use TLSF in hard realtime?

On Tue, May 21, 2013 at 3:23 AM, S Roderick <kiwi [dot] net [..] ...> wrote:
> On May 20, 2013, at 17:39 , Johnathan Van Why wrote:
>
>> We are writing a new hard realtime, Orocos-based software system, and
>> are trying to get hard realtime performance under preempt_rt patched
>> Linux kernels.
>>
>> In our realtime sections, we are passing around ROS messages that
>> contain strings and vectors. In order to make this HRT-safe, we are
>> setting the allocator to RTT::os::rt_allocator<uint8_t>. However, we
>> are missing our deadlines during copies of the data structures as well
>> as during writing to ports.
>>
>> We are manually installing Orocos, which seems to default
>> OS_RT_MALLOC_SBRK and OS_RT_MALLOC_MMAP to enabled, which is warned
>> against in some Orocos documentation. Should these be turned off? I
>> suspect that if the memory pool needs to be grown, then disabling the
>> growing functionality will lead to a bad_alloc exception. Is this the
>> case?
>>
>> If these settings need to be disabled, are they disabled in the Ubuntu
>> packages in the repositories? If so, we'll just switch to using those
>> packages.
>>
>> We need to be able to manipulate these messages in HRT -- is there
>> anything else we're missing that we need to do in order to make this
>> work?
>>
>> Thank you for any help.
>>
>> Thank You,
>> Johnathan Van Why
>> Dynamic Robotics Laboratory
>> Oregon State University

> "Writing ROS messages around": does that mean you use the ROS message data
> stucture as shared variables (fine!), or that you use the normal TCP/IP
> communication approach (not so fine!)?
> The latter would be a bit strange in a single-threaded context, isn't it?

We're using them primarily as local variables (within one thread).
They are being transmitted to ROS by Orocos's RTT-ROS integration, but
they first leave our components via Orocos ports.

> I am very curious to learn why exactly ROS messaging is necessary;
> introducing an inherently non realtime safe aspect in a hard realtime
> design is asking for troubles.

We're logging these types by streaming them to ROS, then using
ROS-based tools to analyze the data. This requires that these be ROS
messages.

> Some things to consider
> - turn off OS_RT_MALLOC_SBRK and OS_RT_MALLOC_MMAP
> - use a real time memory pool of sufficient size
> - pre-allocate as much as possible before going real-time
> - make sure that your priorites are good, particularly w.r.t. the existing interrupt thread priorities (default=50)
> - make sure that you lock memory appropriately, see
> https://rt.wiki.kernel.org/index.php/RT_PREEMPT_HOWTO

Okay, I've turned those off and recompiled. Also, I've bumped up the
memory pool size to 256 MiB (which is quite large, IMO). We don't
dynamically allocate unless our architecture requires it (which,
unfortunately, is fairly often, due to the ROS messages).

Our realtime loop is at priority 80, well above the kernel IRQs, and
we are both locking memory and pre-faulting the stack.

With a mostly-idle system, I saw a latency of over 451 microseconds
within minutes of starting our system. The code that took this long to
execute wrote to 1 port, executed the line
"RTT::os::TimeService::Instance()->getNSecs()", did a comparison, and
called and returned from 2 functions. I do not think that writing to a
port or getting the current time in nanoseconds should take more than
a few microseconds.

> What are your real-time requirements? What performance are you trying to achieve, and on what hardware?

We are trying to run at 1 kHz on standard desktop hardware. Our
lowest-end machines are high-end Intel Atoms, but if we need a more
powerful computer we can use one. We're hoping to reduce our maximum
latencies below 300 microseconds, although I suspect we should be able
to reduce them below 200 microseconds as cyclictest never experiences
latencies above 160 microseconds for us.

> As with others on this list, we achieve real-time performance on stock Dell, etc, desktop hardware with the RT kernel from Ubuntu 10.04. Our application uses plenty of strings and vectors, and primarily runs at 500 Hz. We don't use ROS though, and are still using Orocos v1.
>
> HTH
> S

Required steps to use TLSF in hard realtime?

On Tue, 21 May 2013, Johnathan Van Why wrote:

> On Tue, May 21, 2013 at 3:23 AM, S Roderick <kiwi [dot] net [..] ...> wrote:
>> On May 20, 2013, at 17:39 , Johnathan Van Why wrote:
>>
>>> We are writing a new hard realtime, Orocos-based software system, and
>>> are trying to get hard realtime performance under preempt_rt patched
>>> Linux kernels.
>>>
>>> In our realtime sections, we are passing around ROS messages that
>>> contain strings and vectors. In order to make this HRT-safe, we are
>>> setting the allocator to RTT::os::rt_allocator<uint8_t>. However, we
>>> are missing our deadlines during copies of the data structures as well
>>> as during writing to ports.
>>>
>>> We are manually installing Orocos, which seems to default
>>> OS_RT_MALLOC_SBRK and OS_RT_MALLOC_MMAP to enabled, which is warned
>>> against in some Orocos documentation. Should these be turned off? I
>>> suspect that if the memory pool needs to be grown, then disabling the
>>> growing functionality will lead to a bad_alloc exception. Is this the
>>> case?
>>>
>>> If these settings need to be disabled, are they disabled in the Ubuntu
>>> packages in the repositories? If so, we'll just switch to using those
>>> packages.
>>>
>>> We need to be able to manipulate these messages in HRT -- is there
>>> anything else we're missing that we need to do in order to make this
>>> work?
>>>
>>> Thank you for any help.
>>>
>>> Thank You,
>>> Johnathan Van Why
>>> Dynamic Robotics Laboratory
>>> Oregon State University
>
>> "Writing ROS messages around": does that mean you use the ROS message data
>> stucture as shared variables (fine!), or that you use the normal TCP/IP
>> communication approach (not so fine!)?
>> The latter would be a bit strange in a single-threaded context, isn't it?
>
> We're using them primarily as local variables (within one thread).
Ok, this does not involve any latency, good!

> They are being transmitted to ROS by Orocos's RTT-ROS integration, but
> they first leave our components via Orocos ports.

This does introduce latency, but this is inevitable. But you experience
problems with the realtime allocation, if I got that right? Isn't it
possible to allocate a maximum buffer size at the configuration time of
your application, so that you don't have to use runtime allocation? This is
one of the standard guidelines for hard realtime programming.

>> I am very curious to learn why exactly ROS messaging is necessary;
>> introducing an inherently non realtime safe aspect in a hard realtime
>> design is asking for troubles.
>
> We're logging these types by streaming them to ROS, then using
> ROS-based tools to analyze the data. This requires that these be ROS
> messages.

Ok, understood!

>> Some things to consider
>> - turn off OS_RT_MALLOC_SBRK and OS_RT_MALLOC_MMAP
>> - use a real time memory pool of sufficient size
>> - pre-allocate as much as possible before going real-time
>> - make sure that your priorites are good, particularly w.r.t. the existing interrupt thread priorities (default=50)
>> - make sure that you lock memory appropriately, see
>> https://rt.wiki.kernel.org/index.php/RT_PREEMPT_HOWTO
>
> Okay, I've turned those off and recompiled. Also, I've bumped up the
> memory pool size to 256 MiB (which is quite large, IMO). We don't
> dynamically allocate unless our architecture requires it (which,
> unfortunately, is fairly often, due to the ROS messages).

Auch, oops... Then your hardware seems not to be ready for the software
architecture you've put on it...? And you should switch to a more effective
logging solution. What that other solution is depends on your hardware, of
course; it might be as simple as a DMA based mass data transfer on PCI
based hardware, or going to a realtime Ethernet protocol such as Ethercat
to send out huge buffers (up to 4G if I am not mistaken).

> Our realtime loop is at priority 80, well above the kernel IRQs, and
> we are both locking memory and pre-faulting the stack.
>
> With a mostly-idle system, I saw a latency of over 451 microseconds
> within minutes of starting our system. The code that took this long to
> execute wrote to 1 port, executed the line
> "RTT::os::TimeService::Instance()->getNSecs()", did a comparison, and
> called and returned from 2 functions. I do not think that writing to a
> port or getting the current time in nanoseconds should take more than
> a few microseconds.

>> What are your real-time requirements? What performance are you trying to achieve, and on what hardware?
>
> We are trying to run at 1 kHz on standard desktop hardware. Our
> lowest-end machines are high-end Intel Atoms, but if we need a more
> powerful computer we can use one.

My experience is that "powerful" is not the main issue in such hard
realtime problems, but the motherboard's interrupt latency. Modern PC
boards are optimized for average throughput, not for low latencies, and
we've been bitten several times badly with "powerful" computers.

Of course, there is always the possibiity that you have encountered a bug
in the RTT infrastructure.

> We're hoping to reduce our maximum
> latencies below 300 microseconds, although I suspect we should be able
> to reduce them below 200 microseconds as cyclictest never experiences
> latencies above 160 microseconds for us.

This is still a rather high number... If your architecture is such that a
couple of context switches are used (which can happen if you do even only a
few port operations and communications out of process), the total latency
can be a lot larger than you expect...

Finding the real causes of poor realtime performance is though, sigh...

>> As with others on this list, we achieve real-time performance on stock
>> Dell, etc, desktop hardware with the RT kernel from Ubuntu 10.04. Our
>> application uses plenty of strings and vectors, and primarily runs at
>> 500 Hz. We don't use ROS though, and are still using Orocos v1.
>>
>> HTH
>> S

Herman

Required steps to use TLSF in hard realtime?

On May 22, 2013, at 02:27 , Herman Bruyninckx wrote:

> On Tue, 21 May 2013, Johnathan Van Why wrote:
>
>> On Tue, May 21, 2013 at 3:23 AM, S Roderick <kiwi [dot] net [..] ...> wrote:
>>> On May 20, 2013, at 17:39 , Johnathan Van Why wrote:
>>>
>>>> We are writing a new hard realtime, Orocos-based software system, and
>>>> are trying to get hard realtime performance under preempt_rt patched
>>>> Linux kernels.
>>>>
>>>> In our realtime sections, we are passing around ROS messages that
>>>> contain strings and vectors. In order to make this HRT-safe, we are
>>>> setting the allocator to RTT::os::rt_allocator<uint8_t>. However, we
>>>> are missing our deadlines during copies of the data structures as well
>>>> as during writing to ports.
>>>>
>>>> We are manually installing Orocos, which seems to default
>>>> OS_RT_MALLOC_SBRK and OS_RT_MALLOC_MMAP to enabled, which is warned
>>>> against in some Orocos documentation. Should these be turned off? I
>>>> suspect that if the memory pool needs to be grown, then disabling the
>>>> growing functionality will lead to a bad_alloc exception. Is this the
>>>> case?
>>>>
>>>> If these settings need to be disabled, are they disabled in the Ubuntu
>>>> packages in the repositories? If so, we'll just switch to using those
>>>> packages.
>>>>
>>>> We need to be able to manipulate these messages in HRT -- is there
>>>> anything else we're missing that we need to do in order to make this
>>>> work?
>>>>
>>>> Thank you for any help.
>>>>
>>>> Thank You,
>>>> Johnathan Van Why
>>>> Dynamic Robotics Laboratory
>>>> Oregon State University
>>
>>> "Writing ROS messages around": does that mean you use the ROS message data
>>> stucture as shared variables (fine!), or that you use the normal TCP/IP
>>> communication approach (not so fine!)?
>>> The latter would be a bit strange in a single-threaded context, isn't it?
>>
>> We're using them primarily as local variables (within one thread).
> Ok, this does not involve any latency, good!
>
>> They are being transmitted to ROS by Orocos's RTT-ROS integration, but
>> they first leave our components via Orocos ports.
>
> This does introduce latency, but this is inevitable. But you experience
> problems with the realtime allocation, if I got that right? Isn't it
> possible to allocate a maximum buffer size at the configuration time of
> your application, so that you don't have to use runtime allocation? This is
> one of the standard guidelines for hard realtime programming.
>
>>> I am very curious to learn why exactly ROS messaging is necessary;
>>> introducing an inherently non realtime safe aspect in a hard realtime
>>> design is asking for troubles.
>>
>> We're logging these types by streaming them to ROS, then using
>> ROS-based tools to analyze the data. This requires that these be ROS
>> messages.
>
> Ok, understood!
>
>>> Some things to consider
>>> - turn off OS_RT_MALLOC_SBRK and OS_RT_MALLOC_MMAP
>>> - use a real time memory pool of sufficient size
>>> - pre-allocate as much as possible before going real-time
>>> - make sure that your priorites are good, particularly w.r.t. the existing interrupt thread priorities (default=50)
>>> - make sure that you lock memory appropriately, see
>>> https://rt.wiki.kernel.org/index.php/RT_PREEMPT_HOWTO
>>
>> Okay, I've turned those off and recompiled. Also, I've bumped up the
>> memory pool size to 256 MiB (which is quite large, IMO). We don't
>> dynamically allocate unless our architecture requires it (which,
>> unfortunately, is fairly often, due to the ROS messages).
>
> Auch, oops... Then your hardware seems not to be ready for the software
> architecture you've put on it...? And you should switch to a more effective
> logging solution. What that other solution is depends on your hardware, of
> course; it might be as simple as a DMA based mass data transfer on PCI
> based hardware, or going to a realtime Ethernet protocol such as Ethercat
> to send out huge buffers (up to 4G if I am not mistaken).

What makes you think that the logging solution is at fault? I don't see that in anything that Johnathon has written.

>> Our realtime loop is at priority 80, well above the kernel IRQs, and
>> we are both locking memory and pre-faulting the stack.
>>
>> With a mostly-idle system, I saw a latency of over 451 microseconds
>> within minutes of starting our system. The code that took this long to
>> execute wrote to 1 port, executed the line
>> "RTT::os::TimeService::Instance()->getNSecs()", did a comparison, and
>> called and returned from 2 functions. I do not think that writing to a
>> port or getting the current time in nanoseconds should take more than
>> a few microseconds.
>
>>> What are your real-time requirements? What performance are you trying to achieve, and on what hardware?
>>
>> We are trying to run at 1 kHz on standard desktop hardware. Our
>> lowest-end machines are high-end Intel Atoms, but if we need a more
>> powerful computer we can use one.
>
> My experience is that "powerful" is not the main issue in such hard
> realtime problems, but the motherboard's interrupt latency. Modern PC
> boards are optimized for average throughput, not for low latencies, and
> we've been bitten several times badly with "powerful" computers.

Agreed. I'd check all the SMI and BIOS settings. We've been bitten by that before too.

> Of course, there is always the possibiity that you have encountered a bug
> in the RTT infrastructure.
>
>> We're hoping to reduce our maximum
>> latencies below 300 microseconds, although I suspect we should be able
>> to reduce them below 200 microseconds as cyclictest never experiences
>> latencies above 160 microseconds for us.
>
> This is still a rather high number... If your architecture is such that a
> couple of context switches are used (which can happen if you do even only a
> few port operations and communications out of process), the total latency
> can be a lot larger than you expect...
>
> Finding the real causes of poor realtime performance is though, sigh...

On an average Dell we achieve 10's of microseconds of worst-case latency on a 500 Hz cycle.

Which kernel are you using? Did you patch it yourself or get a prepackaged RT kernel? Which version of Orocos? What else is running on the computer? Are you logged in graphically or using ssh? Have you reduced your system down to a small test case that demonstrates the problem?

Herman's right, there are a lot of things you need to look at to determine the cause of poor RT performance ... :-(
S

Required steps to use TLSF in hard realtime?

On Wed, 22 May 2013, Stephen Roderick wrote:

> On May 22, 2013, at 02:27 , Herman Bruyninckx wrote:
>
>> On Tue, 21 May 2013, Johnathan Van Why wrote:
>>
>>> On Tue, May 21, 2013 at 3:23 AM, S Roderick <kiwi [dot] net [..] ...> wrote:
>>>> On May 20, 2013, at 17:39 , Johnathan Van Why wrote:
>>>>
>>>>> We are writing a new hard realtime, Orocos-based software system, and
>>>>> are trying to get hard realtime performance under preempt_rt patched
>>>>> Linux kernels.
>>>>>
>>>>> In our realtime sections, we are passing around ROS messages that
>>>>> contain strings and vectors. In order to make this HRT-safe, we are
>>>>> setting the allocator to RTT::os::rt_allocator<uint8_t>. However, we
>>>>> are missing our deadlines during copies of the data structures as well
>>>>> as during writing to ports.
>>>>>
>>>>> We are manually installing Orocos, which seems to default
>>>>> OS_RT_MALLOC_SBRK and OS_RT_MALLOC_MMAP to enabled, which is warned
>>>>> against in some Orocos documentation. Should these be turned off? I
>>>>> suspect that if the memory pool needs to be grown, then disabling the
>>>>> growing functionality will lead to a bad_alloc exception. Is this the
>>>>> case?
>>>>>
>>>>> If these settings need to be disabled, are they disabled in the Ubuntu
>>>>> packages in the repositories? If so, we'll just switch to using those
>>>>> packages.
>>>>>
>>>>> We need to be able to manipulate these messages in HRT -- is there
>>>>> anything else we're missing that we need to do in order to make this
>>>>> work?
>>>>>
>>>>> Thank you for any help.
>>>>>
>>>>> Thank You,
>>>>> Johnathan Van Why
>>>>> Dynamic Robotics Laboratory
>>>>> Oregon State University
>>>
>>>> "Writing ROS messages around": does that mean you use the ROS message data
>>>> stucture as shared variables (fine!), or that you use the normal TCP/IP
>>>> communication approach (not so fine!)?
>>>> The latter would be a bit strange in a single-threaded context, isn't it?
>>>
>>> We're using them primarily as local variables (within one thread).
>> Ok, this does not involve any latency, good!
>>
>>> They are being transmitted to ROS by Orocos's RTT-ROS integration, but
>>> they first leave our components via Orocos ports.
>>
>> This does introduce latency, but this is inevitable. But you experience
>> problems with the realtime allocation, if I got that right? Isn't it
>> possible to allocate a maximum buffer size at the configuration time of
>> your application, so that you don't have to use runtime allocation? This is
>> one of the standard guidelines for hard realtime programming.
>>
>>>> I am very curious to learn why exactly ROS messaging is necessary;
>>>> introducing an inherently non realtime safe aspect in a hard realtime
>>>> design is asking for troubles.
>>>
>>> We're logging these types by streaming them to ROS, then using
>>> ROS-based tools to analyze the data. This requires that these be ROS
>>> messages.
>>
>> Ok, understood!
>>
>>>> Some things to consider
>>>> - turn off OS_RT_MALLOC_SBRK and OS_RT_MALLOC_MMAP
>>>> - use a real time memory pool of sufficient size
>>>> - pre-allocate as much as possible before going real-time
>>>> - make sure that your priorites are good, particularly w.r.t. the existing interrupt thread priorities (default=50)
>>>> - make sure that you lock memory appropriately, see
>>>> https://rt.wiki.kernel.org/index.php/RT_PREEMPT_HOWTO
>>>
>>> Okay, I've turned those off and recompiled. Also, I've bumped up the
>>> memory pool size to 256 MiB (which is quite large, IMO). We don't
>>> dynamically allocate unless our architecture requires it (which,
>>> unfortunately, is fairly often, due to the ROS messages).
>>
>> Auch, oops... Then your hardware seems not to be ready for the software
>> architecture you've put on it...? And you should switch to a more effective
>> logging solution. What that other solution is depends on your hardware, of
>> course; it might be as simple as a DMA based mass data transfer on PCI
>> based hardware, or going to a realtime Ethernet protocol such as Ethercat
>> to send out huge buffers (up to 4G if I am not mistaken).
>
> What makes you think that the logging solution is at fault? I don't see
> that in anything that Johnathon has written.

I am of course not sure that the logging is at fault, but his own words are
that it the logging is the one and only reason for dynamic allocation.

>>> Our realtime loop is at priority 80, well above the kernel IRQs, and
>>> we are both locking memory and pre-faulting the stack.
>>>
>>> With a mostly-idle system, I saw a latency of over 451 microseconds
>>> within minutes of starting our system. The code that took this long to
>>> execute wrote to 1 port, executed the line
>>> "RTT::os::TimeService::Instance()->getNSecs()", did a comparison, and
>>> called and returned from 2 functions. I do not think that writing to a
>>> port or getting the current time in nanoseconds should take more than
>>> a few microseconds.
>>
>>>> What are your real-time requirements? What performance are you trying to achieve, and on what hardware?
>>>
>>> We are trying to run at 1 kHz on standard desktop hardware. Our
>>> lowest-end machines are high-end Intel Atoms, but if we need a more
>>> powerful computer we can use one.
>>
>> My experience is that "powerful" is not the main issue in such hard
>> realtime problems, but the motherboard's interrupt latency. Modern PC
>> boards are optimized for average throughput, not for low latencies, and
>> we've been bitten several times badly with "powerful" computers.
>
> Agreed. I'd check all the SMI and BIOS settings. We've been bitten by that before too.

I still find it difficult to give constructive guidelines about _how_
exactly you can check these things and be _sure_ about whether the check is
positive or not... Maybe you have such guidelines?

>> Of course, there is always the possibiity that you have encountered a bug
>> in the RTT infrastructure.
>>
>>> We're hoping to reduce our maximum
>>> latencies below 300 microseconds, although I suspect we should be able
>>> to reduce them below 200 microseconds as cyclictest never experiences
>>> latencies above 160 microseconds for us.
>>
>> This is still a rather high number... If your architecture is such that a
>> couple of context switches are used (which can happen if you do even only a
>> few port operations and communications out of process), the total latency
>> can be a lot larger than you expect...
>>
>> Finding the real causes of poor realtime performance is though, sigh...
>
> On an average Dell we achieve 10's of microseconds of worst-case latency on a 500 Hz cycle.

That conforms to the numbers we experience on "good hard realtime ready"
hardware, mostly independently of the "CPU power". Often the numbers are
even better. But on "bad" hardware (or with bad SMI and BIOS settings), the
numbers can be huge...

> Which kernel are you using? Did you patch it yourself or get a
> prepackaged RT kernel? Which version of Orocos? What else is running on
> the computer? Are you logged in graphically or using ssh? Have you
> reduced your system down to a small test case that demonstrates the
> problem?
>
> Herman's right, there are a lot of things you need to look at to
> determine the cause of poor RT performance ... :-(

It's very unfortunate to be right this time... :-(

> S

Herman

Required steps to use TLSF in hard realtime?

On Wed, May 22, 2013 at 4:59 AM, Herman Bruyninckx
<Herman [dot] Bruyninckx [..] ...> wrote:
> On Wed, 22 May 2013, Stephen Roderick wrote:
>
>> On May 22, 2013, at 02:27 , Herman Bruyninckx wrote:
>>
>>> On Tue, 21 May 2013, Johnathan Van Why wrote:
>>>
>>>> On Tue, May 21, 2013 at 3:23 AM, S Roderick <kiwi [dot] net [..] ...> wrote:
>>>>>
>>>>> On May 20, 2013, at 17:39 , Johnathan Van Why wrote:
>>>>>
>>>>>> We are writing a new hard realtime, Orocos-based software system, and
>>>>>> are trying to get hard realtime performance under preempt_rt patched
>>>>>> Linux kernels.
>>>>>>
>>>>>> In our realtime sections, we are passing around ROS messages that
>>>>>> contain strings and vectors. In order to make this HRT-safe, we are
>>>>>> setting the allocator to RTT::os::rt_allocator<uint8_t>. However, we
>>>>>> are missing our deadlines during copies of the data structures as well
>>>>>> as during writing to ports.
>>>>>>
>>>>>> We are manually installing Orocos, which seems to default
>>>>>> OS_RT_MALLOC_SBRK and OS_RT_MALLOC_MMAP to enabled, which is warned
>>>>>> against in some Orocos documentation. Should these be turned off? I
>>>>>> suspect that if the memory pool needs to be grown, then disabling the
>>>>>> growing functionality will lead to a bad_alloc exception. Is this the
>>>>>> case?
>>>>>>
>>>>>> If these settings need to be disabled, are they disabled in the Ubuntu
>>>>>> packages in the repositories? If so, we'll just switch to using those
>>>>>> packages.
>>>>>>
>>>>>> We need to be able to manipulate these messages in HRT -- is there
>>>>>> anything else we're missing that we need to do in order to make this
>>>>>> work?
>>>>>>
>>>>>> Thank you for any help.
>>>>>>
>>>>>> Thank You,
>>>>>> Johnathan Van Why
>>>>>> Dynamic Robotics Laboratory
>>>>>> Oregon State University
>>>>
>>>>
>>>>> "Writing ROS messages around": does that mean you use the ROS message
>>>>> data
>>>>> stucture as shared variables (fine!), or that you use the normal TCP/IP
>>>>> communication approach (not so fine!)?
>>>>> The latter would be a bit strange in a single-threaded context, isn't
>>>>> it?
>>>>
>>>>
>>>> We're using them primarily as local variables (within one thread).
>>>
>>> Ok, this does not involve any latency, good!
>>>
>>>> They are being transmitted to ROS by Orocos's RTT-ROS integration, but
>>>> they first leave our components via Orocos ports.
>>>
>>>
>>> This does introduce latency, but this is inevitable. But you experience
>>> problems with the realtime allocation, if I got that right? Isn't it
>>> possible to allocate a maximum buffer size at the configuration time of
>>> your application, so that you don't have to use runtime allocation? This
>>> is
>>> one of the standard guidelines for hard realtime programming.

We set a buffered connection policy with a buffer of 10,000 messages.
However, these messages contain strings and, occasionally, vectors,
because ROS put a string into their "Header" message type (which is
used for timestamps). As a result, these strings and vectors still
need to be grown at runtime, since we don't know how large they can
be.

I had not previously been setting data samples for my ports since I
did not believe it to be necessary with the realtime allocator.
However, I just tried setting a data sample, and it appears that my
latencies have been decreased. Is it necessary to set a data sample
even if I'm using a buffered connection and the realtime allocator?

If so, I can set samples, although I'll have to guess on a size
because I won't always know a large message will be produced until it
is actually sent (and tell users not to transmit messages larger than
that).

>>>>> I am very curious to learn why exactly ROS messaging is necessary;
>>>>> introducing an inherently non realtime safe aspect in a hard realtime
>>>>> design is asking for troubles.
>>>>
>>>>
>>>> We're logging these types by streaming them to ROS, then using
>>>> ROS-based tools to analyze the data. This requires that these be ROS
>>>> messages.
>>>
>>>
>>> Ok, understood!
>>>
>>>>> Some things to consider
>>>>> - turn off OS_RT_MALLOC_SBRK and OS_RT_MALLOC_MMAP
>>>>> - use a real time memory pool of sufficient size
>>>>> - pre-allocate as much as possible before going real-time
>>>>> - make sure that your priorites are good, particularly w.r.t. the
>>>>> existing interrupt thread priorities (default=50)
>>>>> - make sure that you lock memory appropriately, see
>>>>> https://rt.wiki.kernel.org/index.php/RT_PREEMPT_HOWTO
>>>>
>>>>
>>>> Okay, I've turned those off and recompiled. Also, I've bumped up the
>>>> memory pool size to 256 MiB (which is quite large, IMO). We don't
>>>> dynamically allocate unless our architecture requires it (which,
>>>> unfortunately, is fairly often, due to the ROS messages).
>>>
>>>
>>> Auch, oops... Then your hardware seems not to be ready for the software
>>> architecture you've put on it...? And you should switch to a more
>>> effective
>>> logging solution. What that other solution is depends on your hardware,
>>> of
>>> course; it might be as simple as a DMA based mass data transfer on PCI
>>> based hardware, or going to a realtime Ethernet protocol such as Ethercat
>>> to send out huge buffers (up to 4G if I am not mistaken).

The system we're building is designed to make writing interchangeable
"controllers" easy, so it has to be able to log various data types
(without requiring the user to write a serialization function). The
automatic serialization is another benefit of ROS messages.

I've looked at the NetCDF data reporter in OCL, but it does not appear
to have sufficient performance for us (we're logging around 400KB/s).
If you're aware of any other technique for fast file-based logging
capable of handling arbitrary data types, I'm open for suggestions.

>> What makes you think that the logging solution is at fault? I don't see
>> that in anything that Johnathon has written.
>
>
> I am of course not sure that the logging is at fault, but his own words are
> that it the logging is the one and only reason for dynamic allocation.
>
>
>>>> Our realtime loop is at priority 80, well above the kernel IRQs, and
>>>> we are both locking memory and pre-faulting the stack.
>>>>
>>>> With a mostly-idle system, I saw a latency of over 451 microseconds
>>>> within minutes of starting our system. The code that took this long to
>>>> execute wrote to 1 port, executed the line
>>>> "RTT::os::TimeService::Instance()->getNSecs()", did a comparison, and
>>>> called and returned from 2 functions. I do not think that writing to a
>>>> port or getting the current time in nanoseconds should take more than
>>>> a few microseconds.
>>>
>>>
>>>>> What are your real-time requirements? What performance are you trying
>>>>> to achieve, and on what hardware?
>>>>
>>>>
>>>> We are trying to run at 1 kHz on standard desktop hardware. Our
>>>> lowest-end machines are high-end Intel Atoms, but if we need a more
>>>> powerful computer we can use one.
>>>
>>>
>>> My experience is that "powerful" is not the main issue in such hard
>>> realtime problems, but the motherboard's interrupt latency. Modern PC
>>> boards are optimized for average throughput, not for low latencies, and
>>> we've been bitten several times badly with "powerful" computers.
>>
>>
>> Agreed. I'd check all the SMI and BIOS settings. We've been bitten by that
>> before too.
>
>
> I still find it difficult to give constructive guidelines about _how_
> exactly you can check these things and be _sure_ about whether the check is
> positive or not... Maybe you have such guidelines?

I've looked around in the BIOS, and haven't found anything
"suspicious". Also, I've run hwlatdetect, which claims to be able to
detect SMI events, but it did not see any sources of large latencies.

My statement about using a more powerful computer if necessary was a
mistake -- I'm simultaneously debugging an issue with logging
throughput that may necessitate more powerful hardware (where this
statement would make sense), and evidently got my threads mixed up for
a minute. I've never attributed hard realtime performance with
throughput performance.

>>> Of course, there is always the possibiity that you have encountered a bug
>>> in the RTT infrastructure.
>>>
>>>> We're hoping to reduce our maximum
>>>> latencies below 300 microseconds, although I suspect we should be able
>>>> to reduce them below 200 microseconds as cyclictest never experiences
>>>> latencies above 160 microseconds for us.
>>>
>>>
>>> This is still a rather high number... If your architecture is such that a
>>> couple of context switches are used (which can happen if you do even only
>>> a
>>> few port operations and communications out of process), the total latency
>>> can be a lot larger than you expect...

Does writing to a port trigger a context switch? I'd assume not for a
normal port (since port-based communication in lockless), but I'd
understand if it's different for an event port. I'm not sure which
rtt_rosnode uses, however.

>>> Finding the real causes of poor realtime performance is though, sigh...
>>
>>
>> On an average Dell we achieve 10's of microseconds of worst-case latency
>> on a 500 Hz cycle.
>
>
> That conforms to the numbers we experience on "good hard realtime ready"
> hardware, mostly independently of the "CPU power". Often the numbers are
> even better. But on "bad" hardware (or with bad SMI and BIOS settings), the
> numbers can be huge...
>
>
>> Which kernel are you using? Did you patch it yourself or get a
>> prepackaged RT kernel? Which version of Orocos? What else is running on
>> the computer? Are you logged in graphically or using ssh? Have you
>> reduced your system down to a small test case that demonstrates the
>> problem?

When I started this thread, I was using 3.6.11 patched with the rt25
patch. I had previously run a cyclic test and found it to have
latencies under 200 µs, as I reported earlier in this thread. However,
I just re-ran cyclictest and found much larger latencies. I'm not sure
what changed, so I configured a new 3.8.11 kernel with the rt8 patch
and now the large latencies are gone.

I'm running the "master" branch of Orocos, whatever version that may
be (I have yet to find a way to check). I'm logged in graphically, and
running various "normal" applications (web browser, terminal, PDF
reader). To test for hard realtime performance, I'm using a kernel
compilation with a large thread count to stress the system, but large
latencies are occurring even without the high load.

>> Herman's right, there are a lot of things you need to look at to
>> determine the cause of poor RT performance ... :-(
>
>
> It's very unfortunate to be right this time... :-(
>
>> S
>
>
> Herman

Required steps to use TLSF in hard realtime?

On Wed, May 22, 2013 at 9:39 PM, Johnathan Van Why <jrvanwhy [..] ...> wrote:
> On Wed, May 22, 2013 at 4:59 AM, Herman Bruyninckx
> <Herman [dot] Bruyninckx [..] ...> wrote:
>> On Wed, 22 May 2013, Stephen Roderick wrote:
>>
>>> On May 22, 2013, at 02:27 , Herman Bruyninckx wrote:
>>>
>>>> On Tue, 21 May 2013, Johnathan Van Why wrote:
>>>>
>>>>> On Tue, May 21, 2013 at 3:23 AM, S Roderick <kiwi [dot] net [..] ...> wrote:
>>>>>>
>>>>>> On May 20, 2013, at 17:39 , Johnathan Van Why wrote:
>>>>>>
>>>>>>> We are writing a new hard realtime, Orocos-based software system, and
>>>>>>> are trying to get hard realtime performance under preempt_rt patched
>>>>>>> Linux kernels.
>>>>>>>
>>>>>>> In our realtime sections, we are passing around ROS messages that
>>>>>>> contain strings and vectors. In order to make this HRT-safe, we are
>>>>>>> setting the allocator to RTT::os::rt_allocator<uint8_t>. However, we
>>>>>>> are missing our deadlines during copies of the data structures as well
>>>>>>> as during writing to ports.
>>>>>>>
>>>>>>> We are manually installing Orocos, which seems to default
>>>>>>> OS_RT_MALLOC_SBRK and OS_RT_MALLOC_MMAP to enabled, which is warned
>>>>>>> against in some Orocos documentation. Should these be turned off? I
>>>>>>> suspect that if the memory pool needs to be grown, then disabling the
>>>>>>> growing functionality will lead to a bad_alloc exception. Is this the
>>>>>>> case?
>>>>>>>
>>>>>>> If these settings need to be disabled, are they disabled in the Ubuntu
>>>>>>> packages in the repositories? If so, we'll just switch to using those
>>>>>>> packages.
>>>>>>>
>>>>>>> We need to be able to manipulate these messages in HRT -- is there
>>>>>>> anything else we're missing that we need to do in order to make this
>>>>>>> work?
>>>>>>>
>>>>>>> Thank you for any help.
>>>>>>>
>>>>>>> Thank You,
>>>>>>> Johnathan Van Why
>>>>>>> Dynamic Robotics Laboratory
>>>>>>> Oregon State University
>>>>>
>>>>>
>>>>>> "Writing ROS messages around": does that mean you use the ROS message
>>>>>> data
>>>>>> stucture as shared variables (fine!), or that you use the normal TCP/IP
>>>>>> communication approach (not so fine!)?
>>>>>> The latter would be a bit strange in a single-threaded context, isn't
>>>>>> it?
>>>>>
>>>>>
>>>>> We're using them primarily as local variables (within one thread).
>>>>
>>>> Ok, this does not involve any latency, good!
>>>>
>>>>> They are being transmitted to ROS by Orocos's RTT-ROS integration, but
>>>>> they first leave our components via Orocos ports.
>>>>
>>>>
>>>> This does introduce latency, but this is inevitable. But you experience
>>>> problems with the realtime allocation, if I got that right? Isn't it
>>>> possible to allocate a maximum buffer size at the configuration time of
>>>> your application, so that you don't have to use runtime allocation? This
>>>> is
>>>> one of the standard guidelines for hard realtime programming.
>
> We set a buffered connection policy with a buffer of 10,000 messages.

Ouch ! This can only work if you're using the LOCKED locking policy, not the
LOCK_FREE one. Any buffer more than size 50-100 should go locked since
the copying is going to take ages....

> However, these messages contain strings and, occasionally, vectors,
> because ROS put a string into their "Header" message type (which is
> used for timestamps). As a result, these strings and vectors still
> need to be grown at runtime, since we don't know how large they can
> be.
>
> I had not previously been setting data samples for my ports since I
> did not believe it to be necessary with the realtime allocator.
> However, I just tried setting a data sample, and it appears that my
> latencies have been decreased. Is it necessary to set a data sample
> even if I'm using a buffered connection and the realtime allocator?
>
> If so, I can set samples, although I'll have to guess on a size
> because I won't always know a large message will be produced until it
> is actually sent (and tell users not to transmit messages larger than
> that).

I don't think it's required to use setDataSample. Also as Stephen writes, the
mechanism preserves memory, even after you popped and element (
independent of your use of setDataSample )...

>
>>>>>> I am very curious to learn why exactly ROS messaging is necessary;
>>>>>> introducing an inherently non realtime safe aspect in a hard realtime
>>>>>> design is asking for troubles.
>>>>>
>>>>>
>>>>> We're logging these types by streaming them to ROS, then using
>>>>> ROS-based tools to analyze the data. This requires that these be ROS
>>>>> messages.
>>>>
>>>>
>>>> Ok, understood!
>>>>
>>>>>> Some things to consider
>>>>>> - turn off OS_RT_MALLOC_SBRK and OS_RT_MALLOC_MMAP
>>>>>> - use a real time memory pool of sufficient size
>>>>>> - pre-allocate as much as possible before going real-time
>>>>>> - make sure that your priorites are good, particularly w.r.t. the
>>>>>> existing interrupt thread priorities (default=50)
>>>>>> - make sure that you lock memory appropriately, see
>>>>>> https://rt.wiki.kernel.org/index.php/RT_PREEMPT_HOWTO
>>>>>
>>>>>
>>>>> Okay, I've turned those off and recompiled. Also, I've bumped up the
>>>>> memory pool size to 256 MiB (which is quite large, IMO). We don't
>>>>> dynamically allocate unless our architecture requires it (which,
>>>>> unfortunately, is fairly often, due to the ROS messages).
>>>>
>>>>
>>>> Auch, oops... Then your hardware seems not to be ready for the software
>>>> architecture you've put on it...? And you should switch to a more
>>>> effective
>>>> logging solution. What that other solution is depends on your hardware,
>>>> of
>>>> course; it might be as simple as a DMA based mass data transfer on PCI
>>>> based hardware, or going to a realtime Ethernet protocol such as Ethercat
>>>> to send out huge buffers (up to 4G if I am not mistaken).
>
> The system we're building is designed to make writing interchangeable
> "controllers" easy, so it has to be able to log various data types
> (without requiring the user to write a serialization function). The
> automatic serialization is another benefit of ROS messages.
>
> I've looked at the NetCDF data reporter in OCL, but it does not appear
> to have sufficient performance for us (we're logging around 400KB/s).
> If you're aware of any other technique for fast file-based logging
> capable of handling arbitrary data types, I'm open for suggestions.

Hmm, current/recent OCL's Netcdf logging is the fastest we can have... are you
sure it won't work ?

>
>>> What makes you think that the logging solution is at fault? I don't see
>>> that in anything that Johnathon has written.
>>
>>
>> I am of course not sure that the logging is at fault, but his own words are
>> that it the logging is the one and only reason for dynamic allocation.
>>
>>
>>>>> Our realtime loop is at priority 80, well above the kernel IRQs, and
>>>>> we are both locking memory and pre-faulting the stack.
>>>>>
>>>>> With a mostly-idle system, I saw a latency of over 451 microseconds
>>>>> within minutes of starting our system. The code that took this long to
>>>>> execute wrote to 1 port, executed the line
>>>>> "RTT::os::TimeService::Instance()->getNSecs()", did a comparison, and
>>>>> called and returned from 2 functions. I do not think that writing to a
>>>>> port or getting the current time in nanoseconds should take more than
>>>>> a few microseconds.
>>>>
>>>>
>>>>>> What are your real-time requirements? What performance are you trying
>>>>>> to achieve, and on what hardware?
>>>>>
>>>>>
>>>>> We are trying to run at 1 kHz on standard desktop hardware. Our
>>>>> lowest-end machines are high-end Intel Atoms, but if we need a more
>>>>> powerful computer we can use one.
>>>>
>>>>
>>>> My experience is that "powerful" is not the main issue in such hard
>>>> realtime problems, but the motherboard's interrupt latency. Modern PC
>>>> boards are optimized for average throughput, not for low latencies, and
>>>> we've been bitten several times badly with "powerful" computers.
>>>
>>>
>>> Agreed. I'd check all the SMI and BIOS settings. We've been bitten by that
>>> before too.
>>
>>
>> I still find it difficult to give constructive guidelines about _how_
>> exactly you can check these things and be _sure_ about whether the check is
>> positive or not... Maybe you have such guidelines?
>
> I've looked around in the BIOS, and haven't found anything
> "suspicious". Also, I've run hwlatdetect, which claims to be able to
> detect SMI events, but it did not see any sources of large latencies.
>
> My statement about using a more powerful computer if necessary was a
> mistake -- I'm simultaneously debugging an issue with logging
> throughput that may necessitate more powerful hardware (where this
> statement would make sense), and evidently got my threads mixed up for
> a minute. I've never attributed hard realtime performance with
> throughput performance.
>
>>>> Of course, there is always the possibiity that you have encountered a bug
>>>> in the RTT infrastructure.
>>>>
>>>>> We're hoping to reduce our maximum
>>>>> latencies below 300 microseconds, although I suspect we should be able
>>>>> to reduce them below 200 microseconds as cyclictest never experiences
>>>>> latencies above 160 microseconds for us.
>>>>
>>>>
>>>> This is still a rather high number... If your architecture is such that a
>>>> couple of context switches are used (which can happen if you do even only
>>>> a
>>>> few port operations and communications out of process), the total latency
>>>> can be a lot larger than you expect...
>
> Does writing to a port trigger a context switch? I'd assume not for a
> normal port (since port-based communication in lockless), but I'd
> understand if it's different for an event port. I'm not sure which
> rtt_rosnode uses, however.
>
>>>> Finding the real causes of poor realtime performance is though, sigh...
>>>
>>>
>>> On an average Dell we achieve 10's of microseconds of worst-case latency
>>> on a 500 Hz cycle.
>>
>>
>> That conforms to the numbers we experience on "good hard realtime ready"
>> hardware, mostly independently of the "CPU power". Often the numbers are
>> even better. But on "bad" hardware (or with bad SMI and BIOS settings), the
>> numbers can be huge...
>>
>>
>>> Which kernel are you using? Did you patch it yourself or get a
>>> prepackaged RT kernel? Which version of Orocos? What else is running on
>>> the computer? Are you logged in graphically or using ssh? Have you
>>> reduced your system down to a small test case that demonstrates the
>>> problem?
>
> When I started this thread, I was using 3.6.11 patched with the rt25
> patch. I had previously run a cyclic test and found it to have
> latencies under 200 µs, as I reported earlier in this thread. However,
> I just re-ran cyclictest and found much larger latencies. I'm not sure
> what changed, so I configured a new 3.8.11 kernel with the rt8 patch
> and now the large latencies are gone.
>
> I'm running the "master" branch of Orocos, whatever version that may
> be (I have yet to find a way to check). I'm logged in graphically, and
> running various "normal" applications (web browser, terminal, PDF
> reader). To test for hard realtime performance, I'm using a kernel
> compilation with a large thread count to stress the system, but large
> latencies are occurring even without the high load.

A 'correct' Orocos application should run at the same latencies as the
cyclic test...

Peter

>
>>> Herman's right, there are a lot of things you need to look at to
>>> determine the cause of poor RT performance ... :-(
>>
>>
>> It's very unfortunate to be right this time... :-(
>>
>>> S
>>
>>
>> Herman
> --
> Orocos-Users mailing list
> Orocos-Users [..] ...
> http://lists.mech.kuleuven.be/mailman/listinfo/orocos-users

Required steps to use TLSF in hard realtime?

On May 27, 2013, at 16:57 , Peter Soetens wrote:

> On Wed, May 22, 2013 at 9:39 PM, Johnathan Van Why <jrvanwhy [..] ...> wrote:
>> On Wed, May 22, 2013 at 4:59 AM, Herman Bruyninckx
>> <Herman [dot] Bruyninckx [..] ...> wrote:
>>> On Wed, 22 May 2013, Stephen Roderick wrote:
>>>
>>>> On May 22, 2013, at 02:27 , Herman Bruyninckx wrote:
>>>>
>>>>> On Tue, 21 May 2013, Johnathan Van Why wrote:
>>>>>
>>>>>> On Tue, May 21, 2013 at 3:23 AM, S Roderick <kiwi [dot] net [..] ...> wrote:
>>>>>>>
>>>>>>> On May 20, 2013, at 17:39 , Johnathan Van Why wrote:
>>>>>>>
>>>>>>>> We are writing a new hard realtime, Orocos-based software system, and
>>>>>>>> are trying to get hard realtime performance under preempt_rt patched
>>>>>>>> Linux kernels.
>>>>>>>>
>>>>>>>> In our realtime sections, we are passing around ROS messages that
>>>>>>>> contain strings and vectors. In order to make this HRT-safe, we are
>>>>>>>> setting the allocator to RTT::os::rt_allocator<uint8_t>. However, we
>>>>>>>> are missing our deadlines during copies of the data structures as well
>>>>>>>> as during writing to ports.
>>>>>>>>
>>>>>>>> We are manually installing Orocos, which seems to default
>>>>>>>> OS_RT_MALLOC_SBRK and OS_RT_MALLOC_MMAP to enabled, which is warned
>>>>>>>> against in some Orocos documentation. Should these be turned off? I
>>>>>>>> suspect that if the memory pool needs to be grown, then disabling the
>>>>>>>> growing functionality will lead to a bad_alloc exception. Is this the
>>>>>>>> case?
>>>>>>>>
>>>>>>>> If these settings need to be disabled, are they disabled in the Ubuntu
>>>>>>>> packages in the repositories? If so, we'll just switch to using those
>>>>>>>> packages.
>>>>>>>>
>>>>>>>> We need to be able to manipulate these messages in HRT -- is there
>>>>>>>> anything else we're missing that we need to do in order to make this
>>>>>>>> work?
>>>>>>>>
>>>>>>>> Thank you for any help.
>>>>>>>>
>>>>>>>> Thank You,
>>>>>>>> Johnathan Van Why
>>>>>>>> Dynamic Robotics Laboratory
>>>>>>>> Oregon State University
>>>>>>
>>>>>>
>>>>>>> "Writing ROS messages around": does that mean you use the ROS message
>>>>>>> data
>>>>>>> stucture as shared variables (fine!), or that you use the normal TCP/IP
>>>>>>> communication approach (not so fine!)?
>>>>>>> The latter would be a bit strange in a single-threaded context, isn't
>>>>>>> it?
>>>>>>
>>>>>>
>>>>>> We're using them primarily as local variables (within one thread).
>>>>>
>>>>> Ok, this does not involve any latency, good!
>>>>>
>>>>>> They are being transmitted to ROS by Orocos's RTT-ROS integration, but
>>>>>> they first leave our components via Orocos ports.
>>>>>
>>>>>
>>>>> This does introduce latency, but this is inevitable. But you experience
>>>>> problems with the realtime allocation, if I got that right? Isn't it
>>>>> possible to allocate a maximum buffer size at the configuration time of
>>>>> your application, so that you don't have to use runtime allocation? This
>>>>> is
>>>>> one of the standard guidelines for hard realtime programming.
>>
>> We set a buffered connection policy with a buffer of 10,000 messages.
>
> Ouch ! This can only work if you're using the LOCKED locking policy, not the
> LOCK_FREE one. Any buffer more than size 50-100 should go locked since
> the copying is going to take ages....
>
>> However, these messages contain strings and, occasionally, vectors,
>> because ROS put a string into their "Header" message type (which is
>> used for timestamps). As a result, these strings and vectors still
>> need to be grown at runtime, since we don't know how large they can
>> be.
>>
>> I had not previously been setting data samples for my ports since I
>> did not believe it to be necessary with the realtime allocator.
>> However, I just tried setting a data sample, and it appears that my
>> latencies have been decreased. Is it necessary to set a data sample
>> even if I'm using a buffered connection and the realtime allocator?
>>
>> If so, I can set samples, although I'll have to guess on a size
>> because I won't always know a large message will be produced until it
>> is actually sent (and tell users not to transmit messages larger than
>> that).
>
> I don't think it's required to use setDataSample. Also as Stephen writes, the
> mechanism preserves memory, even after you popped and element (
> independent of your use of setDataSample )...
>
>>
>>>>>>> I am very curious to learn why exactly ROS messaging is necessary;
>>>>>>> introducing an inherently non realtime safe aspect in a hard realtime
>>>>>>> design is asking for troubles.
>>>>>>
>>>>>>
>>>>>> We're logging these types by streaming them to ROS, then using
>>>>>> ROS-based tools to analyze the data. This requires that these be ROS
>>>>>> messages.
>>>>>
>>>>>
>>>>> Ok, understood!
>>>>>
>>>>>>> Some things to consider
>>>>>>> - turn off OS_RT_MALLOC_SBRK and OS_RT_MALLOC_MMAP
>>>>>>> - use a real time memory pool of sufficient size
>>>>>>> - pre-allocate as much as possible before going real-time
>>>>>>> - make sure that your priorites are good, particularly w.r.t. the
>>>>>>> existing interrupt thread priorities (default=50)
>>>>>>> - make sure that you lock memory appropriately, see
>>>>>>> https://rt.wiki.kernel.org/index.php/RT_PREEMPT_HOWTO
>>>>>>
>>>>>>
>>>>>> Okay, I've turned those off and recompiled. Also, I've bumped up the
>>>>>> memory pool size to 256 MiB (which is quite large, IMO). We don't
>>>>>> dynamically allocate unless our architecture requires it (which,
>>>>>> unfortunately, is fairly often, due to the ROS messages).
>>>>>
>>>>>
>>>>> Auch, oops... Then your hardware seems not to be ready for the software
>>>>> architecture you've put on it...? And you should switch to a more
>>>>> effective
>>>>> logging solution. What that other solution is depends on your hardware,
>>>>> of
>>>>> course; it might be as simple as a DMA based mass data transfer on PCI
>>>>> based hardware, or going to a realtime Ethernet protocol such as Ethercat
>>>>> to send out huge buffers (up to 4G if I am not mistaken).
>>
>> The system we're building is designed to make writing interchangeable
>> "controllers" easy, so it has to be able to log various data types
>> (without requiring the user to write a serialization function). The
>> automatic serialization is another benefit of ROS messages.
>>
>> I've looked at the NetCDF data reporter in OCL, but it does not appear
>> to have sufficient performance for us (we're logging around 400KB/s).
>> If you're aware of any other technique for fast file-based logging
>> capable of handling arbitrary data types, I'm open for suggestions.
>
> Hmm, current/recent OCL's Netcdf logging is the fastest we can have... are you
> sure it won't work ?

We log ~1 MB/s using text files. So on an average desktop with a reasonable HDD, this is entirely possible.
S

Required steps to use TLSF in hard realtime?

2013/5/22 Johnathan Van Why <jrvanwhy [..] ...>

> On Wed, May 22, 2013 at 4:59 AM, Herman Bruyninckx
> <Herman [dot] Bruyninckx [..] ...> wrote:
> > On Wed, 22 May 2013, Stephen Roderick wrote:
> >
> >> On May 22, 2013, at 02:27 , Herman Bruyninckx wrote:
> >>
> >>> On Tue, 21 May 2013, Johnathan Van Why wrote:
> >>>
> >>>> On Tue, May 21, 2013 at 3:23 AM, S Roderick <kiwi [dot] net [..] ...> wrote:
> >>>>>
> >>>>> On May 20, 2013, at 17:39 , Johnathan Van Why wrote:
> >>>>>
> >>>>>> We are writing a new hard realtime, Orocos-based software system,
> and
> >>>>>> are trying to get hard realtime performance under preempt_rt patched
> >>>>>> Linux kernels.
> >>>>>>
> >>>>>> In our realtime sections, we are passing around ROS messages that
> >>>>>> contain strings and vectors. In order to make this HRT-safe, we are
> >>>>>> setting the allocator to RTT::os::rt_allocator<uint8_t>. However, we
> >>>>>> are missing our deadlines during copies of the data structures as
> well
> >>>>>> as during writing to ports.
> >>>>>>
> >>>>>> We are manually installing Orocos, which seems to default
> >>>>>> OS_RT_MALLOC_SBRK and OS_RT_MALLOC_MMAP to enabled, which is warned
> >>>>>> against in some Orocos documentation. Should these be turned off? I
> >>>>>> suspect that if the memory pool needs to be grown, then disabling
> the
> >>>>>> growing functionality will lead to a bad_alloc exception. Is this
> the
> >>>>>> case?
> >>>>>>
> >>>>>> If these settings need to be disabled, are they disabled in the
> Ubuntu
> >>>>>> packages in the repositories? If so, we'll just switch to using
> those
> >>>>>> packages.
> >>>>>>
> >>>>>> We need to be able to manipulate these messages in HRT -- is there
> >>>>>> anything else we're missing that we need to do in order to make this
> >>>>>> work?
> >>>>>>
> >>>>>> Thank you for any help.
> >>>>>>
> >>>>>> Thank You,
> >>>>>> Johnathan Van Why
> >>>>>> Dynamic Robotics Laboratory
> >>>>>> Oregon State University
> >>>>
> >>>>
> >>>>> "Writing ROS messages around": does that mean you use the ROS message
> >>>>> data
> >>>>> stucture as shared variables (fine!), or that you use the normal
> TCP/IP
> >>>>> communication approach (not so fine!)?
> >>>>> The latter would be a bit strange in a single-threaded context, isn't
> >>>>> it?
> >>>>
> >>>>
> >>>> We're using them primarily as local variables (within one thread).
> >>>
> >>> Ok, this does not involve any latency, good!
> >>>
> >>>> They are being transmitted to ROS by Orocos's RTT-ROS integration, but
> >>>> they first leave our components via Orocos ports.
> >>>
> >>>
> >>> This does introduce latency, but this is inevitable. But you experience
> >>> problems with the realtime allocation, if I got that right? Isn't it
> >>> possible to allocate a maximum buffer size at the configuration time of
> >>> your application, so that you don't have to use runtime allocation?
> This
> >>> is
> >>> one of the standard guidelines for hard realtime programming.
>
> We set a buffered connection policy with a buffer of 10,000 messages.
> However, these messages contain strings and, occasionally, vectors,
> because ROS put a string into their "Header" message type (which is
> used for timestamps). As a result, these strings and vectors still
> need to be grown at runtime, since we don't know how large they can
> be.
>
> I had not previously been setting data samples for my ports since I
> did not believe it to be necessary with the realtime allocator.
> However, I just tried setting a data sample, and it appears that my
> latencies have been decreased. Is it necessary to set a data sample
> even if I'm using a buffered connection and the realtime allocator?
>
> If so, I can set samples, although I'll have to guess on a size
> because I won't always know a large message will be produced until it
> is actually sent (and tell users not to transmit messages larger than
> that).
>

This is a very important thing to do in RT contexts. I'm quite sure you
can't prove any realtiness if everything is not bounded ;p
That's the reason for the existence of setSample() function. You do your
memory allocation once for all during non critical phasis (like init), and
then you more or less have to live with it. It is a (implicit) bounding of
your memory.

> >>>>> I am very curious to learn why exactly ROS messaging is necessary;
> >>>>> introducing an inherently non realtime safe aspect in a hard realtime
> >>>>> design is asking for troubles.
> >>>>
> >>>>
> >>>> We're logging these types by streaming them to ROS, then using
> >>>> ROS-based tools to analyze the data. This requires that these be ROS
> >>>> messages.
> >>>
> >>>
> >>> Ok, understood!
> >>>
> >>>>> Some things to consider
> >>>>> - turn off OS_RT_MALLOC_SBRK and OS_RT_MALLOC_MMAP
> >>>>> - use a real time memory pool of sufficient size
> >>>>> - pre-allocate as much as possible before going real-time
> >>>>> - make sure that your priorites are good, particularly w.r.t. the
> >>>>> existing interrupt thread priorities (default=50)
> >>>>> - make sure that you lock memory appropriately, see
> >>>>> https://rt.wiki.kernel.org/index.php/RT_PREEMPT_HOWTO
> >>>>
> >>>>
> >>>> Okay, I've turned those off and recompiled. Also, I've bumped up the
> >>>> memory pool size to 256 MiB (which is quite large, IMO). We don't
> >>>> dynamically allocate unless our architecture requires it (which,
> >>>> unfortunately, is fairly often, due to the ROS messages).
> >>>
> >>>
> >>> Auch, oops... Then your hardware seems not to be ready for the software
> >>> architecture you've put on it...? And you should switch to a more
> >>> effective
> >>> logging solution. What that other solution is depends on your hardware,
> >>> of
> >>> course; it might be as simple as a DMA based mass data transfer on PCI
> >>> based hardware, or going to a realtime Ethernet protocol such as
> Ethercat
> >>> to send out huge buffers (up to 4G if I am not mistaken).
>
> The system we're building is designed to make writing interchangeable
> "controllers" easy, so it has to be able to log various data types
> (without requiring the user to write a serialization function). The
> automatic serialization is another benefit of ROS messages.
>
> I've looked at the NetCDF data reporter in OCL, but it does not appear
> to have sufficient performance for us (we're logging around 400KB/s).
> If you're aware of any other technique for fast file-based logging
> capable of handling arbitrary data types, I'm open for suggestions.
>
> >> What makes you think that the logging solution is at fault? I don't see
> >> that in anything that Johnathon has written.
> >
> >
> > I am of course not sure that the logging is at fault, but his own words
> are
> > that it the logging is the one and only reason for dynamic allocation.
> >
> >
> >>>> Our realtime loop is at priority 80, well above the kernel IRQs, and
> >>>> we are both locking memory and pre-faulting the stack.
> >>>>
> >>>> With a mostly-idle system, I saw a latency of over 451 microseconds
> >>>> within minutes of starting our system. The code that took this long to
> >>>> execute wrote to 1 port, executed the line
> >>>> "RTT::os::TimeService::Instance()->getNSecs()", did a comparison, and
> >>>> called and returned from 2 functions. I do not think that writing to a
> >>>> port or getting the current time in nanoseconds should take more than
> >>>> a few microseconds.
> >>>
> >>>
> >>>>> What are your real-time requirements? What performance are you trying
> >>>>> to achieve, and on what hardware?
> >>>>
> >>>>
> >>>> We are trying to run at 1 kHz on standard desktop hardware. Our
> >>>> lowest-end machines are high-end Intel Atoms, but if we need a more
> >>>> powerful computer we can use one.
> >>>
> >>>
> >>> My experience is that "powerful" is not the main issue in such hard
> >>> realtime problems, but the motherboard's interrupt latency. Modern PC
> >>> boards are optimized for average throughput, not for low latencies, and
> >>> we've been bitten several times badly with "powerful" computers.
> >>
> >>
> >> Agreed. I'd check all the SMI and BIOS settings. We've been bitten by
> that
> >> before too.
> >
> >
> > I still find it difficult to give constructive guidelines about _how_
> > exactly you can check these things and be _sure_ about whether the check
> is
> > positive or not... Maybe you have such guidelines?
>
> I've looked around in the BIOS, and haven't found anything
> "suspicious". Also, I've run hwlatdetect, which claims to be able to
> detect SMI events, but it did not see any sources of large latencies.
>
> My statement about using a more powerful computer if necessary was a
> mistake -- I'm simultaneously debugging an issue with logging
> throughput that may necessitate more powerful hardware (where this
> statement would make sense), and evidently got my threads mixed up for
> a minute. I've never attributed hard realtime performance with
> throughput performance.
>
> >>> Of course, there is always the possibiity that you have encountered a
> bug
> >>> in the RTT infrastructure.
> >>>
> >>>> We're hoping to reduce our maximum
> >>>> latencies below 300 microseconds, although I suspect we should be able
> >>>> to reduce them below 200 microseconds as cyclictest never experiences
> >>>> latencies above 160 microseconds for us.
> >>>
> >>>
> >>> This is still a rather high number... If your architecture is such
> that a
> >>> couple of context switches are used (which can happen if you do even
> only
> >>> a
> >>> few port operations and communications out of process), the total
> latency
> >>> can be a lot larger than you expect...
>
> Does writing to a port trigger a context switch? I'd assume not for a
> normal port (since port-based communication in lockless), but I'd
> understand if it's different for an event port. I'm not sure which
> rtt_rosnode uses, however.
>
> >>> Finding the real causes of poor realtime performance is though, sigh...
> >>
> >>
> >> On an average Dell we achieve 10's of microseconds of worst-case latency
> >> on a 500 Hz cycle.
> >
> >
> > That conforms to the numbers we experience on "good hard realtime ready"
> > hardware, mostly independently of the "CPU power". Often the numbers are
> > even better. But on "bad" hardware (or with bad SMI and BIOS settings),
> the
> > numbers can be huge...
> >
> >
> >> Which kernel are you using? Did you patch it yourself or get a
> >> prepackaged RT kernel? Which version of Orocos? What else is running on
> >> the computer? Are you logged in graphically or using ssh? Have you
> >> reduced your system down to a small test case that demonstrates the
> >> problem?
>
> When I started this thread, I was using 3.6.11 patched with the rt25
> patch. I had previously run a cyclic test and found it to have
> latencies under 200 µs, as I reported earlier in this thread. However,
> I just re-ran cyclictest and found much larger latencies. I'm not sure
> what changed, so I configured a new 3.8.11 kernel with the rt8 patch
> and now the large latencies are gone.
>
> I'm running the "master" branch of Orocos, whatever version that may
> be (I have yet to find a way to check). I'm logged in graphically, and
> running various "normal" applications (web browser, terminal, PDF
> reader). To test for hard realtime performance, I'm using a kernel
> compilation with a large thread count to stress the system, but large
> latencies are occurring even without the high load.
>
> >> Herman's right, there are a lot of things you need to look at to
> >> determine the cause of poor RT performance ... :-(
> >
> >
> > It's very unfortunate to be right this time... :-(
> >
> >> S
> >
> >
> > Herman
> --
> Orocos-Users mailing list
> Orocos-Users [..] ...
> http://lists.mech.kuleuven.be/mailman/listinfo/orocos-users
>

Required steps to use TLSF in hard realtime?

On May 23, 2013, at 06:38 , Willy Lambert wrote:

>
>
>
> 2013/5/22 Johnathan Van Why <jrvanwhy [..] ...>
> On Wed, May 22, 2013 at 4:59 AM, Herman Bruyninckx
> <Herman [dot] Bruyninckx [..] ...> wrote:
> > On Wed, 22 May 2013, Stephen Roderick wrote:
> >
> >> On May 22, 2013, at 02:27 , Herman Bruyninckx wrote:
> >>
> >>> On Tue, 21 May 2013, Johnathan Van Why wrote:
> >>>
> >>>> On Tue, May 21, 2013 at 3:23 AM, S Roderick <kiwi [dot] net [..] ...> wrote:
> >>>>>
> >>>>> On May 20, 2013, at 17:39 , Johnathan Van Why wrote:
> >>>>>
> >>>>>> We are writing a new hard realtime, Orocos-based software system, and
> >>>>>> are trying to get hard realtime performance under preempt_rt patched
> >>>>>> Linux kernels.
> >>>>>>
> >>>>>> In our realtime sections, we are passing around ROS messages that
> >>>>>> contain strings and vectors. In order to make this HRT-safe, we are
> >>>>>> setting the allocator to RTT::os::rt_allocator<uint8_t>. However, we
> >>>>>> are missing our deadlines during copies of the data structures as well
> >>>>>> as during writing to ports.
> >>>>>>
> >>>>>> We are manually installing Orocos, which seems to default
> >>>>>> OS_RT_MALLOC_SBRK and OS_RT_MALLOC_MMAP to enabled, which is warned
> >>>>>> against in some Orocos documentation. Should these be turned off? I
> >>>>>> suspect that if the memory pool needs to be grown, then disabling the
> >>>>>> growing functionality will lead to a bad_alloc exception. Is this the
> >>>>>> case?
> >>>>>>
> >>>>>> If these settings need to be disabled, are they disabled in the Ubuntu
> >>>>>> packages in the repositories? If so, we'll just switch to using those
> >>>>>> packages.
> >>>>>>
> >>>>>> We need to be able to manipulate these messages in HRT -- is there
> >>>>>> anything else we're missing that we need to do in order to make this
> >>>>>> work?
> >>>>>>
> >>>>>> Thank you for any help.
> >>>>>>
> >>>>>> Thank You,
> >>>>>> Johnathan Van Why
> >>>>>> Dynamic Robotics Laboratory
> >>>>>> Oregon State University
> >>>>
> >>>>
> >>>>> "Writing ROS messages around": does that mean you use the ROS message
> >>>>> data
> >>>>> stucture as shared variables (fine!), or that you use the normal TCP/IP
> >>>>> communication approach (not so fine!)?
> >>>>> The latter would be a bit strange in a single-threaded context, isn't
> >>>>> it?
> >>>>
> >>>>
> >>>> We're using them primarily as local variables (within one thread).
> >>>
> >>> Ok, this does not involve any latency, good!
> >>>
> >>>> They are being transmitted to ROS by Orocos's RTT-ROS integration, but
> >>>> they first leave our components via Orocos ports.
> >>>
> >>>
> >>> This does introduce latency, but this is inevitable. But you experience
> >>> problems with the realtime allocation, if I got that right? Isn't it
> >>> possible to allocate a maximum buffer size at the configuration time of
> >>> your application, so that you don't have to use runtime allocation? This
> >>> is
> >>> one of the standard guidelines for hard realtime programming.
>
> We set a buffered connection policy with a buffer of 10,000 messages.
> However, these messages contain strings and, occasionally, vectors,
> because ROS put a string into their "Header" message type (which is
> used for timestamps). As a result, these strings and vectors still
> need to be grown at runtime, since we don't know how large they can
> be.
>
> I had not previously been setting data samples for my ports since I
> did not believe it to be necessary with the realtime allocator.
> However, I just tried setting a data sample, and it appears that my
> latencies have been decreased. Is it necessary to set a data sample
> even if I'm using a buffered connection and the realtime allocator?
>
> If so, I can set samples, although I'll have to guess on a size
> because I won't always know a large message will be produced until it
> is actually sent (and tell users not to transmit messages larger than
> that).
>
> This is a very important thing to do in RT contexts. I'm quite sure you can't prove any realtiness if everything is not bounded ;p
> That's the reason for the existence of setSample() function. You do your memory allocation once for all during non critical phasis (like init), and then you more or less have to live with it. It is a (implicit) bounding of your memory.

Yes, and no. As long as you've pre-faulted the real-time memory buffer, then TLSF guarantees real-time access to the memory pool. So you _should_ not need to do setSample() up front (though I think it's still a good idea, where possible). We don't pre-fault on our system, but our buffer is only 20MB. Johnathon said they're using 250MB, which is really large. I suspect you need to pre-fault the entire pool first (see the Linux RT wiki for why/how).

You also have to realise that with a (circular) buffer port of real-time strings, when the buffer is first setup all those strings have 0 size and so have no memory allocated. As your application uses each buffer item, memory is allocated for each string. _But_ the allocation remains unchanged even after that item is "popped" from the buffer - basically, popping doesn't relieve memory. So what you theoretically need is "number buffer items * max possible message size" of memory in the RT memory pool.
S

Required steps to use TLSF in hard realtime?

On Thu, May 23, 2013 at 3:45 AM, S Roderick <kiwi [dot] net [..] ...> wrote:
> On May 23, 2013, at 06:38 , Willy Lambert wrote:
>
>
>
>
> 2013/5/22 Johnathan Van Why <jrvanwhy [..] ...>
>>
>> On Wed, May 22, 2013 at 4:59 AM, Herman Bruyninckx
>> <Herman [dot] Bruyninckx [..] ...> wrote:
>> > On Wed, 22 May 2013, Stephen Roderick wrote:
>> >
>> >> On May 22, 2013, at 02:27 , Herman Bruyninckx wrote:
>> >>
>> >>> On Tue, 21 May 2013, Johnathan Van Why wrote:
>> >>>
>> >>>> On Tue, May 21, 2013 at 3:23 AM, S Roderick <kiwi [dot] net [..] ...> wrote:
>> >>>>>
>> >>>>> On May 20, 2013, at 17:39 , Johnathan Van Why wrote:
>> >>>>>
>> >>>>>> We are writing a new hard realtime, Orocos-based software system,
>> >>>>>> and
>> >>>>>> are trying to get hard realtime performance under preempt_rt
>> >>>>>> patched
>> >>>>>> Linux kernels.
>> >>>>>>
>> >>>>>> In our realtime sections, we are passing around ROS messages that
>> >>>>>> contain strings and vectors. In order to make this HRT-safe, we are
>> >>>>>> setting the allocator to RTT::os::rt_allocator<uint8_t>. However,
>> >>>>>> we
>> >>>>>> are missing our deadlines during copies of the data structures as
>> >>>>>> well
>> >>>>>> as during writing to ports.
>> >>>>>>
>> >>>>>> We are manually installing Orocos, which seems to default
>> >>>>>> OS_RT_MALLOC_SBRK and OS_RT_MALLOC_MMAP to enabled, which is warned
>> >>>>>> against in some Orocos documentation. Should these be turned off? I
>> >>>>>> suspect that if the memory pool needs to be grown, then disabling
>> >>>>>> the
>> >>>>>> growing functionality will lead to a bad_alloc exception. Is this
>> >>>>>> the
>> >>>>>> case?
>> >>>>>>
>> >>>>>> If these settings need to be disabled, are they disabled in the
>> >>>>>> Ubuntu
>> >>>>>> packages in the repositories? If so, we'll just switch to using
>> >>>>>> those
>> >>>>>> packages.
>> >>>>>>
>> >>>>>> We need to be able to manipulate these messages in HRT -- is there
>> >>>>>> anything else we're missing that we need to do in order to make
>> >>>>>> this
>> >>>>>> work?
>> >>>>>>
>> >>>>>> Thank you for any help.
>> >>>>>>
>> >>>>>> Thank You,
>> >>>>>> Johnathan Van Why
>> >>>>>> Dynamic Robotics Laboratory
>> >>>>>> Oregon State University
>> >>>>
>> >>>>
>> >>>>> "Writing ROS messages around": does that mean you use the ROS
>> >>>>> message
>> >>>>> data
>> >>>>> stucture as shared variables (fine!), or that you use the normal
>> >>>>> TCP/IP
>> >>>>> communication approach (not so fine!)?
>> >>>>> The latter would be a bit strange in a single-threaded context,
>> >>>>> isn't
>> >>>>> it?
>> >>>>
>> >>>>
>> >>>> We're using them primarily as local variables (within one thread).
>> >>>
>> >>> Ok, this does not involve any latency, good!
>> >>>
>> >>>> They are being transmitted to ROS by Orocos's RTT-ROS integration,
>> >>>> but
>> >>>> they first leave our components via Orocos ports.
>> >>>
>> >>>
>> >>> This does introduce latency, but this is inevitable. But you
>> >>> experience
>> >>> problems with the realtime allocation, if I got that right? Isn't it
>> >>> possible to allocate a maximum buffer size at the configuration time
>> >>> of
>> >>> your application, so that you don't have to use runtime allocation?
>> >>> This
>> >>> is
>> >>> one of the standard guidelines for hard realtime programming.
>>
>> We set a buffered connection policy with a buffer of 10,000 messages.
>> However, these messages contain strings and, occasionally, vectors,
>> because ROS put a string into their "Header" message type (which is
>> used for timestamps). As a result, these strings and vectors still
>> need to be grown at runtime, since we don't know how large they can
>> be.
>>
>> I had not previously been setting data samples for my ports since I
>> did not believe it to be necessary with the realtime allocator.
>> However, I just tried setting a data sample, and it appears that my
>> latencies have been decreased. Is it necessary to set a data sample
>> even if I'm using a buffered connection and the realtime allocator?

I take back this statement. I must have just gotten lucky; overnight,
I ran a test and saw a 26ms latency. The line that took 26ms was the
one that writes to a port.

>> If so, I can set samples, although I'll have to guess on a size
>> because I won't always know a large message will be produced until it
>> is actually sent (and tell users not to transmit messages larger than
>> that).
>
>
> This is a very important thing to do in RT contexts. I'm quite sure you
> can't prove any realtiness if everything is not bounded ;p
> That's the reason for the existence of setSample() function. You do your
> memory allocation once for all during non critical phasis (like init), and
> then you more or less have to live with it. It is a (implicit) bounding of
> your memory.
>
>
> Yes, and no. As long as you've pre-faulted the real-time memory buffer, then
> TLSF guarantees real-time access to the memory pool. So you _should_ not
> need to do setSample() up front (though I think it's still a good idea,
> where possible). We don't pre-fault on our system, but our buffer is only
> 20MB. Johnathon said they're using 250MB, which is really large. I suspect
> you need to pre-fault the entire pool first (see the Linux RT wiki for
> why/how).

How do you pre-fault the pool? I'm pre-faulting the stack and locking
memory (both steps I found on the wiki), but don't know how to
pre-fault the entire pool. Where on the wiki is this described?

> You also have to realise that with a (circular) buffer port of real-time
> strings, when the buffer is first setup all those strings have 0 size and so
> have no memory allocated. As your application uses each buffer item, memory
> is allocated for each string. _But_ the allocation remains unchanged even
> after that item is "popped" from the buffer - basically, popping doesn't
> relieve memory. So what you theoretically need is "number buffer items * max
> possible message size" of memory in the RT memory pool.
> S

Thank you for all your help,
Johnathan Van Why
Dynamic Robotics Laboratory
Oregon State University

Required steps to use TLSF in hard realtime?

On May 23, 2013, at 14:38 , Johnathan Van Why wrote:

> On Thu, May 23, 2013 at 3:45 AM, S Roderick <kiwi [dot] net [..] ...> wrote:
>> On May 23, 2013, at 06:38 , Willy Lambert wrote:
>>
>>
>>
>>
>> 2013/5/22 Johnathan Van Why <jrvanwhy [..] ...>
>>>
>>> On Wed, May 22, 2013 at 4:59 AM, Herman Bruyninckx
>>> <Herman [dot] Bruyninckx [..] ...> wrote:
>>>> On Wed, 22 May 2013, Stephen Roderick wrote:
>>>>
>>>>> On May 22, 2013, at 02:27 , Herman Bruyninckx wrote:
>>>>>
>>>>>> On Tue, 21 May 2013, Johnathan Van Why wrote:
>>>>>>
>>>>>>> On Tue, May 21, 2013 at 3:23 AM, S Roderick <kiwi [dot] net [..] ...> wrote:
>>>>>>>>
>>>>>>>> On May 20, 2013, at 17:39 , Johnathan Van Why wrote:
>>>>>>>>
>>>>>>>>> We are writing a new hard realtime, Orocos-based software system,
>>>>>>>>> and
>>>>>>>>> are trying to get hard realtime performance under preempt_rt
>>>>>>>>> patched
>>>>>>>>> Linux kernels.
>>>>>>>>>
>>>>>>>>> In our realtime sections, we are passing around ROS messages that
>>>>>>>>> contain strings and vectors. In order to make this HRT-safe, we are
>>>>>>>>> setting the allocator to RTT::os::rt_allocator<uint8_t>. However,
>>>>>>>>> we
>>>>>>>>> are missing our deadlines during copies of the data structures as
>>>>>>>>> well
>>>>>>>>> as during writing to ports.
>>>>>>>>>
>>>>>>>>> We are manually installing Orocos, which seems to default
>>>>>>>>> OS_RT_MALLOC_SBRK and OS_RT_MALLOC_MMAP to enabled, which is warned
>>>>>>>>> against in some Orocos documentation. Should these be turned off? I
>>>>>>>>> suspect that if the memory pool needs to be grown, then disabling
>>>>>>>>> the
>>>>>>>>> growing functionality will lead to a bad_alloc exception. Is this
>>>>>>>>> the
>>>>>>>>> case?
>>>>>>>>>
>>>>>>>>> If these settings need to be disabled, are they disabled in the
>>>>>>>>> Ubuntu
>>>>>>>>> packages in the repositories? If so, we'll just switch to using
>>>>>>>>> those
>>>>>>>>> packages.
>>>>>>>>>
>>>>>>>>> We need to be able to manipulate these messages in HRT -- is there
>>>>>>>>> anything else we're missing that we need to do in order to make
>>>>>>>>> this
>>>>>>>>> work?
>>>>>>>>>
>>>>>>>>> Thank you for any help.
>>>>>>>>>
>>>>>>>>> Thank You,
>>>>>>>>> Johnathan Van Why
>>>>>>>>> Dynamic Robotics Laboratory
>>>>>>>>> Oregon State University
>>>>>>>
>>>>>>>
>>>>>>>> "Writing ROS messages around": does that mean you use the ROS
>>>>>>>> message
>>>>>>>> data
>>>>>>>> stucture as shared variables (fine!), or that you use the normal
>>>>>>>> TCP/IP
>>>>>>>> communication approach (not so fine!)?
>>>>>>>> The latter would be a bit strange in a single-threaded context,
>>>>>>>> isn't
>>>>>>>> it?
>>>>>>>
>>>>>>>
>>>>>>> We're using them primarily as local variables (within one thread).
>>>>>>
>>>>>> Ok, this does not involve any latency, good!
>>>>>>
>>>>>>> They are being transmitted to ROS by Orocos's RTT-ROS integration,
>>>>>>> but
>>>>>>> they first leave our components via Orocos ports.
>>>>>>
>>>>>>
>>>>>> This does introduce latency, but this is inevitable. But you
>>>>>> experience
>>>>>> problems with the realtime allocation, if I got that right? Isn't it
>>>>>> possible to allocate a maximum buffer size at the configuration time
>>>>>> of
>>>>>> your application, so that you don't have to use runtime allocation?
>>>>>> This
>>>>>> is
>>>>>> one of the standard guidelines for hard realtime programming.
>>>
>>> We set a buffered connection policy with a buffer of 10,000 messages.
>>> However, these messages contain strings and, occasionally, vectors,
>>> because ROS put a string into their "Header" message type (which is
>>> used for timestamps). As a result, these strings and vectors still
>>> need to be grown at runtime, since we don't know how large they can
>>> be.
>>>
>>> I had not previously been setting data samples for my ports since I
>>> did not believe it to be necessary with the realtime allocator.
>>> However, I just tried setting a data sample, and it appears that my
>>> latencies have been decreased. Is it necessary to set a data sample
>>> even if I'm using a buffered connection and the realtime allocator?
>
> I take back this statement. I must have just gotten lucky; overnight,
> I ran a test and saw a 26ms latency. The line that took 26ms was the
> one that writes to a port.

Ouch ....

>>> If so, I can set samples, although I'll have to guess on a size
>>> because I won't always know a large message will be produced until it
>>> is actually sent (and tell users not to transmit messages larger than
>>> that).
>>
>>
>> This is a very important thing to do in RT contexts. I'm quite sure you
>> can't prove any realtiness if everything is not bounded ;p
>> That's the reason for the existence of setSample() function. You do your
>> memory allocation once for all during non critical phasis (like init), and
>> then you more or less have to live with it. It is a (implicit) bounding of
>> your memory.
>>
>>
>> Yes, and no. As long as you've pre-faulted the real-time memory buffer, then
>> TLSF guarantees real-time access to the memory pool. So you _should_ not
>> need to do setSample() up front (though I think it's still a good idea,
>> where possible). We don't pre-fault on our system, but our buffer is only
>> 20MB. Johnathon said they're using 250MB, which is really large. I suspect
>> you need to pre-fault the entire pool first (see the Linux RT wiki for
>> why/how).
>
> How do you pre-fault the pool? I'm pre-faulting the stack and locking
> memory (both steps I found on the wiki), but don't know how to
> pre-fault the entire pool. Where on the wiki is this described?

It's not described, as I don't think anyone has ever mentioned doing it. You'd have to get hold of the memory that TLSF is _about_ to use for the memory pool, and pre-fault all of that, _before_ TLSF writes anything in there. Or you might be able to get hold of the pool and just read it all - I don't recall the specifics of what the pre-faulting requires.

HTH
S

Required steps to use TLSF in hard realtime?

On Mon, 20 May 2013, Johnathan Van Why wrote:

> We are writing a new hard realtime, Orocos-based software system, and
> are trying to get hard realtime performance under preempt_rt patched
> Linux kernels.
>
> In our realtime sections, we are passing around ROS messages that
> contain strings and vectors. In order to make this HRT-safe, we are
> setting the allocator to RTT::os::rt_allocator<uint8_t>. However, we
> are missing our deadlines during copies of the data structures as well
> as during writing to ports.
>
> We are manually installing Orocos, which seems to default
> OS_RT_MALLOC_SBRK and OS_RT_MALLOC_MMAP to enabled, which is warned
> against in some Orocos documentation. Should these be turned off? I
> suspect that if the memory pool needs to be grown, then disabling the
> growing functionality will lead to a bad_alloc exception. Is this the
> case?
>
> If these settings need to be disabled, are they disabled in the Ubuntu
> packages in the repositories? If so, we'll just switch to using those
> packages.
>
> We need to be able to manipulate these messages in HRT -- is there
> anything else we're missing that we need to do in order to make this
> work?

The first thing to consider is whether you indeed need inter-component
messaging... Remember that _only one single_ process/thread can really be
guaranteed to be hard realtime, so you could consider replacing a
component-based design (which typically involves lots of context switching)
with a single-threaded design using shared memory. Hard realtime and
communicating components is not really a match made in heaven, you know...

> Thank you for any help.
>
> Thank You,
> Johnathan Van Why
> Dynamic Robotics Laboratory
> Oregon State University

Herman

Required steps to use TLSF in hard realtime?

On Mon, May 20, 2013 at 9:26 PM, Herman Bruyninckx
<Herman [dot] Bruyninckx [..] ...> wrote:
> On Mon, 20 May 2013, Johnathan Van Why wrote:
>
>> We are writing a new hard realtime, Orocos-based software system, and
>> are trying to get hard realtime performance under preempt_rt patched
>> Linux kernels.
>>
>> In our realtime sections, we are passing around ROS messages that
>> contain strings and vectors. In order to make this HRT-safe, we are
>> setting the allocator to RTT::os::rt_allocator<uint8_t>. However, we
>> are missing our deadlines during copies of the data structures as well
>> as during writing to ports.
>>
>> We are manually installing Orocos, which seems to default
>> OS_RT_MALLOC_SBRK and OS_RT_MALLOC_MMAP to enabled, which is warned
>> against in some Orocos documentation. Should these be turned off? I
>> suspect that if the memory pool needs to be grown, then disabling the
>> growing functionality will lead to a bad_alloc exception. Is this the
>> case?
>>
>> If these settings need to be disabled, are they disabled in the Ubuntu
>> packages in the repositories? If so, we'll just switch to using those
>> packages.
>>
>> We need to be able to manipulate these messages in HRT -- is there
>> anything else we're missing that we need to do in order to make this
>> work?
>
>
> The first thing to consider is whether you indeed need inter-component
> messaging... Remember that _only one single_ process/thread can really be
> guaranteed to be hard realtime, so you could consider replacing a
> component-based design (which typically involves lots of context switching)
> with a single-threaded design using shared memory. Hard realtime and
> communicating components is not really a match made in heaven, you know...

We only have one HRT thread. That thread internally copies a few ROS
messages around (we try to minimize this, but it is occasionally
necessary and currently non-deterministic) and writes to ports (for
logging purposes, -- also non-deterministic right now).

Thank You,
Johnathan Van Why
Dynamic Robotics Laboratory
Oregon State University

>> Thank you for any help.
>>
>> Thank You,
>> Johnathan Van Why
>> Dynamic Robotics Laboratory
>> Oregon State University
>
>
> Herman

Required steps to use TLSF in hard realtime?

On Tue, 21 May 2013, Johnathan Van Why wrote:

> On Mon, May 20, 2013 at 9:26 PM, Herman Bruyninckx
> <Herman [dot] Bruyninckx [..] ...> wrote:
>> On Mon, 20 May 2013, Johnathan Van Why wrote:
>>
>>> We are writing a new hard realtime, Orocos-based software system, and
>>> are trying to get hard realtime performance under preempt_rt patched
>>> Linux kernels.
>>>
>>> In our realtime sections, we are passing around ROS messages that
>>> contain strings and vectors. In order to make this HRT-safe, we are
>>> setting the allocator to RTT::os::rt_allocator<uint8_t>. However, we
>>> are missing our deadlines during copies of the data structures as well
>>> as during writing to ports.
>>>
>>> We are manually installing Orocos, which seems to default
>>> OS_RT_MALLOC_SBRK and OS_RT_MALLOC_MMAP to enabled, which is warned
>>> against in some Orocos documentation. Should these be turned off? I
>>> suspect that if the memory pool needs to be grown, then disabling the
>>> growing functionality will lead to a bad_alloc exception. Is this the
>>> case?
>>>
>>> If these settings need to be disabled, are they disabled in the Ubuntu
>>> packages in the repositories? If so, we'll just switch to using those
>>> packages.
>>>
>>> We need to be able to manipulate these messages in HRT -- is there
>>> anything else we're missing that we need to do in order to make this
>>> work?
>>
>>
>> The first thing to consider is whether you indeed need inter-component
>> messaging... Remember that _only one single_ process/thread can really be
>> guaranteed to be hard realtime, so you could consider replacing a
>> component-based design (which typically involves lots of context switching)
>> with a single-threaded design using shared memory. Hard realtime and
>> communicating components is not really a match made in heaven, you know...
>
> We only have one HRT thread. That thread internally copies a few ROS
> messages around (we try to minimize this, but it is occasionally
> necessary and currently non-deterministic)

"Writing ROS messages around": does that mean you use the ROS message data
stucture as shared variables (fine!), or that you use the normal TCP/IP
communication approach (not so fine!)?
The latter would be a bit strange in a single-threaded context, isn't it?

I am very curious to learn why exactly ROS messaging is necessary;
introducing an inherently non realtime safe aspect in a hard realtime
design is asking for troubles.

> and writes to ports (for
> logging purposes, -- also non-deterministic right now).
>
> Thank You,
> Johnathan Van Why
> Dynamic Robotics Laboratory
> Oregon State University
>
>>> Thank you for any help.
>>>
>>> Thank You,
>>> Johnathan Van Why
>>> Dynamic Robotics Laboratory
>>> Oregon State University
>>
>>
>> Herman
>

--
University of Leuven, Mechanical Engineering, Robotics Research Group
<http://people.mech.kuleuven.be/~bruyninc> Tel: +32 16 328056
Vice-President Research euRobotics AISBL <http://www.eu-robotics.net>
Open RObot COntrol Software <http://www.orocos.org>
Associate Editor JOSER <http://www.joser.org>