Data Flow 2.0 Example

Submitted by sspr on Wed, 2009-08-19 12:45

Orocos-users

This mail is to inform you of my impressions of the new data flow
framework. First of all, the concept behind the new structure is given
in the wiki: http://www.orocos.org/wiki/rtt/rtt-2.0/dataflow

The idea is that *outputs* are send and forget, while *inputs* specify
a 'policy': e.g. 'I want to read all samples -> so buffer the input'
or: ' I want lock-based protection' or:...
The policy is specified in a 'ConnPolicy' object which you can give to
the input, to use as a default, or override during the connection of
the ports during deployment.

This is the basic use case of the new code:

#incude <rtt/Port.hpp>
using namespace RTT;
 
// Component A:
OutputPort<double> a_output("MyOutput");
//...
double x = ...;
a_output.write( x );
 
// Component B buffers data produced by A (default buf size==20):
bool init_connection = true; // read last written value after connection
bool pull = true;                   // fetch data directly from output
port during read
InputPort<double> b_input("MyInput", internal::ConnPolicy::buffer(20,
internal::ConnPolicy::LOCK_FREE, init_connection, pull));
//...
double x;
while ( b_input.read( x ) {
   // process sample x...
} else {
   // buffer empty
}
 
// Component C gets the most recent data produced by A:
bool init_connection = true; // read last written value after connection
bool pull = true;                   // fetch data directly from output
port during read
InputPort<double> c_input("MyInput",
internal::ConnPolicy::data(internal::ConnPolicy::LOCK_FREE,
init_connection, pull));
//...
double x;
if ( c_input.read( x ) {
   // use last value of x...
} else {
  // no new data
}
 
// Finally connect some ports. The order/direction of connecting does
not matter anymore,
// it will always do as expected !
a_output.connectTo( b_input ); // or: b_input.connectTo( a_output );
a_output.connectTo( c_input ); // or other way around
 
//Change buffer size for B by giving a policy during connectTo:
b_input.disconnect();
b_input.connectTo( a_output, internal::ConnPolicy::buffer(20,
internal::ConnPolicy::LOCK_FREE, init_connection, pull));

Note: ConnPolicy will probably move to RTT or RTT::base.

Since each InputPort takes a default policy (which is type = DATA,
lock_policy = LOCK_FREE, init=false, pull=false) we can keep using the
old DeploymentComponent + XML scripts. The 'only' addition necessary
is to extend the XML elements such that a connection policy can be
defined in addition, to override the default. I propose to update the
deployment manual such that the connection semantics are clearer. What
we call now a 'connection', I would propose to call a 'Topic',
analogous to ROS. So you'd define a OutputPort -> Topic and Topic ->
InputPort mapping in your XML file. We could easily generalize Topic
to also allow for port names, such that in simple setups, you just set
OutputPort -> InputPort. I'm not even proposing a new XML format here,
because when we write:

RTT 1.0, from DeploymentComponent example:

    <struct name="Ports"  type="PropertyBag">
      <simple name="a_output_port"
type="string"><value>AConnection</value></simple>
      <simple name="a_input_port"
type="string"><value>BConnection</value></simple>
    </struct>

We actually mean to write (what's in a name):
RTT 2.0

    <struct name="Topics"  type="PropertyBag">
      <simple name="a_output_port" type="string"><value>ATopic</value></simple>
      <simple name="b_input_port" type="string"><value>BTopic</value></simple>
    </struct>

In 2.0, you need to take care that exactly one port writes the given
topic (so one OutputPort) and the other ports are all InputPorts. If
this is not correct, the deployer refuses to setup the connections.

So far deployement. The whole mechanism is reported to work
transparantly over CORBA, but I still need to test that statement
personally though.

As before, the receiver can subscribe an RTT::Event to receive
notifications when a new data sample is ready. The scripting interface
of ports are only 'bool read( sample )' or 'bool write(sample)'.

Peter

PROPOSAL: multi-output/single-input behaviour [was: Data Flow 2.

Submitted by Sylvain Joyeux on Fri, 2009-08-21 08:10.

OK, my brain probably worked while it was supposed to sleep, so I actually
*do* have some kind of a solution.

Caveat: it is a limited solution, i.e. it will be in the category of "for
power users that know what they are doing".

Right now, what happens is that we create one channel per port-to-port
connections. That has the tremendous advantage of simplifying the
implementation, reducing the number of assumptions (in particular, there is at
most one writer and one reader per data channel), while keeping the door open
for some optimizations.

I don't want to break that model. During our discussions, it actually showed
to be successful in solving some of the problems Peter had (like: how many
concurrent threads can access a lock-free data structure in a data flow
connection ? Always 2 !)

Now, it is actually possible to have a MO/SI model, by allowing an input port
to have multiple incoming channels, and having InputPort::read round-robin on
those channels. As an added nicety, one can listen to the "new data" event and
access only the port for which we have an indication that new data can be
available. Implementing that would require very little added code, since the
management of multiple channels is already present in OutputPort.

However, that is not highly-generic: as I already stated, the generic
implementation would require the set up of a policy to manage the
multiplexing. Now, it actually offers the people "in the know" with a simple
way of doing MO/SI. More complex scheme would still have to rely on a
multiplexing component.

What are your thoughts ? Would such a behaviour be acceptable if flagged as
"advanced, use at your own risks" ?

NB: I won't have time to switch to Peter's RTT 2.0 branch soon, so either he
will have to do it, or you will have to wait ;-)

PROPOSAL: multi-output/single-input behaviour [was: Data Flow 2.

Submitted by peter on Sun, 2009-08-23 13:20.

On Fri, Aug 21, 2009 at 10:07, Sylvain Joyeux<sylvain [dot] joyeux [..] ...> wrote:
> OK, my brain probably worked while it was supposed to sleep, so I actually
> *do* have some kind of a solution.
>
> Caveat: it is a limited solution, i.e. it will be in the category of "for
> power users that know what they are doing".
>
> Right now, what happens is that we create one channel per port-to-port
> connections. That has the tremendous advantage of simplifying the
> implementation, reducing the number of assumptions (in particular, there is at
> most one writer and one reader per data channel), while keeping the door open
> for some optimizations.
>
> I don't want to break that model. During our discussions, it actually showed
> to be successful in solving some of the problems Peter had (like: how many
> concurrent threads can access a lock-free data structure in a data flow
> connection ? Always 2 !)

Ack.

>
> Now, it is actually possible to have a MO/SI model, by allowing an input port
> to have multiple incoming channels, and having InputPort::read round-robin on
> those channels. As an added nicety, one can listen to the "new data" event and
> access only the port for which we have an indication that new data can be
> available. Implementing that would require very little added code, since the
> management of multiple channels is already present in OutputPort.

Well, a 'polling' + keeping a pointer to the last read channel, such
that we try that one always first, and only start polling if that
channel turned out 'empty'. This empty detection is problematic for
shared data connections (opposed to buffered), because once they are
written, they always show a valid value. We might need to add that
once an input port is read, the sample is consumed, and a next read
will return false (no new data).

>
> However, that is not highly-generic: as I already stated, the generic
> implementation would require the set up of a policy to manage the
> multiplexing. Now, it actually offers the people "in the know" with a simple
> way of doing MO/SI. More complex scheme would still have to rely on a
> multiplexing component.

We all agree here.

>
> What are your thoughts ? Would such a behaviour be acceptable if flagged as
> "advanced, use at your own risks" ?

I like it. It keeps the simplicity/robustness of the data flow
implementation, while offering the backwards compatibility. Even more,
the use case is so common, that I'm reluctant to force complexity in
each application by forcing the user to add another component in the
dataflow each time this occurs. I know that the purists already hit
their reply button to tell me that all policy should be decided in
components and not in the infrastructure, but unfortunately for them,
I take the word of a single user higher than the decision of a
committee. If you want something else than the d-f does, you'll have
to start adding components, and I'm sure they will be contributed once
there's a need for them.

Peter

PROPOSAL: multi-output/single-input behaviour [was: Data Flow 2.

Submitted by markus.klotzbuecher on Mon, 2009-08-24 09:20.

On Sun, Aug 23, 2009 at 03:13:19PM +0200, Peter Soetens wrote:

> > Now, it is actually possible to have a MO/SI model, by allowing an input port
> > to have multiple incoming channels, and having InputPort::read round-robin on
> > those channels. As an added nicety, one can listen to the "new data" event and
> > access only the port for which we have an indication that new data can be
> > available. Implementing that would require very little added code, since the
> > management of multiple channels is already present in OutputPort.
>
> Well, a 'polling' + keeping a pointer to the last read channel, such
> that we try that one always first, and only start polling if that
> channel turned out 'empty'. This empty detection is problematic for
> shared data connections (opposed to buffered), because once they are
> written, they always show a valid value. We might need to add that
> once an input port is read, the sample is consumed, and a next read
> will return false (no new data).

Is there really a usecase for multiple incoming but unbuffered
connections? It seems to me that the result would be quite arbitrary.

Regards
Markus

PROPOSAL: multi-output/single-input behaviour [was: Data Flow 2.

Submitted by markus.klotzbuecher on Mon, 2009-08-24 09:35.

On Mon, Aug 24, 2009 at 11:19:58AM +0200, Markus Klotzbuecher wrote:
> On Sun, Aug 23, 2009 at 03:13:19PM +0200, Peter Soetens wrote:
>
> > > Now, it is actually possible to have a MO/SI model, by allowing an input port
> > > to have multiple incoming channels, and having InputPort::read round-robin on
> > > those channels. As an added nicety, one can listen to the "new data" event and
> > > access only the port for which we have an indication that new data can be
> > > available. Implementing that would require very little added code, since the
> > > management of multiple channels is already present in OutputPort.
> >
> > Well, a 'polling' + keeping a pointer to the last read channel, such
> > that we try that one always first, and only start polling if that
> > channel turned out 'empty'. This empty detection is problematic for
> > shared data connections (opposed to buffered), because once they are
> > written, they always show a valid value. We might need to add that
> > once an input port is read, the sample is consumed, and a next read
> > will return false (no new data).
>
> Is there really a usecase for multiple incoming but unbuffered
> connections? It seems to me that the result would be quite arbitrary.

Of course there is. If you think at a more broader scope there could
be a coordination component controlling the individual components such
that the results are not arbitrary at all.

In fact this is a good example of explicit vs. implicit coordination.

Regards
Markus

PROPOSAL: multi-output/single-input behaviour [was: Data Flow 2.

Submitted by snrkiwi on Wed, 2009-08-26 02:20.

On Aug 24, 2009, at 05:33 , Markus Klotzbuecher wrote:

> On Mon, Aug 24, 2009 at 11:19:58AM +0200, Markus Klotzbuecher wrote:
>> On Sun, Aug 23, 2009 at 03:13:19PM +0200, Peter Soetens wrote:
>>
>>>> Now, it is actually possible to have a MO/SI model, by allowing
>>>> an input port
>>>> to have multiple incoming channels, and having InputPort::read
>>>> round-robin on
>>>> those channels. As an added nicety, one can listen to the "new
>>>> data" event and
>>>> access only the port for which we have an indication that new
>>>> data can be
>>>> available. Implementing that would require very little added
>>>> code, since the
>>>> management of multiple channels is already present in OutputPort.
>>>
>>> Well, a 'polling' + keeping a pointer to the last read channel, such
>>> that we try that one always first, and only start polling if that
>>> channel turned out 'empty'. This empty detection is problematic for
>>> shared data connections (opposed to buffered), because once they
>>> are
>>> written, they always show a valid value. We might need to add that
>>> once an input port is read, the sample is consumed, and a next read
>>> will return false (no new data).
>>
>> Is there really a usecase for multiple incoming but unbuffered
>> connections? It seems to me that the result would be quite arbitrary.
>
> Of course there is. If you think at a more broader scope there could
> be a coordination component controlling the individual components such
> that the results are not arbitrary at all.
>
> In fact this is a good example of explicit vs. implicit coordination.

This is _exactly_ the situation we have in our projects. Multiple
components with unbuffered output connections, to a single input
connection on another component. A coordination component ensures that
only one of the input components is running at a time, but they are
all connected.

Here, we want the latest data value available. No more, no less.

Otherwise, Markus is correct. Having more than one input component
running simultaneously would be arbitrary and give nonsense output data.

Stephen

PROPOSAL: multi-output/single-input behaviour [was: Data Flow 2.

Submitted by bruyninc on Wed, 2009-08-26 02:45.

On Mon, 24 Aug 2009, S Roderick wrote:

> On Aug 24, 2009, at 05:33 , Markus Klotzbuecher wrote:
>
>> On Mon, Aug 24, 2009 at 11:19:58AM +0200, Markus Klotzbuecher wrote:
>>> On Sun, Aug 23, 2009 at 03:13:19PM +0200, Peter Soetens wrote:
>>>
>>>>> Now, it is actually possible to have a MO/SI model, by allowing
>>>>> an input port
>>>>> to have multiple incoming channels, and having InputPort::read
>>>>> round-robin on
>>>>> those channels. As an added nicety, one can listen to the "new
>>>>> data" event and
>>>>> access only the port for which we have an indication that new
>>>>> data can be
>>>>> available. Implementing that would require very little added
>>>>> code, since the
>>>>> management of multiple channels is already present in OutputPort.
>>>>
>>>> Well, a 'polling' + keeping a pointer to the last read channel, such
>>>> that we try that one always first, and only start polling if that
>>>> channel turned out 'empty'. This empty detection is problematic for
>>>> shared data connections (opposed to buffered), because once they
>>>> are
>>>> written, they always show a valid value. We might need to add that
>>>> once an input port is read, the sample is consumed, and a next read
>>>> will return false (no new data).
>>>
>>> Is there really a usecase for multiple incoming but unbuffered
>>> connections? It seems to me that the result would be quite arbitrary.
>>
>> Of course there is. If you think at a more broader scope there could
>> be a coordination component controlling the individual components such
>> that the results are not arbitrary at all.
>>
>> In fact this is a good example of explicit vs. implicit coordination.
>
> This is _exactly_ the situation we have in our projects. Multiple
> components with unbuffered output connections, to a single input
> connection on another component. A coordination component ensures that
> only one of the input components is running at a time, but they are
> all connected.
>
> Here, we want the latest data value available. No more, no less.
>
> Otherwise, Markus is correct. Having more than one input component
> running simultaneously would be arbitrary and give nonsense output data.

Indeed... So, the conclusion I draw from this (sub)discussion is the
following: the _coordinated_ multi-writer use case is so special that it
does not deserve its own feature in the Data Ports part of RTT. (The
Coordinator will (have to) know about all its "data providers", and
make/delete the connections to them explicitly. So, there is no need to
"help him out" by this specific data port policy implementation.)

Herman

PROPOSAL: multi-output/single-input behaviour [was: Data Flow 2.

Submitted by Sylvain Joyeux on Sun, 2009-08-23 14:45.

On Sun, Aug 23, 2009 at 03:13:19PM +0200, Peter Soetens wrote:
> > Now, it is actually possible to have a MO/SI model, by allowing an input port
> > to have multiple incoming channels, and having InputPort::read round-robin on
> > those channels. As an added nicety, one can listen to the "new data" event and
> > access only the port for which we have an indication that new data can be
> > available. Implementing that would require very little added code, since the
> > management of multiple channels is already present in OutputPort.
>
> Well, a 'polling' + keeping a pointer to the last read channel, such
> that we try that one always first, and only start polling if that
> channel turned out 'empty'. This empty detection is problematic for
> shared data connections (opposed to buffered), because once they are
> written, they always show a valid value. We might need to add that
> once an input port is read, the sample is consumed, and a next read
> will return false (no new data).

I don't like the idea of read() returning false on an already initialized data
connection. If you want a connection telling you if it has been written since
last read(), use a buffer. Maybe having read() return a tri-state: NO_SAMPLE,
UPDATED_SAMPLE, OLD_SAMPLE with NO_SAMPLE being false ?

Sylvain

PROPOSAL: multi-output/single-input behaviour [was: Data Flow 2.

Submitted by bruyninc on Tue, 2009-08-25 22:05.

On Sun, 23 Aug 2009, Sylvain Joyeux wrote:

> On Sun, Aug 23, 2009 at 03:13:19PM +0200, Peter Soetens wrote:
>>> Now, it is actually possible to have a MO/SI model, by allowing an input port
>>> to have multiple incoming channels, and having InputPort::read round-robin on
>>> those channels. As an added nicety, one can listen to the "new data" event and
>>> access only the port for which we have an indication that new data can be
>>> available. Implementing that would require very little added code, since the
>>> management of multiple channels is already present in OutputPort.
>>
>> Well, a 'polling' + keeping a pointer to the last read channel, such
>> that we try that one always first, and only start polling if that
>> channel turned out 'empty'. This empty detection is problematic for
>> shared data connections (opposed to buffered), because once they are
>> written, they always show a valid value. We might need to add that
>> once an input port is read, the sample is consumed, and a next read
>> will return false (no new data).
>
> I don't like the idea of read() returning false on an already initialized data
> connection. If you want a connection telling you if it has been written since
> last read(), use a buffer. Maybe having read() return a tri-state: NO_SAMPLE,
> UPDATED_SAMPLE, OLD_SAMPLE with NO_SAMPLE being false ?

I think RTT should only provide the simplest and easiest-to-implement
policy: each reader gets the last value that was written, and new writes
overwrite that value.

More complex policies belong to dedicated port components, each providing
one (or more) of those policies.

Herman

PROPOSAL: multi-output/single-input behaviour [was: Data Flow 2.

Submitted by bruyninc on Fri, 2009-08-21 08:25.

On Fri, 21 Aug 2009, Sylvain Joyeux wrote:

> OK, my brain probably worked while it was supposed to sleep, so I actually
> *do* have some kind of a solution.
>
> Caveat: it is a limited solution, i.e. it will be in the category of "for
> power users that know what they are doing".

...as is all of RTT, isn't it? :-)

> Right now, what happens is that we create one channel per port-to-port
> connections. That has the tremendous advantage of simplifying the
> implementation, reducing the number of assumptions (in particular, there is at
> most one writer and one reader per data channel), while keeping the door open
> for some optimizations.
>
> I don't want to break that model. During our discussions, it actually showed
> to be successful in solving some of the problems Peter had (like: how many
> concurrent threads can access a lock-free data structure in a data flow
> connection ? Always 2 !)

I agree with this approach. RTT should offer the simplest, most
deterministic and realtime-ready version. Any additional complexity is the
responsibility of: (i) external middleware that is made RTT-interoperable
in one way or another, or (ii) specialized communication _Components_.

> Now, it is actually possible to have a MO/SI model, by allowing an input port
> to have multiple incoming channels, and having InputPort::read round-robin on
> those channels.

Why do you suggest this particular round robin "scheduling" policty?

> As an added nicety, one can listen to the "new data" event and
> access only the port for which we have an indication that new data can be
> available.
I think this event-driven reading is not an "added nicety", but more
fundamental than round-robin scheduling of reads.

> Implementing that would require very little added code, since the
> management of multiple channels is already present in OutputPort.

> However, that is not highly-generic: as I already stated, the generic
> implementation would require the set up of a policy to manage the
> multiplexing. Now, it actually offers the people "in the know" with a simple
> way of doing MO/SI. More complex scheme would still have to rely on a
> multiplexing component.

Agreed. But Round Robin _is_ already a multiplexing policy.

Herman

> What are your thoughts ? Would such a behaviour be acceptable if flagged as
> "advanced, use at your own risks" ?
>
> NB: I won't have time to switch to Peter's RTT 2.0 branch soon, so either he
> will have to do it, or you will have to wait ;-)

PROPOSAL: multi-output/single-input behaviour [was: Data Flow 2.

Submitted by Sylvain Joyeux on Fri, 2009-08-21 08:40.

> > As an added nicety, one can listen to the "new data" event and
> > access only the port for which we have an indication that new data can be
> > available.
>
> I think this event-driven reading is not an "added nicety", but more
> fundamental than round-robin scheduling of reads.

Wrong terminology, sorry. It is not a round-robin, but it is basically trying
each incoming channel one after the other, starting from the first, until one
has data. It is not a "proposed strategy", it is basically the only one that
can be implemented without bloating the data flow implementation.

The only "selectable policies" we could integrate would deal with the order in
which the ports are read(), i.e. round-robin, trying first last port which had
data, ... Reading the *samples* in a FIFO manner could not be done for
instance.

PROPOSAL: multi-output/single-input behaviour [was: Data Flow 2.

Submitted by bruyninc on Fri, 2009-08-21 09:05.

On Fri, 21 Aug 2009, Sylvain Joyeux wrote:

>>> As an added nicety, one can listen to the "new data" event and
>>> access only the port for which we have an indication that new data can be
>>> available.
>>
>> I think this event-driven reading is not an "added nicety", but more
>> fundamental than round-robin scheduling of reads.
>
> Wrong terminology, sorry. It is not a round-robin, but it is basically trying
> each incoming channel one after the other, starting from the first, until one
> has data.

Ok, it's _polling_ versus _events_, which is indeed the classical trade-off
question :-) But RTT should not make this trade-off and offer _both_ (but
not more!), since both have lots of use cases in the scope of Orocos!

> It is not a "proposed strategy", it is basically the only one that
> can be implemented without bloating the data flow implementation.
I agree.

> The only "selectable policies" we could integrate would deal with the order in
> which the ports are read(), i.e. round-robin, trying first last port which had
> data, ... Reading the *samples* in a FIFO manner could not be done for
> instance.
I fully agree.

Herman

Re: Data Flow 2.0 Example

Submitted by vri on Thu, 2009-08-20 12:06.

sspr wrote:

This mail is to inform you of my impressions of the new data flow
framework. First of all, the concept behind the new structure is given
in the wiki: http://www.orocos.org/wiki/rtt/rtt-2.0/dataflow

cut...

Peter

Thank you, Peter and Sylvian. I'll try to do some tests as soon as I can.

I believe that some kind of multiplexers are indeed a very good solution for making many-to-one connections. Being a person with a "signals" point-of-view, I have always preferred explicit components for merging outputs, hence we have always used 1.0 in that way.
My idea has been to add an optional feature to ports to make them cloneable (or duplicatable): initially, such a port is not instantiated, but every time that one makes a new connection involving that port, a new instance of it is created. People knowing Simulink or 20-sim will recognize it from e.g. a summer.

If I understand correctly, one-to-many connections are still possible in 2.0, right??

I understand and respect the decision to make outputs "send and forget". However, for our systems this seems inconvenient: one output is typically connected to many inputs that do not buffer, and hence the data is copied many times. I haven't checked the code, but would it be doable to implement a policy for outputs to hold data and for unbuffered inputs to get data from a connected output?

Finally: one more vote _against_ "Topic"; I think we have a very nice signal-oriented data flow now, and "Connection" is fine. Why should it change?? The fact that ROS uses Topic is really of no interest for me; I find Topic counter-intuitive, and Connection very much to the point.

Cheers, Theo.