Mixture of Gaussian implementation

Hi all,

With this mail I want to start a discussion on how to implement a mixture of
Gaussians in BFL.

I would propose to make a mixtureGaussian class as a daughter of discretePdf.
Then, some functions need be reimplemented: e.g. the sampleFrom function.
In this sampleFrom function first, the sampleFrom function of discretePdf
would be called (discretePdf::sampleFrom(...)), and next, using the obtained
sample, a sample would be drawn from the according Gaussian.

A special need for mixture of Gaussians is the ability to change the number of
components (addComponent, deleteComponent). These functions would need the
function DimensionSet and therefore this discussion is linked to bug #463.

Eager to hear you opinion,

Tinne

_______________________________________________
I hereby promise not to top-post on the
BFL mailing list
BFL [..] ...
http://lists.mech.kuleuven.be/mailman/listinfo/bfl

Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm

Mixture of Gaussian implementation

On Dec 7, 2007 10:09 AM, Tinne De Laet <tinne [dot] delaet [..] ...> wrote:
> With this mail I want to start a discussion on how to implement a mixture of
> Gaussians in BFL.
>
> I would propose to make a mixtureGaussian class as a daughter of discretePdf.
> Then, some functions need be reimplemented: e.g. the sampleFrom function.
> In this sampleFrom function first, the sampleFrom function of discretePdf
> would be called (discretePdf::sampleFrom(...)), and next, using the obtained
> sample, a sample would be drawn from the according Gaussian.

What's the advantage of heritage compared to creating a class that
herits from pdf and has a pointer to a discretePdf and a
vector of size "discretePdf".dimensionGet();

> A special need for mixture of Gaussians is the ability to change the number of
> components (addComponent, deleteComponent). These functions would need the
> function DimensionSet and therefore this discussion is linked to bug #463.

This had somewhat to do with the definition of "dimension". The
dimension attribute was originally added (at the time bfl did not yet
support discrete and hybrid pdfs) to describe the size of the
continuous argument of BFL.

When adding discrete, I think I must have somehow decided, "well,
let's leave the dimension in the code of Pdf and give it that meaning
in the case of a discrete pdf."

Now it seems to me that this wasn't the best choice after all?

Concerning the mixture of gaussians: as addComponent and
deleteComponent are non-realtime operations anyway, one might consider
deleting the discretepdf and creating a new one afterwards too as a
solution that doesn't need the DimensionSet() call.

Klaas
_______________________________________________
I hereby promise not to top-post on the
BFL mailing list
BFL [..] ...
http://lists.mech.kuleuven.be/mailman/listinfo/bfl

Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm

Mixture of Gaussian implementation

> > I would propose to make a mixtureGaussian class as a daughter of
> > discretePdf. Then, some functions need be reimplemented: e.g. the
> > sampleFrom function. In this sampleFrom function first, the sampleFrom
> > function of discretePdf would be called (discretePdf::sampleFrom(...)),
> > and next, using the obtained sample, a sample would be drawn from the
> > according Gaussian.
>
> What's the advantage of heritage compared to creating a class that
> herits from pdf and has a pointer to a discretePdf and a
> vector of size "discretePdf".dimensionGet();
That was indeed another option.
In my thought however, the mixture of Gaussians is a special case of a
discretePdf. (this is open for discussion however)
I believe it could inherit some functions of the discretePdf and therefore as
a daughter class some double implementation could be prevented.

> > A special need for mixture of Gaussians is the ability to change the
> > number of components (addComponent, deleteComponent). These functions
> > would need the function DimensionSet and therefore this discussion is
> > linked to bug #463.
>
> This had somewhat to do with the definition of "dimension". The
> dimension attribute was originally added (at the time bfl did not yet
> support discrete and hybrid pdfs) to describe the size of the
> continuous argument of BFL.
>
> When adding discrete, I think I must have somehow decided, "well,
> let's leave the dimension in the code of Pdf and give it that meaning
> in the case of a discrete pdf."
>
> Now it seems to me that this wasn't the best choice after all?
Mmm, I'm not very sure yet.
In a Gaussian it represents the dimension of the state (and therefore of the
mean and the covariance....).
If you consider the discretePdf as a 1-of-K representation (i.e. a
representation where one of the K components, corresponding to the actual
state is 1 and all other zero (e.g. 10 0 , 0 1 0, 0 0 1)), the dimension of a
discretepdf has the same meaning as the dimension of a Gaussian, i.e. the
dimension of the state.
Using this thought, I believe the same reasons for using dimension in case of
a Gaussian should hold for using dimension in a discretePdf.
Could you therefor clarify why you introduce a dimension for the Gaussian?

> Concerning the mixture of gaussians: as addComponent and
> deleteComponent are non-realtime operations anyway, one might consider
> deleting the discretepdf and creating a new one afterwards too as a
> solution that doesn't need the DimensionSet() call.
I also considered this solution. In my humble opinion this can however result
in a lot of copying which can get quite time consuming (you have to copy all
the Gaussians .... )

Tinne

Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm

_______________________________________________
I hereby promise not to top-post on the
BFL mailing list
BFL [..] ...
http://lists.mech.kuleuven.be/mailman/listinfo/bfl

Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm

Mixture of Gaussian implementation

On Fri, 2007-12-07 at 11:20 +0100, Tinne De Laet wrote:
> > > I would propose to make a mixtureGaussian class as a daughter of
> > > discretePdf. Then, some functions need be reimplemented: e.g. the
> > > sampleFrom function. In this sampleFrom function first, the sampleFrom
> > > function of discretePdf would be called (discretePdf::sampleFrom(...)),
> > > and next, using the obtained sample, a sample would be drawn from the
> > > according Gaussian.
> >
> > What's the advantage of heritage compared to creating a class that
> > herits from pdf and has a pointer to a discretePdf and a
> > vector of size "discretePdf".dimensionGet();
> That was indeed another option.
> In my thought however, the mixture of Gaussians is a special case of a
> discretePdf. (this is open for discussion however)
> I believe it could inherit some functions of the discretePdf and therefore as
> a daughter class some double implementation could be prevented.

I don't think that using a pdf will lead to double
implementation. I think that most function have to be reimplemented:
getProbability, sampleFrom, ...

I also think that inheriting form a discretePdf will cause a lot of
confusion:
1) What is the meaning of the dimension of a MoG?
If the class MoG inherit from a discretePdf, then it means the # of
gaussians, but I would expect it to be the dimension of the space it can
represent.

2) What is the meaning of of probabilitySet and probabilityGet?
What does it mean? Get/set the weights, or get/set the probability? How
can you set a probability of a MoG?

You _could_ overload the probabilitySet & Get
Probability probabilityGet(int nr) -> get weight
Probability probabilityGet(ColumnVector) -> get Probability

But won't that create even more confusion?

> > Concerning the mixture of gaussians: as addComponent and
> > deleteComponent are non-realtime operations anyway, one might consider
> > deleting the discretepdf and creating a new one afterwards too as a
> > solution that doesn't need the DimensionSet() call.
> I also considered this solution. In my humble opinion this can however result
> in a lot of copying which can get quite time consuming (you have to copy all
> the Gaussians .... )
You only have to copy the weights of the gaussians.

I think that it would be best to remove the function DimensionSet, and
set the variable _dimension as protected instead of private. I don't
think it's needed to have a function that changes the dimension in a
uncontrolled way. If you want to do that, it is better to recreate a new
object.

For a discretePdf it could be interesting to add or remove a state.
Adding this kind of functions would solve the problem for a MoG in the
cleanest way IMHO.

To summarise, I would implement MoG like this:
* Inherit from pdf for MoG
* Remove the SetDimension from pdf.h
* make _dimension protected, so when a not initialise instance is
created it can still set the _dimension, if the dimension is 0.
* Add a function StateAdd() and StateRemove(stateNr) to discrete pdf

François

_______________________________________________
I hereby promise not to top-post on the
BFL mailing list
BFL [..] ...
http://lists.mech.kuleuven.be/mailman/listinfo/bfl

Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm

Mixture of Gaussian implementation

> To summarise, I would implement MoG like this:
> * Inherit from pdf for MoG
I agree.
> * Remove the SetDimension from pdf.h
To make it all more clear I propose even a bigger change.
I propose to remove the _dimension variable from the pdf-class and instead
implement it in the classes which really need it (i.e. Gaussian). In this
case dimension is really the dimension of the state variable.
For the discrete pdf I propose to introduce a variable _numStates which holds
the number of discrete states.
Then for the MoG, both the _dimension and the _numStates variables can be
implemented and used. Like this the meaning of these variables will be
consistent.

> * make _dimension protected, so when a not initialise instance is
> created it can still set the _dimension, if the dimension is 0.
Both _dimension and _numStates will be protected.
> * Add a function StateAdd() and StateRemove(stateNr) to discrete pdf
Indeed, also add them to MoG.

I will try to make a proposal for the changes related to _dimension and
_numStates.

Tinne

Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm

_______________________________________________
I hereby promise not to top-post on the
BFL mailing list
BFL [..] ...
http://lists.mech.kuleuven.be/mailman/listinfo/bfl

Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm

Mixture of Gaussian implementation

On Dec 7, 2007 3:34 PM, François Cauwe <francois [..] ...> wrote:
> On Fri, 2007-12-07 at 11:20 +0100, Tinne De Laet wrote:
> > > > I would propose to make a mixtureGaussian class as a daughter of
> > > > discretePdf. Then, some functions need be reimplemented: e.g. the
> > > > sampleFrom function. In this sampleFrom function first, the sampleFrom
> > > > function of discretePdf would be called (discretePdf::sampleFrom(...)),
> > > > and next, using the obtained sample, a sample would be drawn from the
> > > > according Gaussian.
> > >
> > > What's the advantage of heritage compared to creating a class that
> > > herits from pdf and has a pointer to a discretePdf and a
> > > vector of size "discretePdf".dimensionGet();
> > That was indeed another option.
> > In my thought however, the mixture of Gaussians is a special case of a
> > discretePdf. (this is open for discussion however)
> > I believe it could inherit some functions of the discretePdf and therefore as
> > a daughter class some double implementation could be prevented.
>
> I don't think that using a pdf will lead to double
> implementation. I think that most function have to be reimplemented:
> getProbability, sampleFrom, ...
>
> I also think that inheriting form a discretePdf will cause a lot of
> confusion:
> 1) What is the meaning of the dimension of a MoG?
> If the class MoG inherit from a discretePdf, then it means the # of
> gaussians, but I would expect it to be the dimension of the space it can
> represent.
>
> 2) What is the meaning of of probabilitySet and probabilityGet?
> What does it mean? Get/set the weights, or get/set the probability? How
> can you set a probability of a MoG?
>
> You _could_ overload the probabilitySet & Get
> Probability probabilityGet(int nr) -> get weight
> Probability probabilityGet(ColumnVector) -> get Probability
>
> But won't that create even more confusion?
>
> > > Concerning the mixture of gaussians: as addComponent and
> > > deleteComponent are non-realtime operations anyway, one might consider
> > > deleting the discretepdf and creating a new one afterwards too as a
> > > solution that doesn't need the DimensionSet() call.
> > I also considered this solution. In my humble opinion this can however result
> > in a lot of copying which can get quite time consuming (you have to copy all
> > the Gaussians .... )
> You only have to copy the weights of the gaussians.
>
> I think that it would be best to remove the function DimensionSet, and
> set the variable _dimension as protected instead of private. I don't
> think it's needed to have a function that changes the dimension in a
> uncontrolled way. If you want to do that, it is better to recreate a new
> object.
>
> For a discretePdf it could be interesting to add or remove a state.
> Adding this kind of functions would solve the problem for a MoG in the
> cleanest way IMHO.
>
> To summarise, I would implement MoG like this:
> * Inherit from pdf for MoG
> * Remove the SetDimension from pdf.h
> * make _dimension protected, so when a not initialise instance is
> created it can still set the _dimension, if the dimension is 0.
> * Add a function StateAdd() and StateRemove(stateNr) to discrete pdf

It's like you can read my mind :-)
I agree with all of the above.

Klaas
_______________________________________________
I hereby promise not to top-post on the
BFL mailing list
BFL [..] ...
http://lists.mech.kuleuven.be/mailman/listinfo/bfl

Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm