Started reworking and reorganization of the top level documentation of the compute graph.

3 years ago · 5a4b6064d1
parent dfb67ee993
commit 5a4b6064d1
19 changed files with 577 additions and 490 deletions
--- a/ComputeGraph/Dynamic.md
+++ b/ComputeGraph/Dynamic.md
@ -13,7 +13,7 @@ With a dynamic flow and scheduling, there is no more any way to ensure that ther
 * Another node may decide to do nothing and skip the execution
 * Another node may decide to raise an error.

-With dynamic scheduling, a node must implement the function `prepareForRunning` and decide what to do.
+With dynamic flow, a node must implement the function `prepareForRunning` and decide what to do.

 3 error / status codes are reserved for this. They are defined in the header `cg_status.h`. This header is not included by default, but if you define you own error codes, they should be coherent with `cg_status` and use the same values for the 3 status / error codes which are used in dynamic mode:

@ -23,9 +23,9 @@ With dynamic scheduling, a node must implement the function `prepareForRunning`

 Any other returned value will stop the execution.

-The dynamic mode (also named asynchronous), is enabled with option : `asynchronous`
+The dynamic mode (also named asynchronous), is enabled with option : `asynchronous` of the configuration object used with the scheduling functions.

-The system will still compute a scheduling and FIFO sizes as if the flow was static. We can see the static flow as an average of the dynamic flow. In dynamic mode, the FIFOs may need to be bigger than the ones computed in static mode.  The static estimation is giving a first idea of what the size of the FIFOs should be. The size can be increased by specifying a percent increase with option `FIFOIncrease`.
+The system will still compute a synchronous scheduling and FIFO sizes as if the flow was static. We can see the static flow as an average of the dynamic flow. In dynamic mode, the FIFOs may need to be bigger than the ones computed in static mode.  The static estimation is giving a first idea of what the size of the FIFOs should be. The size can be increased by specifying a percent increase with option `FIFOIncrease`.

 For pure compute functions (like CMSIS-DSP ones), which are not packaged into a C++ class, there is no way to customize the decision logic in case of a problem with FIFO. There is a global option : `asyncDefaultSkip`. 

@ -82,7 +82,7 @@ If the `getReadBuffer` and `getWriteBuffer` are causing an underflow or overflow

 ## Graph constraints

-The dynamic / asynchronous mode is using a synchronous graph as average / ideal case. But it is important to understand that we are no more in static / synchronous mode and some static graph may be too complex for the dynamic mode. Let's take the following graph as example:
+The dynamic mode is using a synchronous graph as average / ideal case. But it is important to understand that we are no more in static / synchronous mode and some static graph may be too complex for the dynamic mode. Let's take the following graph as example:

 ![async_topological2](documentation/async_topological2.png)

@ -104,14 +104,14 @@ sink

 If we use a strategy of skipping the execution of a node in case of overflow / underflow, what will happen is:

-* Schedule execution 1
+* Schedule iteration  1
  * First `src` node execution is successful since there is a sample
  * All other execution attempts will be skipped 
-* Schedule execution 2
+* Schedule iteration  2
  * First `src` node execution is successful since there is a sample
  * All other execution attempt will be skipped 
 * ...
-* Schedule execution 5:
+* Schedule iteration  5:
  * First `src` node execution is successful since there is a sample
  * 4 other `src` node executions are skipped
  * The `filter` execution can finally take place since enough data has been generated
@ -143,5 +143,3 @@ As consequence, the recommendation in dynamic / asynchronous mode is to:
 * Ensure that the amount of data produced and consumed on each FIFO end is the same (so that each node execution is attempted only once during a schedule)
 * Use the maximum amount of samples required on both ends of the FIFO
  * Here `sink` is generating  at most `1` sample, `filter` needs 5. So we use `5` on both ends of the FIFO
-* More complex graphs will create a useless overhead in dynamic / asynchronous mode
-
--- a/ComputeGraph/Introduction.md
+++ b/ComputeGraph/Introduction.md
@ -0,0 +1,98 @@
+# Introduction
+
+Embedded systems are often used to implement streaming solutions : the software is processing and / or generating stream of samples. The software is made of components that have no concept of streams : they are working with buffers. As a consequence, implementing a streaming solution is forcing the developer to think about scheduling questions, FIFO sizing etc ...
+
+The CMSIS-DSP compute graph is a **low overhead** solution to this problem : it makes it easier to build streaming solutions by connecting components and computing a scheduling at **build time**. The use of C++ template also enables the compiler to have more information about the components for better code generation.
+
+A dataflow graph is a representation of how compute blocks are connected to implement a streaming processing. 
+
+Here is an example with 3 nodes:
+
+- A source
+- A filter
+- A sink
+
+Each node is producing and consuming some amount of samples. For instance, the source node is producing 5 samples each time it is run. The filter node is consuming 7 samples each time it is run.
+
+The FIFOs lengths are represented on each edge of the graph : 11 samples for the leftmost FIFO and 5 for the other one.
+
+In blue, the amount of samples generated or consumed by a node each time it is called.
+
+<img src="examples/example1/docassets/graph1.PNG" alt="graph1" style="zoom:100%;" />
+
+When the processing is applied to a stream of samples then the problem to solve is : 
+
+> **how the blocks must be scheduled and the FIFOs connecting the block dimensioned**
+
+The general problem can be very difficult. But, if some constraints are applied to the graph then some algorithms can compute a static schedule at build time.
+
+When the following constraints are satisfied we say we have a Synchronous / Static Dataflow Graph:
+
+- Each node is always consuming and producing the same number of samples (static / synchronous flow)
+
+The CMSIS-DSP Compute Graph Tools are a set of Python scripts and C++ classes with following features:
+
+- A compute graph and its static flow can be described in Python
+- The Python script will compute a static schedule and the optimal FIFOs size
+- A static schedule is:
+  - A periodic sequence of functions calls
+  - A periodic execution where the FIFOs remain bounded
+  - A periodic execution with no deadlock : when a node is run there is enough data available to run it 
+- The Python script will generate a [Graphviz](https://graphviz.org/) representation of the graph 
+- The Python script will generate a C++ implementation of the static schedule 
+- The Python script can also generate a Python implementation of the static schedule (for use with the CMSIS-DSP Python wrapper)
+
+There is no FIFO underflow or overflow due to the scheduling. If there are not enough cycles to run the processing, the real-time will be broken and the solution won't work. But this problem is independent from the scheduling itself. 
+
+# Why it is useful
+
+Without any scheduling tool for a dataflow graph, there is a problem of modularity : a change on a node may impact other nodes in the graph. For instance, if the number of samples consumed by a node is changed:
+
+- You may need to change how many samples are produced by the predecessor blocks  in the graph (assuming it is possible)
+- You may need to change how many times the predecessor blocks must run
+- You may have to change the FIFOs sizes
+
+With the CMSIS-DSP Compute Graph (CG) Tools you don't have to think about those details while you are still experimenting with your data processing pipeline. It makes it easier to experiment, add or remove blocks, change their parameters.
+
+The tools will generate a schedule and the FIFOs. Even if you don't use this at the end for a final implementation, the information could be useful : is the schedule too long ? Are the FIFOs too big ? Is there too much latency between the sources and the sinks ?
+
+Let's look at an (artificial) example:
+
+<img src="examples/example1/docassets/graph1.PNG" alt="graph1" style="zoom:100%;" />
+
+Without a tool, the user would probably try to modify the number of samples so that the number of sample produced is equal to the number of samples consumed. With the CG Tools  we know that such a graph can be scheduled and that the FIFO sizes need to be 11 and 5.
+
+The periodic schedule generated for this graph has a length of 19. It is big for such a small graph and it is because, indeed 5 and 7 are not very well chosen values. But, it is working even with those values.
+
+The schedule is (the number of samples in the FIFOs after the execution of the nodes are displayed in the brackets):
+
+```
+source [ 5   0]
+source [10   0]
+filter [ 3   5]
+sink   [ 3   0]
+source [ 8   0]
+filter [ 1   5]
+sink   [ 1   0]
+source [ 6   0]
+source [11   0]
+filter [ 4   5]
+sink   [ 4   0]
+source [ 9   0]
+filter [ 2   5]
+sink   [ 2   0]
+source [ 7   0]
+filter [ 0   5]
+sink   [ 0   0]
+```
+
+At the end, both FIFOs are empty so the schedule can be run again : it is periodic !
+
+The compute graph is focusing on the synchronous / static case but some extensions have been introduced for more flexibility:
+
+* A [cyclo-static scheduling](CycloStatic.md) (nearly static)
+* A [dynamic/asynchronous](Async.md) mode
+
+Here is a summary of the different configuration supported by the compute graph. The cyclo-static scheduling is part of the static flow mode.
+
+![supported_configs](documentation/supported_configs.png)
--- a/ComputeGraph/README.md
+++ b/ComputeGraph/README.md
@ -1,187 +1,28 @@
 # Compute Graph for streaming with CMSIS-DSP

-## Introduction
+## Table of contents

-Embedded systems are often used to implement streaming solutions : the software is processing and / or generating stream of samples. The software is made of components that have no concept of streams : they are working with buffers. As a consequence, implementing a streaming solution is forcing the developer to think about scheduling questions, FIFO sizing etc ...
+1. ### [Introduction](Introduction.md)

-The CMSIS-DSP compute graph is a **low overhead** solution to this problem : it makes it easier to build streaming solutions by connecting components and computing a scheduling at **build time**. The use of C++ template also enables the compiler to have more information about the components for better code generation.
+2. ### [How to get started](examples/simple/README.md)

-A dataflow graph is a representation of how compute blocks are connected to implement a streaming processing. 
+3. ### [Examples](examples/README.md)

-Here is an example with 3 nodes:
+4. ### [Python API](documentation/PythonAPI.md)

- A source
- A filter
- A sink
+5. ### [C++ Default nodes](documentation/CPPNodes.md)

-Each node is producing and consuming some amount of samples. For instance, the source node is producing 5 samples each time it is run. The filter node is consuming 7 samples each time it is run.
+6. ### [Python default nodes](documentation/PythonNodes.md)

-The FIFOs lengths are represented on each edge of the graph : 11 samples for the leftmost FIFO and 5 for the other one.
+7. ### Extensions

-In blue, the amount of samples generated or consumed by a node each time it is called.
+   1. #### [Cyclo-static scheduling](CycloStatic.md)

-<img src="examples/example1/docassets/graph1.PNG" alt="graph1" style="zoom:100%;" />
+   2. #### [Dynamic / Asynchronous mode](Async.md)

-When the processing is applied to a stream of samples then the problem to solve is : 
+8. ### [Maths principles](MATHS.md)

-> **how the blocks must be scheduled and the FIFOs connecting the block dimensioned**
+9. ### [FAQ](FAQ.md)

-The general problem can be very difficult. But, if some constraints are applied to the graph then some algorithms can compute a static schedule at build time.

-When the following constraints are satisfied we say we have a Synchronous / Static Dataflow Graph:
-
- Static graph : graph topology is not changing
- Each node is always consuming and producing the same number of samples (static flow)
-
-The CMSIS-DSP Compute Graph Tools are a set of Python scripts and C++ classes with following features:
-
- A compute graph and its static flow can be described in Python
- The Python script will compute a static schedule and the FIFOs size
- A static schedule is:
-  - A periodic sequence of functions calls
-  - A periodic execution where the FIFOs remain bounded
-  - A periodic execution with no deadlock : when a node is run there is enough data available to run it 
- The Python script will generate a [Graphviz](https://graphviz.org/) representation of the graph 
- The Python script will generate a C++ implementation of the static schedule 
- The Python script can also generate a Python implementation of the static schedule (for use with the CMSIS-DSP Python wrapper)
-
-There is no FIFO underflow or overflow due to the scheduling. If there are not enough cycles to run the processing, the real-time will be broken and the solution won't work But this problem is independent from the scheduling itself. 
-
-## Why it is useful
-
-Without any scheduling tool for a dataflow graph, there is a problem of modularity : a change on a node may impact other nodes in the graph. For instance, if the number of samples consumed by a node is changed:
-
- You may need to change how many samples are produced by the predecessor blocks  in the graph (assuming it is possible)
- You may need to change how many times the predecessor blocks must run
- You may have to change the FIFOs sizes
-
-With the CMSIS-DSP Compute Graph (CG) Tools you don't have to think about those details while you are still experimenting with your data processing pipeline. It makes it easier to experiment, add or remove blocks, change their parameters.
-
-The tools will generate a schedule and the FIFOs. Even if you don't use this at the end for a final implementation, the information could be useful : is the schedule too long ? Are the FIFOs too big ? Is there too much latency between the sources and the sinks ?
-
-Let's look at an (artificial) example:
-
-<img src="examples/example1/docassets/graph1.PNG" alt="graph1" style="zoom:100%;" />
-
-Without a tool, the user would probably try to modify the number of samples so that the number of sample produced is equal to the number of samples consumed. With the CG Tools  we know that such a graph can be scheduled and that the FIFO sizes need to be 11 and 5.
-
-The periodic schedule generated for this graph has a length of 19. It is big for such a small graph and it is because, indeed 5 and 7 are not very well chosen values. But, it is working even with those values.
-
-The schedule is (the size of the FIFOs after the execution of the node displayed in the brackets):
-
-```
-source [ 5   0]
-source [10   0]
-filter [ 3   5]
-sink   [ 3   0]
-source [ 8   0]
-filter [ 1   5]
-sink   [ 1   0]
-source [ 6   0]
-source [11   0]
-filter [ 4   5]
-sink   [ 4   0]
-source [ 9   0]
-filter [ 2   5]
-sink   [ 2   0]
-source [ 7   0]
-filter [ 0   5]
-sink   [ 0   0]
-```
-
-At the end, both FIFOs are empty so the schedule can be run again : it is periodic !
-
-The compute graph is focusing on the synchronous / static case but some extensions have been introduced for more flexibility:
-
-* A [cyclo-static scheduling](CycloStatic.md) (nearly static)
-* A [dynamic/asynchronous](Dynamic.md) mode
-
-Here is a summary of the different configuration supported by the compute graph. The cyclo-static scheduling is part of the static flow mode.
-
-![supported_configs](documentation/supported_configs.png)
-
-More details about the maths behind the code generator are available in a [separate document](MATHS.md).
-
-## How to use the static scheduler generator
-
-First, you must install the `CMSIS-DSP` PythonWrapper:
-
-```
-pip install cmsisdsp
-```
-
-The functions and classes inside the cmsisdsp wrapper can be used to describe and generate the schedule.
-
-To start, you can create a `graph.py` file and include :
-
-```python
-from cmsisdsp.cg.scheduler import *
-```
-
-In this file, you can describe new type of blocks that you need in the compute graph if they are not provided by the python package by default.
-
-Finally, you can execute `graph.py` to generate the C++ files.
-
-The generated files need to include the `ComputeGraph/cg/src/GenericNodes.h` and the nodes used in the graph and which can be found in `cg/nodes/cpp`. Those headers are part of the CMSIS-DSP Pack. They are optional so you'll need to select the compute graph extension in the pack.
-
-If you have declared new nodes in `graph.py` then you'll need to provide an implementation.
-
-More details and explanations can be found in the documentation for the examples. The first example is a deep dive giving all the details about the Python and C++ sides of the tool: 
-
-* [Example 1 : how to describe a simple graph](examples/example1/README.md)
-* [Example 2 : More complex example with delay and CMSIS-DSP](examples/example2/README.md)
-* [Example 3 : Working example with CMSIS-DSP and FFT](examples/example3/README.md)
-* [Example 4 : Same as example 3 but with the CMSIS-DSP Python wrapper](examples/example4/README.md)
-* [Example 10 : The asynchronous mode](examples/example10/README.md)
-
-Examples 5 and 6 are showing how to use the CMSIS-DSP MFCC with a synchronous data flow.
-
-Example 7 is communicating with OpenModelica. The Modelica model (PythonTest) in the example is implementing a Larsen effect.
-
-Example 8 is showing how to define a new custom datatype for the IOs of the nodes. Example 8 is also demonstrating a new feature where an IO can be connected up to 3 inputs and the static scheduler will automatically generate duplicate nodes.
-
-## Frequently asked questions:
-
-There is a [FAQ](FAQ.md) document.
-
-## Options
-
-There is a document describing the [list](documentation/Options.md) of available options
-
-## How to build the examples
-
-There is a document explaining [how to build the examples](examples/README.md).
-
-## Limitations
-
- CMSIS-DSP integration must be improved to make it easier
- The code is requiring a lot more comments and cleaning
- A C version of the code generator is missing
- The code generation could provide more flexibility for memory allocation with a choice between:
-  - Global
-  - Stack
-  - Heap
-
-## Default nodes
-Here is a list of the nodes supported by default. More can be easily added:
-
- Unary:
-  - Unary function with header `void function(T* src, T* dst, int nbSamples)`
- Binary:
-  - Binary function with header `void function(T* srcA, T* srcB, T* dst, int nbSamples)`
- CMSIS-DSP function:
-  - It will detect if it is an unary or binary function.
-  - The name must not contain the prefix `arm` nor the the type suffix
-  - For instance, use `Dsp("mult",CType(F32),NBSAMPLES)` to use `arm_mult_f32`
-  - Other CMSIS-DSP function (with an instance variable) are requiring the creation of a Node if it is not already provided
- CFFT / ICFFT : Use of CMSIS-DSP CFFT. Currently only F32, F16 and Q15 
- Zip / Unzip : To zip / unzip streams 
- ToComplex : Map a real stream onto a complex stream
- ToReal : Extract real part of a complex stream
- FileSource and FileSink : Read/write float to/from a file (Host only)
- NullSink : Do nothing. Useful for debug 
- InterleavedStereoToMono : Interleaved stereo converted to mono with scaling to avoid saturation of the addition
- Python only nodes:
-  - WavSink and WavSource to use wav files for testing
-  - VHTSDF : To communicate with OpenModelica using VHTModelica blocks

--- a/ComputeGraph/cg/src/GenericNodes.h
+++ b/ComputeGraph/cg/src/GenericNodes.h
@ -83,11 +83,11 @@ class FIFO<T,length,0,0>: public FIFOBase<T>
        FIFO(uint8_t *buffer,int delay=0):mBuffer((T*)buffer),readPos(0),writePos(delay) {};

        /* Not used in synchronous mode */
-        bool willUnderflowWith(int nb) const override {return false;};
-        bool willOverflowWith(int nb) const override {return false;};
-        int nbSamplesInFIFO() const override {return 0;};
+        bool willUnderflowWith(int nb) const final {return false;};
+        bool willOverflowWith(int nb) const final {return false;};
+        int nbSamplesInFIFO() const final {return 0;};

-        T * getWriteBuffer(int nb) override
+        T * getWriteBuffer(int nb) final
        {
            
            T *ret;
@ -103,7 +103,7 @@ class FIFO<T,length,0,0>: public FIFOBase<T>
            return(ret);
        };

-        T* getReadBuffer(int nb) override
+        T* getReadBuffer(int nb) final
        {
            
            T *ret = mBuffer + readPos;
@ -145,16 +145,16 @@ class FIFO<T,length,1,0>: public FIFOBase<T>
        FIFO(uint8_t *buffer,int delay=0):mBuffer((T*)buffer),readPos(0),writePos(delay) {};

        /* Not used in synchronous mode */
-        bool willUnderflowWith(int nb) const override {return false;};
-        bool willOverflowWith(int nb) const override {return false;};
-        int nbSamplesInFIFO() const override {return 0;};
+        bool willUnderflowWith(int nb) const final {return false;};
+        bool willOverflowWith(int nb) const final {return false;};
+        int nbSamplesInFIFO() const final {return 0;};

-        T * getWriteBuffer(int nb) override
+        T * getWriteBuffer(int nb) const final
        {
            return(mBuffer);
        };

-        T* getReadBuffer(int nb) override
+        T* getReadBuffer(int nb) const final
        {
            return(mBuffer);
        }
@ -198,7 +198,7 @@ class FIFO<T,length,0,1>: public FIFOBase<T>
        before using this function 
        
        */
-        T * getWriteBuffer(int nb) override
+        T * getWriteBuffer(int nb) final
        {
            
            T *ret;
@ -221,7 +221,7 @@ class FIFO<T,length,0,1>: public FIFOBase<T>
        before using this function 
        
        */
-        T* getReadBuffer(int nb) override
+        T* getReadBuffer(int nb) final
        {
           
            T *ret = mBuffer + readPos;
@ -230,17 +230,17 @@ class FIFO<T,length,0,1>: public FIFOBase<T>
            return(ret);
        }

-        bool willUnderflowWith(int nb) const override
+        bool willUnderflowWith(int nb) const final
        {
            return((nbSamples - nb)<0);
        }

-        bool willOverflowWith(int nb) const override
+        bool willOverflowWith(int nb) const final
        {
            return((nbSamples + nb)>length);
        }

-        int nbSamplesInFIFO() const override {return nbSamples;};
+        int nbSamplesInFIFO() const final {return nbSamples;};

        #ifdef DEBUGSCHED
        void dump()
@ -423,7 +423,7 @@ public:
    Duplicate2(FIFOBase<IN> &src,FIFOBase<IN> &dst1,FIFOBase<IN> &dst2):
    GenericNode12<IN,inputSize,IN,inputSize,IN,inputSize>(src,dst1,dst2){};

-    int prepareForRunning() override
+    int prepareForRunning() final
    {
        if (this->willUnderflow() || 
            this->willOverflow1() ||
@ -435,7 +435,7 @@ public:
        return(0);
    };

-    int run() override {
+    int run() final {
        IN *a=this->getReadBuffer();
        IN *b1=this->getWriteBuffer1();
        IN *b2=this->getWriteBuffer2();
@ -475,7 +475,7 @@ public:
                  IN,inputSize,
                  IN,inputSize>(src,dst1,dst2,dst3){};

-    int prepareForRunning() override
+    int prepareForRunning() final
    {
        if (this->willUnderflow() || 
            this->willOverflow1() ||
@ -489,7 +489,7 @@ public:
        return(0);
    };

-    int run() override {
+    int run() final {
        IN *a=this->getReadBuffer();
        IN *b1=this->getWriteBuffer1();
        IN *b2=this->getWriteBuffer2();
--- a/ComputeGraph/documentation/CCodeGen.md
+++ b/ComputeGraph/documentation/CCodeGen.md
@ -0,0 +1,85 @@
+#### Options for C Code Generation only
+
+##### cOptionalArgs (default = "")
+
+Optional arguments to pass to the C API of the scheduler function
+
+It can either use a `string` or a list of `string` where an element is an argument of the function (and should be valid `C`).
+
+##### codeArray (default = True)
+
+When true, the scheduling is defined as an array. Otherwise, a list of function calls is generated.
+
+A list of function call may be easier to read but if the schedule is long, it is not good for code size. In that case, it is better to encode the schedule as an array rather than a list of functions.
+
+When `codeArray` is True, the option `switchCase`can also be used.
+
+##### switchCase (default = True)
+
+`codeArray` must be true or this option is ignored.
+
+When the schedule is encoded as an array, it can either be an array of function pointers (`switchCase` false) or an array of indexes for a state machine (`switchCase` true)
+
+##### eventRecorder (default = False)
+
+Enable the generation of `CMSIS EventRecorder` intrumentation in the code. The CMSIS-DSP Pack is providing definition of 3 events:
+
+* Schedule iteration
+* Node execution
+* Error
+
+##### customCName (default = "custom.h")
+
+Name of custom header in generated C code. If you use several scheduler, you may want to use different headers for each one.
+
+##### postCustomCName (default = "")
+
+Name of custom header in generated C code coming after all of the other includes. 
+
+##### genericNodeCName (default = "GenericNodes.h")
+
+Name of GenericNodes header in generated C code. If you use several scheduler, you may want to use different headers for each one.
+
+##### appNodesCName (default = "AppNodes.h")
+
+Name of AppNodes header in generated C code. If you use several scheduler, you may want to use different headers for each one.
+
+##### schedulerCFileName (default = "scheduler")
+
+Name of scheduler cpp and header in generated C code. If you use several scheduler, you may want to use different headers for each one.
+
+If the option is set to `xxx`, the names generated will be `xxx.cpp` and `xxx.h`
+
+##### CAPI (default = True)
+
+By default, the scheduler function is callable from C. When false, it is a standard C++ API.
+
+##### CMSISDSP (default = True)
+
+If you don't use any of the datatypes or functions of the CMSIS-DSP, you don't need to include the `arm_math.h` in the scheduler file. This option can thus be set to `False`.
+
+##### asynchronous (default = False)
+
+When true, the scheduling is for a dynamic / asynchronous flow. A node may not always produce or consume the same amount of data. As consequence, a scheduling can fail. Each node needs to implement a `prepareForRunning` function to identify and recover from FIFO underflows and overflows.
+
+A synchronous schedule is used as start and should describe the average case.
+
+This implies `codeArray` and `switchCase`. This disables `memoryOptimizations`.
+
+Synchronous FIFOs that are just buffers will be considered as FIFOs in asynchronous mode.
+
+More info are available in the documentation for [this mode](Dynamic.md).
+
+##### FIFOIncrease (default 0)
+
+In case of dynamic / asynchronous scheduling, the FIFOs may need to be bigger than what is computed assuming a static / synchronous scheduling. This option is used to increase the FIFO size. It represents a percent increase.
+
+For instance, a value of 10 means the FIFO will have their size updated from `oldSize` to `1.1 * oldSize` which is ` (1 + 10%)* oldSize`
+
+If the value is a `float` instead of an `int` it will be used as is. For instance, `1.1` would increase the size by `1.1` and be equivalent to the setting `10` (for 10 percent).
+
+##### asyncDefaultSkip (default True)
+
+Behavior of a pure function (like CMSIS-DSP) in asynchronous mode. When `True`, the execution is skipped if the function can't be executed. If `False`, an error is raised.
+
+If another error recovery is needed, the function must be packaged into a C++ class to implement a `prepareForRun` function.
--- a/ComputeGraph/documentation/CPPNodes.md
+++ b/ComputeGraph/documentation/CPPNodes.md
@ -0,0 +1,54 @@
+# CPP Nodes and classes
+
+(DOCUMENTATION TO BE WRITTEN)
+
+## Mandatory classes
+
+FIFO
+
+GenericNode
+
+GenericNode12
+
+GenericNode13
+
+GenericNode21
+
+GenericSource
+
+GenericSink
+
+Duplicate2
+
+Duplicate3
+
+## Optional nodes
+
+CFFT
+
+CIFFT
+
+InterleavedStereoToMono
+
+MFCC
+
+NullSink
+
+OverlapAndAdd
+
+SlidingBuffer
+
+ToComplex
+
+ToReal
+
+Unzip
+
+Zip
+
+### Host
+
+FileSink
+
+FileSource
+
--- a/ComputeGraph/documentation/CodegenOptions.md
+++ b/ComputeGraph/documentation/CodegenOptions.md
@ -0,0 +1,27 @@
+### Options for the code generator
+
+#### debugLimit (default = 0)
+
+When `debugLimit` is > 0, the number of iterations of the scheduling is limited to  `debugLimit`. Otherwise, the scheduling is running forever or until an error has occured.
+
+#### dumpFIFO (default = False)
+
+When true, generate some code to dump the FIFO content at runtime. Only useful for debug.
+
+In C++ code generation, it is only available when using the mode `codeArray == False`.
+
+When this mode is enabled, the first line of the scheduler file is :
+
+`#define DEBUGSCHED 1`
+
+and it also enable some debug code in `GenericNodes.h`
+
+#### schedName (default = "scheduler")
+
+Name of the scheduler function used in the generated code.
+
+#### prefix (default = "")
+
+Prefix to add before the FIFO buffer definitions. Those buffers are not static and are global. If you want to use several schedulers in your code, the buffer names used by each should be different.
+
+Another possibility would be to make the buffer static by redefining the macro `CG_BEFORE_BUFFER`
--- a/ComputeGraph/documentation/Generic.md
+++ b/ComputeGraph/documentation/Generic.md
@ -0,0 +1 @@
+(DOCUMENTATION TO BE WRITTEN)
--- a/ComputeGraph/documentation/Graph.md
+++ b/ComputeGraph/documentation/Graph.md
@ -0,0 +1,42 @@
+### Options for the graph
+
+Those options needs to be used on the graph object created with `Graph()`.
+
+For instance :
+
+```python
+g = Graph()
+g.defaultFIFOClass = "FIFO"
+```
+
+#### defaultFIFOClass (default = "FIFO")
+
+Class used for FIFO by default. Can also be customized for each connection (`connect` of `connectWithDelay` call) with something like:
+
+`g.connect(src.o,b.i,fifoClass="FIFOClassNameForThisConnection")`
+
+#### duplicateNodeClassName(default="Duplicate")
+
+Prefix used to generate the duplicate node classes like `Duplicate2`, `Duplicate3` ...
+
+
+
+### Options for connections
+
+It is now possible to write something like:
+
+```python
+g.connect(src.o,b.i,fifoClass="FIFOSource")
+```
+
+The `fifoClass` argument allows to choose a specific FIFO class in the generated C++ or Python.
+
+Only the `FIFO` class is provided by default. Any new implementation must inherit from `FIFObase<T>`
+
+There is also an option to set the scaling factor when used in asynchronous mode:
+
+```python
+g.connect(odd.o,debug.i,fifoScale=3.0)
+```
+
+When this option is set, it will be used (instead of the global setting). This must be a float.
--- a/ComputeGraph/documentation/GraphvizGen.md
+++ b/ComputeGraph/documentation/GraphvizGen.md
@ -0,0 +1,11 @@
+
+
+### Options for the graphviz generator
+
+#### horizontal (default = True)
+
+Horizontal or vertical layout for the graph.
+
+#### displayFIFOBuf (default = False)
+
+By default, the graph is displaying the FIFO sizes. If you want to know with FIFO variable is used in the code, you can set this option to true and the graph will display the FIFO variable names.
--- a/ComputeGraph/documentation/Options.md
+++ b/ComputeGraph/documentation/Options.md
@ -1,220 +0,0 @@
-## Options
-
-Several options can be used in the Python to control the schedule generation. Some options are used by the scheduling algorithm and other options are used by the code generators or graphviz generator:
-
-### Options for the graph
-
-Those options needs to be used on the graph object created with `Graph()`.
-
-For instance :
-
-```python
-g = Graph()
-g.defaultFIFOClass = "FIFO"
-```
-
-#### defaultFIFOClass (default = "FIFO")
-
-Class used for FIFO by default. Can also be customized for each connection (`connect` of `connectWithDelay` call) with something like:
-
-`g.connect(src.o,b.i,fifoClass="FIFOClassNameForThisConnection")`
-
-#### duplicateNodeClassName(default="Duplicate")
-
-Prefix used to generate the duplicate node classes like `Duplicate2`, `Duplicate3` ...
-
-### Options for the scheduling
-
-Those options needs to be used on a configuration objects passed as argument of the scheduling function. For instance:
-
-```python
-conf = Configuration()
-conf.debugLimit = 10
-sched = g.computeSchedule(config = conf)
-```
-
-Note that the configuration object also contain options for the code generators.
-
-#### memoryOptimization (default = False)
-
-When the amount of data written to a FIFO and read from the FIFO is the same, the FIFO is just an array. In this case, depending on the scheduling, the memory used by different arrays may be reused if those arrays are not needed at the same time.
-
-This option is enabling an analysis to optimize the memory usage by merging some buffers when it is possible.
-
-#### sinkPriority (default = True)
-
-Try to prioritize the scheduling of the sinks to minimize the latency between sources and sinks.
-
-When  this option is enabled, the tool may not be able to find a schedule in all cases. If it can't find a schedule, it will raise a `DeadLock` exception.
-
-#### displayFIFOSizes (default = False)
-
-During computation of the schedule, the evolution of the FIFO sizes is generated on `stdout`.
-
-#### dumpSchedule (default = False)
-
-During computation of the schedule, the human readable schedule is generated on `stdout`.
-
-### Options for the code generator
-
-#### debugLimit (default = 0)
-
-When `debugLimit` is > 0, the number of iterations of the scheduling is limited to  `debugLimit`. Otherwise, the scheduling is running forever or until an error has occured.
-
-#### dumpFIFO (default = False)
-
-When true, generate some code to dump the FIFO content at runtime. Only useful for debug.
-
-In C++ code generation, it is only available when using the mode `codeArray == False`.
-
-When this mode is enabled, the first line of the scheduler file is :
-
-`#define DEBUGSCHED 1`
-
-and it also enable some debug code in `GenericNodes.h`
-
-#### schedName (default = "scheduler")
-
-Name of the scheduler function used in the generated code.
-
-#### prefix (default = "")
-
-Prefix to add before the FIFO buffer definitions. Those buffers are not static and are global. If you want to use several schedulers in your code, the buffer names used by each should be different.
-
-Another possibility would be to make the buffer static by redefining the macro `CG_BEFORE_BUFFER`
-
-#### Options for C Code Generation only
-
-##### cOptionalArgs (default = "")
-
-Optional arguments to pass to the C API of the scheduler function
-
-It can either use a `string` or a list of `string` where an element is an argument of the function (and should be valid `C`).
-
-##### codeArray (default = True)
-
-When true, the scheduling is defined as an array. Otherwise, a list of function calls is generated.
-
-A list of function call may be easier to read but if the schedule is long, it is not good for code size. In that case, it is better to encode the schedule as an array rather than a list of functions.
-
-When `codeArray` is True, the option `switchCase`can also be used.
-
-##### switchCase (default = True)
-
-`codeArray` must be true or this option is ignored.
-
-When the schedule is encoded as an array, it can either be an array of function pointers (`switchCase` false) or an array of indexes for a state machine (`switchCase` true)
-
-##### eventRecorder (default = False)
-
-Enable the generation of `CMSIS EventRecorder` intrumentation in the code. The CMSIS-DSP Pack is providing definition of 3 events:
-
-* Schedule iteration
-* Node execution
-* Error
-
-##### customCName (default = "custom.h")
-
-Name of custom header in generated C code. If you use several scheduler, you may want to use different headers for each one.
-
-##### postCustomCName (default = "")
-
-Name of custom header in generated C code coming after all of the other includes. 
-
-##### genericNodeCName (default = "GenericNodes.h")
-
-Name of GenericNodes header in generated C code. If you use several scheduler, you may want to use different headers for each one.
-
-##### appNodesCName (default = "AppNodes.h")
-
-Name of AppNodes header in generated C code. If you use several scheduler, you may want to use different headers for each one.
-
-##### schedulerCFileName (default = "scheduler")
-
-Name of scheduler cpp and header in generated C code. If you use several scheduler, you may want to use different headers for each one.
-
-If the option is set to `xxx`, the names generated will be `xxx.cpp` and `xxx.h`
-
-##### CAPI (default = True)
-
-By default, the scheduler function is callable from C. When false, it is a standard C++ API.
-
-##### CMSISDSP (default = True)
-
-If you don't use any of the datatypes or functions of the CMSIS-DSP, you don't need to include the `arm_math.h` in the scheduler file. This option can thus be set to `False`.
-
-##### asynchronous (default = False)
-
-When true, the scheduling is for a dynamic / asynchronous flow. A node may not always produce or consume the same amount of data. As consequence, a scheduling can fail. Each node needs to implement a `prepareForRunning` function to identify and recover from FIFO underflows and overflows.
-
-A synchronous schedule is used as start and should describe the average case.
-
-This implies `codeArray` and `switchCase`. This disables `memoryOptimizations`.
-
-Synchronous FIFOs that are just buffers will be considered as FIFOs in asynchronous mode.
-
-More info are available in the documentation for [this mode](Dynamic.md).
-
-##### FIFOIncrease (default 0)
-
-In case of dynamic / asynchronous scheduling, the FIFOs may need to be bigger than what is computed assuming a static / synchronous scheduling. This option is used to increase the FIFO size. It represents a percent increase.
-
-For instance, a value of 10 means the FIFO will have their size updated from `oldSize` to `1.1 * oldSize` which is ` (1 + 10%)* oldSize`
-
-If the value is a `float` instead of an `int` it will be used as is. For instance, `1.1` would increase the size by `1.1` and be equivalent to the setting `10` (for 10 percent).
-
-##### asyncDefaultSkip (default True)
-
-Behavior of a pure function (like CMSIS-DSP) in asynchronous mode. When `True`, the execution is skipped if the function can't be executed. If `False`, an error is raised.
-
-If another error recovery is needed, the function must be packaged into a C++ class to implement a `prepareForRun` function.
-
-#### Options for Python code generation only
-
-##### pyOptionalArgs (default = "")
-
-Optional arguments to pass to the Python version of the scheduler function
-
-##### customPythonName (default = "custom")
-
-Name of custom header in generated Python code. If you use several scheduler, you may want to use different headers for each one.
-
-##### appNodesPythonName (default = "appnodes")
-
-Name of AppNodes header in generated Python code. If you use several scheduler, you may want to use different headers for each one.
-
-##### schedulerPythonFileName (default = "sched")
-
-Name of scheduler file in generated Python code. If you use several scheduler, you may want to use different headers for each one.
-
-If the option is set to `xxx`, the name generated will be `xxx.py`
-
-### Options for the graphviz generator
-
-#### horizontal (default = True)
-
-Horizontal or vertical layout for the graph.
-
-#### displayFIFOBuf (default = False)
-
-By default, the graph is displaying the FIFO sizes. If you want to know with FIFO variable is used in the code, you can set this option to true and the graph will display the FIFO variable names.
-
-### Options for connections
-
-It is now possible to write something like:
-
-```python
-g.connect(src.o,b.i,fifoClass="FIFOSource")
-```
-
-The `fifoClass` argument allows to choose a specific FIFO class in the generated C++ or Python.
-
-Only the `FIFO` class is provided by default. Any new implementation must inherit from `FIFObase<T>`
-
-There is also an option to set the scaling factor when used in asynchronous mode:
-
-```python
-g.connect(odd.o,debug.i,fifoScale=3.0)
-```
-
-When this option is set, it will be used (instead of the global setting). This must be a float.
--- a/ComputeGraph/documentation/PythonAPI.md
+++ b/ComputeGraph/documentation/PythonAPI.md
@ -0,0 +1,28 @@
+# Python API 
+
+Python APIs to describe the nodes and graph and generate the C++, Python or Graphviz code.
+
+1. ## [Graph class](Graph.md)
+
+2. ## [Generic Node, Source and Sink classes](Generic.md)
+
+3. ## Scheduler
+
+   1. ### [Scheduler options](SchedOptions.md)
+
+   2. ### [Code generation](CodegenOptions.md)
+
+      1. #### [C Code generation](CCodeGen.md)
+
+      2. #### [Python code generation](PythonGen.md)
+
+   3. ### [Graphviz representation](GraphvizGen.md)
+
+
+
+
+
+
+
+
+
--- a/ComputeGraph/documentation/PythonGen.md
+++ b/ComputeGraph/documentation/PythonGen.md
@ -0,0 +1,19 @@
+#### Options for Python code generation only
+
+##### pyOptionalArgs (default = "")
+
+Optional arguments to pass to the Python version of the scheduler function
+
+##### customPythonName (default = "custom")
+
+Name of custom header in generated Python code. If you use several scheduler, you may want to use different headers for each one.
+
+##### appNodesPythonName (default = "appnodes")
+
+Name of AppNodes header in generated Python code. If you use several scheduler, you may want to use different headers for each one.
+
+##### schedulerPythonFileName (default = "sched")
+
+Name of scheduler file in generated Python code. If you use several scheduler, you may want to use different headers for each one.
+
+If the option is set to `xxx`, the name generated will be `xxx.py`
--- a/ComputeGraph/documentation/PythonNodes.md
+++ b/ComputeGraph/documentation/PythonNodes.md
@ -0,0 +1,65 @@
+# Python Nodes and classes
+
+(DOCUMENTATION TO BE WRITTEN)
+
+## Mandatory classes
+
+FIFO
+
+GenericNode
+
+GenericNode12
+
+GenericNode13
+
+GenericNode21
+
+GenericSource
+
+GenericSink
+
+OverlapAdd
+
+SlidingBuffer
+
+## Optional nodes
+
+CFFT
+
+CIFFT
+
+InterleavedStereoToMono
+
+MFCC
+
+NullSink
+
+ToComplex
+
+ToReal
+
+Unzip
+
+Zip
+
+Duplicate
+
+Duplicate2
+
+Duplicate3
+
+### Host
+
+FileSink
+
+FileSource
+
+WavSource
+
+WavSink 
+
+NumpySink
+
+VHTSource
+
+VHTSink
--- a/ComputeGraph/documentation/SchedOptions.md
+++ b/ComputeGraph/documentation/SchedOptions.md
@ -0,0 +1,31 @@
+### Options for the scheduling
+
+Those options needs to be used on a configuration objects passed as argument of the scheduling function. For instance:
+
+```python
+conf = Configuration()
+conf.debugLimit = 10
+sched = g.computeSchedule(config = conf)
+```
+
+Note that the configuration object also contain options for the code generators.
+
+#### memoryOptimization (default = False)
+
+When the amount of data written to a FIFO and read from the FIFO is the same, the FIFO is just an array. In this case, depending on the scheduling, the memory used by different arrays may be reused if those arrays are not needed at the same time.
+
+This option is enabling an analysis to optimize the memory usage by merging some buffers when it is possible.
+
+#### sinkPriority (default = True)
+
+Try to prioritize the scheduling of the sinks to minimize the latency between sources and sinks.
+
+When  this option is enabled, the tool may not be able to find a schedule in all cases. If it can't find a schedule, it will raise a `DeadLock` exception.
+
+#### displayFIFOSizes (default = False)
+
+During computation of the schedule, the evolution of the FIFO sizes is generated on `stdout`.
+
+#### dumpSchedule (default = False)
+
+During computation of the schedule, the human readable schedule is generated on `stdout`.
--- a/ComputeGraph/examples/README.md
+++ b/ComputeGraph/examples/README.md
@ -1,5 +1,15 @@
 ## How to build the examples

+First, you must install the `CMSIS-DSP` PythonWrapper:
+
+```
+pip install cmsisdsp
+```
+
+The functions and classes inside the cmsisdsp wrapper can be used to describe and generate the schedule.
+
+You need a recent Graphviz dot tool supporting the HTML-like labels. You'll need `cmake` and `make`
+
 In folder `ComputeGraph/example/build`, type the `cmake` command:

 ```bash
@ -9,7 +19,7 @@ cmake -DHOST=YES \
   -G "Unix Makefiles" ..
 ```

-The Graphviz dot tool is requiring a recent version supporting the HTML-like labels.
+The core include directory is `...CMSIS_5/Core` ...

 If cmake is successful, you can type `make` to build the examples. It will also build CMSIS-DSP for the host.

@ -34,4 +44,19 @@ For `example3` which is using an input file, `cmake` should have copied the inpu
 python main.py
 ```

-`example7` is communicating with `OpenModelica`. You need to install the VHTModelica blocks from the [VHT-SystemModeling](https://github.com/ARM-software/VHT-SystemModeling) project on our GitHub
+`example7` is communicating with `OpenModelica`. You need to install the VHTModelica blocks from the [AVH-SystemModeling](https://github.com/ARM-software/VHT-SystemModeling) project on our GitHub
+
+# List of examples
+
+* [Simple example](simple/README.md) : How to get started
+* [Example 1](example1/README.md) : Sample as the simple example but explaining how to add arguments to the scheduler API and node constructors
+* [Example 2](example2/README.md) : Explain how to use CMSIS-DSP pure functions (no state) and add delay on the arcs of the graph. Explain some configuration options for the schedule generation.
+* [Example 3 ](example3/README.md) : A full signal processing example with CMSIS-DSP using FFT and sliding windows and overlap and add node
+* [Example 4](example4/README.md) : Same as examples 3 but where we generate a Python implementation rather than a C++ implementation. The resulting graph can be executed than to the CMSIS-DSP Python wrapper
+* [Example 5](example5/README.md) : Another pure Python example showing how to compute a sequence of Q15 MFCC and generate an animation (using also the CMSIS-DSP Python wrapper)
+* [Example 6](example6/README.md) : Same as example 5 but with C++ code generation
+* [Example 7](example7/README.md) : Pure Python example demonstrating a communication between the compute graph and OpenModelica to generate a Larsen effect
+* [Example 8](example8/README.md) : Introduce structured datatype for the samples and implicit `Duplicate` nodes for the graph
+* [Example 9](example9/README.md) : Check that duplicate nodes and arc delays are working together and a scheduling is generated
+* [Example 10 : The dynamic dataflow mode](example10/README.md)
+
--- a/ComputeGraph/examples/References.md
+++ b/ComputeGraph/examples/References.md
@ -1,36 +0,0 @@
-# Reference statistics
-
-The different examples should return following schedule statistics:
-
-
-## Example 1
-    Schedule length = 17
-    Memory usage 64 bytes
-
-## Example 2
-    Schedule length = 302
-    Memory usage 10720 bytes
-
-## Example 3 
-    Schedule length = 25
-    Memory usage 11264 bytes
-
-## Example 4
-    Schedule length = 25
-    Memory usage 11264 bytes
-
-## Example 5 
-    Schedule length = 292
-    Memory usage 6614 bytes
-
-## Example 6 
-    Schedule length = 17
-    Memory usage 2204 bytes
-
-## Example 7 
-    Schedule length = 3
-    Memory usage 512 bytes
-
-## Example 8 
-    Schedule length = 37
-    Memory usage 288 bytes
--- a/ComputeGraph/examples/example10/README.md
+++ b/ComputeGraph/examples/example10/README.md
@ -2,13 +2,13 @@

 Please refer to the [simple example](../simple/README.md) to have an overview of how to define a graph and it nodes and how to generate the C++ code for the static scheduler. This document is only explaining additional details

-This example is implementing a dynamic / asynchronous mode.
+This example is implementing a [dynamic / asynchronous mode](../../Async.md).

 It is enabled in `graph.py` with:

 `conf.asynchronous = True`

-The FIFO sizes are doubled with:
+There is an option to increase the FIFO size compared to their synchronous values. To double the value (increase by `100%`) we write:

 `conf.FIFOIncrease = 100`

@ -24,10 +24,17 @@ The even source is generating a value only when the count is even.

 The processing is adding its inputs. If no data is available on an input, 0 is used.

-In case of fifo overflow or underflow, any node will skip its execution.
+In case of FIFO overflow or underflow, any node will skip its execution.

 All nodes are generating or consuming one sample but the FIFOs have a size of 2 because of the 100% increase requested in the configuration settings.

+Thus in this example :
+
+* A sample is not always generated on an edge
+* A sample is not always available on an edge
+
+The dataflow on each edge is thus not static and vary between iterations of the schedule
+
 ## Expected outputs

 ```
--- a/ComputeGraph/examples/simple/README.md
+++ b/ComputeGraph/examples/simple/README.md
@ -1,5 +1,7 @@
 # README

+This example is inside the folder `examples/simple` of the Compute graph folder.
+
 This example explains how to create a very simple synchronous compute graph with 3 nodes:

 ![simple](docassets/simple.png)
@ -22,7 +24,7 @@ The processing node is working on packets of 7 values.

 The graph is described with a Python script `create.py` and this document will explain how to write this Python script to define the nodes and their connections.

-When this Python script is executed, it will compute a static schedule and generate a C++ implementation. This implementation is using some C++ nodes that must have been defined somewhere. This document will explain how to write those nodes and make them available to the C++ scheduler.
+When this Python script is executed, it will compute a static schedule and generate a C++ implementation. This implementation is using some C++ wrapper that must have been defined somewhere. This document will explain how to write those wrappers and make them available to the C++ scheduler.

 To run the script you first must install the CMSIS-DSP Python package:

@ -42,13 +44,15 @@ A graphical representation of the graph is generated in graphviz dot format. If

 `dot -Tpng -o simple.png simple.dot`

+The executable can be built (as explained below) by compiling the files `scheduler.cpp` and `main.cpp`.
+
 ## How to write the Python script

-Let's look at the required steps in reverse order starting first with how to generate some C++ code.
+Let's look at the required steps in reverse order starting first with how to generate the C++ code for the scheduler.

 ### Generating the C++ code and the Graphviz representation

-The file `create.py` will generate the C++ scheduler when run. This file is assuming that the nodes and the graph have already been defined somewhere else. The first lines of this script are including the nodes and graph definitions:
+The Python script `create.py` will generate the C++ scheduler when run. This file is assuming that the nodes and the graph have already been defined somewhere else. The first lines of this script are including the nodes and graph definitions:

 ```python
 from nodes import * 
@ -85,7 +89,7 @@ print("Schedule length = %d" % scheduling.scheduleLength)
 print("Memory usage %d bytes" % scheduling.memory)
 ```

-The scheduling length is the number of node executions required for one scheduling iterations.
+The scheduling length is the number of node executions required for one scheduling iteration.

 The memory usage is the space required by all the FIFOs expressed in bytes.

@ -108,10 +112,9 @@ Now that we have computed the scheduling, we are ready to generate the C++ imple

 ```python
 scheduling.ccode("generated",conf)
-
 ```

-`"generated" ` is the name of the folder where the files are generated (relative to the working directory of the script). It is possible to customize the naming of the generated files using the `Configuration` object `conf` we created to limit the number of iterations.
+`"generated" ` is the name of the folder where the files are generated (relative to the working directory of the script). It is possible to customize the naming of the generated files using the `Configuration` object `conf` .

 We can also generated a `graphviz` file that can then be processed with the `dot` tool to generate a picture of the graph:

@ -139,9 +142,11 @@ We need the definitions from the CMSIS-DSP Python wrapper to define the datatype
 floatType = CType(F32)
 ```

+#### How to instantiate the nodes
+
 The nodes are created like any other Python object. The API is not standardized. The compute graph should be able to work with any library of standard components. In this example, the node APIs are first listing the input, then the outputs. And for each IO, we define the data type and the number of samples produced or consumed.

-#### How to instantiate the source:
+##### How to instantiate the source:

 ```python
 src = Source("source",floatType,5)
@ -149,21 +154,21 @@ src = Source("source",floatType,5)

 A Python object `src` is created from the Python class `Source`. In the generated code, and in the pictures of the graph, this node will be named "source". This name must thus be a valid C variable name.

-The datatype is the second argument of the constructor. It is the float datatype we defined just before. The last argument is the number of sample produced by the node ar each execution : 5 samples.
+The datatype is the second argument of the constructor. It is the float datatype we defined just before. The last argument is the number of sample produced by the node at each execution : 5 samples.

-#### How to instantiate the processing node:
+##### How to instantiate the processing node:

 ```python
 processing = ProcessingNode("processing",floatType,7,7)
 ```

-It is very similar to the sink. We just need to specify two sizes : the number of samples consumed and number of samples produced. This node is using the same data type for both input and output.
+It is very similar to the source. We just need to specify two sizes : the number of samples consumed and number of samples produced. This node is using the same data type for both input and output.

 As we will see later, the C++ implementation of the node is only supporting the case where the number of samples produced is equal to the number of samples consumed. If it is not the case, the solution won't build. It is caught at the type system level. This constraint could have been enforced at the Python level.

-It demonstrates that a Python description of a node can be very generic and anticipate on future use cases and implementation without introducing problem at runtime since some validation is occurring on the C++ side.
+It demonstrates that a Python description of a node can be very generic and anticipate on future use cases without introducing problem at runtime since some validation is occurring on the C++ side.

-#### How to instantiate the sink:
+##### How to instantiate the sink:

 ```python
 sink = Sink("sink",floatType,5)
@ -204,8 +209,6 @@ The script `nodes.py` is defining the nodes needed for this example. The first l
 from cmsisdsp.cg.scheduler import GenericNode,GenericSink,GenericSource
 ```

-
-
 #### The source

 The source is defined with:
@ -221,8 +224,6 @@ class Source(GenericSource):
        return "Source"
 ```

-
-
 It is a lot but it is not complex. Let's detail each part of this definition:

 ```python
@ -257,7 +258,7 @@ There is a last part in the definition of the node:
        return "Source"
 ```

-This defines the name of the C++ class implementing the node.
+This defines the name of the C++ wrapper implementing the node.

 #### The processing node

@ -307,11 +308,11 @@ The C++ template is also providing some entry points to enable the scheduler to
 * Access to the FIFOs
 * Running of the code

-Those C++ templates should thus be very light.
+Those C++ templates should thus be very light and that's why we prefer to speak of C++ wrappers rather than C++ objects. The code for the algorithms will generally be outside of those wrappers (and will often be in C).

-Those templates are defined in a file `AppNodes.h` included by the scheduler (it is possible to change the name from the Pyuthon script). This file must be provided by the user of the ComputeGraph framework.
+Those templates are defined in a file `AppNodes.h` included by the scheduler (it is possible to change the name from the Python script). This file must be provided by the user of the ComputeGraph framework.

-### The source
+### The source C++ wrapper

 First, like with Python, we need to define the datatype:

@ -335,10 +336,16 @@ This template can be used to implement different kind of `Source` classes : with

 You don't need to be knowledgeable in C++ template to start using them in the context of the compute graph. They are just here to define the plumbing.

-Now, when you have declared a C++ template, you need to implement it. There are two ways to do it:
+The only thing to understand is that:

-* You can define a generic implementation
-* And/or you can define specialized implementations for specific datatypes or sizes.
+* `Source<X,Y>` is the datatype where the template argument has been replaced by the types `X` and `Y`. 
+* `Source<X,Y>` is a different datatype than `Source<X',Y'>` if `X` and `X'` are for instance different types
+* `X` and `Y` may be numbers (so a number is considered as a type in this context)
+
+When you have declared a C++ template, you need to implement it. There are two ways to do it:
+
+* You can define a generic implementation for `Source`
+* And/or you can define specialized implementations for specific types (`Source<X,Y>`).

 For the `Source` we have defined a generic implementation so we need (like in Python case) to inherit from `GenericSource`:

@ -360,7 +367,7 @@ We also need to initialize the `GenericSource` parent since we are inheriting fr

 The constructor is here doing nothing more than initializing the parent and the implementation is empty `{}`

- Then, the implementation needs to provide an entry point to be usable from the scheduler. It is the `run` function. As said before, since the algorithm is very simple it has been implemented in `run`. In general, `run` is just calling an external function with the buffers coming from the FIFOs.
+The implementation of `Source` needs to provide an entry point to be usable from the scheduler. It is the `run` function. As said before, since the algorithm is very simple it has been implemented in `run`. In general, `run` is just calling an external function with the buffers coming from the FIFOs.

 ```C++
 int run() final {
@ -375,15 +382,17 @@ int run() final {
    };
 ```

-
-
 The first line is the important one:

 ```C++
 OUT *b=this->getWriteBuffer();
 ```

-We get a pointer to be able to write in the output FIFO. This pointer has the datatype OUT coming from the template so can be anything. **Those functions (`getWriteBuffer` and/or `getReadBuffer`)  must always be used even if the node is doing nothing because FIFOs are only updated when those functions are used.**
+We get a pointer to be able to write in the output FIFO. This pointer has the datatype OUT coming from the template so can be anything. 
+
+**Those functions (`getWriteBuffer` and/or `getReadBuffer`)  must always be used even if the node is doing nothing because FIFOs are only updated when those functions are used.**
+
+So for each IO, the corresponding function must be called even if nothing is read or written on this IO. Of course, in a synchronous mode it would not make sense to do nothing with an IO. But, sometimes, for debug, it can be interesting to have nodes like a `NullSink` that would just consume everything but do nothing.

 The code in the loop is casting an `int` (the loop index) into the `OUT` datatype. If it is not possible it won't typecheck and build.

@ -394,7 +403,7 @@ for(int i=0;i<outputSize;i++)
 }
 ```

-So, although we have not provided a specific implementation of the template, this template can only work with specific `OUT` datatypes.
+So, although we have not provided a specific implementation of the template, this template can only work with specific `OUT` datatypes because of the implementation. It is not a generic implementation.

 The return of the function `run` is to inform the scheduler that no error occurred. In synchronous mode, errors (like underflow or overflow) cannot occur due to the scheduling but only because of a broken real time. So any error returned by a node will stop the scheduling.

@ -421,7 +430,7 @@ class ProcessingNode<IN,inputOutputSize,IN,inputOutputSize>

 This enforces that the `OUT` datatype is equal to the `IN` datatype since `IN` is used in both arguments.

-It also envorces that the input and output sizes are the same since `inputOutputSize` is used in the two arguments for the size.
+It also enforces that the input and output sizes are the same since `inputOutputSize` is used in the two arguments for the size.

 Since the arguments of the template are still not fully specified and there is some remaining degree of freedom, we need to continue to define some template parameters:

@ -446,6 +455,8 @@ class ProcessingNode:
      public GenericNode<IN,inputSize,OUT,outputSize>
 ```

+In the generic implementation we do not use `<>` after `ProcessingNode` since we do not specify specific values of the template arguments.
+
 It is possible to have several specialization of the same class.

 One could also have another specialization like:
@ -484,7 +495,7 @@ It is a C API that can be used from C code.

 In case of error, the function is returning :

-* the number of schedule iterations computed since 
+* the number of schedule iterations computed since the beginning
 * an error code.

 It is possible, from the Python script, to add arguments to this API when there is the need to pass additional information to the nodes.