Started to rework the documentation for the ComputeGraph examples.

pull/94/head
Christophe Favergeon 3 years ago
parent 95dc3f3807
commit dfb67ee993

@ -20,7 +20,7 @@ The FIFOs lengths are represented on each edge of the graph : 11 samples for the
In blue, the amount of samples generated or consumed by a node each time it is called.
<img src="documentation/graph1.PNG" alt="graph1" style="zoom:50%;" />
<img src="examples/example1/docassets/graph1.PNG" alt="graph1" style="zoom:100%;" />
When the processing is applied to a stream of samples then the problem to solve is :
@ -61,7 +61,7 @@ The tools will generate a schedule and the FIFOs. Even if you don't use this at
Let's look at an (artificial) example:
<img src="documentation/graph1.PNG" alt="graph1" style="zoom:50%;" />
<img src="examples/example1/docassets/graph1.PNG" alt="graph1" style="zoom:100%;" />
Without a tool, the user would probably try to modify the number of samples so that the number of sample produced is equal to the number of samples consumed. With the CG Tools we know that such a graph can be scheduled and that the FIFO sizes need to be 11 and 5.
@ -128,11 +128,11 @@ If you have declared new nodes in `graph.py` then you'll need to provide an impl
More details and explanations can be found in the documentation for the examples. The first example is a deep dive giving all the details about the Python and C++ sides of the tool:
* [Example 1 : how to describe a simple graph](documentation/examples/example1/README.md)
* [Example 2 : More complex example with delay and CMSIS-DSP](documentation/examples/example2/README.md)
* [Example 3 : Working example with CMSIS-DSP and FFT](documentation/examples/example3/README.md)
* [Example 4 : Same as example 3 but with the CMSIS-DSP Python wrapper](documentation/examples/example4/README.md)
* [Example 10 : The asynchronous mode](documentation/examples/example10/README.md)
* [Example 1 : how to describe a simple graph](examples/example1/README.md)
* [Example 2 : More complex example with delay and CMSIS-DSP](examples/example2/README.md)
* [Example 3 : Working example with CMSIS-DSP and FFT](examples/example3/README.md)
* [Example 4 : Same as example 3 but with the CMSIS-DSP Python wrapper](examples/example4/README.md)
* [Example 10 : The asynchronous mode](examples/example10/README.md)
Examples 5 and 6 are showing how to use the CMSIS-DSP MFCC with a synchronous data flow.
@ -146,262 +146,11 @@ There is a [FAQ](FAQ.md) document.
## Options
Several options can be used in the Python to control the schedule generation. Some options are used by the scheduling algorithm and other options are used by the code generators or graphviz generator:
### Options for the graph
Those options needs to be used on the graph object created with `Graph()`.
For instance :
```python
g = Graph()
g.defaultFIFOClass = "FIFO"
```
#### defaultFIFOClass (default = "FIFO")
Class used for FIFO by default. Can also be customized for each connection (`connect` of `connectWithDelay` call) with something like:
`g.connect(src.o,b.i,fifoClass="FIFOClassNameForThisConnection")`
#### duplicateNodeClassName(default="Duplicate")
Prefix used to generate the duplicate node classes like `Duplicate2`, `Duplicate3` ...
### Options for the scheduling
Those options needs to be used on a configuration objects passed as argument of the scheduling function. For instance:
```python
conf = Configuration()
conf.debugLimit = 10
sched = g.computeSchedule(config = conf)
```
Note that the configuration object also contain options for the code generators.
#### memoryOptimization (default = False)
When the amount of data written to a FIFO and read from the FIFO is the same, the FIFO is just an array. In this case, depending on the scheduling, the memory used by different arrays may be reused if those arrays are not needed at the same time.
This option is enabling an analysis to optimize the memory usage by merging some buffers when it is possible.
#### sinkPriority (default = True)
Try to prioritize the scheduling of the sinks to minimize the latency between sources and sinks.
When this option is enabled, the tool may not be able to find a schedule in all cases. If it can't find a schedule, it will raise a `DeadLock` exception.
#### displayFIFOSizes (default = False)
During computation of the schedule, the evolution of the FIFO sizes is generated on `stdout`.
#### dumpSchedule (default = False)
During computation of the schedule, the human readable schedule is generated on `stdout`.
### Options for the code generator
#### debugLimit (default = 0)
When `debugLimit` is > 0, the number of iterations of the scheduling is limited to `debugLimit`. Otherwise, the scheduling is running forever or until an error has occured.
#### dumpFIFO (default = False)
When true, generate some code to dump the FIFO content at runtime. Only useful for debug.
In C++ code generation, it is only available when using the mode `codeArray == False`.
When this mode is enabled, the first line of the scheduler file is :
`#define DEBUGSCHED 1`
and it also enable some debug code in `GenericNodes.h`
#### schedName (default = "scheduler")
Name of the scheduler function used in the generated code.
#### prefix (default = "")
Prefix to add before the FIFO buffer definitions. Those buffers are not static and are global. If you want to use several schedulers in your code, the buffer names used by each should be different.
Another possibility would be to make the buffer static by redefining the macro `CG_BEFORE_BUFFER`
#### Options for C Code Generation only
##### cOptionalArgs (default = "")
Optional arguments to pass to the C API of the scheduler function
It can either use a `string` or a list of `string` where an element is an argument of the function (and should be valid `C`).
##### codeArray (default = True)
When true, the scheduling is defined as an array. Otherwise, a list of function calls is generated.
A list of function call may be easier to read but if the schedule is long, it is not good for code size. In that case, it is better to encode the schedule as an array rather than a list of functions.
When `codeArray` is True, the option `switchCase`can also be used.
##### switchCase (default = True)
`codeArray` must be true or this option is ignored.
When the schedule is encoded as an array, it can either be an array of function pointers (`switchCase` false) or an array of indexes for a state machine (`switchCase` true)
##### eventRecorder (default = False)
Enable the generation of `CMSIS EventRecorder` intrumentation in the code. The CMSIS-DSP Pack is providing definition of 3 events:
* Schedule iteration
* Node execution
* Error
##### customCName (default = "custom.h")
Name of custom header in generated C code. If you use several scheduler, you may want to use different headers for each one.
##### postCustomCName (default = "")
Name of custom header in generated C code coming after all of the other includes.
##### genericNodeCName (default = "GenericNodes.h")
Name of GenericNodes header in generated C code. If you use several scheduler, you may want to use different headers for each one.
##### appNodesCName (default = "AppNodes.h")
Name of AppNodes header in generated C code. If you use several scheduler, you may want to use different headers for each one.
##### schedulerCFileName (default = "scheduler")
Name of scheduler cpp and header in generated C code. If you use several scheduler, you may want to use different headers for each one.
If the option is set to `xxx`, the names generated will be `xxx.cpp` and `xxx.h`
##### CAPI (default = True)
By default, the scheduler function is callable from C. When false, it is a standard C++ API.
##### CMSISDSP (default = True)
If you don't use any of the datatypes or functions of the CMSIS-DSP, you don't need to include the `arm_math.h` in the scheduler file. This option can thus be set to `False`.
##### asynchronous (default = False)
When true, the scheduling is for a dynamic / asynchronous flow. A node may not always produce or consume the same amount of data. As consequence, a scheduling can fail. Each node needs to implement a `prepareForRunning` function to identify and recover from FIFO underflows and overflows.
A synchronous schedule is used as start and should describe the average case.
This implies `codeArray` and `switchCase`. This disables `memoryOptimizations`.
Synchronous FIFOs that are just buffers will be considered as FIFOs in asynchronous mode.
More info are available in the documentation for [this mode](Dynamic.md).
##### FIFOIncrease (default 0)
In case of dynamic / asynchronous scheduling, the FIFOs may need to be bigger than what is computed assuming a static / synchronous scheduling. This option is used to increase the FIFO size. It represents a percent increase.
For instance, a value of 10 means the FIFO will have their size updated from `oldSize` to `1.1 * oldSize` which is ` (1 + 10%)* oldSize`
If the value is a `float` instead of an `int` it will be used as is. For instance, `1.1` would increase the size by `1.1` and be equivalent to the setting `10` (for 10 percent).
##### asyncDefaultSkip (default True)
Behavior of a pure function (like CMSIS-DSP) in asynchronous mode. When `True`, the execution is skipped if the function can't be executed. If `False`, an error is raised.
If another error recovery is needed, the function must be packaged into a C++ class to implement a `prepareForRun` function.
#### Options for Python code generation only
##### pyOptionalArgs (default = "")
Optional arguments to pass to the Python version of the scheduler function
##### customPythonName (default = "custom")
Name of custom header in generated Python code. If you use several scheduler, you may want to use different headers for each one.
##### appNodesPythonName (default = "appnodes")
Name of AppNodes header in generated Python code. If you use several scheduler, you may want to use different headers for each one.
##### schedulerPythonFileName (default = "sched")
Name of scheduler file in generated Python code. If you use several scheduler, you may want to use different headers for each one.
If the option is set to `xxx`, the name generated will be `xxx.py`
### Options for the graphviz generator
#### horizontal (default = True)
Horizontal or vertical layout for the graph.
#### displayFIFOBuf (default = False)
By default, the graph is displaying the FIFO sizes. If you want to know with FIFO variable is used in the code, you can set this option to true and the graph will display the FIFO variable names.
### Options for connections
It is now possible to write something like:
```python
g.connect(src.o,b.i,fifoClass="FIFOSource")
```
The `fifoClass` argument allows to choose a specific FIFO class in the generated C++ or Python.
Only the `FIFO` class is provided by default. Any new implementation must inherit from `FIFObase<T>`
There is also an option to set the scaling factor when used in asynchronous mode:
```python
g.connect(odd.o,debug.i,fifoScale=3.0)
```
When this option is set, it will be used (instead of the global setting). This must be a float.
There is a document describing the [list](documentation/Options.md) of available options
## How to build the examples
In folder `ComputeGraph/example/build`, type the `cmake` command:
```bash
cmake -DHOST=YES \
-DDOT="path to dot.EXE" \
-DCMSISCORE="path to cmsis core include directory" \
-G "Unix Makefiles" ..
```
The Graphviz dot tool is requiring a recent version supporting the HTML-like labels.
If cmake is successful, you can type `make` to build the examples. It will also build CMSIS-DSP for the host.
If you don't have graphviz, the option -DDOT can be removed.
If for some reason it does not work, you can go into an example folder (for instance example1), and type the commands:
```bash
python graph.py
dot -Tpdf -o test.pdf test.dot
```
It will generate the C++ files for the schedule and a pdf representation of the graph.
Note that the Python code is relying on the CMSIS-DSP PythonWrapper which is now also containing the Python scripts for the Synchronous Data Flow.
For `example3` which is using an input file, `cmake` should have copied the input test pattern `input_example3.txt` inside the build folder. The output file will also be generated in the build folder.
`example4` is like `example3` but in pure Python and using the CMSIS-DSP Python wrapper (which must already be installed before trying the example). To run a Python example, you need to go into an example folder and type:
```bash
python main.py
```
`example7` is communicating with `OpenModelica`. You need to install the VHTModelica blocks from the [VHT-SystemModeling](https://github.com/ARM-software/VHT-SystemModeling) project on our GitHub
There is a document explaining [how to build the examples](examples/README.md).
## Limitations

@ -1 +1,220 @@
Options
## Options
Several options can be used in the Python to control the schedule generation. Some options are used by the scheduling algorithm and other options are used by the code generators or graphviz generator:
### Options for the graph
Those options needs to be used on the graph object created with `Graph()`.
For instance :
```python
g = Graph()
g.defaultFIFOClass = "FIFO"
```
#### defaultFIFOClass (default = "FIFO")
Class used for FIFO by default. Can also be customized for each connection (`connect` of `connectWithDelay` call) with something like:
`g.connect(src.o,b.i,fifoClass="FIFOClassNameForThisConnection")`
#### duplicateNodeClassName(default="Duplicate")
Prefix used to generate the duplicate node classes like `Duplicate2`, `Duplicate3` ...
### Options for the scheduling
Those options needs to be used on a configuration objects passed as argument of the scheduling function. For instance:
```python
conf = Configuration()
conf.debugLimit = 10
sched = g.computeSchedule(config = conf)
```
Note that the configuration object also contain options for the code generators.
#### memoryOptimization (default = False)
When the amount of data written to a FIFO and read from the FIFO is the same, the FIFO is just an array. In this case, depending on the scheduling, the memory used by different arrays may be reused if those arrays are not needed at the same time.
This option is enabling an analysis to optimize the memory usage by merging some buffers when it is possible.
#### sinkPriority (default = True)
Try to prioritize the scheduling of the sinks to minimize the latency between sources and sinks.
When this option is enabled, the tool may not be able to find a schedule in all cases. If it can't find a schedule, it will raise a `DeadLock` exception.
#### displayFIFOSizes (default = False)
During computation of the schedule, the evolution of the FIFO sizes is generated on `stdout`.
#### dumpSchedule (default = False)
During computation of the schedule, the human readable schedule is generated on `stdout`.
### Options for the code generator
#### debugLimit (default = 0)
When `debugLimit` is > 0, the number of iterations of the scheduling is limited to `debugLimit`. Otherwise, the scheduling is running forever or until an error has occured.
#### dumpFIFO (default = False)
When true, generate some code to dump the FIFO content at runtime. Only useful for debug.
In C++ code generation, it is only available when using the mode `codeArray == False`.
When this mode is enabled, the first line of the scheduler file is :
`#define DEBUGSCHED 1`
and it also enable some debug code in `GenericNodes.h`
#### schedName (default = "scheduler")
Name of the scheduler function used in the generated code.
#### prefix (default = "")
Prefix to add before the FIFO buffer definitions. Those buffers are not static and are global. If you want to use several schedulers in your code, the buffer names used by each should be different.
Another possibility would be to make the buffer static by redefining the macro `CG_BEFORE_BUFFER`
#### Options for C Code Generation only
##### cOptionalArgs (default = "")
Optional arguments to pass to the C API of the scheduler function
It can either use a `string` or a list of `string` where an element is an argument of the function (and should be valid `C`).
##### codeArray (default = True)
When true, the scheduling is defined as an array. Otherwise, a list of function calls is generated.
A list of function call may be easier to read but if the schedule is long, it is not good for code size. In that case, it is better to encode the schedule as an array rather than a list of functions.
When `codeArray` is True, the option `switchCase`can also be used.
##### switchCase (default = True)
`codeArray` must be true or this option is ignored.
When the schedule is encoded as an array, it can either be an array of function pointers (`switchCase` false) or an array of indexes for a state machine (`switchCase` true)
##### eventRecorder (default = False)
Enable the generation of `CMSIS EventRecorder` intrumentation in the code. The CMSIS-DSP Pack is providing definition of 3 events:
* Schedule iteration
* Node execution
* Error
##### customCName (default = "custom.h")
Name of custom header in generated C code. If you use several scheduler, you may want to use different headers for each one.
##### postCustomCName (default = "")
Name of custom header in generated C code coming after all of the other includes.
##### genericNodeCName (default = "GenericNodes.h")
Name of GenericNodes header in generated C code. If you use several scheduler, you may want to use different headers for each one.
##### appNodesCName (default = "AppNodes.h")
Name of AppNodes header in generated C code. If you use several scheduler, you may want to use different headers for each one.
##### schedulerCFileName (default = "scheduler")
Name of scheduler cpp and header in generated C code. If you use several scheduler, you may want to use different headers for each one.
If the option is set to `xxx`, the names generated will be `xxx.cpp` and `xxx.h`
##### CAPI (default = True)
By default, the scheduler function is callable from C. When false, it is a standard C++ API.
##### CMSISDSP (default = True)
If you don't use any of the datatypes or functions of the CMSIS-DSP, you don't need to include the `arm_math.h` in the scheduler file. This option can thus be set to `False`.
##### asynchronous (default = False)
When true, the scheduling is for a dynamic / asynchronous flow. A node may not always produce or consume the same amount of data. As consequence, a scheduling can fail. Each node needs to implement a `prepareForRunning` function to identify and recover from FIFO underflows and overflows.
A synchronous schedule is used as start and should describe the average case.
This implies `codeArray` and `switchCase`. This disables `memoryOptimizations`.
Synchronous FIFOs that are just buffers will be considered as FIFOs in asynchronous mode.
More info are available in the documentation for [this mode](Dynamic.md).
##### FIFOIncrease (default 0)
In case of dynamic / asynchronous scheduling, the FIFOs may need to be bigger than what is computed assuming a static / synchronous scheduling. This option is used to increase the FIFO size. It represents a percent increase.
For instance, a value of 10 means the FIFO will have their size updated from `oldSize` to `1.1 * oldSize` which is ` (1 + 10%)* oldSize`
If the value is a `float` instead of an `int` it will be used as is. For instance, `1.1` would increase the size by `1.1` and be equivalent to the setting `10` (for 10 percent).
##### asyncDefaultSkip (default True)
Behavior of a pure function (like CMSIS-DSP) in asynchronous mode. When `True`, the execution is skipped if the function can't be executed. If `False`, an error is raised.
If another error recovery is needed, the function must be packaged into a C++ class to implement a `prepareForRun` function.
#### Options for Python code generation only
##### pyOptionalArgs (default = "")
Optional arguments to pass to the Python version of the scheduler function
##### customPythonName (default = "custom")
Name of custom header in generated Python code. If you use several scheduler, you may want to use different headers for each one.
##### appNodesPythonName (default = "appnodes")
Name of AppNodes header in generated Python code. If you use several scheduler, you may want to use different headers for each one.
##### schedulerPythonFileName (default = "sched")
Name of scheduler file in generated Python code. If you use several scheduler, you may want to use different headers for each one.
If the option is set to `xxx`, the name generated will be `xxx.py`
### Options for the graphviz generator
#### horizontal (default = True)
Horizontal or vertical layout for the graph.
#### displayFIFOBuf (default = False)
By default, the graph is displaying the FIFO sizes. If you want to know with FIFO variable is used in the code, you can set this option to true and the graph will display the FIFO variable names.
### Options for connections
It is now possible to write something like:
```python
g.connect(src.o,b.i,fifoClass="FIFOSource")
```
The `fifoClass` argument allows to choose a specific FIFO class in the generated C++ or Python.
Only the `FIFO` class is provided by default. Any new implementation must inherit from `FIFObase<T>`
There is also an option to set the scaling factor when used in asynchronous mode:
```python
g.connect(odd.o,debug.i,fifoScale=3.0)
```
When this option is set, it will be used (instead of the global setting). This must be a float.

@ -0,0 +1,37 @@
## How to build the examples
In folder `ComputeGraph/example/build`, type the `cmake` command:
```bash
cmake -DHOST=YES \
-DDOT="path to dot.EXE" \
-DCMSISCORE="path to cmsis core include directory" \
-G "Unix Makefiles" ..
```
The Graphviz dot tool is requiring a recent version supporting the HTML-like labels.
If cmake is successful, you can type `make` to build the examples. It will also build CMSIS-DSP for the host.
If you don't have graphviz, the option -DDOT can be removed.
If for some reason it does not work, you can go into an example folder (for instance example1), and type the commands:
```bash
python graph.py
dot -Tpdf -o test.pdf test.dot
```
It will generate the C++ files for the schedule and a pdf representation of the graph.
Note that the Python code is relying on the CMSIS-DSP PythonWrapper which is now also containing the Python scripts for the Synchronous Data Flow.
For `example3` which is using an input file, `cmake` should have copied the input test pattern `input_example3.txt` inside the build folder. The output file will also be generated in the build folder.
`example4` is like `example3` but in pure Python and using the CMSIS-DSP Python wrapper (which must already be installed before trying the example). To run a Python example, you need to go into an example folder and type:
```bash
python main.py
```
`example7` is communicating with `OpenModelica`. You need to install the VHTModelica blocks from the [VHT-SystemModeling](https://github.com/ARM-software/VHT-SystemModeling) project on our GitHub

@ -1,375 +1,65 @@
# Example 1
In this example we will see how to describe the following graph:
Please refer to the [simple example](../simple/README.md) to have an overview of how to define a graph and it nodes and how to generate the C++ code for the static scheduler. This document is only explaining additional details:
<img src="docassets/graph1.PNG" alt="graph1" style="zoom:50%;" />
* How to define new arguments for the C implementation of the nodes
* How to define new arguments for the C API of the scheduler function
* More detailed description of the generated C++ scheduler
The framework is coming with some default blocks. But for this example, we will create new blocks. The blocks that you to create need must be described with a simple Python class and a corresponding simple C++ class.
The graph is is nearly the same as the one in the [simple example](../simple/README.md) but the processing node is just generating 5 samples in this example:
## The steps
<img src="docassets/graph1.PNG" alt="graph1" style="zoom:100%;" />
It looks complex because there is a lot of information but the process is always the same:
Contrary to the [simple example](../simple/README.md) , there is only one Python script `graph.py` and it is containing everything : nodes, graph description and C++ code generation.
1. You define new kind of nodes in the Python. They define the IOs, sample types and amount of data read/written on each IO
2. You create instance of those new kind of Nodes
3. You connect them in a graph and generate a schedule
4. In your AppNodes.h file , you implement the new kind of nodes with C++ templates:
1. The class is generally not doing a lot : defining the IOs and the function to call when run
5. If you need more control on the initialization, it is possible to pass additional arguments to the node constructors and to the scheduler function.
## Defining new arguments for a node and the scheduler
## Python code
For `ProcessingNode`, we are adding additional arguments in this example to show how it is possible to do it for initializing a node in the generated code.
Let's analyze the file `graph.py` in the `example1` folder. This file is describing the graph and the node and is calling the Python functions to generate the dot and C++ files.
First, we add some path so that the example can find the CG static packages when run from example1 folder.
```python
from cmsisdsp.cg.scheduler import *
```
Then, we describe the new kind of blocks that we need : Source, ProcessingNode and Sink.
```python
class Sink(GenericSink):
def __init__(self,name,theType,inLength):
GenericSink.__init__(self,name)
self.addInput("i",theType,inLength)
@property
def typeName(self):
return "Sink"
```
When creating a new kind of node (here a sink) we always need to do 2 things:
- Add a type in typeName. It will be used to create objects in C++ or Python. So it must be a valid C++ or Python class name ;
- Add inputs and outputs. The convention is that an input is named "i" and output "o". When there are several inputs they are named "ia", "ib" etc ...
- For a sink you can only add an input. So the function addOutput is not available.
- The constructor is taking a length and a type. It is used to create the io
- When there are several inputs or outputs, they are ordered using alphabetical order.
It is important to know what is the ID of the corresponding IO in the C code.
The definition of a new kind of Source is very similar:
```python
class Source(GenericSource):
def __init__(self,name,theType,inLength):
GenericSource.__init__(self,name)
self.addOutput("o",theType,inLength)
@property
def typeName(self):
return "Source"
```
Then for the processing node, we could define it directly. But, often there will be several Nodes in a graph, so it is useful to create a new Node blocks and inherit from it.
```python
class Node(GenericNode):
def __init__(self,name,theType,inLength,outLength):
GenericNode.__init__(self,name)
self.addInput("i",theType,inLength)
self.addOutput("o",theType,outLength)
```
Note that this new kind of block has no type. It just has an input and an output.
Now we can define the Processing node:
```python
class ProcessingNode(Node):
@property
def typeName(self):
return "ProcessingNode"
```
We just define its type.
Once it is done, we can start creating instance of those nodes. We will also need to define the type for the samples (float32 in this example). The functions and constants are defined in `cg.types`.
If `processing` is the node, we can add arguments with the APIs `addLiteralArg` and `addVariableArg`.
```python
floatType=CType(F32)
processing.addLiteralArg(4,"testString")
processing.addVariableArg("someVariable")
```
It is also possible to use a custom datatype, the `example8` is giving an example:
* `addLiteralArg(4,"testString")` will pass the value `4` as first additional argument of the C++ constructor (after the FIFOs) and the string `"testString"` as second additional argument of the C++ constructor (after the FIFOs)
* `addVariableArg("someVariable")` will pass the variable `someVariable` as third additional argument of the C++ constructor (after the FIFOs)
```python
complexType=CStructType("complex","MyComplex",8)
```
This is defining a new datatype that is mapped to the type `complex` in C/C++ and the class `MyComplex` in Python. The last argument is the size in bytes of the struct in C.
The type complex may be defined with:
```c
typedef struct {
float re;
float im;
} complex;
```
The constructor API will look like:
**Note that:**
- The value **must have** value semantic in C/C++. So avoid classes
- In Python, the classes have reference semantic which implies some constraints:
- You should never modify an object from the read buffer
- You should change the field of an object in the write buffer
- If you need a new object : copy or create a new object. Never use an object from the read buffer as it is if you intend to customize it
Once a datatype has been defined and chosen, we can define the nodes for the graph:
```python
src=Source("source",floatType,5)
b=ProcessingNode("filter",floatType,7,5)
sink=Sink("sink",floatType,5)
```
For each node, we define :
- The name (name of variable in C++ or Python generated code)
- The type for the inputs and outputs
- The numbers of samples consumed / produced on the io
- Inputs are listed first for the number of samples
For `ProcessingNode` we are adding additional arguments to show how it is possible to add other arguments for initializing a node in the generated code:
```python
b.addLiteralArg(4)
b.addLiteralArg("Test")
b.addVariableArg("someVariable")
```
The C++ for object of type `ProcessingNode` are taking 3 arguments in addition to the io. For those, arguments we are passing an int, a string and a variable name.
Now that the nodes have been created, we can create the graph and connect the nodes:
```python
g = Graph()
g.connect(src.o,b.i)
g.connect(b.o,sink.i)
```
Then, before we generate a schedule, we can define some configuration:
```python
conf=Configuration()
conf.debugLimit=1
```
Since it is streamed based processing, the schedule should run forever. For testing, we can limit the number of iterations. Here the generated code will run just one iteration of the schedule.
This configuration object can be used as argument of the scheduling function (named parameter config) and must be used as argument of the code generating functions.
There are other fields for the configuration:
- `dumpFIFO` : Will dump the output FIFOs content after each execution of the node (the code generator is inserting calls to the FIFO dump function)
- `displayFIFOSizes` : During the computation of the schedule, the Python script is displaying the evolution of the FIFO lengths.
- `schedName` : The name of the scheduler function (`scheduler` by default)
- `cOptionalArgs` and pyOptionalArgs for passing additional arguments to the scheduling function
- `prefix` to prefix the same of the global buffers
- `memoryOptimization` : Experimental. It is attempting to reuse buffer memory and share it between several FIFOs
- `codeArray` : Experimental. When a schedule is very long, representing it as a sequence of function calls is not good for the code size of the generated solution. When this option is enabled, the schedule is described with an array. It implies that the pure function calls cannot be inlined any more and are replaced by new nodes which are automatically generated.
- `eventRecorder` : Enable the support for the CMSIS Event Recorder.
In the example 1, we are passing a variable to initialize the node of type ProcessingNode. So, it would be great if this variable was an argument of the scheduler function. So we define:
```python
conf.cOptionalArgs="int someVariable"
```C++
ProcessingNode(FIFOBase<IN> &src,FIFOBase<OUT> &dst,int,const char*,int)
```
This will be added after the error argument of the scheduling function.
This API is defined in `AppNodes.h` by the developper. The types are not generated by the scripts. Here the variable `someVariable` is chosen to have type `int` hence the last argument of the constructor has type `int`. But it is not imposed by the Python script that is just declaring the existence of a variable.
Once we have a configuration object, we can start to compute the schedule and generate the code:
In the generated scheduler, the constructor is used as:
```python
sched = g.computeSchedule()
print("Schedule length = %d" % sched.scheduleLength)
print("Memory usage %d bytes" % sched.memory)
```C++
ProcessingNode<float32_t,7,float32_t,5> processing(fifo0,fifo1,4,"testString",someVariable);
```
A schedule is computed. We also display:
This variable `someVariable` must come from somewhere. The API of the scheduler is:
- The length of the schedule
- The total amount of memory used by all the FIFOs
We could also have used:
```python
sched = g.computeSchedule(config=conf)
```C++
extern uint32_t scheduler(int *error,int someVariable);
```
to use the configuration object if we needed to dump the FIFOs lengths.
Now, that we have a schedule, we can generate the graphviz and the C++ code:
This new argument to the scheduler is defined in the Python script:
```python
with open("test.dot","w") as f:
sched.graphviz(f)
sched.ccode("generated",conf)
conf.cOptionalArgs=["int someVariable"]
```
The C++ code will be generated in the `example1` folder `generated` : sched.cpp
## The C++ code
The C++ code generated in`scheduler.cpp` and `scheduler.h` in `generated` folder is relying on some additional files which must be provided by the developer:
- custom.h : to define some custom initialization or `#define` used by the code
- AppNodes.h to define the new C++ blocks
Let's look at custom.h first:
### custom.h
```c++
#ifndef _CUSTOM_H_
#endif _CUSTOM_H_
```
It is empty in `example1`. This file can be used to include or define some variables and constants used by the network.
### AppNodes.h
All the new nodes defined in the Python script must also be defined in the C++ code. They are very similar to the Python code but a bit more verbose.
```c++
template<typename IN, int inputSize>
class Sink: public GenericSink<IN, inputSize>
{
public:
Sink(FIFOBase<IN> &src):GenericSink<IN,inputSize>(src){};
int prepareForRunning() override
{
if (this->willUnderflow())
{
return(CG_SKIP_EXECUTION_ID_CODE); // Skip execution
}
return(0);
};
int run() override
{
IN *b=this->getReadBuffer();
printf("Sink\n");
for(int i=0;i<inputSize;i++)
{
std::cout << (int)b[i] << std::endl;
}
return(0);
};
};
```
The `Sink` is inheriting from the `GenericSink`. In the constructor we pass the fifos : input fifos first (output fifos are always following the input fifos when they are used. For a sink, we have no output fifos).
In the template parameters , we pass the type/length for each io : input first then followed by outputs (when there are some outputs).
The node must have a `run` function which is implementing the processing.
The `prepareForRunning` function is used only in dynamic / asynchronous mode. But it must be defined (even if not used) in static / synchronous mode or the code won#t build.
Here the sink is just dumping to stdout the content of the buffer. The amount of data read by `getReadBuffer` is defined in the `GenericSink` and is coming from the template parameter.
The `Source` definition is very similar:
```C++
template<typename OUT,int outputSize>
class Source: GenericSource<OUT,outputSize>
{
public:
Source(FIFOBase<OUT> &dst):GenericSource<OUT,outputSize>(dst),mCounter(0){};
int prepareForRunning() override
{
if (this->willOverflow())
{
return(CG_SKIP_EXECUTION_ID_CODE); // Skip execution
}
return(0);
};
int run() override
{
OUT *b=this->getWriteBuffer();
printf("Source\n");
for(int i=0;i<outputSize;i++)
{
b[i] = (OUT)mCounter++;
}
return(0);
};
int mCounter;
};
```
In this example, the source is just counting. And we only have output fifos.
`getWriteBuffer` and `getReadBuffer` must always be called on the io ports to ensure that
the FIFOs are not overflowing or underflowing (**even if the run function is doing nothing**).
No error detection is done because the static schedule is ensuring that no error will occur if you don't forget to call the functions in your nodes.
Finally, the processing node:
```C++
template<typename IN, int inputSize,typename OUT,int outputSize>
class ProcessingNode: public GenericNode<IN,inputSize,OUT,outputSize>
{
public:
ProcessingNode(FIFOBase<IN> &src,FIFOBase<OUT> &dst,int,const char*,int):GenericNode<IN,inputSize,OUT,outputSize>(src,dst){};
int prepareForRunning() override
{
if (this->willOverflow() ||
this->willUnderflow()
)
{
return(CG_SKIP_EXECUTION_ID_CODE); // Skip execution
}
return(0);
};
int run() override
{
printf("ProcessingNode\n");
IN *a=this->getReadBuffer();
OUT *b=this->getWriteBuffer();
b[0] =(OUT)a[3];
return(0);
};
};
```
The processing node is (very arbitrary) copying the value at index 3 to index 0 of the output.
The processing node is taking 3 arguments after the FIFOs in the constructor because the Python script is defining 3 additional arguments for this node : `int`, `string` and another `int` but passed trough a variable in the scheduler.
The C++ code generated in`scheduler.cpp` and `scheduler.h` in `generated` folder
### scheduler.cpp
#### Included headers
The generated code is first including the needed headers:
```C++
@ -386,22 +76,79 @@ The generated code is first including the needed headers:
- Application nodes
- scheduler API
Then, the generated code is defining the buffers for the FIFOs:
#### Macros
The generated code is then including some macro definitions that can all be redefined to customize some aspects of the generated scheduler. By default those macros, except `CHECKERROR`, are doing nothing:
* CHECKERROR
* Check for an error after each node executioin. Default action is to branch out of the scheduler loop and return an error
* CG_BEFORE_ITERATION
* Code to execute before each iteration of the scheduler
* CG_AFTER_ITERATION
* Code to executed after each iteration of the scheduler
* CG_BEFORE_SCHEDULE
* Code to execute before starting the scheduler loop
* CG_AFTER_SCHEDULE
* Code to execute after the end of the scheduler loop
* CG_BEFORE_BUFFER
* Code before any buffer definition. Can be used, for instance, to align a buffer or to put this buffer in a specific memory section
* CG_BEFORE_FIFO_BUFFERS
* Code included before the definitions of the globals FIFO buffers
* CG_BEFORE_FIFO_INIT
* Code to execute before the creation of the FIFO C++ objects
* CG_BEFORE_NODE_INIT
* Code to execute before the creation of the node C++ objects
* CG_AFTER_INCLUDES
* Code coming after the include files (useful to add other include files after the default ones)
* CG_BEFORE_SCHEDULER_FUNCTION
* Code defined before the scheduler function
* CG_BEFORE_NODE_EXECUTION
* Code executed before a node execution
* CG_AFTER_NODE_EXECUTION
* Code executed after a node execution and before the error checking
#### Memory buffers and FIFOs
Then, the generated code is defining the buffers for the FIFOs. First the size are defined:
```C++
CG_BEFORE_FIFO_BUFFERS
/***********
FIFO buffers
************/
#define FIFOSIZE0 11
float32_t buf0[FIFOSIZE0]={0};
#define FIFOSIZE1 5
float32_t buf1[FIFOSIZE1]={0};
```
The FIFOs may have size different from the buffer when a buffer is shared between different FIFOs. So, there are different defines for the buffer sizes:
```C++
#define BUFFERSIZE1 11
CG_BEFORE_BUFFER
float32_t buf1[BUFFERSIZE1]={0};
#define BUFFERSIZE2 5
CG_BEFORE_BUFFER
float32_t buf2[BUFFERSIZE2]={0};
```
In case of buffer sharing, a shared buffer will be defined with `int8_t` type. It is **very important** to align such a buffer by defining `CG_BEFORE_BUFFER` See the [FAQ](../../FAQ.md) for more information about alignment issues.
#### Description of the schedule
```C++
static unsigned int schedule[17]=
{
2,2,0,1,2,0,1,2,2,0,1,2,0,1,2,0,1,
};
```
There are different code generation modes in the compute graph. By default, the schedule is encoded as a list of numbers and a `switch/case` is used to execute the node corresponding to an identification number.
#### Scheduler API
Then, the scheduling function is generated:
```C++
@ -414,6 +161,8 @@ The returned valued is the number of schedules fully executed when the error occ
The `someVariable` is defined in the Python script. The Python script can add as many arguments as needed with whatever type is needed.
#### Scheduler locals
The scheduling function is starting with a definition of some variables used for debug and statistics:
```C++
@ -425,59 +174,147 @@ int32_t debugCounter=1;
Then, it is followed with a definition of the FIFOs:
```C++
CG_BEFORE_FIFO_INIT;
/*
Create FIFOs objects
*/
FIFO<float32_t,FIFOSIZE0> fifo0(buf0);
FIFO<float32_t,FIFOSIZE1> fifo1(buf1);
FIFO<float32_t,FIFOSIZE0,0,0> fifo0(buf1);
FIFO<float32_t,FIFOSIZE1,1,0> fifo1(buf2);
```
The FIFO template has type:
```C++
template<typename T, int length, int isArray=0, int isAsync = 0>
class FIFO;
```
`isArray` is set to `1` when the Python code can deduce that the FIFO is always used as an array. In this case, the memory buffer may be shared with other FIFO depending on the data flow dependencies of the graph.
`isAsync` is set to 1 when the graph is an asynchronous one.
Then, the nodes are created and connected to the FIFOs:
```C++
/*
Create node objects
*/
ProcessingNode<float32_t,7,float32_t,5> filter(fifo0,fifo1,4,"Test",someVariable);
ProcessingNode<float32_t,7,float32_t,5> processing(fifo0,fifo1,4,"testString",someVariable);
Sink<float32_t,5> sink(fifo1);
Source<float32_t,5> source(fifo0);
```
One can see that the processing nodes has 3 additional arguments in addition to the FIFOs. Those arguments are defined in the Python script. The third argument is `someVariable` and this variable must be in the scope. That's why the Python script is adding an argument `someVariable` to the scheduler API. So, one can pass information to nay node from the outside of the scheduler using those additional arguments.
And finally, the function is entering the scheduling loop:
```C++
while((cgStaticError==0) && (debugCounter > 0))
{
nbSchedule++;
/* Run several schedule iterations */
CG_BEFORE_SCHEDULE;
while((cgStaticError==0) && (debugCounter > 0))
{
```
The content of the loop is a `switch / case`:
```C++
CG_BEFORE_NODE_EXECUTION;
switch(schedule[id])
{
case 0:
{
cgStaticError = processing.run();
}
break;
case 1:
{
cgStaticError = sink.run();
}
break;
cgStaticError = source.run();
CHECKERROR;
case 2:
{
cgStaticError = source.run();
}
break;
default:
break;
}
CG_AFTER_NODE_EXECUTION;
CHECKERROR;
```
`CHECKERROR` is a macro defined in `Sched.h`. It is just testing if `cgStaticError< 0` and breaking out of the loop if it is the case. This can be redefined by the user.
#### Error handling
Since an application may want to use several SDF graphs, the name of the `sched` and `customInit` functions can be customized in the `configuration` object on the Python side:
In case of error, the code is branching out to the end of the function:
```python
config.schedName = "sched"
```C++
errorHandling:
CG_AFTER_SCHEDULE;
*error=cgStaticError;
return(nbSchedule);
```
A prefix can also be added before the name of the global FIFO buffers:
## Expected output
```python
config.prefix="bufferPrefix"
Output of the Python script:
```
Schedule length = 17
Memory usage 64 bytes
```
Output of the execution:
```
Start
Source
Source
ProcessingNode
Sink
3
0
0
0
0
Source
ProcessingNode
Sink
10
0
0
0
0
Source
Source
ProcessingNode
Sink
17
0
0
0
0
Source
ProcessingNode
Sink
24
0
0
0
0
Source
ProcessingNode
Sink
31
0
0
0
0
```
## Summary
The source is incrementing a counter and generate 0,1,2,3 ...
It looks complex because there is a lot of information but the process is always the same:
The processing node is copying the 4th sample of the input to the first sample of the output. So there is a delta of 7 between each new value written to the output.
1. You define new kind of nodes in the Python. They define the IOs, type and amount of data read/written on each IO
2. You create Python instance of those new kind of Nodes
3. You connect them in a graph and generate a schedule
4. In you AppNodes.h, you implement the new kind of nodes with a C++ template:
1. The template is generally defining the IO and the function to call when run
1. It should be minimal. The template is just a wrapper. Don't forget those nodes are created on the stack in the scheduler function. So they should not be too big. They should just be simple wrappers
5. If you need more control on the initialization, it is possible to pass additional arguments to the nodes constructors and to the scheduler function.
The sink is displaying the 5 samples at the input.

Binary file not shown.

Before

Width:  |  Height:  |  Size: 16 KiB

After

Width:  |  Height:  |  Size: 5.1 KiB

@ -102,8 +102,7 @@ float32_t buf2[BUFFERSIZE2]={0};
CG_BEFORE_SCHEDULER_FUNCTION
uint32_t scheduler(int *error,const char *testString,
int someVariable)
uint32_t scheduler(int *error,int someVariable)
{
int cgStaticError=0;
uint32_t nbSchedule=0;
@ -120,7 +119,7 @@ uint32_t scheduler(int *error,const char *testString,
/*
Create node objects
*/
ProcessingNode<float32_t,7,float32_t,5> filter(fifo0,fifo1,4,testString,someVariable);
ProcessingNode<float32_t,7,float32_t,5> processing(fifo0,fifo1,4,"testString",someVariable);
Sink<float32_t,5> sink(fifo1);
Source<float32_t,5> source(fifo0);
@ -138,7 +137,7 @@ uint32_t scheduler(int *error,const char *testString,
{
case 0:
{
cgStaticError = filter.run();
cgStaticError = processing.run();
}
break;

@ -16,8 +16,7 @@ extern "C"
#endif
extern uint32_t scheduler(int *error,const char *testString,
int someVariable);
extern uint32_t scheduler(int *error,int someVariable);
#ifdef __cplusplus
}

@ -36,42 +36,29 @@ class ProcessingNode(Node):
### Define nodes
floatType=CType(F32)
src=Source("source",floatType,5)
b=ProcessingNode("filter",floatType,7,5)
b.addLiteralArg(4)
b.addVariableArg("testString","someVariable")
processing=ProcessingNode("processing",floatType,7,5)
processing.addLiteralArg(4,"testString")
processing.addVariableArg("someVariable")
sink=Sink("sink",floatType,5)
g = Graph()
g.connect(src.o,b.i)
g.connect(b.o,sink.i)
g.connect(src.o,processing.i)
g.connect(processing.o,sink.i)
print("Generate graphviz and code")
conf=Configuration()
conf.debugLimit=1
conf.cOptionalArgs=["const char *testString"
,"int someVariable"
conf.cOptionalArgs=["int someVariable"
]
#conf.displayFIFOSizes=True
# Prefix for global FIFO buffers
#conf.prefix="sched1"
#conf.dumpSchedule = True
sched = g.computeSchedule(config=conf)
#print(sched.schedule)
print("Schedule length = %d" % sched.scheduleLength)
print("Memory usage %d bytes" % sched.memory)
#
#conf.postCustomCName = "post.h"
#conf.CAPI = True
#conf.prefix="global"
#conf.dumpFIFO = True
#conf.CMSISDSP = False
#conf.switchCase = False
sched.ccode("generated",conf)
with open("test.dot","w") as f:

@ -6,6 +6,6 @@ int main(int argc, char const *argv[])
{
int error;
printf("Start\n");
uint32_t nbSched=scheduler(&error,"Test",1);
uint32_t nbSched=scheduler(&error,1);
return 0;
}

@ -9,10 +9,10 @@ digraph structs {
fontname="times"
filter [label=<
processing [label=<
<TABLE BORDER="0" CELLBORDER="1" CELLSPACING="0" CELLPADDING="4">
<TR>
<TD ALIGN="CENTER" PORT="i">filter<BR/>(ProcessingNode)</TD>
<TD ALIGN="CENTER" PORT="i">processing<BR/>(ProcessingNode)</TD>
</TR>
</TABLE>>];
@ -32,13 +32,13 @@ source [label=<
source:i -> filter:i [label="f32(11)"
source:i -> processing:i [label="f32(11)"
,headlabel=<<TABLE BORDER="0" CELLPADDING="2"><TR><TD><FONT COLOR="blue" POINT-SIZE="12.0" >7</FONT>
</TD></TR></TABLE>>
,taillabel=<<TABLE BORDER="0" CELLPADDING="2"><TR><TD><FONT COLOR="blue" POINT-SIZE="12.0" >5</FONT>
</TD></TR></TABLE>>]
filter:i -> sink:i [label="f32(5)"
processing:i -> sink:i [label="f32(5)"
,headlabel=<<TABLE BORDER="0" CELLPADDING="2"><TR><TD><FONT COLOR="blue" POINT-SIZE="12.0" >5</FONT>
</TD></TR></TABLE>>
,taillabel=<<TABLE BORDER="0" CELLPADDING="2"><TR><TD><FONT COLOR="blue" POINT-SIZE="12.0" >5</FONT>

@ -1,5 +1,7 @@
# Example 10
Please refer to the [simple example](../simple/README.md) to have an overview of how to define a graph and it nodes and how to generate the C++ code for the static scheduler. This document is only explaining additional details
This example is implementing a dynamic / asynchronous mode.
It is enabled in `graph.py` with:
@ -22,6 +24,38 @@ The even source is generating a value only when the count is even.
The processing is adding its inputs. If no data is available on an input, 0 is used.
In case of fifo overflow or underflow, any node will slip its execution.
In case of fifo overflow or underflow, any node will skip its execution.
All nodes are generating or consuming one sample but the FIFOs have a size of 2 because of the 100% increase requested in the configuration settings.
## Expected outputs
```
Schedule length = 9
Memory usage 34 bytes
```
```
Start
0
0
1
1
2
2
3
3
4
4
5
5
6
6
7
7
8
8
9
9
```
All nodes are generating or consuming one sample but the FIFOs have a size of 2 because of the 100% increase requested in the configuration settings.

@ -2,7 +2,7 @@ from cmsisdsp.cg.scheduler import *
### Define new types of Nodes
class SinkAsync(GenericSink):
def __init__(self,name,theType,inLength):

@ -1,12 +1,15 @@
# Example 2
Please refer to [Example 1](example1.md) for the details about how to create a graph and the C++ support classes.
Please refer to the [simple example](../simple/README.md) to have an overview of how to define a graph and it nodes and how to generate the C++ code for the static scheduler.
In this example. we are just analyzing a much more complex example to see some new features:
- Delay
- CMSIS-DSP functions
- Some default nodes : sliding buffer
- CMSIS-DSP function
- Constant node
- SlidingBuffer
This example is not really using a MFCC or a TensorFlow Lite node. It is just providing some wrappers to show how such a nodes could be included in a graph:
The graph is:
@ -14,7 +17,7 @@ The graph is:
It is much more complex:
- First we have a source delayed by 10 samples ;
- First we have a stereo source delayed by 10 samples ;
- Then this stereo source is split into left/right samples using the default block Unzip
- The samples are divided by 2 using a CMSIS-DSP function
- The node HALF representing a constant is introduced (constant arrays are also supported)
@ -24,18 +27,11 @@ It is much more complex:
- Another sliding buffer
- An a block representing TensorFlow Lite for Micro (a fake TFLite node)
Note that those blocks (MFCC, TFLite) are doing nothing in this example. It is just to illustrate a more complex example that someone may want to experiment with for keyword spotting.
Note that those blocks (MFCC, TFLite) are doing nothing in this example. It is just to illustrate a more complex example typical of keyword spotting applications.
Examples 5 and 6 are showing how to use the CMSIS-DSP MFCC.
The new features compared to `example1` are:
- Delay
- CMSIS-DSP function
- Constant node
- SlidingBuffer
Let's look at all of this:
Let's look at the new features compared to example 1:
## Delay
@ -43,9 +39,7 @@ Let's look at all of this:
g.connectWithDelay(src.o, toMono.i,10)
```
To add a delay on a link between 2 nodes, you just use the `connectWithDelay` function. Delays can be useful for some graphs which are not schedulable. They are implemented by starting the schedule with a FIFO which is not empty but contain 0 samples.
To add a delay on a link between 2 nodes, you just use the `connectWithDelay` function. Delays can be useful for some graphs which are not schedulable. They are implemented by starting the schedule with a FIFO which is not empty but contain some 0 samples.
## CMSIS-DSP function
@ -59,16 +53,18 @@ sa=Dsp("scale",floatType,blockSize)
The corresponding CMSIS-DSP function will be named: `arm_scale_f32`
The code generated in `sched.cpp` will not require any C++ class, It will look like:
The code generated in `scheduler.cpp` will not require any C++ class, It will look like:
```C++
{
float32_t* i0;
float32_t* o2;
i0=fifo2.getReadBuffer(160);
o2=fifo4.getWriteBuffer(160);
arm_scale_f32(i0,HALF,o2,160);
cgStaticError = 0;
float32_t* i0;
float32_t* i1;
float32_t* o2;
i0=fifo3.getReadBuffer(160);
i1=fifo4.getReadBuffer(160);
o2=fifo5.getWriteBuffer(160);
arm_add_f32(i0,i1,o2,160);
cgStaticError = 0;
}
```
@ -84,23 +80,21 @@ A constant node is defined as:
half=Constant("HALF")
```
In the C++ code, `HALF` is expected to be a value defined in `custom.h`
In the C++ code, HALF is expected to be a value defined in custom.h
In the Python generated code, it would be in custom.py
Constant values are not involved in the scheduling (they are ignored) and they have no io. So, to connect to a constant node we do:
Constant values are not involved in the scheduling (they are ignored) and they have no IO. So, to connect to a constant node we do:
```python
g.connect(half,sa.ib)
```
There is no "o", "oa" suffixes for the constant node half.
There is no "o", "oa" suffixes for the constant node `half`.
Constant nodes are just here to make it easier to use CMSIS-DSP functions.
## SlidingBuffer
Sliding buffers and OverlapAndAdd are used a lot so they are provided by default.
Sliding buffers and OverlapAndAdd are used a lot so they are provided in the `cg/nodes/cpp`folder of the `ComputeGraph` folder.
In Python, it can be used with:
@ -114,3 +108,18 @@ There is no C++ class to write for this since it is provided by default by the f
It is named `SlidingBuffer` but not `SlidingWindow` because no multiplication with a window is done. It must be implemented with another block as will be demonstrated in the [example 3](example3.md)
## Expected outputs
```
Schedule length = 302
Memory usage 10720 bytes
```
And when executed:
```
Start
Nb = 40
```
Execution is running for 40 iterations without errors.

@ -1,5 +1,7 @@
# Example 3
Please refer to the [simple example](../simple/README.md) to have an overview of how to define a graph and it nodes and how to generate the C++ code for the static scheduler. This document is only explaining additional details
This example is implementing a working example with FFT. The graph is:
![graph3](docassets/graph3.PNG)
@ -29,9 +31,7 @@ Now, the constant is an array:
hann=Constant("HANN")
```
In custom.h, this array is defined as:
In `custom.h`, this array is defined as:
```C++
extern const float32_t HANN[256];
@ -41,47 +41,53 @@ extern const float32_t HANN[256];
## CMSIS-DSP FFT
The FFT node cannot be created using a `Dsp` node in Python because FFT is requiring specific initializations. So, a Python class and C++ class must be created :
The FFT node cannot be created using a `Dsp` node in Python because FFT is requiring specific initializations. So, a Python class and C++ class must be created. They are provided by default in the ffamework butg let's look at how they are implemented:
```python
class CFFT(GenericNode):
def __init__(self,name,inLength):
def __init__(self,name,theType,inLength):
GenericNode.__init__(self,name)
self.addInput("i",floatType,2*inLength)
self.addOutput("o",floatType,2*inLength)
self.addInput("i",theType,2*inLength)
self.addOutput("o",theType,2*inLength)
@property
def typeName(self):
return "CFFT"
```
Look at the definition of the inputs and outputs : The FFT is using complex number so the ports have twice the number of float samples. The argument of the constructor is the FFT length in complex sample.
Look at the definition of the inputs and outputs : The FFT is using complex number so the ports have twice the number of float samples. The argument of the constructor is the FFT length in **complex** sample but `addInput` and `addOutput` require the number of samples of the base type : here float.
We suggest to use as arguments of the blocks a number of samples which is meaningful for the blocks and use the lengths in standard data type (f32, q31 ...) when defining the IO.
So here, the number of complex samples is used as arguments. But the IO are using the number of floats required to encode those complex numbers.
So here, the number of complex samples is used as arguments. But the IO are using the number of floats required to encode those complex numbers hence a factor of 2.
The corresponding C++ class is:
The C++ template is:
```C++
template<typename IN, int inputSize,typename OUT,int outputSize>
class CFFT: public GenericNode<IN,inputSize,OUT,outputSize>
class CFFT;
```
There are only specific implementations for specific datatype. No generic implementation is provided.
For, float we have:
```C++
template<int inputSize>
class CFFT<float32_t,inputSize,float32_t,inputSize>: public GenericNode<float32_t,inputSize,float32_t,inputSize>
{
public:
CFFT(FIFOBase<IN> &src,FIFOBase<OUT> &dst):
GenericNode<IN,inputSize,OUT,outputSize>(src,dst){
CFFT(FIFOBase<float32_t> &src,FIFOBase<float32_t> &dst):GenericNode<float32_t,inputSize,float32_t,inputSize>(src,dst)
{
arm_status status;
status=arm_cfft_init_f32(&sfft,inputSize>>1);
};
int prepareForRunning() override
{
if (this->willOverflow() ||
this->willUnderflow()
)
this->willUnderflow())
{
return(CG_SKIP_EXECUTION_ID_CODE); // Skip execution
}
@ -89,10 +95,11 @@ public:
return(0);
};
int run() override {
IN *a=this->getReadBuffer();
OUT *b=this->getWriteBuffer();
memcpy((void*)b,(void*)a,outputSize*sizeof(IN));
int run() override
{
float32_t *a=this->getReadBuffer();
float32_t *b=this->getWriteBuffer();
memcpy((void*)b,(void*)a,inputSize*sizeof(float32_t));
arm_cfft_f32(&sfft,b,0,1);
return(0);
};
@ -104,11 +111,9 @@ public:
It is verbose but not difficult. The constructor is initializing the CMSIS-DSP FFT instance and connecting to the FIFO (through GenericNode).
The run function is applying the `arm_cfft_f32`. Since this function is modifying the input buffer, there is a `memcpy`. It is not really needed here. The read buffer can be modified by the CFFT. It will just make it more difficult to debug if you'd like to inspect the content of the FIFOs.
THe function `prepareForRunning` is only used in asynchronous mode. Please refer to the documentation for the asynchronous mode.
This node is provided in `cg/nodes/cpp` so no need to define it. You can just use it by including the right headers.
@ -125,3 +130,43 @@ from cmsisdsp.cg.scheduler import *
```
The scheduler module is automatically including the default nodes.
## Expected output
Output of Python script:
```
Schedule length = 25
Memory usage 11264 bytes
```
Output of execution:
```
Start
Nb = 40
```
It is running for 40 iterations of the scheduler without errors.
The python script `debug.py` can be used to display the content of `input_example3.txt` and `../build/output_example3.txt`
It should display the same sinusoid but it is delayed in `output_example3.txt` by a few samples because of the sliding buffer. The sliding buffer will generate 256 samples in output each time 128 samples are received in input. As consequence, at start, 256 samples with the half set to zero are generated.
We can check it in the debug script by comparing a delayed version of the original to the output.
You should get something like:
![sine](docassets/sine.png)
We have 40 execution of the schedule iteration. In each schedule iteration we have two sinks. A sink is producing 192 samples.
So, the execution is producing `40 * 2 * 192 == 15360` so a bit less than the `16000` samples in input.
If we compare the input and output taking into account this length difference and the delay of 128 samples, we get (by running `debug.py`):
```
Comparison of input and output : max absolute error
6.59404862823898e-07
```

@ -0,0 +1,20 @@
import numpy as np
from pylab import figure, clf, plot, xlabel, ylabel, xlim, ylim, title, grid, axes, show,semilogx, semilogy
from numpy import genfromtxt
ref_data = genfromtxt('input_example3.txt', delimiter=',')
figure()
plot(ref_data)
output_data = genfromtxt('../build/output_example3.txt', delimiter=',')
plot(output_data)
show()
print(ref_data.shape)
print(output_data.shape)
nb = output_data.shape[0] - 128
print("Comparison of input and output : max absolute error")
diff = output_data[128:] - ref_data[:nb]
print(np.max(np.abs(diff)))

Binary file not shown.

After

Width:  |  Height:  |  Size: 74 KiB

@ -2,6 +2,8 @@
It is exactly the same example as example 3 but the code generation is generating Python code instead of C++.
![graph4](docassets/graph4.png)
The Python code is generated with:
```python
@ -12,6 +14,12 @@ and it will generate a `sched.py` file.
A file `custom.py` and `appnodes.py` are also required.
The example can be run with:
`python main.py`
Do not confuse `graph.py,` which is used to describe the graph, with the other Python files that are used to execute the graph.
## custom.py
```python
@ -25,7 +33,7 @@ An array HANN is defined for the Hann window.
## appnodes.py
This file is defining the new nodes which were used in `graph.py`. In `graph.py` which are just defining new kind of nodes for scheduling purpose : type and sizes.
This file is defining the new nodes which were used in `graph.py`.
In `appnodes.py` we including new kind of nodes for simulation purpose:
@ -33,8 +41,6 @@ In `appnodes.py` we including new kind of nodes for simulation purpose:
from cmsisdsp.cg.scheduler import *
```
The CFFT is very similar to the C++ version of example 3. But there is no `prepareForRunning`. Dynamic / asynchronous mode is not implemented for Python.
```python
@ -110,3 +116,22 @@ DISPBUF = np.zeros(16000)
nb,error = s.scheduler(DISPBUF)
```
The example can be run with:
`python main.py`
## Expected outputs
```
Generate graphviz and code
Schedule length = 25
Memory usage 11264 bytes
```
And when executed:
![sine](docassets/sine.png)
As you can see at the beginning, there is a small delay during which the output signal is zero.

@ -1,20 +0,0 @@
import numpy as np
from cmsisdsp.cg.static.nodes.simu import *
a=np.zeros(10)
f=FIFO(10,a)
f.dump()
nb = 1
for i in range(4):
w=f.getWriteBuffer(2)
w[0:2]=nb*np.ones(2)
nb = nb + 1
f.dump()
print(a)
for i in range(4):
w=f.getReadBuffer(2)
print(w)

Binary file not shown.

After

Width:  |  Height:  |  Size: 24 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 50 KiB

@ -0,0 +1,25 @@
# Example 5
This is a pure python example. It is computing a sequence of MFCC with an overlap of 0.5 s and it is creating an animation.
It can be run with:
`python main.py`
The `NumPy` sink at the end is just recording all the MFCC outputs as a list of buffers. This list is used to create an animation.
<img src="docassets/graph5.png" alt="graph5" style="zoom:100%;" />
## Expected output
```
Generate graphviz and code
Schedule length = 292
Memory usage 6614 bytes
```
And when executed you should get an animation looking like this:
![mfcc](docassets/mfcc.png)
The Python `main.py` contains a line which can be uncommented to record the animation as a `.mp4` video.

Binary file not shown.

After

Width:  |  Height:  |  Size: 11 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 112 KiB

@ -0,0 +1,15 @@
# Example 6
This example is similar to example 5 but with C code generation instead of Python.
![graph6](docassets/graph6.png)
## Expected output
```
nbMFCCOutputs = 126
Generate graphviz and code
Schedule length = 17
Memory usage 2204 bytes
```

Binary file not shown.

After

Width:  |  Height:  |  Size: 7.4 KiB

@ -13,7 +13,7 @@ model PythonTest
Placement(visible = true, transformation(origin = {-82, 8}, extent = {{-10, -10}, {10, 10}}, rotation = 0)));
inner Modelica.Blocks.Noise.GlobalSeed globalSeed annotation(
Placement(visible = true, transformation(origin = {-86, -28}, extent = {{-10, -10}, {10, 10}}, rotation = 0)));
ARM.Sound.WaveOutput waveOutput annotation(
ARM.Sound.WaveOutput waveOutput(path = "C:\\benchresults\\cmsis\\CMSIS-DSP\\ComputeGraph\\examples\\example7\\output.wav") annotation(
Placement(visible = true, transformation(origin = {24, -32}, extent = {{-10, -10}, {10, 10}}, rotation = 0)));
equation
connect(vht.y, transferFunction.u) annotation(

@ -0,0 +1,62 @@
# Example 7
This is an example showing how a graph in in Python (not C) can interact with an [OpenModelica](https://openmodelica.org/) model.
![graph7](docassets/graph7.png)
First you need to get the project [AVH-SystemModeling](https://github.com/ARM-software/AVH-SystemModeling) from our ARM-Software repository.
Then, you need launch `OpenModelica` and choose `Open Model`.
Select `AVH-SystemModeling/VHTModelicaBlock/ARM/package.mo`
Then choose `Open Model` again and select `PythonTest.mo`.
You should see something like that in `Open Modelica`:
![modelica](docassets/modelica.png)
Customize the output path in the `Wave` node.
Refer to the `Open Modelica` documentation to know who to build and run this simulation. Once it is started in Modelica, launch the Python script in `example7`:
`python main.py`
You should see :
```
Connecting as INPUT
Connecting as OUTPUT
```
In Modelica window, the simulation should continue to `100%`.
In the simulation window, you should be able to plot the output wav and get something like:
![waveoutput](docassets/waveoutput.png)
A `.wav` should have been generated so that you can listen to the result : A Larsen effect !
The `Processing` node in the compute graph is implemented in `custom.py` and is a gain computed with `CMSIS-DSP` Python wrapper
```python
class Processing(GenericNode):
def __init__(self,inputSize,outputSize,fifoin,fifoout):
GenericNode.__init__(self,inputSize,outputSize,fifoin,fifoout)
def run(self):
i=self.getReadBuffer()
o=self.getWriteBuffer()
b=dsp.arm_scale_q15(i,0x6000,1)
o[:]=b[:]
return(0)
```
The gain has been chosen to create an instability.

Binary file not shown.

After

Width:  |  Height:  |  Size: 5.0 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 32 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 58 KiB

@ -33,18 +33,11 @@ print("Generate graphviz and code")
conf=Configuration()
#conf.dumpSchedule = True
sched = g.computeSchedule(conf)
#print(sched.schedule)
print("Schedule length = %d" % sched.scheduleLength)
print("Memory usage %d bytes" % sched.memory)
#
# Pass the source and sink objects used to communicate with the VHT Modelica block
#conf.pyOptionalArgs=""
conf.pathToSDFModule="C:\\\\benchresults\\\\cmsis_docker\\\\CMSIS\\\\DSP\\\\SDFTools"
#conf.dumpFIFO=True
#conf.prefix="sched1"
sched.pythoncode(".",config=conf)
with open("test.dot","w") as f:

@ -0,0 +1,54 @@
# Example 8
This example is illustrating :
* The `Duplicate` node to have a one-to-many connection at an output
* A structured datatype for the samples in the connections
![graph8](docassets/graph8.png)
## Structured datatype
It is possible to use a custom datatype:
```python
complexType=CStructType("complex","MyComplex",8)
```
This is defining a new datatype that is mapped to the type `complex` in C/C++ and the class `MyComplex` in Python. The last argument is the size in bytes of the struct in C.
The type complex may be defined with:
```c
typedef struct {
float re;
float im;
} complex;
```
**Note that:**
- The value **must have** value semantic in C/C++. So avoid classes
- In Python, the classes have reference semantic which implies some constraints:
- You should never modify an object from the read buffer
- You should change the field of an object in the write buffer but not the object itself
- If you need a new object : copy or create a new object. Never use an object from the read buffer as it is if you intend to customize it
The size of the C structure should take into account the padding that may be added to the struct.
When no buffer sharing is used, the size of buffers is always expressed in number of samples.
But in case of buffer sharing, the datatype of the buffer is `int8_t` and the size of the buffer must be computed by the Compute Graph taking into account ay padding that may exist.
## Duplicate node
In case of a one-to-many connections, the Python code will automatically add `Duplicate` nodes in the graph. Those `Duplicate` nodes do not appear directly in the graphviz but only as a stylized way : a dot.
Currently it is limited to 3. If you need more that 3 outputs on an IO you'll have to insert the `Duplicate` nodes explicitly in the graph.
In the generated code, you'll see the `Duplicate` nodes. For instance, in this example:
```C++
Duplicate3<complex,5,complex,5,complex,5,complex,5> dup0(fifo2,fifo3,fifo4,fifo5);
```

Binary file not shown.

After

Width:  |  Height:  |  Size: 24 KiB

@ -0,0 +1,7 @@
# Example 9
Thsi example is just checking that duplicate node insertion and delay on a connection are working well together.
The Python script is able to schedule the graph.
![graph9](docassets/graph9.png)

Binary file not shown.

After

Width:  |  Height:  |  Size: 20 KiB

@ -383,13 +383,20 @@ The first line is the important one:
OUT *b=this->getWriteBuffer();
```
We get a pointer to be able to write in the output FIFO. This pointer has the datatype OUT coming from the template so can be anything.
We get a pointer to be able to write in the output FIFO. This pointer has the datatype OUT coming from the template so can be anything. **Those functions (`getWriteBuffer` and/or `getReadBuffer`) must always be used even if the node is doing nothing because FIFOs are only updated when those functions are used.**
The code in the loop is casting an `int` (the loop index) into the `OUT` datatype. If it is not possible it won't typecheck and build.
```C++
for(int i=0;i<outputSize;i++)
{
b[i] = (OUT)i;
}
```
So, although we have not provided a specific implementation of the template, this template can only work with specific `OUT` datatypes.
The return of the function is to inform the scheduler that no error occurred. In synchronous mode, errors (like underflow or overflow) cannot occur due to the scheduling but only because of a broken real time. So any error returned by a node will stop the scheduling.
The return of the function `run` is to inform the scheduler that no error occurred. In synchronous mode, errors (like underflow or overflow) cannot occur due to the scheduling but only because of a broken real time. So any error returned by a node will stop the scheduling.
### The processing node
@ -515,7 +522,72 @@ The headers required by the software are:
* It is coming from the `../../cg/src` folder.
* It provides the basic definitions needed by the framework like `GenericNode`, `GenericSink`,`GenericSource`, `FIFO` ...
### Expected output
There are 7 executions of the `Sink` and `Source` and 5 executions of the `ProcessingNode`.
```
Start
Source
Source
ProcessingNode
Sink
1
2
3
4
5
Source
ProcessingNode
Sink
1
2
3
4
5
Source
Source
ProcessingNode
Sink
1
2
3
4
5
Sink
1
2
3
4
5
Source
ProcessingNode
Sink
1
2
3
4
5
Source
ProcessingNode
Sink
1
2
3
4
5
Sink
1
2
3
4
5
```

Loading…
Cancel
Save