Streamdoc (#94)

Reworked the documentation and examples for the compute graph.
pull/95/head
Christophe Favergeon 3 years ago committed by GitHub
parent d7e4dea51a
commit 4806c7f01d
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -1,5 +1,7 @@
# Dynamic Data Flow
This feature is illustrated in the [Example 10 : The dynamic dataflow mode](examples/example10/README.md)
Versions of the compute graph corresponding to CMSIS-DSP Version >= `1.14.3` and Python wrapper version >= `1.10.0` are supporting a new dynamic / asynchronous mode.
With a dynamic flow, the flow of data is potentially changing at each execution. The IOs can generate or consume a different amount of data at each execution of their node (including no data).
@ -13,7 +15,7 @@ With a dynamic flow and scheduling, there is no more any way to ensure that ther
* Another node may decide to do nothing and skip the execution
* Another node may decide to raise an error.
With dynamic scheduling, a node must implement the function `prepareForRunning` and decide what to do.
With dynamic flow, a node must implement the function `prepareForRunning` and decide what to do.
3 error / status codes are reserved for this. They are defined in the header `cg_status.h`. This header is not included by default, but if you define you own error codes, they should be coherent with `cg_status` and use the same values for the 3 status / error codes which are used in dynamic mode:
@ -23,9 +25,9 @@ With dynamic scheduling, a node must implement the function `prepareForRunning`
Any other returned value will stop the execution.
The dynamic mode (also named asynchronous), is enabled with option : `asynchronous`
The dynamic mode (also named asynchronous), is enabled with option : `asynchronous` of the configuration object used with the scheduling functions.
The system will still compute a scheduling and FIFO sizes as if the flow was static. We can see the static flow as an average of the dynamic flow. In dynamic mode, the FIFOs may need to be bigger than the ones computed in static mode. The static estimation is giving a first idea of what the size of the FIFOs should be. The size can be increased by specifying a percent increase with option `FIFOIncrease`.
The system will still compute a synchronous scheduling and FIFO sizes as if the flow was static. We can see the static flow as an average of the dynamic flow. In dynamic mode, the FIFOs may need to be bigger than the ones computed in static mode. The static estimation is giving a first idea of what the size of the FIFOs should be. The size can be increased by specifying a percent increase with option `FIFOIncrease`.
For pure compute functions (like CMSIS-DSP ones), which are not packaged into a C++ class, there is no way to customize the decision logic in case of a problem with FIFO. There is a global option : `asyncDefaultSkip`.
@ -82,7 +84,7 @@ If the `getReadBuffer` and `getWriteBuffer` are causing an underflow or overflow
## Graph constraints
The dynamic / asynchronous mode is using a synchronous graph as average / ideal case. But it is important to understand that we are no more in static / synchronous mode and some static graph may be too complex for the dynamic mode. Let's take the following graph as example:
The dynamic mode is using a synchronous graph as average / ideal case. But it is important to understand that we are no more in static / synchronous mode and some static graph may be too complex for the dynamic mode. Let's take the following graph as example:
![async_topological2](documentation/async_topological2.png)
@ -104,14 +106,14 @@ sink
If we use a strategy of skipping the execution of a node in case of overflow / underflow, what will happen is:
* Schedule execution 1
* Schedule iteration 1
* First `src` node execution is successful since there is a sample
* All other execution attempts will be skipped
* Schedule execution 2
* Schedule iteration 2
* First `src` node execution is successful since there is a sample
* All other execution attempt will be skipped
* ...
* Schedule execution 5:
* Schedule iteration 5:
* First `src` node execution is successful since there is a sample
* 4 other `src` node executions are skipped
* The `filter` execution can finally take place since enough data has been generated
@ -143,5 +145,3 @@ As consequence, the recommendation in dynamic / asynchronous mode is to:
* Ensure that the amount of data produced and consumed on each FIFO end is the same (so that each node execution is attempted only once during a schedule)
* Use the maximum amount of samples required on both ends of the FIFO
* Here `sink` is generating at most `1` sample, `filter` needs 5. So we use `5` on both ends of the FIFO
* More complex graphs will create a useless overhead in dynamic / asynchronous mode

@ -1,5 +1,7 @@
# Cyclo static scheduling
This feature is illustrated in the [cyclo](examples/cyclo/README.md) example.
Beginning with the version `1.7.0` of the Python wrapper and version >= `1.12` of CMSIS-DSP, cyclo static scheduling has been added.
## What is the problem it is trying to solve ?

@ -20,8 +20,8 @@ The read buffer and write buffers used to interact with a FIFO have the alignmen
If the number of samples read is `NR` and the number of samples written if `NW`, the alignments (in number of samples) may be:
* `r0 . NR` (where `r0 ` if an integer with `r0 >= 0`)
* `w . NW - r1 . NR` (where `r1 ` and `w` are integers with `r1 >= 0` and `w >= 0`)
* `r0 . NR` for a read buffer in the FIFO (where `r0 ` if an integer with `r0 >= 0`)
* `w . NW - r1 . NR` for a write buffer in the FIFO (where `r1 ` and `w` are integers with `r1 >= 0` and `w >= 0`)
If you need a stronger alignment, you'll need to chose `NR` and `NW` in the right way.
@ -29,7 +29,7 @@ For instance, if you need an alignment on a multiple of `16` bytes with a buffer
If you can't choose freely the values of `NR` and `NW` then you may need to do a copy inside your component to align the buffer (of course only if the overhead due to the lack of alignment is bigger than doing a copy.)
## Memory sharing
## Memory sharing example
When the `memoryOptimization` is enabled, the memory may be reused for different FIFOs to minimize the memory usage. But the scheduling algorithm is not trying to optimize this. So depending on how the graph was scheduled, the level of sharing may be different.
@ -42,21 +42,23 @@ If you share memory, you are using reference semantic and it should be hidden fr
One could define an audio buffer data type :
```c++
template<int nbSamples,int refCount>
template<int nbSamples,
int refCount>
struct SharedAudioBuf
{
float32_t *buf;
static int getNbSamples() {return nbSamples;};
};
template<int nbSamples,int refCount>
template<int nbSamples,
int refCount>
using SharedBuf = struct SharedAudioBuf<nbSamples,refCount>;
```
The template tracks the number of samples and the reference count.
The template tracks the number of samples and the reference count statically. `refCount` is not a value of the struct. It is a template argument : a number at type level.
The FIFO are no more containing the float samples but only the shared buffers.
The FIFOs are no more containing the audio samples but only a pointer to a shared buffers of samples.
In this example, instead of having a length of 128 `float` samples, a FIFO would have a length of one `SharedBuf<128,r>` samples.
@ -64,7 +66,7 @@ An example of compute graph could be:
![shared_buffer](documentation/shared_buffer.png)
The copy of a `SharedBuf<NB,REF>` is copying a pointer to a buffer and not the buffer. It is reference semantic and the buffer should not be modified if the ref count if > 1.
A copy of the struct `SharedBuf<NB,REF>` is copying a pointer to a buffer and not the buffer. It is reference semantic and the buffer should not be modified if the ref count is > 1.
In the above graph, there is a processing node doing in-place modification of the buffer and it could have a template specialization defined as:
@ -84,7 +86,7 @@ public GenericNode<SharedBuf<NB,1>,1,
The meaning is:
* The input and output FIFOs have a length of 1 sample
* The sample has a type `SharedBuf<NB,1>`
* The sample has a type `SharedBuf<NB,1>` for both input and output
* The reference count is statically known to be 1 so it is safe to do in place modifications of the buffer and the output buffer is a pointer to the input one
In case of duplication, the template specialization could look like:
@ -257,67 +259,133 @@ public:
The `input` and `output` arrays, used in the sink / source, are defined as extern. The source is reading from `input` and the sink is writing to `output`.
If we look at the asm code generated with `-Ofast` with armclang `AC6` and for one iteration of the schedule, we get:
The generated scheduler is:
```txt
PUSH {r4-r6,lr}
MOVW r5,#0x220
MOVW r1,#0x620
MOVT r5,#0x3000
MOV r4,r0
MOVT r1,#0x3000
MOV r0,r5
MOV r2,#0x200
BL __aeabi_memcpy4 ; 0x10000a94
MOVW r6,#0x420
MOV r0,r5
MOVT r6,#0x3000
MOVS r2,#0x80
VMOV.F32 s0,#0.5
MOV r1,r6
BL arm_offset_f32 ; 0x10002cd0
MOV r0,#0x942c
MOV r1,r6
MOVT r0,#0x3000
MOV r2,#0x200
BL __aeabi_memcpy4 ; 0x10000a94
MOVS r1,#0
MOVS r0,#1
STR r1,[r4,#0]
POP {r4-r6,pc}
```C++
uint32_t scheduler(int *error)
{
int cgStaticError=0;
uint32_t nbSchedule=0;
int32_t debugCounter=1;
CG_BEFORE_FIFO_INIT;
/*
Create FIFOs objects
*/
FIFO<float32_t,FIFOSIZE0,1,0> fifo0(buf0);
FIFO<float32_t,FIFOSIZE1,1,0> fifo1(buf1);
CG_BEFORE_NODE_INIT;
/*
Create node objects
*/
ProcessingNode<float32_t,128,float32_t,128> proc(fifo0,fifo1);
Sink<float32_t,128> sink(fifo1);
Source<float32_t,128> source(fifo0);
/* Run several schedule iterations */
CG_BEFORE_SCHEDULE;
while((cgStaticError==0) && (debugCounter > 0))
{
/* Run a schedule iteration */
CG_BEFORE_ITERATION;
for(unsigned long id=0 ; id < 3; id++)
{
CG_BEFORE_NODE_EXECUTION;
switch(schedule[id])
{
case 0:
{
cgStaticError = proc.run();
}
break;
case 1:
{
cgStaticError = sink.run();
}
break;
case 2:
{
cgStaticError = source.run();
}
break;
default:
break;
}
CG_AFTER_NODE_EXECUTION;
CHECKERROR;
}
debugCounter--;
CG_AFTER_ITERATION;
nbSchedule++;
}
errorHandling:
CG_AFTER_SCHEDULE;
*error=cgStaticError;
return(nbSchedule);
}
```
It is the code you would get if you was manually writing a call to the corresponding CMSIS-DSP function. All the C++ templates have disappeared. The switch / case used to implement the scheduler has also been removed.
The code was generated with `memoryOptimization` enabled and the Python script detected in this case that the FIFOs are used as arrays. As consequence, there is no FIFO update code. They are used as normal arrays.
The generated code is as efficient as something manually coded.
The sink and the sources have been replaced by a `memcpy`. The call to the CMSIS-DSP function is just loading the registers and branching to the CMSIS-DSP function.
The input buffer `input` is at address `0x30000620`.
The `output` buffer is at address `0x3000942c`.
We can see in the code:
If we look at the asm of the scheduler generated for a Cortex-M7 with `-Ofast` with armclang `AC6.19` and for **one** iteration of the schedule, we get (disassembly is from uVision IDE):
```txt
MOVW r1,#0x620
...
MOVT r1,#0x3000
0x000004B0 B570 PUSH {r4-r6,lr}
97: b[i] = input[i];
0x000004B2 F2402518 MOVW r5,#0x218
0x000004B6 F2406118 MOVW r1,#0x618
0x000004BA F2C20500 MOVT r5,#0x2000
0x000004BE 4604 MOV r4,r0
0x000004C0 F2C20100 MOVT r1,#0x2000
0x000004C4 F44F7200 MOV r2,#0x200
0x000004C8 4628 MOV r0,r5
0x000004CA F00BF8E6 BL.W 0x0000B69A __aeabi_memcpy4
0x000004CE EEB60A00 VMOV.F32 s0,#0.5
131: arm_offset_f32(a,0.5,b,inputSize);
0x000004D2 F2404618 MOVW r6,#0x418
0x000004D6 F2C20600 MOVT r6,#0x2000
0x000004DA 2280 MOVS r2,#0x80
0x000004DC 4628 MOV r0,r5
0x000004DE 4631 MOV r1,r6
0x000004E0 F002FC5E BL.W 0x00002DA0 arm_offset_f32
63: output[i] = b[i];
0x000004E4 F648705C MOVW r0,#0x8F5C
0x000004E8 F44F7200 MOV r2,#0x200
0x000004EC F2C20000 MOVT r0,#0x2000
0x000004F0 4631 MOV r1,r6
0x000004F2 F00BF8D2 BL.W 0x0000B69A __aeabi_memcpy4
163: CG_AFTER_ITERATION;
164: nbSchedule++;
165: }
166:
167: errorHandling:
168: CG_AFTER_SCHEDULE;
169: *error=cgStaticError;
170: return(nbSchedule);
0x000004F6 F2402014 MOVW r0,#0x214
0x000004FA F2C20000 MOVT r0,#0x2000
0x000004FE 6801 LDR r1,[r0,#0x00]
0x00000500 3101 ADDS r1,r1,#0x01
0x00000502 6001 STR r1,[r0,#0x00]
171: }
0x00000504 2001 MOVS r0,#0x01
0x00000506 2100 MOVS r1,#0x00
169: *error=cgStaticError;
0x00000508 6021 STR r1,[r4,#0x00]
0x0000050A BD70 POP {r4-r6,pc}
```
or
```
MOV r0,#0x942c
...
MOVT r0,#0x3000
```
It is the code you would get if you was manually writing a call to the corresponding CMSIS-DSP functions. All the C++ templates have disappeared. The switch / case used to implement the scheduler has also been removed.
just before the `memcpy`
The code was generated with `memoryOptimization` enabled and the Python script detected in this case that the FIFOs are used as arrays. As consequence, there is no FIFO update code. They are used as normal arrays.
The generated code is as efficient as something manually coded.
The sink and the sources have been replaced by a `memcpy`. The call to the CMSIS-DSP function is just loading the registers and branching to the CMSIS-DSP function.
It is not always as ideal as in this example. But it demonstrates that the use of C++ templates and a Python code generator is enabling a low overhead solution to the problem of streaming and compute graph.

@ -0,0 +1,98 @@
# Introduction
Embedded systems are often used to implement streaming solutions : the software is processing and / or generating stream of samples. The software is made of components that have no concept of streams : they are working with buffers. As a consequence, implementing a streaming solution is forcing the developer to think about scheduling questions, FIFO sizing etc ...
The CMSIS-DSP compute graph is a **low overhead** solution to this problem : it makes it easier to build streaming solutions by connecting components and computing a scheduling at **build time**. The use of C++ template also enables the compiler to have more information about the components for better code generation.
A dataflow graph is a representation of how compute blocks are connected to implement a streaming processing.
Here is an example with 3 nodes:
- A source
- A filter
- A sink
Each node is producing and consuming some amount of samples. For instance, the source node is producing 5 samples each time it is run. The filter node is consuming 7 samples each time it is run.
The FIFOs lengths are represented on each edge of the graph : 11 samples for the leftmost FIFO and 5 for the other one.
In blue, the amount of samples generated or consumed by a node each time it is called.
<img src="examples/example1/docassets/graph1.PNG" alt="graph1" style="zoom:100%;" />
When the processing is applied to a stream of samples then the problem to solve is :
> **how the blocks must be scheduled and the FIFOs connecting the block dimensioned**
The general problem can be very difficult. But, if some constraints are applied to the graph then some algorithms can compute a static schedule at build time.
When the following constraints are satisfied we say we have a Synchronous / Static Dataflow Graph:
- Each node is always consuming and producing the same number of samples (static / synchronous flow)
The CMSIS-DSP Compute Graph Tools are a set of Python scripts and C++ classes with following features:
- A compute graph and its static flow can be described in Python
- The Python script will compute a static schedule and the optimal FIFOs size
- A static schedule is:
- A periodic sequence of functions calls
- A periodic execution where the FIFOs remain bounded
- A periodic execution with no deadlock : when a node is run there is enough data available to run it
- The Python script will generate a [Graphviz](https://graphviz.org/) representation of the graph
- The Python script will generate a C++ implementation of the static schedule
- The Python script can also generate a Python implementation of the static schedule (for use with the CMSIS-DSP Python wrapper)
There is no FIFO underflow or overflow due to the scheduling. If there are not enough cycles to run the processing, the real-time will be broken and the solution won't work. But this problem is independent from the scheduling itself.
# Why it is useful
Without any scheduling tool for a dataflow graph, there is a problem of modularity : a change on a node may impact other nodes in the graph. For instance, if the number of samples consumed by a node is changed:
- You may need to change how many samples are produced by the predecessor blocks in the graph (assuming it is possible)
- You may need to change how many times the predecessor blocks must run
- You may have to change the FIFOs sizes
With the CMSIS-DSP Compute Graph (CG) Tools you don't have to think about those details while you are still experimenting with your data processing pipeline. It makes it easier to experiment, add or remove blocks, change their parameters.
The tools will generate a schedule and the FIFOs. Even if you don't use this at the end for a final implementation, the information could be useful : is the schedule too long ? Are the FIFOs too big ? Is there too much latency between the sources and the sinks ?
Let's look at an (artificial) example:
<img src="examples/example1/docassets/graph1.PNG" alt="graph1" style="zoom:100%;" />
Without a tool, the user would probably try to modify the number of samples so that the number of sample produced is equal to the number of samples consumed. With the CG Tools we know that such a graph can be scheduled and that the FIFO sizes need to be 11 and 5.
The periodic schedule generated for this graph has a length of 19. It is big for such a small graph and it is because, indeed 5 and 7 are not very well chosen values. But, it is working even with those values.
The schedule is (the number of samples in the FIFOs after the execution of the nodes are displayed in the brackets):
```
source [ 5 0]
source [10 0]
filter [ 3 5]
sink [ 3 0]
source [ 8 0]
filter [ 1 5]
sink [ 1 0]
source [ 6 0]
source [11 0]
filter [ 4 5]
sink [ 4 0]
source [ 9 0]
filter [ 2 5]
sink [ 2 0]
source [ 7 0]
filter [ 0 5]
sink [ 0 0]
```
At the end, both FIFOs are empty so the schedule can be run again : it is periodic !
The compute graph is focusing on the synchronous / static case but some extensions have been introduced for more flexibility:
* A [cyclo-static scheduling](CycloStatic.md) (nearly static)
* A [dynamic/asynchronous](Async.md) mode
Here is a summary of the different configuration supported by the compute graph. The cyclo-static scheduling is part of the static flow mode.
![supported_configs](documentation/supported_configs.png)

@ -1,438 +1,34 @@
# Compute Graph for streaming with CMSIS-DSP
## Introduction
## Table of contents
Embedded systems are often used to implement streaming solutions : the software is processing and / or generating stream of samples. The software is made of components that have no concept of streams : they are working with buffers. As a consequence, implementing a streaming solution is forcing the developer to think about scheduling questions, FIFO sizing etc ...
1. ### [Introduction](Introduction.md)
The CMSIS-DSP compute graph is a **low overhead** solution to this problem : it makes it easier to build streaming solutions by connecting components and computing a scheduling at **build time**. The use of C++ template also enables the compiler to have more information about the components for better code generation.
2. ### How to get started
A dataflow graph is a representation of how compute blocks are connected to implement a streaming processing.
1. [Simple graph creation example](examples/simple/README.md)
Here is an example with 3 nodes:
2. [Simple graph creation example with CMSIS-DSP](examples/simpledsp/README.md)
- A source
- A filter
- A sink
3. ### [Examples](examples/README.md)
Each node is producing and consuming some amount of samples. For instance, the source node is producing 5 samples each time it is run. The filter node is consuming 7 samples each time it is run.
4. ### [Python API](documentation/PythonAPI.md)
The FIFOs lengths are represented on each edge of the graph : 11 samples for the leftmost FIFO and 5 for the other one.
5. ### [C++ Default nodes](documentation/CPPNodes.md)
In blue, the amount of samples generated or consumed by a node each time it is called.
6. ### [Python default nodes](documentation/PythonNodes.md)
<img src="documentation/graph1.PNG" alt="graph1" style="zoom:50%;" />
7. ### Extensions
When the processing is applied to a stream of samples then the problem to solve is :
1. #### [Memory optimizations](documentation/Memory.md)
> **how the blocks must be scheduled and the FIFOs connecting the block dimensioned**
2. #### [Cyclo-static scheduling](CycloStatic.md)
The general problem can be very difficult. But, if some constraints are applied to the graph then some algorithms can compute a static schedule at build time.
3. #### [Dynamic / Asynchronous mode](Async.md)
When the following constraints are satisfied we say we have a Synchronous / Static Dataflow Graph:
8. ### [Maths principles](MATHS.md)
- Static graph : graph topology is not changing
- Each node is always consuming and producing the same number of samples (static flow)
9. ### [FAQ](FAQ.md)
The CMSIS-DSP Compute Graph Tools are a set of Python scripts and C++ classes with following features:
- A compute graph and its static flow can be described in Python
- The Python script will compute a static schedule and the FIFOs size
- A static schedule is:
- A periodic sequence of functions calls
- A periodic execution where the FIFOs remain bounded
- A periodic execution with no deadlock : when a node is run there is enough data available to run it
- The Python script will generate a [Graphviz](https://graphviz.org/) representation of the graph
- The Python script will generate a C++ implementation of the static schedule
- The Python script can also generate a Python implementation of the static schedule (for use with the CMSIS-DSP Python wrapper)
There is no FIFO underflow or overflow due to the scheduling. If there are not enough cycles to run the processing, the real-time will be broken and the solution won't work But this problem is independent from the scheduling itself.
## Why it is useful
Without any scheduling tool for a dataflow graph, there is a problem of modularity : a change on a node may impact other nodes in the graph. For instance, if the number of samples consumed by a node is changed:
- You may need to change how many samples are produced by the predecessor blocks in the graph (assuming it is possible)
- You may need to change how many times the predecessor blocks must run
- You may have to change the FIFOs sizes
With the CMSIS-DSP Compute Graph (CG) Tools you don't have to think about those details while you are still experimenting with your data processing pipeline. It makes it easier to experiment, add or remove blocks, change their parameters.
The tools will generate a schedule and the FIFOs. Even if you don't use this at the end for a final implementation, the information could be useful : is the schedule too long ? Are the FIFOs too big ? Is there too much latency between the sources and the sinks ?
Let's look at an (artificial) example:
<img src="documentation/graph1.PNG" alt="graph1" style="zoom:50%;" />
Without a tool, the user would probably try to modify the number of samples so that the number of sample produced is equal to the number of samples consumed. With the CG Tools we know that such a graph can be scheduled and that the FIFO sizes need to be 11 and 5.
The periodic schedule generated for this graph has a length of 19. It is big for such a small graph and it is because, indeed 5 and 7 are not very well chosen values. But, it is working even with those values.
The schedule is (the size of the FIFOs after the execution of the node displayed in the brackets):
```
source [ 5 0]
source [10 0]
filter [ 3 5]
sink [ 3 0]
source [ 8 0]
filter [ 1 5]
sink [ 1 0]
source [ 6 0]
source [11 0]
filter [ 4 5]
sink [ 4 0]
source [ 9 0]
filter [ 2 5]
sink [ 2 0]
source [ 7 0]
filter [ 0 5]
sink [ 0 0]
```
At the end, both FIFOs are empty so the schedule can be run again : it is periodic !
The compute graph is focusing on the synchronous / static case but some extensions have been introduced for more flexibility:
* A [cyclo-static scheduling](CycloStatic.md) (nearly static)
* A [dynamic/asynchronous](Dynamic.md) mode
Here is a summary of the different configuration supported by the compute graph. The cyclo-static scheduling is part of the static flow mode.
![supported_configs](documentation/supported_configs.png)
More details about the maths behind the code generator are available in a [separate document](MATHS.md).
## How to use the static scheduler generator
First, you must install the `CMSIS-DSP` PythonWrapper:
```
pip install cmsisdsp
```
The functions and classes inside the cmsisdsp wrapper can be used to describe and generate the schedule.
To start, you can create a `graph.py` file and include :
```python
from cmsisdsp.cg.scheduler import *
```
In this file, you can describe new type of blocks that you need in the compute graph if they are not provided by the python package by default.
Finally, you can execute `graph.py` to generate the C++ files.
The generated files need to include the `ComputeGraph/cg/src/GenericNodes.h` and the nodes used in the graph and which can be found in `cg/nodes/cpp`. Those headers are part of the CMSIS-DSP Pack. They are optional so you'll need to select the compute graph extension in the pack.
If you have declared new nodes in `graph.py` then you'll need to provide an implementation.
More details and explanations can be found in the documentation for the examples. The first example is a deep dive giving all the details about the Python and C++ sides of the tool:
* [Example 1 : how to describe a simple graph](documentation/example1.md)
* [Example 2 : More complex example with delay and CMSIS-DSP](documentation/example2.md)
* [Example 3 : Working example with CMSIS-DSP and FFT](documentation/example3.md)
* [Example 4 : Same as example 3 but with the CMSIS-DSP Python wrapper](documentation/example4.md)
* [Example 10 : The asynchronous mode](documentation/example10.md)
Examples 5 and 6 are showing how to use the CMSIS-DSP MFCC with a synchronous data flow.
Example 7 is communicating with OpenModelica. The Modelica model (PythonTest) in the example is implementing a Larsen effect.
Example 8 is showing how to define a new custom datatype for the IOs of the nodes. Example 8 is also demonstrating a new feature where an IO can be connected up to 3 inputs and the static scheduler will automatically generate duplicate nodes.
## Frequently asked questions:
There is a [FAQ](FAQ.md) document.
## Options
Several options can be used in the Python to control the schedule generation. Some options are used by the scheduling algorithm and other options are used by the code generators or graphviz generator:
### Options for the graph
Those options needs to be used on the graph object created with `Graph()`.
For instance :
```python
g = Graph()
g.defaultFIFOClass = "FIFO"
```
#### defaultFIFOClass (default = "FIFO")
Class used for FIFO by default. Can also be customized for each connection (`connect` of `connectWithDelay` call) with something like:
`g.connect(src.o,b.i,fifoClass="FIFOClassNameForThisConnection")`
#### duplicateNodeClassName(default="Duplicate")
Prefix used to generate the duplicate node classes like `Duplicate2`, `Duplicate3` ...
### Options for the scheduling
Those options needs to be used on a configuration objects passed as argument of the scheduling function. For instance:
```python
conf = Configuration()
conf.debugLimit = 10
sched = g.computeSchedule(config = conf)
```
Note that the configuration object also contain options for the code generators.
#### memoryOptimization (default = False)
When the amount of data written to a FIFO and read from the FIFO is the same, the FIFO is just an array. In this case, depending on the scheduling, the memory used by different arrays may be reused if those arrays are not needed at the same time.
This option is enabling an analysis to optimize the memory usage by merging some buffers when it is possible.
#### sinkPriority (default = True)
Try to prioritize the scheduling of the sinks to minimize the latency between sources and sinks.
When this option is enabled, the tool may not be able to find a schedule in all cases. If it can't find a schedule, it will raise a `DeadLock` exception.
#### displayFIFOSizes (default = False)
During computation of the schedule, the evolution of the FIFO sizes is generated on `stdout`.
#### dumpSchedule (default = False)
During computation of the schedule, the human readable schedule is generated on `stdout`.
### Options for the code generator
#### debugLimit (default = 0)
When `debugLimit` is > 0, the number of iterations of the scheduling is limited to `debugLimit`. Otherwise, the scheduling is running forever or until an error has occured.
#### dumpFIFO (default = False)
When true, generate some code to dump the FIFO content at runtime. Only useful for debug.
In C++ code generation, it is only available when using the mode `codeArray == False`.
When this mode is enabled, the first line of the scheduler file is :
`#define DEBUGSCHED 1`
and it also enable some debug code in `GenericNodes.h`
#### schedName (default = "scheduler")
Name of the scheduler function used in the generated code.
#### prefix (default = "")
Prefix to add before the FIFO buffer definitions. Those buffers are not static and are global. If you want to use several schedulers in your code, the buffer names used by each should be different.
Another possibility would be to make the buffer static by redefining the macro `CG_BEFORE_BUFFER`
#### Options for C Code Generation only
##### cOptionalArgs (default = "")
Optional arguments to pass to the C API of the scheduler function
It can either use a `string` or a list of `string` where an element is an argument of the function (and should be valid `C`).
##### codeArray (default = True)
When true, the scheduling is defined as an array. Otherwise, a list of function calls is generated.
A list of function call may be easier to read but if the schedule is long, it is not good for code size. In that case, it is better to encode the schedule as an array rather than a list of functions.
When `codeArray` is True, the option `switchCase`can also be used.
##### switchCase (default = True)
`codeArray` must be true or this option is ignored.
When the schedule is encoded as an array, it can either be an array of function pointers (`switchCase` false) or an array of indexes for a state machine (`switchCase` true)
##### eventRecorder (default = False)
Enable the generation of `CMSIS EventRecorder` intrumentation in the code. The CMSIS-DSP Pack is providing definition of 3 events:
* Schedule iteration
* Node execution
* Error
##### customCName (default = "custom.h")
Name of custom header in generated C code. If you use several scheduler, you may want to use different headers for each one.
##### postCustomCName (default = "")
Name of custom header in generated C code coming after all of the other includes.
##### genericNodeCName (default = "GenericNodes.h")
Name of GenericNodes header in generated C code. If you use several scheduler, you may want to use different headers for each one.
##### appNodesCName (default = "AppNodes.h")
Name of AppNodes header in generated C code. If you use several scheduler, you may want to use different headers for each one.
##### schedulerCFileName (default = "scheduler")
Name of scheduler cpp and header in generated C code. If you use several scheduler, you may want to use different headers for each one.
If the option is set to `xxx`, the names generated will be `xxx.cpp` and `xxx.h`
##### CAPI (default = True)
By default, the scheduler function is callable from C. When false, it is a standard C++ API.
##### CMSISDSP (default = True)
If you don't use any of the datatypes or functions of the CMSIS-DSP, you don't need to include the `arm_math.h` in the scheduler file. This option can thus be set to `False`.
##### asynchronous (default = False)
When true, the scheduling is for a dynamic / asynchronous flow. A node may not always produce or consume the same amount of data. As consequence, a scheduling can fail. Each node needs to implement a `prepareForRunning` function to identify and recover from FIFO underflows and overflows.
A synchronous schedule is used as start and should describe the average case.
This implies `codeArray` and `switchCase`. This disables `memoryOptimizations`.
Synchronous FIFOs that are just buffers will be considered as FIFOs in asynchronous mode.
More info are available in the documentation for [this mode](Dynamic.md).
##### FIFOIncrease (default 0)
In case of dynamic / asynchronous scheduling, the FIFOs may need to be bigger than what is computed assuming a static / synchronous scheduling. This option is used to increase the FIFO size. It represents a percent increase.
For instance, a value of 10 means the FIFO will have their size updated from `oldSize` to `1.1 * oldSize` which is ` (1 + 10%)* oldSize`
If the value is a `float` instead of an `int` it will be used as is. For instance, `1.1` would increase the size by `1.1` and be equivalent to the setting `10` (for 10 percent).
##### asyncDefaultSkip (default True)
Behavior of a pure function (like CMSIS-DSP) in asynchronous mode. When `True`, the execution is skipped if the function can't be executed. If `False`, an error is raised.
If another error recovery is needed, the function must be packaged into a C++ class to implement a `prepareForRun` function.
#### Options for Python code generation only
##### pyOptionalArgs (default = "")
Optional arguments to pass to the Python version of the scheduler function
##### customPythonName (default = "custom")
Name of custom header in generated Python code. If you use several scheduler, you may want to use different headers for each one.
##### appNodesPythonName (default = "appnodes")
Name of AppNodes header in generated Python code. If you use several scheduler, you may want to use different headers for each one.
##### schedulerPythonFileName (default = "sched")
Name of scheduler file in generated Python code. If you use several scheduler, you may want to use different headers for each one.
If the option is set to `xxx`, the name generated will be `xxx.py`
### Options for the graphviz generator
#### horizontal (default = True)
Horizontal or vertical layout for the graph.
#### displayFIFOBuf (default = False)
By default, the graph is displaying the FIFO sizes. If you want to know with FIFO variable is used in the code, you can set this option to true and the graph will display the FIFO variable names.
### Options for connections
It is now possible to write something like:
```python
g.connect(src.o,b.i,fifoClass="FIFOSource")
```
The `fifoClass` argument allows to choose a specific FIFO class in the generated C++ or Python.
Only the `FIFO` class is provided by default. Any new implementation must inherit from `FIFObase<T>`
There is also an option to set the scaling factor when used in asynchronous mode:
```python
g.connect(odd.o,debug.i,fifoScale=3.0)
```
When this option is set, it will be used (instead of the global setting). This must be a float.
## How to build the examples
In folder `ComputeGraph/example/build`, type the `cmake` command:
```bash
cmake -DHOST=YES \
-DDOT="path to dot.EXE" \
-DCMSISCORE="path to cmsis core include directory" \
-G "Unix Makefiles" ..
```
The Graphviz dot tool is requiring a recent version supporting the HTML-like labels.
If cmake is successful, you can type `make` to build the examples. It will also build CMSIS-DSP for the host.
If you don't have graphviz, the option -DDOT can be removed.
If for some reason it does not work, you can go into an example folder (for instance example1), and type the commands:
```bash
python graph.py
dot -Tpdf -o test.pdf test.dot
```
It will generate the C++ files for the schedule and a pdf representation of the graph.
Note that the Python code is relying on the CMSIS-DSP PythonWrapper which is now also containing the Python scripts for the Synchronous Data Flow.
For `example3` which is using an input file, `cmake` should have copied the input test pattern `input_example3.txt` inside the build folder. The output file will also be generated in the build folder.
`example4` is like `example3` but in pure Python and using the CMSIS-DSP Python wrapper (which must already be installed before trying the example). To run a Python example, you need to go into an example folder and type:
```bash
python main.py
```
`example7` is communicating with `OpenModelica`. You need to install the VHTModelica blocks from the [VHT-SystemModeling](https://github.com/ARM-software/VHT-SystemModeling) project on our GitHub
## Limitations
- CMSIS-DSP integration must be improved to make it easier
- The code is requiring a lot more comments and cleaning
- A C version of the code generator is missing
- The code generation could provide more flexibility for memory allocation with a choice between:
- Global
- Stack
- Heap
## Default nodes
Here is a list of the nodes supported by default. More can be easily added:
- Unary:
- Unary function with header `void function(T* src, T* dst, int nbSamples)`
- Binary:
- Binary function with header `void function(T* srcA, T* srcB, T* dst, int nbSamples)`
- CMSIS-DSP function:
- It will detect if it is an unary or binary function.
- The name must not contain the prefix `arm` nor the the type suffix
- For instance, use `Dsp("mult",CType(F32),NBSAMPLES)` to use `arm_mult_f32`
- Other CMSIS-DSP function (with an instance variable) are requiring the creation of a Node if it is not already provided
- CFFT / ICFFT : Use of CMSIS-DSP CFFT. Currently only F32, F16 and Q15
- Zip / Unzip : To zip / unzip streams
- ToComplex : Map a real stream onto a complex stream
- ToReal : Extract real part of a complex stream
- FileSource and FileSink : Read/write float to/from a file (Host only)
- NullSink : Do nothing. Useful for debug
- InterleavedStereoToMono : Interleaved stereo converted to mono with scaling to avoid saturation of the addition
- Python only nodes:
- WavSink and WavSource to use wav files for testing
- VHTSDF : To communicate with OpenModelica using VHTModelica blocks

@ -3,13 +3,11 @@
* Title: CFFT.h
* Description: Node for CMSIS-DSP cfft
*
* $Date: 30 July 2021
* $Revision: V1.10.0
*
* Target Processor: Cortex-M and Cortex-A cores
* -------------------------------------------------------------------- */
/*
* Copyright (C) 2010-2021 ARM Limited or its affiliates. All rights reserved.
* --------------------------------------------------------------------
*
* Copyright (C) 2021-2023 ARM Limited or its affiliates. All rights reserved.
*
* SPDX-License-Identifier: Apache-2.0
*
@ -45,7 +43,7 @@ public:
status=arm_cfft_init_f32(&sfft,inputSize>>1);
};
int prepareForRunning() override
int prepareForRunning() final
{
if (this->willOverflow() ||
this->willUnderflow()
@ -57,7 +55,7 @@ public:
return(0);
};
int run() override
int run() final
{
float32_t *a=this->getReadBuffer();
float32_t *b=this->getWriteBuffer();
@ -85,7 +83,7 @@ public:
status=arm_cfft_init_f16(&sfft,inputSize>>1);
};
int prepareForRunning() override
int prepareForRunning() final
{
if (this->willOverflow() ||
this->willUnderflow()
@ -97,7 +95,7 @@ public:
return(0);
};
int run() override
int run() final
{
float16_t *a=this->getReadBuffer();
float16_t *b=this->getWriteBuffer();
@ -124,7 +122,7 @@ public:
status=arm_cfft_init_q15(&sfft,inputSize>>1);
};
int prepareForRunning() override
int prepareForRunning() final
{
if (this->willOverflow() ||
this->willUnderflow()
@ -136,7 +134,7 @@ public:
return(0);
};
int run() override
int run() final
{
q15_t *a=this->getReadBuffer();
q15_t *b=this->getWriteBuffer();

@ -3,13 +3,11 @@
* Title: ICFFT.h
* Description: Node for CMSIS-DSP icfft
*
* $Date: 30 July 2021
* $Revision: V1.10.0
*
* Target Processor: Cortex-M and Cortex-A cores
* -------------------------------------------------------------------- */
/*
* Copyright (C) 2010-2021 ARM Limited or its affiliates. All rights reserved.
* --------------------------------------------------------------------
*
* Copyright (C) 2021-2023 ARM Limited or its affiliates. All rights reserved.
*
* SPDX-License-Identifier: Apache-2.0
*
@ -45,7 +43,7 @@ public:
status=arm_cfft_init_f32(&sifft,inputSize>>1);
};
int prepareForRunning() override
int prepareForRunning() final
{
if (this->willOverflow() ||
this->willUnderflow()
@ -57,7 +55,7 @@ public:
return(0);
};
int run() override
int run() final
{
float32_t *a=this->getReadBuffer();
float32_t *b=this->getWriteBuffer();
@ -85,7 +83,7 @@ public:
status=arm_cfft_init_f16(&sifft,inputSize>>1);
};
int prepareForRunning() override
int prepareForRunning() final
{
if (this->willOverflow() ||
this->willUnderflow()
@ -97,7 +95,7 @@ public:
return(0);
};
int run() override
int run() final
{
float16_t *a=this->getReadBuffer();
float16_t *b=this->getWriteBuffer();
@ -125,7 +123,7 @@ public:
status=arm_cfft_init_q15(&sifft,inputSize>>1);
};
int prepareForRunning() override
int prepareForRunning() final
{
if (this->willOverflow() ||
this->willUnderflow()
@ -137,7 +135,7 @@ public:
return(0);
};
int run() override
int run() final
{
q15_t *a=this->getReadBuffer();
q15_t *b=this->getWriteBuffer();

@ -3,13 +3,11 @@
* Title: InterleavedStereoToMono.h
* Description: Interleaved Stereo to mono stream in Q15
*
* $Date: 06 August 2021
* $Revision: V1.10.0
*
* Target Processor: Cortex-M and Cortex-A cores
* -------------------------------------------------------------------- */
/*
* Copyright (C) 2010-2021 ARM Limited or its affiliates. All rights reserved.
* --------------------------------------------------------------------
*
* Copyright (C) 2021-2023 ARM Limited or its affiliates. All rights reserved.
*
* SPDX-License-Identifier: Apache-2.0
*
@ -40,7 +38,7 @@ public:
InterleavedStereoToMono(FIFOBase<q15_t> &src,FIFOBase<q15_t> &dst):
GenericNode<q15_t,inputSize,q15_t,outputSize>(src,dst){};
int prepareForRunning() override
int prepareForRunning() final
{
if (this->willOverflow() ||
this->willUnderflow()
@ -52,7 +50,7 @@ public:
return(0);
};
int run() override
int run() final
{
q15_t *a=this->getReadBuffer();
q15_t *b=this->getWriteBuffer();
@ -72,7 +70,7 @@ public:
InterleavedStereoToMono(FIFOBase<q31_t> &src,FIFOBase<q31_t> &dst):
GenericNode<q31_t,inputSize,q31_t,outputSize>(src,dst){};
int prepareForRunning() override
int prepareForRunning() final
{
if (this->willOverflow() ||
this->willUnderflow()
@ -84,7 +82,7 @@ public:
return(0);
};
int run() override
int run() final
{
q31_t *a=this->getReadBuffer();
q31_t *b=this->getWriteBuffer();
@ -104,7 +102,7 @@ public:
InterleavedStereoToMono(FIFOBase<float32_t> &src,FIFOBase<float32_t> &dst):
GenericNode<float32_t,inputSize,float32_t,outputSize>(src,dst){};
int prepareForRunning() override
int prepareForRunning() final
{
if (this->willOverflow() ||
this->willUnderflow()
@ -116,7 +114,7 @@ public:
return(0);
};
int run() override
int run() final
{
float32_t *a=this->getReadBuffer();
float32_t *b=this->getWriteBuffer();

@ -3,13 +3,11 @@
* Title: MFCC.h
* Description: Node for CMSIS-DSP MFCC
*
* $Date: 06 October 2021
* $Revision: V1.10.0
*
* Target Processor: Cortex-M and Cortex-A cores
* -------------------------------------------------------------------- */
/*
* Copyright (C) 2010-2021 ARM Limited or its affiliates. All rights reserved.
* --------------------------------------------------------------------
*
* Copyright (C) 2021-2023 ARM Limited or its affiliates. All rights reserved.
*
* SPDX-License-Identifier: Apache-2.0
*
@ -58,7 +56,7 @@ public:
#endif
};
int prepareForRunning() override
int prepareForRunning() final
{
if (this->willOverflow() ||
this->willUnderflow()
@ -70,7 +68,7 @@ public:
return(0);
};
int run() override
int run() final
{
float32_t *a=this->getReadBuffer();
float32_t *b=this->getWriteBuffer();
@ -101,7 +99,7 @@ public:
#endif
};
int prepareForRunning() override
int prepareForRunning() final
{
if (this->willOverflow() ||
this->willUnderflow()
@ -113,7 +111,7 @@ public:
return(0);
};
int run() override
int run() final
{
float16_t *a=this->getReadBuffer();
float16_t *b=this->getWriteBuffer();
@ -140,7 +138,7 @@ public:
memory.resize(2*inputSize);
};
int prepareForRunning() override
int prepareForRunning() final
{
if (this->willOverflow() ||
this->willUnderflow()
@ -152,7 +150,7 @@ public:
return(0);
};
int run() override
int run() final
{
q31_t *a=this->getReadBuffer();
q31_t *b=this->getWriteBuffer();
@ -178,7 +176,7 @@ public:
memory.resize(2*inputSize);
};
int prepareForRunning() override
int prepareForRunning() final
{
if (this->willOverflow() ||
this->willUnderflow()
@ -190,7 +188,7 @@ public:
return(0);
};
int run() override
int run() final
{
q15_t *a=this->getReadBuffer();
q15_t *b=this->getWriteBuffer();

@ -3,13 +3,11 @@
* Title: NullSink.h
* Description: Sink doing nothing for debug
*
* $Date: 08 August 2021
* $Revision: V1.10.0
*
* Target Processor: Cortex-M and Cortex-A cores
* -------------------------------------------------------------------- */
/*
* Copyright (C) 2010-2021 ARM Limited or its affiliates. All rights reserved.
* --------------------------------------------------------------------
*
* Copyright (C) 2021-2023 ARM Limited or its affiliates. All rights reserved.
*
* SPDX-License-Identifier: Apache-2.0
*
@ -35,7 +33,7 @@ class NullSink: public GenericSink<IN, inputSize>
public:
NullSink(FIFOBase<IN> &src):GenericSink<IN,inputSize>(src){};
int prepareForRunning() override
int prepareForRunning() final
{
if (this->willUnderflow()
)
@ -46,7 +44,7 @@ public:
return(0);
};
int run() override
int run() final
{
IN *b=this->getReadBuffer();

@ -3,13 +3,11 @@
* Title: OverlapAndAdd.h
* Description: Overlap And Add
*
* $Date: 25 October 2022
* $Revision: V1.10.0
*
* Target Processor: Cortex-M and Cortex-A cores
* -------------------------------------------------------------------- */
/*
* Copyright (C) 2010-2022 ARM Limited or its affiliates. All rights reserved.
* --------------------------------------------------------------------
*
* Copyright (C) 2021-2023 ARM Limited or its affiliates. All rights reserved.
*
* SPDX-License-Identifier: Apache-2.0
*
@ -40,7 +38,7 @@ public:
memory.resize(overlap);
};
int prepareForRunning() override
int prepareForRunning() final
{
if (this->willOverflow() ||
this->willUnderflow()
@ -52,7 +50,7 @@ public:
return(0);
};
int run() override
int run() final
{
int i;
IN *a=this->getReadBuffer();

@ -3,13 +3,11 @@
* Title: SlidingBuffer.h
* Description: Sliding buffer
*
* $Date: 25 October 2022
* $Revision: V1.10.0
*
* Target Processor: Cortex-M and Cortex-A cores
* -------------------------------------------------------------------- */
/*
* Copyright (C) 2010-2022 ARM Limited or its affiliates. All rights reserved.
* --------------------------------------------------------------------
*
* Copyright (C) 2021-2023 ARM Limited or its affiliates. All rights reserved.
*
* SPDX-License-Identifier: Apache-2.0
*
@ -40,7 +38,7 @@ public:
memory.resize(overlap);
};
int prepareForRunning() override
int prepareForRunning() final
{
if (this->willOverflow() ||
this->willUnderflow()
@ -52,7 +50,7 @@ public:
return(0);
};
int run() override
int run() final
{
IN *a=this->getReadBuffer();
IN *b=this->getWriteBuffer();

@ -3,13 +3,11 @@
* Title: ToComplex.h
* Description: Node to convert real to complex
*
* $Date: 30 July 2021
* $Revision: V1.10.0
*
* Target Processor: Cortex-M and Cortex-A cores
* -------------------------------------------------------------------- */
/*
* Copyright (C) 2010-2021 ARM Limited or its affiliates. All rights reserved.
* --------------------------------------------------------------------
*
* Copyright (C) 2021-2023 ARM Limited or its affiliates. All rights reserved.
*
* SPDX-License-Identifier: Apache-2.0
*
@ -44,7 +42,7 @@ public:
ToComplex(FIFOBase<IN> &src,FIFOBase<IN> &dst):GenericNode<IN,inputSize,IN,outputSize>(src,dst){
};
int prepareForRunning() override
int prepareForRunning() final
{
if (this->willOverflow() ||
this->willUnderflow()
@ -56,7 +54,7 @@ public:
return(0);
};
int run() override
int run() final
{
IN *a=this->getReadBuffer();
IN *b=this->getWriteBuffer();

@ -3,13 +3,11 @@
* Title: ToReal.h
* Description: Node to convert complex to reals
*
* $Date: 30 July 2021
* $Revision: V1.10.0
*
* Target Processor: Cortex-M and Cortex-A cores
* -------------------------------------------------------------------- */
/*
* Copyright (C) 2010-2021 ARM Limited or its affiliates. All rights reserved.
* --------------------------------------------------------------------
*
* Copyright (C) 2021-2023 ARM Limited or its affiliates. All rights reserved.
*
* SPDX-License-Identifier: Apache-2.0
*
@ -43,7 +41,7 @@ public:
ToReal(FIFOBase<IN> &src,FIFOBase<IN> &dst):GenericNode<IN,inputSize,IN,outputSize>(src,dst){
};
int prepareForRunning() override
int prepareForRunning() final
{
if (this->willOverflow() ||
this->willUnderflow()
@ -55,7 +53,7 @@ public:
return(0);
};
int run() override
int run() final
{
IN *a=this->getReadBuffer();
IN *b=this->getWriteBuffer();

@ -3,13 +3,11 @@
* Title: Unzip.h
* Description: Node to unzip a stream of pair
*
* $Date: 30 July 2021
* $Revision: V1.10.0
*
* Target Processor: Cortex-M and Cortex-A cores
* -------------------------------------------------------------------- */
/*
* Copyright (C) 2010-2021 ARM Limited or its affiliates. All rights reserved.
* --------------------------------------------------------------------
*
* Copyright (C) 2021-2023 ARM Limited or its affiliates. All rights reserved.
*
* SPDX-License-Identifier: Apache-2.0
*
@ -46,7 +44,7 @@ public:
Unzip(FIFOBase<IN> &src,FIFOBase<IN> &dst1,FIFOBase<IN> &dst2):
GenericNode12<IN,inputSize,IN,output1Size,IN,output2Size>(src,dst1,dst2){};
int prepareForRunning() override
int prepareForRunning() final
{
if (this->willOverflow1() ||
this->willOverflow2() ||
@ -62,7 +60,7 @@ public:
/*
2*outputSize1 == 2*outSize2 == inputSize
*/
int run() override
int run() final
{
IN *a=this->getReadBuffer();
IN *b1=this->getWriteBuffer1();

@ -3,13 +3,11 @@
* Title: Zip.h
* Description: Node to zip a pair of stream
*
* $Date: 06 August 2021
* $Revision: V1.10.0
*
* Target Processor: Cortex-M and Cortex-A cores
* -------------------------------------------------------------------- */
/*
* Copyright (C) 2010-2021 ARM Limited or its affiliates. All rights reserved.
* --------------------------------------------------------------------
*
* Copyright (C) 2021-2023 ARM Limited or its affiliates. All rights reserved.
*
* SPDX-License-Identifier: Apache-2.0
*
@ -39,7 +37,7 @@ public:
Zip(FIFOBase<IN> &src1,FIFOBase<IN> &src2,FIFOBase<IN> &dst):
GenericNode21<IN,inputSize,IN,inputSize,IN,outputSize>(src1,src2,dst){};
int prepareForRunning() override
int prepareForRunning() final
{
if (this->willOverflow() ||
this->willUnderflow1() ||
@ -52,7 +50,7 @@ public:
return(0);
};
int run() override
int run() final
{
IN *a1=this->getReadBuffer1();
IN *a2=this->getReadBuffer2();

@ -3,13 +3,11 @@
* Title: FileSink.h
* Description: Node for creating File sinks
*
* $Date: 30 July 2021
* $Revision: V1.10.0
*
* Target Processor: Cortex-M and Cortex-A cores
* -------------------------------------------------------------------- */
/*
* Copyright (C) 2010-2021 ARM Limited or its affiliates. All rights reserved.
* --------------------------------------------------------------------
*
* Copyright (C) 2021-2023 ARM Limited or its affiliates. All rights reserved.
*
* SPDX-License-Identifier: Apache-2.0
*
@ -35,7 +33,7 @@ class FileSink: public GenericSink<IN, inputSize>
public:
FileSink(FIFOBase<IN> &src, std::string name):GenericSink<IN,inputSize>(src),output(name){};
int prepareForRunning() override
int prepareForRunning() final
{
if (this->willUnderflow()
)
@ -46,7 +44,7 @@ public:
return(0);
};
int run() override
int run() final
{
IN *b=this->getReadBuffer();

@ -3,13 +3,11 @@
* Title: FileSource.h
* Description: Node for creating File sources
*
* $Date: 30 July 2021
* $Revision: V1.10.0
*
* Target Processor: Cortex-M and Cortex-A cores
* -------------------------------------------------------------------- */
/*
* Copyright (C) 2010-2021 ARM Limited or its affiliates. All rights reserved.
* --------------------------------------------------------------------
*
* Copyright (C) 2021-2023 ARM Limited or its affiliates. All rights reserved.
*
* SPDX-License-Identifier: Apache-2.0
*
@ -46,7 +44,7 @@ public:
};
int prepareForRunning() override
int prepareForRunning() final
{
if (this->willOverflow()
)
@ -57,7 +55,7 @@ public:
return(0);
};
int run() override
int run() final
{
string str;
int i;

@ -3,13 +3,11 @@
* Title: GenericNodes.h
* Description: C++ support templates for the compute graph with static scheduler
*
* $Date: 29 July 2021
* $Revision: V1.10.0
*
* Target Processor: Cortex-M and Cortex-A cores
* -------------------------------------------------------------------- */
/*
* Copyright (C) 2010-2022 ARM Limited or its affiliates. All rights reserved.
* --------------------------------------------------------------------
*
* Copyright (C) 2021-2023 ARM Limited or its affiliates. All rights reserved.
*
* SPDX-License-Identifier: Apache-2.0
*
@ -83,16 +81,23 @@ class FIFO<T,length,0,0>: public FIFOBase<T>
FIFO(uint8_t *buffer,int delay=0):mBuffer((T*)buffer),readPos(0),writePos(delay) {};
/* Not used in synchronous mode */
bool willUnderflowWith(int nb) const override {return false;};
bool willOverflowWith(int nb) const override {return false;};
int nbSamplesInFIFO() const override {return 0;};
bool willUnderflowWith(int nb) const final {return false;};
bool willOverflowWith(int nb) const final {return false;};
int nbSamplesInFIFO() const final {return 0;};
T * getWriteBuffer(int nb) override
T * getWriteBuffer(int nb) final
{
T *ret;
if (readPos > 0)
{
/* This is re-aligning the read buffer.
Aligning buffer is better for vectorized code.
But it has an impact since more memcpy are
executed than required.
This is likely to be not so useful in practice
so a future version will optimize the memcpy usage
*/
memcpy((void*)mBuffer,(void*)(mBuffer+readPos),(writePos-readPos)*sizeof(T));
writePos -= readPos;
readPos = 0;
@ -103,7 +108,7 @@ class FIFO<T,length,0,0>: public FIFOBase<T>
return(ret);
};
T* getReadBuffer(int nb) override
T* getReadBuffer(int nb) final
{
T *ret = mBuffer + readPos;
@ -145,16 +150,16 @@ class FIFO<T,length,1,0>: public FIFOBase<T>
FIFO(uint8_t *buffer,int delay=0):mBuffer((T*)buffer),readPos(0),writePos(delay) {};
/* Not used in synchronous mode */
bool willUnderflowWith(int nb) const override {return false;};
bool willOverflowWith(int nb) const override {return false;};
int nbSamplesInFIFO() const override {return 0;};
bool willUnderflowWith(int nb) const final {return false;};
bool willOverflowWith(int nb) const final {return false;};
int nbSamplesInFIFO() const final {return 0;};
T * getWriteBuffer(int nb) override
T * getWriteBuffer(int nb) final
{
return(mBuffer);
};
T* getReadBuffer(int nb) override
T* getReadBuffer(int nb) final
{
return(mBuffer);
}
@ -198,7 +203,7 @@ class FIFO<T,length,0,1>: public FIFOBase<T>
before using this function
*/
T * getWriteBuffer(int nb) override
T * getWriteBuffer(int nb) final
{
T *ret;
@ -221,7 +226,7 @@ class FIFO<T,length,0,1>: public FIFOBase<T>
before using this function
*/
T* getReadBuffer(int nb) override
T* getReadBuffer(int nb) final
{
T *ret = mBuffer + readPos;
@ -230,17 +235,17 @@ class FIFO<T,length,0,1>: public FIFOBase<T>
return(ret);
}
bool willUnderflowWith(int nb) const override
bool willUnderflowWith(int nb) const final
{
return((nbSamples - nb)<0);
}
bool willOverflowWith(int nb) const override
bool willOverflowWith(int nb) const final
{
return((nbSamples + nb)>length);
}
int nbSamplesInFIFO() const override {return nbSamples;};
int nbSamplesInFIFO() const final {return nbSamples;};
#ifdef DEBUGSCHED
void dump()
@ -423,7 +428,7 @@ public:
Duplicate2(FIFOBase<IN> &src,FIFOBase<IN> &dst1,FIFOBase<IN> &dst2):
GenericNode12<IN,inputSize,IN,inputSize,IN,inputSize>(src,dst1,dst2){};
int prepareForRunning() override
int prepareForRunning() final
{
if (this->willUnderflow() ||
this->willOverflow1() ||
@ -435,7 +440,7 @@ public:
return(0);
};
int run() override {
int run() final {
IN *a=this->getReadBuffer();
IN *b1=this->getWriteBuffer1();
IN *b2=this->getWriteBuffer2();
@ -475,7 +480,7 @@ public:
IN,inputSize,
IN,inputSize>(src,dst1,dst2,dst3){};
int prepareForRunning() override
int prepareForRunning() final
{
if (this->willUnderflow() ||
this->willOverflow1() ||
@ -489,7 +494,7 @@ public:
return(0);
};
int run() override {
int run() final {
IN *a=this->getReadBuffer();
IN *b1=this->getWriteBuffer1();
IN *b2=this->getWriteBuffer2();

@ -1,3 +1,29 @@
/* ----------------------------------------------------------------------
* Project: CMSIS DSP Library
* Title: cg_status.h
* Description: Error code for the Compute Graph
*
*
* Target Processor: Cortex-M and Cortex-A cores
* --------------------------------------------------------------------
*
* Copyright (C) 2021-2023 ARM Limited or its affiliates. All rights reserved.
*
* SPDX-License-Identifier: Apache-2.0
*
* Licensed under the Apache License, Version 2.0 (the License); you may
* not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an AS IS BASIS, WITHOUT
* WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#ifndef _CG_STATUS_H_

@ -0,0 +1,106 @@
# C Code generation
## API
```python
def ccode(self,directory,config=Configuration())
```
It is a method of the `Schedule` object returned by `computeSchedule`.
It generate C++ code implementing the static schedule.
* `directory` : The directory where to generate the C++ files
* `config` : An optional configuration object
## Options for C Code Generation
### cOptionalArgs (default = "")
Optional arguments to pass to the C API of the scheduler function
It can either use a `string` or a list of `string` where an element is an argument of the function (and should be valid `C`).
For instance:
```Python
conf.cOptionalArgs=["int someVariable"]
```
### codeArray (default = True)
When true, the scheduling is defined as an array. Otherwise, a list of function calls is generated.
A list of function call may be easier to read but if the schedule is long, it is not good for code size. In that case, it is better to encode the schedule as an array rather than a list of functions.
When `codeArray` is True, the option `switchCase`can also be used.
### switchCase (default = True)
`codeArray` must be true or this option is ignored.
When the schedule is encoded as an array, it can either be an array of function pointers (`switchCase` false) or an array of indexes for a state machine (`switchCase` true)
### eventRecorder (default = False)
Enable the generation of `CMSIS EventRecorder` intrumentation in the code. The CMSIS-DSP Pack is providing definition of 3 events:
* Schedule iteration
* Node execution
* Error
### customCName (default = "custom.h")
Name of custom header in generated C code. If you use several scheduler, you may want to use different headers for each one.
### postCustomCName (default = "")
Name of custom header in generated C code coming after all of the other includes. By default none is used.
### genericNodeCName (default = "GenericNodes.h")
Name of GenericNodes header in generated C code. If you use several scheduler, you may want to use different headers for each one.
### appNodesCName (default = "AppNodes.h")
Name of AppNodes header in generated C code. If you use several scheduler, you may want to use different headers for each one.
### schedulerCFileName (default = "scheduler")
Name of scheduler `cpp` and header in generated C code. If you use several scheduler, you may want to use different headers for each one.
If the option is set to `xxx`, the names generated will be `xxx.cpp` and `xxx.h`
### CAPI (default = True)
By default, the scheduler function is callable from C. When false, it is a standard C++ API.
### CMSISDSP (default = True)
If you don't use any of the datatypes or functions of the CMSIS-DSP, you don't need to include the `arm_math.h` in the scheduler file. This option can thus be set to `False`.
### asynchronous (default = False)
When true, the scheduling is for a dynamic / asynchronous flow. A node may not always produce or consume the same amount of data. As consequence, a scheduling can fail. Each node needs to implement a `prepareForRunning` function to identify and recover from FIFO underflows and overflows.
A synchronous schedule is used as start and should describe the average case.
This implies `codeArray` and `switchCase`. This disables `memoryOptimizations`.
Synchronous FIFOs that are just buffers will be considered as FIFOs in asynchronous mode.
More info are available in the documentation for [this mode](../Async.md).
### FIFOIncrease (default 0)
In case of dynamic / asynchronous scheduling, the FIFOs may need to be bigger than what is computed assuming a static / synchronous scheduling. This option is used to increase the FIFO size. It represents a percent increase.
For instance, a value of `10` means the FIFO will have their size updated from `oldSize` to `1.1 * oldSize` which is ` (1 + 10%)* oldSize`
If the value is a `float` instead of an `int` it will be used as is. For instance, `1.1` would increase the size by `1.1` and be equivalent to the setting `10` (for 10 percent).
### asyncDefaultSkip (default True)
Behavior of a pure function (like CMSIS-DSP) in asynchronous mode. When `True`, the execution is skipped if the function can't be executed. If `False`, an error is raised.
If another error recovery is needed, the function must be packaged into a C++ class to implement a `prepareForRun` function.

@ -0,0 +1,455 @@
# CPP Nodes and classes
## Mandatory classes
Those classes are defined in `GenericNodes.h` a header that is always included by the scheduler.
As consequence, the definition for those classes is always included : that's the meaning of mandatory.
### FIFO
FIFO classes are inheriting from the virtual class `FIFOBase`:
```C++
template<typename T>
class FIFOBase{
public:
virtual T* getWriteBuffer(int nb)=0;
virtual T* getReadBuffer(int nb)=0;
virtual bool willUnderflowWith(int nb) const = 0;
virtual bool willOverflowWith(int nb) const = 0;
virtual int nbSamplesInFIFO() const = 0;
};
```
The functions `willUnderflowWith`, `willOverflowWith` and `nbSamplesInFIFO` are only used in asynchronous mode.
If you implement a FIFO for synchronous mode you only need to implement `getWriteBuffer` and `getReadBuffer`.
FIFO must be templates with a type defined as:
```C++
template<typename T, int length, int isArray=0, int isAsync = 0>
class FIFO;
```
* `T` is a C datatype that must have value semantic : standard C type like `float` or `struct`
* `length` is the length of the FIFO in **samples**
* `isArray` is set to 1 when the scheduler has identified that the FIFO is always used as a buffer. So it is possible to provide a more optimized implementation for this case
* `isAsync` is set to 1 for the asynchronous mode
If you implement you own FIFO class, it should come from a template with the same arguments. For instance:
```C++
template<typename T, int length, int isArray=0, int isAsync = 0>
class MyCustomFIFO;
```
and it should inherit from `FIFOBase<T>`.
`GenericNodes.h` is providing 3 default implementations. Their are specialization of the FIFO template:
#### FIFO for synchronous mode
```C++
template<typename T, int length>
class FIFO<T,length,0,0>: public FIFOBase<T>
```
#### Buffer for synchronous mode
In some case a FIFO is just used as a buffer. An optimized implementation for this case is provided
```C++
template<typename T, int length>
class FIFO<T,length,1,0>: public FIFOBase<T>
```
In this mode, the FIFO implementation is very light. For instance, for `getWriteBuffer` we have:
```C++
T * getWriteBuffer(int nb) const final
{
return(mBuffer);
};
```
#### FIFO for asynchronous mode
```C++
template<typename T, int length>
class FIFO<T,length,0,1>: public FIFOBase<T>
```
This implementation is a bit more heavy and is providing implementations of following function that is doing something useful :
```C++
bool willUnderflowWith(int nb) const;
bool willOverflowWith(int nb) const;
int nbSamplesInFIFO() const;
```
### Nodes
Nodes are inheriting from the virtual class:
```C++
class NodeBase
{
public:
virtual int run()=0;
virtual int prepareForRunning()=0;
};
```
`GenericNode`, `GenericSource` and `GenericSink` are providing accesses to the FIFOs for each IO. The goal of those wrappers is to define the IOs (number of IO, their type and length) and hide the API to the FIFOs.
There are different versions depending on the number of inputs and/or output. Other nodes of that kind can be created by the user if different IO configurations are required:
#### GenericNode
The template is:
```C++
template<typename IN, int inputSize,
typename OUT, int outputSize>
class GenericNode:public NodeBase
```
There is one input and one output.
The constructor is:
```C++
GenericNode(FIFOBase<IN> &src,FIFOBase<OUT> &dst);
```
It is taking the input and output FIFOs as argument. The real type of the FIFO is hidden since the type `FIFOBase` is used. So `GenericNode` can be used with any FIFO implementation.
The main role of this `GenericNode` class is to provide functions to connect to the FIFOs.
The functions to access the FIFO buffers are:
```C++
OUT * getWriteBuffer(int nb = outputSize);
IN * getReadBuffer(int nb = inputSize);
```
`getWriteBuffer` is getting a pointer to a buffer of length `nb` to write the output samples.
`getReadBuffer` is getting a pointer to a buffer of length `nb` to read the input samples.
`nb` must be chosen so that there is no underflow / overflow. In synchronous mode, it will work by design if the length defined in the template argument is used. The template length is thus chosen as default value for `nb`.
This value may be changed in cyclo-static or asynchronous mode. In asynchronous mode, additional functions are provided to test for a possibility of underflow / overflow **before** getting a pointer to the buffer.
It is done with following function that are also provided by `GenericNode`:
```C++
bool willOverflow(int nb = outputSize);
bool willUnderflow(int nb = inputSize);
```
All of those functions introduced by `GenericNode` are doing nothing more than calling the underlying FIFO methods. But they hide those FIFOs from the user code. The FIFO can only be accessed through those APIs.
#### GenericNode12
Same as `GenericNode` but with two outputs.
```C++
template<typename IN, int inputSize,
typename OUT1, int output1Size
typename OUT2, int output2Size>
class GenericNode12:public NodeBase
```
It provides:
```C++
IN * getReadBuffer(int nb=inputSize);
OUT1 * getWriteBuffer1(int nb=output1Size);
OUT2 * getWriteBuffer2(int nb=output2Size);
bool willUnderflow(int nb = inputSize);
bool willOverflow1(int nb = output1Size);
bool willOverflow2(int nb = output2Size);
```
#### GenericNode13
Same but with 3 outputs.
#### GenericNode21
Same but with 2 inputs and 1 output.
#### GenericSource
Similar to a `GenericNode` but there is no inputs.
#### GenericSink
Similar to a `GenericNode` but there is no outputs.
#### Duplicate2
This node is duplicating its input to 2 outputs.
The template is:
```C++
template<typename IN, int inputSize,
typename OUT1,int output1Size,
typename OUT2,int output2Size>
class Duplicate2;
```
Only one specialization of this template makes sense : the output must have same type and same length as the input.
```C++
template<typename IN, int inputSize>
class Duplicate2<IN,inputSize,
IN,inputSize,
IN,inputSize> :
public GenericNode12<IN,inputSize,
IN,inputSize,
IN,inputSize>
```
#### Duplicate3
Similar to `Duplicate2` but with 3 outputs.
## Optional nodes
Those nodes are not included by default. They can be found in `ComputeGraph/cg/nodes/cpp`
To use any of them you just need to include the header (for instance in your `AppNodes.h` file):
```C++
#include "CFFT.h"
```
### CFFT / CIFFT
Those nodes are for using the CMSIS-DSP FFT.
Template:
```C++
template<typename IN, int inputSize,
typename OUT,int outputSize>
class CFFT;
```
Specialization provided only for `float32_t`, `float16_t`,`q15_t`.
The wrapper is copying the input buffer before doing the FFT (since CMSIS-DSP FFT is modifying the input buffer). It is normally possible to modify the input buffer even if it is in the input FIFO.
This implementation has made the choice of not touching the input FIFO with the cost of an additional copy.
Other data types can be easily added based on the current provided example. The user can just implement other specializations.
`CIFFT` is defined with class `CIFFT`.
### InterleavedStereoToMono
Deinterleave a stream of stereo samples to **one** stream of mono samples.
Template:
```C++
template<typename IN, int inputSize,
typename OUT,int outputSize>
class InterleavedStereoToMono;
```
For specialization `q15_t` and `q31_t`, the inputs are divided by 2 before being added to avoid any overflow.
For specialization `float32_t` : The output is multiplied by `0.5f` for consistency for the fixed point version.
### MFCC
Those nodes are for using the CMSIS-DSP MFCC.
Template:
```C++
template<typename IN, int inputSize,
typename OUT,int outputSize>
class MFCC;
```
Specializations provided for `float32_t`, `float16_t`, `q31_t` and `q15_t`.
The MFCC is requiring a temporary buffer. The wrappers are thus allocating a memory buffer during initialization of the node.
The buffer is allocated as a C++ vector. See the documentation of the MFCC in CMSIS-DSP to know more about the size of this buffer.
### NullSink
Template:
```C++
template<typename IN, int inputSize>
class NullSink: public GenericSink<IN, inputSize>
```
It is useful for development and debug. This node is doing nothing and just consuming its input.
### OverlapAndAdd
Template:
```c++
template<typename IN,int windowSize, int overlap>
class OverlapAdd: public GenericNode<IN,windowSize,IN,windowSize-overlap>
```
There are two sizes in the template arguments : `windowSize` and `overlap`.
From those size, the template is computing the number of samples consumed and produced by the node.
The implementation is generic but will only build for a type `IN` having an addition operator.
This node is using a little memory (C++ vector) of size `overlap` that is allocated during creation of the node.
This node will overlap input data by `overlap` samples and add the common overlapping samples.
### SlidingBuffer
Template:
```C++
template<typename IN,int windowSize, int overlap>
class SlidingBuffer: public GenericNode<IN,windowSize-overlap,IN,windowSize>
```
There are two sizes in the template arguments : `windowSize` and `overlap`.
For those size, the template is computing the number of samples consumed and produced by the node.
The implementation is generic and will work with all types.
This node is using a little memory (C++ vector) of size `overlap` allocated during creation of the node.
This node is moving a window on the input data with an overlap. The output data is the content of the window.
Note that this node is not doing any multiplication with window functions that can be found in signal processing literature. This multiplication has to be implemented in the compute graph in a separate node.
### ToComplex
Template:
```C++
template<typename IN, int inputSize,
typename OUT,int outputSize>
class ToComplex;
```
Convert a stream of reals a b c d ... to complexes a 0 b 0 c 0 d 0 ...
The implementation is generic and does not enforce the required size constraints.
### ToReal
Template:
```C++
template<typename IN, int inputSize,typename OUT,int outputSize>
class ToReal;
```
Convert a stream of complex a 0 b 0 c 0 ... to reals a b c ...
The implementation is generic and does not enforce the required size constraints.
### Unzip
Template:
```C++
template<typename IN, int inputSize,
typename OUT1,int output1Size,
typename OUT2,int output2Size>
class Unzip;
```
Unzip a stream a1 a2 b1 b2 c1 c2 ...
Into 2 streams:
a1 b1 c1 ...
a2 b2 c2 ...
The implementation is generic and does not enforce the required size constraints.
### Zip
Template:
```C++
template<typename IN1, int inputSize1,
typename IN2,int inputSize2,
typename OUT,int outputSize>
class Zip;
```
Transform two input streams:
a1 b1 c1 ...
a2 b2 c2 ...
into one output stream:
a1 a2 b1 b2 c1 c2 ...
The implementation is generic and does not enforce the required size constraints
### Host
Those nodes are for host (Windows, Linux, Mac). They can be useful to experiment with a compute graph.
By default there is no nodes to read / write `.wav` files but you can easily add some if needed (`dr_wav.h` is a simple way to add `.wav` reading / writing and is freely available from the web).
#### FileSink
Template
```C++
template<typename IN, int inputSize>
class FileSink: public GenericSink<IN, inputSize>
```
Write the input samples to a file. The implementation is generic and use iostream for writing the datatype.
The constructor has an additional argument : the name/path of the output file:
```C++
FileSink(FIFOBase<IN> &src, std::string name)
```
#### FileSource
Template:
```C++
template<typename OUT,int outputSize> class FileSource;
```
There is only one specialization for the `float32_t` type.
It is reading text file with one float per file and generating a stream of float.
At the end of file, 0s are generated on the output indefinitely.
The constructor has an additional argument : the name/path of the input file:
```C++
FileSource(FIFOBase<float32_t> &dst,std::string name)
```

@ -0,0 +1,29 @@
# Common options for the code generators
Global options for the code generators. There are specific options for the C, Python and Graphviz generators. They are described in different part of the documentation.
## debugLimit (default = 0)
When `debugLimit` is > 0, the number of iterations of the scheduling is limited to `debugLimit`. Otherwise, the scheduling is running forever or until an error has occured.
## dumpFIFO (default = False)
When true, generate some code to dump the FIFO content at **runtime**. Only useful for debug.
In C++ code generation, it is only available when using the mode `codeArray == False`.
When this mode is enabled, the first line of the scheduler file is :
`#define DEBUGSCHED 1`
and it also enable some debug code in `GenericNodes.h`
## schedName (default = "scheduler")
Name of the scheduler function used in the generated code.
## prefix (default = "")
Prefix to add before the FIFO buffer definitions. Those buffers are not static and are global. If you want to use several schedulers in your code, the buffer names used by each should be different.
Another possibility would be to make the buffer static by redefining the macro `CG_BEFORE_BUFFER`

@ -0,0 +1,129 @@
# Generic and functions bodes
The generic and function nodes are the basic nodes that you use to create other kind of nodes in the graph.
There are 3 generic classes provided by the framework to be used to create new nodes :
* `GenericSource`
* `GenericNode`
* `GenericSink`
They are defined in `cmsisdsp.cg.scheduler`
There are 3 other classes that can be used to create new nodes from functions:
* `Unary`
* `Binary`
* `Dsp`
## Generic Nodes
Any new kind of node must inherit from one of those classes. Those classes are providing the methods `addInput` and/or `addOutput` to define new IOs.
The method `typeName` from the parent class must be overridden.
A new kind of node is generally defined as:
```python
class ProcessingNode(GenericNode):
def __init__(self,name,theType,inLength,outLength):
GenericNode.__init__(self,name)
self.addInput("i",theType,inLength)
self.addOutput("o",theType,outLength)
@property
def typeName(self):
return "ProcessingNode"
```
See the [simple](../examples/simple/README.md) example for more explanation about how to define a new node.
### Methods
The constructor of the node is using the `addInput` and/or `addOutput` to define new IOs.
```python
def addInput(self,name,theType,theLength):
```
* `name` is the name of the input. It will becomes a property of the Python object so it must not conflict with existing properties. If `name` is, for instance, "i" then it can be accessed with `node.i` in the code
* `theType` is the datatype of the IO. It must inherit from `CGStaticType` (see below for more details about defining the types)
* `theLength` is the amount of **samples** consumed by this IO at each execution of the node
```python
def addOutput(self,name,theType,theLength):
```
* `name` is the name of the input. It will becomes a property of the Python object so it must not conflict with existing properties. If `name` is, for instance, "o" then it can be accessed with `node.o` in the code
* `theType` is the datatype of the IO. It must inherit from `CGStaticType` (see below for more details about defining the types)
* `theLength` is the amount of **samples** produced by this IO at each execution of the node
```python
@property
def typeName(self):
return "ProcessingNode"
```
This method defines the name of the C++ class implementing the wrapper for this node.
### Datatypes
Datatypes for the IOs are inheriting from `CGStaticType`.
Currently there are two classes defined:
* `CType` for the standard CMSIS-DSP types
* `CStructType` for a C struct
#### CType
You create such a type with `CType(id)` where `id` is one of the constant coming from the Python wrapper:
* F64
* F32
* F16
* Q31
* Q15
* Q7
* UINT32
* UINT16
* UINT8
* SINT32
* SINT16
* SINT8
For instance, to define a `float32_t` type for an IO you can use `CType(F32)`
#### CStructType
The constructor has the following definition
```python
def __init__(self,name,python_name,size_in_bytes):
```
* `name` is the name of the C struct
* `python_name` is the name of the Python class implementing this type (when you generate a Python schedule)
* `size_in_bytes` is the size of the struct. It should take into account padding. It is used in case of buffer sharing since the datatype of the shared buffer is `int8_t`. The Python script must be able to compute the size of those buffers and needs to know the size of the structure.
In Python, there is no `struct`. This datatype is mapped to an object. Object have reference type. Compute graph FIFOs are assuming a value type semantic.
As consequence, in Python side you should never copy those structs since it would copy the reference. You should instead copy the members of the struct.
If you don't plan on generating a Python scheduler, you can just use whatever name you want for the `python_name`. It will be ignored by the C++ code generation.
## Function and constant nodes
A Compute graph C++ wrapper is useful when the software components you use have a state that needs to be initialized in the C++ constructor, and preserved between successive calls to the `run` method of the wrapper.
Most CMSIS-DSP functions have no state. The compute graph framework is providing some ways to easily use functions in the graph without having to write a wrapper.
This feature is relying on the nodes:
* `Unary`
* `Binary`
* `Dsp`
* `Constant`
All of this is explained in detail in the [simple example with CMSIS-DSP](../examples/simpledsp/README.md).

@ -0,0 +1,57 @@
# API of the Graph Class
## Creating a connection
Those methods must be applied to a graph object created with `Graph()`. The `Graph` class is defined inside `cmsisdsp.cg.scheduler` from the CMSIS-DSP Python wrapper.
```python
def connect(self,input_io,output_io,fifoClass=None,fifoScale = 1.0):
```
Typically this method is used as:
```python
the_graph = Graph()
# Connect the source output to the processing node input
the_graph.connect(src.o,processing.i)
```
There are two optional arguments:
* `fifoClass` : To use a different C++ class for implementing the connection between the two IOs. (it is also possible to change the FIFO class globally by setting an option on the graph. See below). Only the `FIFO` class is provided by default. Any new implementation must inherit from `FIFObase<T>`
* `fifoScale` : In asynchronous mode, it is a scaling factor to increase the length of the FIFO compared to what has been computed by the synchronous approximation. This setting can also be set globally using the scheduler options. `fifoScale` is overriding the global setting. It must be a `float` (not an `int`).
```python
def connectWithDelay(self,input_io,output_io,delay,fifoClass=None,fifoScale=1.0):
```
The only difference with the previous function is the `delay` argument. It could be used like:
```python
the_graph.connect(src.o,processing.i, 10)
```
The `delay` is the number of samples contained in the FIFO at start (initialized to zero). The FIFO length (computed by the scheduling) is generally bigger by this amount of sample. The result is that it is delaying the output by `delay` samples.
It is generally useful when the graph has some loops to make it schedulable.
## Options for the graph
Those options needs to be used on the graph object created with `Graph()`.
For instance :
```python
g = Graph()
g.defaultFIFOClass = "FIFO"
```
### defaultFIFOClass (default = "FIFO")
Class used for FIFO by default. Can also be customized for each connection (`connect` of `connectWithDelay` call).
### duplicateNodeClassName(default="Duplicate")
Prefix used to generate the duplicate node classes like `Duplicate2`, `Duplicate3` ...

@ -0,0 +1,24 @@
# Graphviz generation
## API
```python
def graphviz(self,f,config=Configuration())
```
It is a method of the `Schedule` object returned by `computeSchedule`.
* `f` : Opened file where to write the graphviz description
* `config` : An optional configuration object
## Options for the graphviz generator
### horizontal (default = True)
Horizontal or vertical layout for the graph.
### displayFIFOBuf (default = False)
By default, the graph is displaying the FIFO sizes computed as result of the scheduling. If you want to know the FIFO variable names used in the code, you can set this option to true and the graph will display the FIFO variable names.

@ -0,0 +1,79 @@
# Memory optimizations
## Buffers
Sometimes, a FIFO is in fact a buffer. In below graph, the source is writing 5 samples and the sink is reading 5 samples.
![buffer](buffer.png)
The scheduling will obviously be something like:
`Source, Sink, Source, Sink ...`
In this case, the FIFO is used as a simple buffer. The read and the write are always taking place from the start of the buffer.
The schedule generator will detect FIFOs that are used as buffer and the FIFO implementation will be replaced by buffers : the third argument of the template (`isArray`) is set to one:
```C++
FIFO<float32_t,FIFOSIZE0,1,0> fifo0(buf1);
```
## Buffer sharing
When several FIFOs are used as buffers then it may be possible to share the underlying memory for all of those buffers. This optimization is enabled by setting `memoryOptimization` to `true` in the configuration object:
```python
conf.memoryOptimization=True
```
The optimization depends on how the graph has been scheduled.
With the following graph there is a possibility for buffer sharing:
![memory](memory.png)
Without `memoryOptimization`, the FIFO are consuming 60 bytes (4*5 * 3 FIFOs). With `memoryOptimization`, only 40 bytes are needed.
You cannot share memory for the input / output of a node since a node needs both to read and write for its execution. This imposes some constraints on the graph.
The constraints are internally represented by a different graph that represents when buffers are live at the same time : the interference graph. The input / output buffers of a node are live at the same time. Graph coloring is used to identify, from this graph of interferences, when memory for buffers can be shared.
The interference graph is highly depend on how the compute graph is scheduled : a buffer is live when a write has taken place but no read has yet read the full content.
For the above compute graph and its computed schedule, the interference graph would be:
![inter](inter.png)
Adjacent vertices in the graph should use different colors. A coloring of this graph is equivalent to assigning memory areas. Graph coloring of the previous interference graph is giving the following buffer sharing:
![fifos](fifos.png)
The dimension of the buffer is the maximum for all the edges using this buffers.
In the C++ code it is represented as:
```C++
#define BUFFERSIZE0 20
CG_BEFORE_BUFFER
uint8_t buf0[BUFFERSIZE0]={0};
```
`uint8_t` is used (instead of the `float32_t` of this example) because different edges of the graph may use different datatypes.
It is really important that you use the macro `CG_BEFORE_BUFFER` to align this buffer so that the alignment is coherent with the datatype used on all the FIFOs.
### Shared buffer sizing
Let's look at a more complex example to see how the size of the shared buffer is computed:
![shared_complex](shared_complex.png)
The source is generating 10 samples instead of 5. The FIFOs are using 80 bytes without buffer sharing.
With buffer sharing, 60 bytes are used. The buffer sharing is:
![shared_complex_buffer](shared_complex_buffer.png)
Buffer 1 is used by first and last edge in the graph. The dimension of this buffer is 40 bytes : big enough to be usable by edge 0 and edge 3 in the graph.

@ -0,0 +1,30 @@
# Python API
Python APIs to describe the nodes and graph and generate the C++, Python or Graphviz code.
1. ## [Graph class](Graph.md)
2. ## [Generic and function nodes](Generic.md)
3. ## Scheduler
1. ### [Schedule computation](SchedOptions.md)
2. ### Code generation
1. #### [C++ Code generation](CCodeGen.md)
2. #### [Python code generation](PythonGen.md)
3. #### [Graphviz representation](GraphvizGen.md)
4. #### [Common options](CodegenOptions.md)

@ -0,0 +1,34 @@
# Python code generation
## API
```python
def pythoncode(self,directory,config=Configuration())
```
It is a method of the `Schedule` object returned by `computeSchedule`.
It generate Python code to implement the static schedule.
* `directory` : The directory where to generate the C++ files
* `config` : An optional configuration object
## Options for Python code generation
### pyOptionalArgs (default = "")
Optional arguments to pass to the Python version of the scheduler function
### customPythonName (default = "custom")
Name of custom header in generated Python code. If you use several scheduler, you may want to use different headers for each one.
### appNodesPythonName (default = "appnodes")
Name of AppNodes header in generated Python code. If you use several scheduler, you may want to use different headers for each one.
### schedulerPythonFileName (default = "sched")
Name of scheduler file in generated Python code. If you use several scheduler, you may want to use different headers for each one.
If the option is set to `xxx`, the name generated will be `xxx.py`

@ -0,0 +1,65 @@
# Python Nodes and classes
(DOCUMENTATION TO BE WRITTEN)
## Mandatory classes
FIFO
GenericNode
GenericNode12
GenericNode13
GenericNode21
GenericSource
GenericSink
OverlapAdd
SlidingBuffer
## Optional nodes
CFFT
CIFFT
InterleavedStereoToMono
MFCC
NullSink
ToComplex
ToReal
Unzip
Zip
Duplicate
Duplicate2
Duplicate3
### Host
FileSink
FileSource
WavSource
WavSink
NumpySink
VHTSource
VHTSink

@ -0,0 +1,49 @@
# Schedule computation
## API
```python
def computeSchedule(self,config=Configuration()):
```
This is a method on the `Graph` object. It can take an optional `Configuration` object.
It returns a `Schedule` object. This object contains:
* A description of the static schedule
* The computed size of the FIFOs
* The FIFOs
* The buffers for the FIFOs (with sharing when possible if memory optimizations were enabled)
* A rewritten graph with `Duplicate` nodes inserted
## Options for the scheduling
Those options needs to be used on a configuration objects passed as argument of the scheduling function. For instance:
```python
conf = Configuration()
conf.debugLimit = 10
sched = g.computeSchedule(config = conf)
```
Note that the configuration object also contain options for the code generators. They are described in different part of the documentation.
### memoryOptimization (default = False)
When the amount of data written to a FIFO and read from the FIFO is the same, the FIFO is just an array. In this case, depending on the scheduling, the memory used by different arrays may be reused if those arrays are not needed at the same time.
This option is enabling an analysis to optimize the memory usage by merging some buffers when it is possible.
### sinkPriority (default = True)
Try to prioritize the scheduling of the sinks to minimize the latency between sources and sinks.
When this option is enabled, the tool may not be able to find a schedule in all cases. If it can't find a schedule, it will raise a `DeadLock` exception.
### displayFIFOSizes (default = False)
During computation of the schedule, the evolution of the FIFO sizes is generated on `stdout`.
### dumpSchedule (default = False)
During computation of the schedule, the human readable schedule is generated on `stdout`.

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.8 KiB

@ -1,483 +0,0 @@
# Example 1
In this example we will see how to describe the following graph:
<img src="graph1.PNG" alt="graph1" style="zoom:50%;" />
The framework is coming with some default blocks. But for this example, we will create new blocks. The blocks that you to create need must be described with a simple Python class and a corresponding simple C++ class.
## The steps
It looks complex because there is a lot of information but the process is always the same:
1. You define new kind of nodes in the Python. They define the IOs, sample types and amount of data read/written on each IO
2. You create instance of those new kind of Nodes
3. You connect them in a graph and generate a schedule
4. In your AppNodes.h file , you implement the new kind of nodes with C++ templates:
1. The class is generally not doing a lot : defining the IOs and the function to call when run
5. If you need more control on the initialization, it is possible to pass additional arguments to the node constructors and to the scheduler function.
## Python code
Let's analyze the file `graph.py` in the `example1` folder. This file is describing the graph and the node and is calling the Python functions to generate the dot and C++ files.
First, we add some path so that the example can find the CG static packages when run from example1 folder.
```python
from cmsisdsp.cg.scheduler import *
```
Then, we describe the new kind of blocks that we need : Source, ProcessingNode and Sink.
```python
class Sink(GenericSink):
def __init__(self,name,theType,inLength):
GenericSink.__init__(self,name)
self.addInput("i",theType,inLength)
@property
def typeName(self):
return "Sink"
```
When creating a new kind of node (here a sink) we always need to do 2 things:
- Add a type in typeName. It will be used to create objects in C++ or Python. So it must be a valid C++ or Python class name ;
- Add inputs and outputs. The convention is that an input is named "i" and output "o". When there are several inputs they are named "ia", "ib" etc ...
- For a sink you can only add an input. So the function addOutput is not available.
- The constructor is taking a length and a type. It is used to create the io
- When there are several inputs or outputs, they are ordered using alphabetical order.
It is important to know what is the ID of the corresponding IO in the C code.
The definition of a new kind of Source is very similar:
```python
class Source(GenericSource):
def __init__(self,name,theType,inLength):
GenericSource.__init__(self,name)
self.addOutput("o",theType,inLength)
@property
def typeName(self):
return "Source"
```
Then for the processing node, we could define it directly. But, often there will be several Nodes in a graph, so it is useful to create a new Node blocks and inherit from it.
```python
class Node(GenericNode):
def __init__(self,name,theType,inLength,outLength):
GenericNode.__init__(self,name)
self.addInput("i",theType,inLength)
self.addOutput("o",theType,outLength)
```
Note that this new kind of block has no type. It just has an input and an output.
Now we can define the Processing node:
```python
class ProcessingNode(Node):
@property
def typeName(self):
return "ProcessingNode"
```
We just define its type.
Once it is done, we can start creating instance of those nodes. We will also need to define the type for the samples (float32 in this example). The functions and constants are defined in `cg.types`.
```python
floatType=CType(F32)
```
It is also possible to use a custom datatype, the `example8` is giving an example:
```python
complexType=CStructType("complex","MyComplex",8)
```
This is defining a new datatype that is mapped to the type `complex` in C/C++ and the class `MyComplex` in Python. The last argument is the size in bytes of the struct in C.
The type complex may be defined with:
```c
typedef struct {
float re;
float im;
} complex;
```
**Note that:**
- The value **must have** value semantic in C/C++. So avoid classes
- In Python, the classes have reference semantic which implies some constraints:
- You should never modify an object from the read buffer
- You should change the field of an object in the write buffer
- If you need a new object : copy or create a new object. Never use an object from the read buffer as it is if you intend to customize it
Once a datatype has been defined and chosen, we can define the nodes for the graph:
```python
src=Source("source",floatType,5)
b=ProcessingNode("filter",floatType,7,5)
sink=Sink("sink",floatType,5)
```
For each node, we define :
- The name (name of variable in C++ or Python generated code)
- The type for the inputs and outputs
- The numbers of samples consumed / produced on the io
- Inputs are listed first for the number of samples
For `ProcessingNode` we are adding additional arguments to show how it is possible to add other arguments for initializing a node in the generated code:
```python
b.addLiteralArg(4)
b.addLiteralArg("Test")
b.addVariableArg("someVariable")
```
The C++ for object of type `ProcessingNode` are taking 3 arguments in addition to the io. For those, arguments we are passing an int, a string and a variable name.
Now that the nodes have been created, we can create the graph and connect the nodes:
```python
g = Graph()
g.connect(src.o,b.i)
g.connect(b.o,sink.i)
```
Then, before we generate a schedule, we can define some configuration:
```python
conf=Configuration()
conf.debugLimit=1
```
Since it is streamed based processing, the schedule should run forever. For testing, we can limit the number of iterations. Here the generated code will run just one iteration of the schedule.
This configuration object can be used as argument of the scheduling function (named parameter config) and must be used as argument of the code generating functions.
There are other fields for the configuration:
- `dumpFIFO` : Will dump the output FIFOs content after each execution of the node (the code generator is inserting calls to the FIFO dump function)
- `displayFIFOSizes` : During the computation of the schedule, the Python script is displaying the evolution of the FIFO lengths.
- `schedName` : The name of the scheduler function (`scheduler` by default)
- `cOptionalArgs` and pyOptionalArgs for passing additional arguments to the scheduling function
- `prefix` to prefix the same of the global buffers
- `memoryOptimization` : Experimental. It is attempting to reuse buffer memory and share it between several FIFOs
- `codeArray` : Experimental. When a schedule is very long, representing it as a sequence of function calls is not good for the code size of the generated solution. When this option is enabled, the schedule is described with an array. It implies that the pure function calls cannot be inlined any more and are replaced by new nodes which are automatically generated.
- `eventRecorder` : Enable the support for the CMSIS Event Recorder.
In the example 1, we are passing a variable to initialize the node of type ProcessingNode. So, it would be great if this variable was an argument of the scheduler function. So we define:
```python
conf.cOptionalArgs="int someVariable"
```
This will be added after the error argument of the scheduling function.
Once we have a configuration object, we can start to compute the schedule and generate the code:
```python
sched = g.computeSchedule()
print("Schedule length = %d" % sched.scheduleLength)
print("Memory usage %d bytes" % sched.memory)
```
A schedule is computed. We also display:
- The length of the schedule
- The total amount of memory used by all the FIFOs
We could also have used:
```python
sched = g.computeSchedule(config=conf)
```
to use the configuration object if we needed to dump the FIFOs lengths.
Now, that we have a schedule, we can generate the graphviz and the C++ code:
```python
with open("test.dot","w") as f:
sched.graphviz(f)
sched.ccode("generated",conf)
```
The C++ code will be generated in the `example1` folder `generated` : sched.cpp
## The C++ code
The C++ code generated in`scheduler.cpp` and `scheduler.h` in `generated` folder is relying on some additional files which must be provided by the developer:
- custom.h : to define some custom initialization or `#define` used by the code
- AppNodes.h to define the new C++ blocks
Let's look at custom.h first:
### custom.h
```c++
#ifndef _CUSTOM_H_
#endif _CUSTOM_H_
```
It is empty in `example1`. This file can be used to include or define some variables and constants used by the network.
### AppNodes.h
All the new nodes defined in the Python script must also be defined in the C++ code. They are very similar to the Python code but a bit more verbose.
```c++
template<typename IN, int inputSize>
class Sink: public GenericSink<IN, inputSize>
{
public:
Sink(FIFOBase<IN> &src):GenericSink<IN,inputSize>(src){};
int prepareForRunning() override
{
if (this->willUnderflow())
{
return(CG_SKIP_EXECUTION_ID_CODE); // Skip execution
}
return(0);
};
int run() override
{
IN *b=this->getReadBuffer();
printf("Sink\n");
for(int i=0;i<inputSize;i++)
{
std::cout << (int)b[i] << std::endl;
}
return(0);
};
};
```
The `Sink` is inheriting from the `GenericSink`. In the constructor we pass the fifos : input fifos first (output fifos are always following the input fifos when they are used. For a sink, we have no output fifos).
In the template parameters , we pass the type/length for each io : input first then followed by outputs (when there are some outputs).
The node must have a `run` function which is implementing the processing.
The `prepareForRunning` function is used only in dynamic / asynchronous mode. But it must be defined (even if not used) in static / synchronous mode or the code won#t build.
Here the sink is just dumping to stdout the content of the buffer. The amount of data read by `getReadBuffer` is defined in the `GenericSink` and is coming from the template parameter.
The `Source` definition is very similar:
```C++
template<typename OUT,int outputSize>
class Source: GenericSource<OUT,outputSize>
{
public:
Source(FIFOBase<OUT> &dst):GenericSource<OUT,outputSize>(dst),mCounter(0){};
int prepareForRunning() override
{
if (this->willOverflow())
{
return(CG_SKIP_EXECUTION_ID_CODE); // Skip execution
}
return(0);
};
int run() override
{
OUT *b=this->getWriteBuffer();
printf("Source\n");
for(int i=0;i<outputSize;i++)
{
b[i] = (OUT)mCounter++;
}
return(0);
};
int mCounter;
};
```
In this example, the source is just counting. And we only have output fifos.
`getWriteBuffer` and `getReadBuffer` must always be called on the io ports to ensure that
the FIFOs are not overflowing or underflowing (**even if the run function is doing nothing**).
No error detection is done because the static schedule is ensuring that no error will occur if you don't forget to call the functions in your nodes.
Finally, the processing node:
```C++
template<typename IN, int inputSize,typename OUT,int outputSize>
class ProcessingNode: public GenericNode<IN,inputSize,OUT,outputSize>
{
public:
ProcessingNode(FIFOBase<IN> &src,FIFOBase<OUT> &dst,int,const char*,int):GenericNode<IN,inputSize,OUT,outputSize>(src,dst){};
int prepareForRunning() override
{
if (this->willOverflow() ||
this->willUnderflow()
)
{
return(CG_SKIP_EXECUTION_ID_CODE); // Skip execution
}
return(0);
};
int run() override
{
printf("ProcessingNode\n");
IN *a=this->getReadBuffer();
OUT *b=this->getWriteBuffer();
b[0] =(OUT)a[3];
return(0);
};
};
```
The processing node is (very arbitrary) copying the value at index 3 to index 0 of the output.
The processing node is taking 3 arguments after the FIFOs in the constructor because the Python script is defining 3 additional arguments for this node : `int`, `string` and another `int` but passed trough a variable in the scheduler.
### scheduler.cpp
The generated code is first including the needed headers:
```C++
#include "arm_math.h"
#include "custom.h"
#include "GenericNodes.h"
#include "AppNodes.h"
#include "scheduler.h"
```
- CMSIS-DSP header
- Custom definitions
- Generic nodes from `GenericNodes.h`
- Application nodes
- scheduler API
Then, the generated code is defining the buffers for the FIFOs:
```C++
/***********
FIFO buffers
************/
#define FIFOSIZE0 11
float32_t buf0[FIFOSIZE0]={0};
#define FIFOSIZE1 5
float32_t buf1[FIFOSIZE1]={0};
```
Then, the scheduling function is generated:
```C++
uint32_t scheduler(int *error,int someVariable) {
```
A value `<0` in `error` means there was an error during the execution.
The returned valued is the number of schedules fully executed when the error occurred.
The `someVariable` is defined in the Python script. The Python script can add as many arguments as needed with whatever type is needed.
The scheduling function is starting with a definition of some variables used for debug and statistics:
```C++
int cgStaticError=0;
uint32_t nbSchedule=0;
int32_t debugCounter=1;
```
Then, it is followed with a definition of the FIFOs:
```C++
/*
Create FIFOs objects
*/
FIFO<float32_t,FIFOSIZE0> fifo0(buf0);
FIFO<float32_t,FIFOSIZE1> fifo1(buf1);
```
Then, the nodes are created and connected to the FIFOs:
```C++
/*
Create node objects
*/
ProcessingNode<float32_t,7,float32_t,5> filter(fifo0,fifo1,4,"Test",someVariable);
Sink<float32_t,5> sink(fifo1);
Source<float32_t,5> source(fifo0);
```
One can see that the processing nodes has 3 additional arguments in addition to the FIFOs. Those arguments are defined in the Python script. The third argument is `someVariable` and this variable must be in the scope. That's why the Python script is adding an argument `someVariable` to the scheduler API. So, one can pass information to nay node from the outside of the scheduler using those additional arguments.
And finally, the function is entering the scheduling loop:
```C++
while((cgStaticError==0) && (debugCounter > 0))
{
nbSchedule++;
cgStaticError = source.run();
CHECKERROR;
```
`CHECKERROR` is a macro defined in `Sched.h`. It is just testing if `cgStaticError< 0` and breaking out of the loop if it is the case. This can be redefined by the user.
Since an application may want to use several SDF graphs, the name of the `sched` and `customInit` functions can be customized in the `configuration` object on the Python side:
```python
config.schedName = "sched"
```
A prefix can also be added before the name of the global FIFO buffers:
```python
config.prefix="bufferPrefix"
```
## Summary
It looks complex because there is a lot of information but the process is always the same:
1. You define new kind of nodes in the Python. They define the IOs, type and amount of data read/written on each IO
2. You create Python instance of those new kind of Nodes
3. You connect them in a graph and generate a schedule
4. In you AppNodes.h, you implement the new kind of nodes with a C++ template:
1. The template is generally defining the IO and the function to call when run
1. It should be minimal. The template is just a wrapper. Don't forget those nodes are created on the stack in the scheduler function. So they should not be too big. They should just be simple wrappers
5. If you need more control on the initialization, it is possible to pass additional arguments to the nodes constructors and to the scheduler function.

@ -1,27 +0,0 @@
# Example 10
This example is implementing a dynamic / asynchronous mode.
It is enabled in `graph.py` with:
`conf.asynchronous = True`
The FIFO sizes are doubled with:
`conf.FIFOIncrease = 100`
The graph implemented in this example is:
![graph10](graph10.png)
There is a global iteration count corresponding to one execution of the schedule.
The odd source is generating a value only when the count is odd.
The even source is generating a value only when the count is even.
The processing is adding its inputs. If no data is available on an input, 0 is used.
In case of fifo overflow or underflow, any node will slip its execution.
All nodes are generating or consuming one sample but the FIFOs have a size of 2 because of the 100% increase requested in the configuration settings.

@ -1,127 +0,0 @@
# Example 3
This example is implementing a working example with FFT. The graph is:
![graph3](graph3.PNG)
The example is:
- Providing a file source which is reading a source file and then padding with zero
- A sliding window
- A multiplication with a Hann window
- A conversion to/from complex
- Use of CMSIS-DSP FFT/IFFT
- Overlap and add
- File sink writing the result into a file
The new feature s compared to previous examples are:
- The constant array HANN
- The CMSIS-DSP FFT
## Constant array
It is like in example 2 where the constant was a float.
Now, the constant is an array:
```python
hann=Constant("HANN")
```
In custom.h, this array is defined as:
```C++
extern const float32_t HANN[256];
```
## CMSIS-DSP FFT
The FFT node cannot be created using a `Dsp` node in Python because FFT is requiring specific initializations. So, a Python class and C++ class must be created :
```python
class CFFT(GenericNode):
def __init__(self,name,inLength):
GenericNode.__init__(self,name)
self.addInput("i",floatType,2*inLength)
self.addOutput("o",floatType,2*inLength)
@property
def typeName(self):
return "CFFT"
```
Look at the definition of the inputs and outputs : The FFT is using complex number so the ports have twice the number of float samples. The argument of the constructor is the FFT length in complex sample.
We suggest to use as arguments of the blocks a number of samples which is meaningful for the blocks and use the lengths in standard data type (f32, q31 ...) when defining the IO.
So here, the number of complex samples is used as arguments. But the IO are using the number of floats required to encode those complex numbers.
The corresponding C++ class is:
```C++
template<typename IN, int inputSize,typename OUT,int outputSize>
class CFFT: public GenericNode<IN,inputSize,OUT,outputSize>
{
public:
CFFT(FIFOBase<IN> &src,FIFOBase<OUT> &dst):
GenericNode<IN,inputSize,OUT,outputSize>(src,dst){
arm_status status;
status=arm_cfft_init_f32(&sfft,inputSize>>1);
};
int prepareForRunning() override
{
if (this->willOverflow() ||
this->willUnderflow()
)
{
return(CG_SKIP_EXECUTION_ID_CODE); // Skip execution
}
return(0);
};
int run() override {
IN *a=this->getReadBuffer();
OUT *b=this->getWriteBuffer();
memcpy((void*)b,(void*)a,outputSize*sizeof(IN));
arm_cfft_f32(&sfft,b,0,1);
return(0);
};
arm_cfft_instance_f32 sfft;
};
```
It is verbose but not difficult. The constructor is initializing the CMSIS-DSP FFT instance and connecting to the FIFO (through GenericNode).
The run function is applying the `arm_cfft_f32`. Since this function is modifying the input buffer, there is a `memcpy`. It is not really needed here. The read buffer can be modified by the CFFT. It will just make it more difficult to debug if you'd like to inspect the content of the FIFOs.
This node is provided in `cg/nodes/cpp` so no need to define it. You can just use it by including the right headers.
It can be used by just doing in your `AppNodes.h` file :
```c++
#include "CFFT.h"
```
From Python side it would be:
```python
from cmsisdsp.cg.scheduler import *
```
The scheduler module is automatically including the default nodes.

Binary file not shown.

After

Width:  |  Height:  |  Size: 5.8 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 16 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 7.9 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 5.9 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 6.0 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 5.9 KiB

@ -5,22 +5,22 @@ set(Python_FIND_REGISTRY "LAST")
find_package (Python COMPONENTS Interpreter)
function(sdf TARGET)
function(sdf TARGET SCRIPT DOTNAME)
if (DOT)
add_custom_command(TARGET ${TARGET} PRE_BUILD
BYPRODUCTS ${CMAKE_CURRENT_SOURCE_DIR}/test.pdf
COMMAND ${DOT} -Tpdf -o ${CMAKE_CURRENT_SOURCE_DIR}/test.pdf ${CMAKE_CURRENT_SOURCE_DIR}/test.dot
BYPRODUCTS ${CMAKE_CURRENT_SOURCE_DIR}/${DOTNAME}.pdf
COMMAND ${DOT} -Tpdf -o ${CMAKE_CURRENT_SOURCE_DIR}/${DOTNAME}.pdf ${CMAKE_CURRENT_SOURCE_DIR}/${DOTNAME}.dot
WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}
DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/test.dot
DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/${DOTNAME}.dot
VERBATIM
)
endif()
add_custom_command(OUTPUT ${CMAKE_CURRENT_SOURCE_DIR}/generated/scheduler.cpp
${CMAKE_CURRENT_SOURCE_DIR}/test.dot
COMMAND ${Python_EXECUTABLE} ${CMAKE_CURRENT_SOURCE_DIR}/graph.py
${CMAKE_CURRENT_SOURCE_DIR}/${DOTNAME}.dot
COMMAND ${Python_EXECUTABLE} ${CMAKE_CURRENT_SOURCE_DIR}/${SCRIPT}
WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}
DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/graph.py
DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/${SCRIPT}
VERBATIM
)
target_sources(${TARGET} PRIVATE ${CMAKE_CURRENT_SOURCE_DIR}/generated/scheduler.cpp)
@ -73,6 +73,9 @@ add_subdirectory(example6 bin_example6)
add_subdirectory(example8 bin_example8)
add_subdirectory(example9 bin_example9)
add_subdirectory(example10 bin_example10)
add_subdirectory(simple bin_simple)
add_subdirectory(simpledsp bin_simpledsp)
add_subdirectory(cyclo bin_cyclo)
# Python examples
add_subdirectory(example4 bin_example4)

@ -0,0 +1,64 @@
## How to build the examples
First, you must install the `CMSIS-DSP` PythonWrapper:
```
pip install cmsisdsp
```
The functions and classes inside the cmsisdsp wrapper can be used to describe and generate the schedule.
You need a recent Graphviz dot tool supporting the HTML-like labels. You'll need `cmake` and `make`
In folder `ComputeGraph/example/build`, type the `cmake` command:
```bash
cmake -DHOST=YES \
-DDOT="path to dot.EXE" \
-DCMSISCORE="path to cmsis core include directory" \
-G "Unix Makefiles" ..
```
The core include directory is something like `CMSIS_5/Core` ...
If cmake is successful, you can type `make` to build the examples. It will also build CMSIS-DSP for the host.
If you don't have graphviz, the option -DDOT can be removed.
If for some reason it does not work, you can go into an example folder (for instance example1), and type the commands:
```bash
python graph.py
dot -Tpdf -o test.pdf test.dot
```
It will generate the C++ files for the schedule and a pdf representation of the graph.
Note that the Python code is relying on the CMSIS-DSP PythonWrapper which is now also containing the Python scripts for the Synchronous Data Flow.
For `example3` which is using an input file, `cmake` should have copied the input test pattern `input_example3.txt` inside the build folder. The output file will also be generated in the build folder.
`example4` is like `example3` but in pure Python and using the CMSIS-DSP Python wrapper (which must already be installed before trying the example). To run a Python example, you need to go into an example folder and type:
```bash
python main.py
```
`example7` is communicating with `OpenModelica`. You need to install the VHTModelica blocks from the [AVH-SystemModeling](https://github.com/ARM-software/VHT-SystemModeling) project on our GitHub
# List of examples
* [Simple example without CMSIS-DSP](simple/README.md) : **How to get started**
* [Simple example with CMSIS-DSP](simpledsp/README.md) : **How to get started with CMSIS-DSP**
* [Example 1](example1/README.md) : Same as the simple example but explaining how to add arguments to the scheduler API and node constructors. This example is also giving a **detailed explanation of the C++ code** generated for the scheduler
* [Example 2](example2/README.md) : Explain how to use CMSIS-DSP pure functions (no state) and add delay on the arcs of the graph. Explain some configuration options for the schedule generation.
* [Example 3](example3/README.md) : A full signal processing example with CMSIS-DSP using FFT and sliding windows and overlap and add node
* [Example 4](example4/README.md) : Same as example 3 but where we generate a Python implementation rather than a C++ implementation. The resulting graph can be executed thanks to the CMSIS-DSP Python wrapper
* [Example 5](example5/README.md) : Another pure Python example showing how to compute a sequence of Q15 MFCC and generate an animation (using also the CMSIS-DSP Python wrapper)
* [Example 6](example6/README.md) : Same as example 5 but with C++ code generation
* [Example 7](example7/README.md) : Pure Python example demonstrating a communication between the compute graph and OpenModelica to generate a Larsen effect
* [Example 8](example8/README.md) : Introduce structured datatype for the samples and implicit `Duplicate` nodes for the graph
* [Example 9](example9/README.md) : Check that duplicate nodes and arc delays are working together and a scheduling is generated
* [Example 10 : The dynamic dataflow mode](example10/README.md)
* [Cyclo-static scheduling](cyclo/README.md)

@ -1,36 +0,0 @@
# Reference statistics
The different examples should return following schedule statistics:
## Example 1
Schedule length = 17
Memory usage 64 bytes
## Example 2
Schedule length = 302
Memory usage 10720 bytes
## Example 3
Schedule length = 25
Memory usage 11264 bytes
## Example 4
Schedule length = 25
Memory usage 11264 bytes
## Example 5
Schedule length = 292
Memory usage 6614 bytes
## Example 6
Schedule length = 17
Memory usage 2204 bytes
## Example 7
Schedule length = 3
Memory usage 512 bytes
## Example 8
Schedule length = 37
Memory usage 288 bytes

@ -0,0 +1,155 @@
/* ----------------------------------------------------------------------
* Project: CMSIS DSP Library
* Title: AppNodes.h
* Description: Application nodes for Example cyclo
*
* Target Processor: Cortex-M and Cortex-A cores
* -------------------------------------------------------------------- */
/*
* Copyright (C) 2021-2023 ARM Limited or its affiliates. All rights reserved.
*
* SPDX-License-Identifier: Apache-2.0
*
* Licensed under the Apache License, Version 2.0 (the License); you may
* not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an AS IS BASIS, WITHOUT
* WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#ifndef _APPNODES_H_
#define _APPNODES_H_
#include <iostream>
template<typename IN, int inputSize>
class Sink: public GenericSink<IN, inputSize>
{
public:
Sink(FIFOBase<IN> &src):GenericSink<IN,inputSize>(src){};
int prepareForRunning() final
{
if (this->willUnderflow())
{
return(CG_SKIP_EXECUTION_ID_CODE); // Skip execution
}
return(0);
};
int run() final
{
IN *b=this->getReadBuffer();
printf("Sink\n");
for(int i=0;i<inputSize;i++)
{
std::cout << (int)b[i] << std::endl;
}
return(0);
};
};
template<typename OUT,int outputSize>
class Source: public GenericSource<OUT,outputSize>
{
public:
Source(FIFOBase<OUT> &dst):GenericSource<OUT,outputSize>(dst),
mPeriod(0),mValuePeriodStart(0){};
int getSamplesForPeriod() const
{
if (mPeriod == 0)
{
return(3);
}
return(2);
}
void updatePeriod(){
mPeriod++;
mValuePeriodStart = 3;
if (mPeriod == 2)
{
mPeriod = 0;
mValuePeriodStart = 0;
}
}
int prepareForRunning() final
{
/* Cyclo static scheduling do not make sense in
asynchronous mode so the default outputSize is used.
This function is never used in cyclo-static scheduling
*/
if (this->willOverflow())
{
return(CG_SKIP_EXECUTION_ID_CODE); // Skip execution
}
return(0);
};
int run() final{
OUT *b=this->getWriteBuffer(getSamplesForPeriod());
printf("Source\n");
for(int i=0;i<getSamplesForPeriod();i++)
{
b[i] = mValuePeriodStart + (OUT)i;
}
updatePeriod();
return(0);
};
protected:
int mPeriod;
OUT mValuePeriodStart;
};
template<typename IN, int inputSize,typename OUT,int outputSize>
class ProcessingNode;
template<typename IN, int inputOutputSize>
class ProcessingNode<IN,inputOutputSize,IN,inputOutputSize>:
public GenericNode<IN,inputOutputSize,IN,inputOutputSize>
{
public:
ProcessingNode(FIFOBase<IN> &src,
FIFOBase<IN> &dst):GenericNode<IN,inputOutputSize,
IN,inputOutputSize>(src,dst){};
int prepareForRunning() final
{
if (this->willOverflow() ||
this->willUnderflow())
{
return(CG_SKIP_EXECUTION_ID_CODE); // Skip execution
}
return(0);
};
int run() final{
printf("ProcessingNode\n");
IN *a=this->getReadBuffer();
IN *b=this->getWriteBuffer();
for(int i=0;i<inputOutputSize;i++)
{
b[i] = a[i]+1;
}
return(0);
};
};
#endif

@ -0,0 +1,13 @@
cmake_minimum_required (VERSION 3.14)
include(CMakePrintHelpers)
project(cyclo)
add_executable(cyclo main.cpp)
sdf(cyclo create.py cyclo)
add_sdf_dir(cyclo)
target_include_directories(cyclo PRIVATE ${CMAKE_CURRENT_SOURCE_DIR})
target_include_directories(cyclo PRIVATE ${CMAKE_CURRENT_SOURCE_DIR}/generated)

@ -0,0 +1,18 @@
# Makefile for MSVC compiler on Windows
SHELL = cmd
CC = cl.exe
RM = del /Q /F
INCLUDES = /Igenerated /I../../cg/src /I.
WINFLAGS = /DWIN32 /D_WINDOWS /EHsc /Zi /Ob0 /Od /RTC1 -MDd
CFLAGS = $(INCLUDES) $(WINFLAGS)
all:
$(CC) /Fecyclo.exe $(CFLAGS) generated/scheduler.cpp main.cpp
clean:
$(RM) main.obj
$(RM) scheduler.obj
$(RM) cyclo.ilk
$(RM) cyclo.exe
$(RM) *.pdb

@ -0,0 +1,143 @@
# README
This example is inside the folder `examples/cyclo` of the Compute graph folder. Before reading this documentation you need to understand the principles explained in the [simple example without CMSIS-DSP](../simple/README.md)
![cyclo](docassets/cyclo.png)
The nodes are:
* A source generating floating point values (0,1,2,3,4).
* A processing node adding 1 to those values
* A sink printing its input values (1,2,3,4,5)
The graph generates an infinite streams of values : 1,2,3,4,5,1,2,3,4,5,1,2,3,4,5 ... For this example, the number of iterations will be limited so that it does not run forever.
The big difference compared to the [simple example without CMSIS-DSP](../simple/README.md) is the source node:
* The source node is no more generating samples per packet of 5
* The first call to the source node will generate 3 samples
* The second call to the source node will generate 2 samples
* Other execution will just reproduce this schedule : 3,2,3,2 ...
The flow is not static, but it is periodically static : **cyclo-static scheduling**.
## C++ Implementation
The C++ wrapper must take into account this periodic schedule of sample generation.
First call should generate only 3 samples and second call generate 2.
We want the first call to generate `0,1,2` and the second call to generate `3,4`.
The C++ wrapper has been modified for this. Here is the body of the `run` function:
```C++
OUT *b=this->getWriteBuffer(getSamplesForPeriod());
printf("Source\n");
for(int i=0;i<getSamplesForPeriod();i++)
{
b[i] = mValuePeriodStart + (OUT)i;
}
updatePeriod();
```
The `run` function is generating only the number of samples required in a given period.
The value generated is using `mValuePeriodStart`.
The template for `Source` has not changed and is :
```C++
template<typename OUT,int outputSize>
class Source: public GenericSource<OUT,outputSize>
```
`outputSize` cannot be the list `[3,2]`.
The generated code is using the max of the values, so here `3`:
```C++
Source<float32_t,3> source(fifo0);
```
## Expected output:
```
Schedule length = 26
Memory usage 88 bytes
```
The schedule length is `26` compared to `19` for the simple example where source is generating samples by packet of 5. The source node executions must be a multiple of 2 in this graph because the period of sample generation has length 2. In the original graph, the number of executions could be an odd number. That's why there are more executions in this cyclo-static scheduling.
The memory usage (FIFO) is the same as the one for the simple example without cyclo-static scheduling.
The expected output of the execution is still 1,2,3,4,5,1,2,3,4,5 ... but the scheduling is different. There are more source executions.
```
Start
Source
Source
Source
ProcessingNode
Sink
1
2
3
4
5
Source
Source
Source
ProcessingNode
Sink
1
2
3
4
5
Source
Source
Source
ProcessingNode
Sink
1
2
3
4
5
Sink
1
2
3
4
5
Source
Source
ProcessingNode
Sink
1
2
3
4
5
Source
Source
Source
ProcessingNode
Sink
1
2
3
4
5
Sink
1
2
3
4
5
```

@ -0,0 +1,31 @@
# Include definition of the nodes
from nodes import *
# Include definition of the graph
from graph import *
# Create a configuration object
conf=Configuration()
# The number of schedule iteration is limited to 1
# to prevent the scheduling from running forever
# (which should be the case for a stream computation)
conf.debugLimit=1
# Disable inclusion of CMSIS-DSP headers so that we don't have
# to recompile CMSIS-DSP for such a simple example
conf.CMSISDSP = False
# Compute a static scheduling of the graph
# The size of FIFO is also computed
scheduling = the_graph.computeSchedule(config=conf)
# Print some statistics about the compute schedule
# and the memory usage
print("Schedule length = %d" % scheduling.scheduleLength)
print("Memory usage %d bytes" % scheduling.memory)
# Generate the C++ code for the static scheduler
scheduling.ccode("generated",conf)
# Generate a graphviz representation of the graph
with open("cyclo.dot","w") as f:
scheduling.graphviz(f)

@ -0,0 +1,5 @@
#ifndef _CUSTOM_H_
typedef float float32_t;
#endif

@ -0,0 +1,48 @@
digraph structs {
node [shape=plaintext]
rankdir=LR
edge [arrowsize=0.5]
fontname="times"
processing [label=<
<TABLE BORDER="0" CELLBORDER="1" CELLSPACING="0" CELLPADDING="4">
<TR>
<TD ALIGN="CENTER" PORT="i">processing<BR/>(ProcessingNode)</TD>
</TR>
</TABLE>>];
sink [label=<
<TABLE BORDER="0" CELLBORDER="1" CELLSPACING="0" CELLPADDING="4">
<TR>
<TD ALIGN="CENTER" PORT="i">sink<BR/>(Sink)</TD>
</TR>
</TABLE>>];
source [label=<
<TABLE BORDER="0" CELLBORDER="1" CELLSPACING="0" CELLPADDING="4">
<TR>
<TD ALIGN="CENTER" PORT="i">source<BR/>(Source)</TD>
</TR>
</TABLE>>];
source:i -> processing:i [label="f32(11)"
,headlabel=<<TABLE BORDER="0" CELLPADDING="2"><TR><TD><FONT COLOR="blue" POINT-SIZE="12.0" >7</FONT>
</TD></TR></TABLE>>
,taillabel=<<TABLE BORDER="0" CELLPADDING="2"><TR><TD><FONT COLOR="blue" POINT-SIZE="12.0" >[3, 2]</FONT>
</TD></TR></TABLE>>]
processing:i -> sink:i [label="f32(11)"
,headlabel=<<TABLE BORDER="0" CELLPADDING="2"><TR><TD><FONT COLOR="blue" POINT-SIZE="12.0" >5</FONT>
</TD></TR></TABLE>>
,taillabel=<<TABLE BORDER="0" CELLPADDING="2"><TR><TD><FONT COLOR="blue" POINT-SIZE="12.0" >7</FONT>
</TD></TR></TABLE>>]
}

Binary file not shown.

After

Width:  |  Height:  |  Size: 5.2 KiB

@ -0,0 +1,170 @@
/*
Generated with CMSIS-DSP Compute Graph Scripts.
The generated code is not covered by CMSIS-DSP license.
The support classes and code is covered by CMSIS-DSP license.
*/
#include "custom.h"
#include "GenericNodes.h"
#include "AppNodes.h"
#include "scheduler.h"
#if !defined(CHECKERROR)
#define CHECKERROR if (cgStaticError < 0) \
{\
goto errorHandling;\
}
#endif
#if !defined(CG_BEFORE_ITERATION)
#define CG_BEFORE_ITERATION
#endif
#if !defined(CG_AFTER_ITERATION)
#define CG_AFTER_ITERATION
#endif
#if !defined(CG_BEFORE_SCHEDULE)
#define CG_BEFORE_SCHEDULE
#endif
#if !defined(CG_AFTER_SCHEDULE)
#define CG_AFTER_SCHEDULE
#endif
#if !defined(CG_BEFORE_BUFFER)
#define CG_BEFORE_BUFFER
#endif
#if !defined(CG_BEFORE_FIFO_BUFFERS)
#define CG_BEFORE_FIFO_BUFFERS
#endif
#if !defined(CG_BEFORE_FIFO_INIT)
#define CG_BEFORE_FIFO_INIT
#endif
#if !defined(CG_BEFORE_NODE_INIT)
#define CG_BEFORE_NODE_INIT
#endif
#if !defined(CG_AFTER_INCLUDES)
#define CG_AFTER_INCLUDES
#endif
#if !defined(CG_BEFORE_SCHEDULER_FUNCTION)
#define CG_BEFORE_SCHEDULER_FUNCTION
#endif
#if !defined(CG_BEFORE_NODE_EXECUTION)
#define CG_BEFORE_NODE_EXECUTION
#endif
#if !defined(CG_AFTER_NODE_EXECUTION)
#define CG_AFTER_NODE_EXECUTION
#endif
CG_AFTER_INCLUDES
/*
Description of the scheduling.
*/
static unsigned int schedule[26]=
{
2,2,2,0,1,2,2,2,0,1,2,2,2,0,1,1,2,2,0,1,2,2,2,0,1,1,
};
CG_BEFORE_FIFO_BUFFERS
/***********
FIFO buffers
************/
#define FIFOSIZE0 11
#define FIFOSIZE1 11
#define BUFFERSIZE1 11
CG_BEFORE_BUFFER
float32_t buf1[BUFFERSIZE1]={0};
#define BUFFERSIZE2 11
CG_BEFORE_BUFFER
float32_t buf2[BUFFERSIZE2]={0};
CG_BEFORE_SCHEDULER_FUNCTION
uint32_t scheduler(int *error)
{
int cgStaticError=0;
uint32_t nbSchedule=0;
int32_t debugCounter=1;
CG_BEFORE_FIFO_INIT;
/*
Create FIFOs objects
*/
FIFO<float32_t,FIFOSIZE0,0,0> fifo0(buf1);
FIFO<float32_t,FIFOSIZE1,0,0> fifo1(buf2);
CG_BEFORE_NODE_INIT;
/*
Create node objects
*/
ProcessingNode<float32_t,7,float32_t,7> processing(fifo0,fifo1);
Sink<float32_t,5> sink(fifo1);
Source<float32_t,3> source(fifo0);
/* Run several schedule iterations */
CG_BEFORE_SCHEDULE;
while((cgStaticError==0) && (debugCounter > 0))
{
/* Run a schedule iteration */
CG_BEFORE_ITERATION;
for(unsigned long id=0 ; id < 26; id++)
{
CG_BEFORE_NODE_EXECUTION;
switch(schedule[id])
{
case 0:
{
cgStaticError = processing.run();
}
break;
case 1:
{
cgStaticError = sink.run();
}
break;
case 2:
{
cgStaticError = source.run();
}
break;
default:
break;
}
CG_AFTER_NODE_EXECUTION;
CHECKERROR;
}
debugCounter--;
CG_AFTER_ITERATION;
nbSchedule++;
}
errorHandling:
CG_AFTER_SCHEDULE;
*error=cgStaticError;
return(nbSchedule);
}

@ -0,0 +1,26 @@
/*
Generated with CMSIS-DSP Compute Graph Scripts.
The generated code is not covered by CMSIS-DSP license.
The support classes and code is covered by CMSIS-DSP license.
*/
#ifndef _SCHEDULER_H_
#define _SCHEDULER_H_
#ifdef __cplusplus
extern "C"
{
#endif
extern uint32_t scheduler(int *error);
#ifdef __cplusplus
}
#endif
#endif

@ -0,0 +1,39 @@
# Include definitions from the Python package to
# define datatype for the IOs and to have access to the
# Graph class
from cmsisdsp.cg.scheduler import *
# Include definition of the nodes
from nodes import *
# Define the datatype we are using for all the IOs in this
# example
floatType=CType(F32)
# Instantiate a Source node with a float datatype and
# working with packet of 5 samples (each execution of the
# source in the C code will generate 5 samples)
# "source" is the name of the C variable that will identify
# this node
src=Source("source",floatType,[3,2])
# Instantiate a Processing node using a float data type for
# both the input and output. The number of samples consumed
# on the input and produced on the output is 7 each time
# the node is executed in the C code
# "processing" is the name of the C variable that will identify
# this node
processing=ProcessingNode("processing",floatType,7,7)
# Instantiate a Sink node with a float datatype and consuming
# 5 samples each time the node is executed in the C code
# "sink" is the name of the C variable that will identify
# this node
sink=Sink("sink",floatType,5)
# Create a Graph object
the_graph = Graph()
# Connect the source to the processing node
the_graph.connect(src.o,processing.i)
# Connect the processing node to the sink
the_graph.connect(processing.o,sink.i)

@ -0,0 +1,11 @@
#include <cstdio>
#include <cstdint>
#include "scheduler.h"
int main(int argc, char const *argv[])
{
int error;
printf("Start\n");
uint32_t nbSched=scheduler(&error);
return 0;
}

@ -0,0 +1,77 @@
# Include definitions from the Python package
from cmsisdsp.cg.scheduler import GenericNode,GenericSink,GenericSource
### Define new types of Nodes
class ProcessingNode(GenericNode):
"""
Definition of a ProcessingNode for the graph
Parameters
----------
name : str
Name of the C variable identifying this node
in the C code
theType : CGStaticType
The datatype for the input and output
inLength : int
The number of samples consumed by input
outLength : int
The number of samples produced on output
"""
def __init__(self,name,theType,inLength,outLength):
GenericNode.__init__(self,name)
self.addInput("i",theType,inLength)
self.addOutput("o",theType,outLength)
@property
def typeName(self):
"""The name of the C++ class implementing this node"""
return "ProcessingNode"
class Sink(GenericSink):
"""
Definition of a Sink node for the graph
Parameters
----------
name : str
Name of the C variable identifying this node
in the C code
theType : CGStaticType
The datatype for the input
inLength : int
The number of samples consumed by input
"""
def __init__(self,name,theType,inLength):
GenericSink.__init__(self,name)
self.addInput("i",theType,inLength)
@property
def typeName(self):
"""The name of the C++ class implementing this node"""
return "Sink"
class Source(GenericSource):
"""
Definition of a Source node for the graph
Parameters
----------
name : str
Name of the C variable identifying this node
in the C code
theType : CGStaticType
The datatype for the output
outLength : int
The number of samples produced on output
"""
def __init__(self,name,theType,outLength):
GenericSource.__init__(self,name)
self.addOutput("o",theType,outLength)
@property
def typeName(self):
"""The name of the C++ class implementing this node"""
return "Source"

@ -3,13 +3,10 @@
* Title: AppNodes.h
* Description: Application nodes for Example 1
*
* $Date: 29 July 2021
* $Revision: V1.10.0
*
* Target Processor: Cortex-M and Cortex-A cores
* -------------------------------------------------------------------- */
/*
* Copyright (C) 2010-2021 ARM Limited or its affiliates. All rights reserved.
* Copyright (C) 2021-2023 ARM Limited or its affiliates. All rights reserved.
*
* SPDX-License-Identifier: Apache-2.0
*

@ -6,7 +6,7 @@ project(Example1)
add_executable(example1 main.cpp)
sdf(example1)
sdf(example1 graph.py test)
add_sdf_dir(example1)
target_include_directories(example1 PRIVATE ${CMAKE_CURRENT_SOURCE_DIR})

@ -0,0 +1,320 @@
# Example 1
Please refer to the [simple example](../simple/README.md) to have an overview of how to define a graph and it nodes and how to generate the C++ code for the static scheduler. This document is only explaining additional details:
* How to define new arguments for the C implementation of the nodes
* How to define new arguments for the C API of the scheduler function
* Detailed description of the generated C++ scheduler
The graph is is nearly the same as the one in the [simple example](../simple/README.md) but the processing node is just generating 5 samples in this example:
<img src="docassets/graph1.PNG" alt="graph1" style="zoom:100%;" />
Contrary to the [simple example](../simple/README.md) , there is only one Python script `graph.py` and it is containing everything : nodes, graph description and C++ code generation.
## Defining new arguments for a node and the scheduler
For `ProcessingNode`, we are adding additional arguments in this example to show how it is possible to do it for initializing a node in the generated code.
If `processing` is the node, we can add arguments with the APIs `addLiteralArg` and `addVariableArg`.
```python
processing.addLiteralArg(4,"testString")
processing.addVariableArg("someVariable")
```
* `addLiteralArg(4,"testString")` will pass the value `4` as first additional argument of the C++ constructor (after the FIFOs) and the string `"testString"` as second additional argument of the C++ constructor (after the FIFOs)
* `addVariableArg("someVariable")` will pass the variable `someVariable` as third additional argument of the C++ constructor (after the FIFOs)
The constructor API will look like:
```C++
ProcessingNode(FIFOBase<IN> &src,FIFOBase<OUT> &dst,int,const char*,int)
```
This API is defined in `AppNodes.h` by the developper. The types are not generated by the scripts. Here the variable `someVariable` is chosen to have type `int` hence the last argument of the constructor has type `int`. But it is not imposed by the Python script that is just declaring the existence of a variable.
In the generated scheduler, the constructor is used as:
```C++
ProcessingNode<float32_t,7,float32_t,5> processing(fifo0,fifo1,4,"testString",someVariable);
```
This variable `someVariable` must come from somewhere. The API of the scheduler is:
```C++
extern uint32_t scheduler(int *error,int someVariable);
```
This new argument to the scheduler is defined in the Python script:
```python
conf.cOptionalArgs=["int someVariable"]
```
## The C++ code
The C++ code generated in`scheduler.cpp` and `scheduler.h` in `generated` folder
### scheduler.cpp
#### Included headers
The generated code is first including the needed headers:
```C++
#include "arm_math.h"
#include "custom.h"
#include "GenericNodes.h"
#include "AppNodes.h"
#include "scheduler.h"
```
- CMSIS-DSP header
- Custom definitions
- Generic nodes from `GenericNodes.h`
- Application nodes
- scheduler API
#### Macros
The generated code is then including some macro definitions that can all be redefined to customize some aspects of the generated scheduler. By default those macros, except `CHECKERROR`, are doing nothing:
* CHECKERROR
* Check for an error after each node executioin. Default action is to branch out of the scheduler loop and return an error
* CG_BEFORE_ITERATION
* Code to execute before each iteration of the scheduler
* CG_AFTER_ITERATION
* Code to executed after each iteration of the scheduler
* CG_BEFORE_SCHEDULE
* Code to execute before starting the scheduler loop
* CG_AFTER_SCHEDULE
* Code to execute after the end of the scheduler loop
* CG_BEFORE_BUFFER
* Code before any buffer definition. Can be used, for instance, to align a buffer or to put this buffer in a specific memory section
* CG_BEFORE_FIFO_BUFFERS
* Code included before the definitions of the globals FIFO buffers
* CG_BEFORE_FIFO_INIT
* Code to execute before the creation of the FIFO C++ objects
* CG_BEFORE_NODE_INIT
* Code to execute before the creation of the node C++ objects
* CG_AFTER_INCLUDES
* Code coming after the include files (useful to add other include files after the default ones)
* CG_BEFORE_SCHEDULER_FUNCTION
* Code defined before the scheduler function
* CG_BEFORE_NODE_EXECUTION
* Code executed before a node execution
* CG_AFTER_NODE_EXECUTION
* Code executed after a node execution and before the error checking
#### Memory buffers and FIFOs
Then, the generated code is defining the buffers for the FIFOs. First the size are defined:
```C++
CG_BEFORE_FIFO_BUFFERS
/***********
FIFO buffers
************/
#define FIFOSIZE0 11
#define FIFOSIZE1 5
```
The FIFOs may have size different from the buffer when a buffer is shared between different FIFOs. So, there are different defines for the buffer sizes:
```C++
#define BUFFERSIZE1 11
CG_BEFORE_BUFFER
float32_t buf1[BUFFERSIZE1]={0};
#define BUFFERSIZE2 5
CG_BEFORE_BUFFER
float32_t buf2[BUFFERSIZE2]={0};
```
In case of buffer sharing, a shared buffer will be defined with `int8_t` type. It is **very important** to align such a buffer by defining `CG_BEFORE_BUFFER` See the [FAQ](../../FAQ.md) for more information about alignment issues.
#### Description of the schedule
```C++
static unsigned int schedule[17]=
{
2,2,0,1,2,0,1,2,2,0,1,2,0,1,2,0,1,
};
```
There are different code generation modes in the compute graph. By default, the schedule is encoded as a list of numbers and a `switch/case` is used to execute the node corresponding to an identification number.
#### Scheduler API
Then, the scheduling function is generated:
```C++
uint32_t scheduler(int *error,int someVariable) {
```
A value `<0` in `error` means there was an error during the execution.
The returned valued is the number of schedules fully executed when the error occurred.
The `someVariable` is defined in the Python script. The Python script can add as many arguments as needed with whatever type is needed.
#### Scheduler locals
The scheduling function is starting with a definition of some variables used for debug and statistics:
```C++
int cgStaticError=0;
uint32_t nbSchedule=0;
int32_t debugCounter=1;
```
Then, it is followed with a definition of the FIFOs:
```C++
CG_BEFORE_FIFO_INIT;
/*
Create FIFOs objects
*/
FIFO<float32_t,FIFOSIZE0,0,0> fifo0(buf1);
FIFO<float32_t,FIFOSIZE1,1,0> fifo1(buf2);
```
The FIFO template has type:
```C++
template<typename T, int length, int isArray=0, int isAsync = 0>
class FIFO;
```
`isArray` is set to `1` when the Python code can deduce that the FIFO is always used as an array. In this case, the memory buffer may be shared with other FIFO depending on the data flow dependencies of the graph.
`isAsync` is set to 1 when the graph is an asynchronous one.
Then, the nodes are created and connected to the FIFOs:
```C++
/*
Create node objects
*/
ProcessingNode<float32_t,7,float32_t,5> processing(fifo0,fifo1,4,"testString",someVariable);
Sink<float32_t,5> sink(fifo1);
Source<float32_t,5> source(fifo0);
```
And finally, the function is entering the scheduling loop:
```C++
/* Run several schedule iterations */
CG_BEFORE_SCHEDULE;
while((cgStaticError==0) && (debugCounter > 0))
{
```
The content of the loop is a `switch / case`:
```C++
CG_BEFORE_NODE_EXECUTION;
switch(schedule[id])
{
case 0:
{
cgStaticError = processing.run();
}
break;
case 1:
{
cgStaticError = sink.run();
}
break;
case 2:
{
cgStaticError = source.run();
}
break;
default:
break;
}
CG_AFTER_NODE_EXECUTION;
CHECKERROR;
```
#### Error handling
In case of error, the code is branching out to the end of the function:
```C++
errorHandling:
CG_AFTER_SCHEDULE;
*error=cgStaticError;
return(nbSchedule);
```
## Expected output
Output of the Python script:
```
Schedule length = 17
Memory usage 64 bytes
```
Output of the execution:
```
Start
Source
Source
ProcessingNode
Sink
3
0
0
0
0
Source
ProcessingNode
Sink
10
0
0
0
0
Source
Source
ProcessingNode
Sink
17
0
0
0
0
Source
ProcessingNode
Sink
24
0
0
0
0
Source
ProcessingNode
Sink
31
0
0
0
0
```
The source is incrementing a counter and generate 0,1,2,3 ...
The processing node is copying the 4th sample of the input to the first sample of the output. So there is a delta of 7 between each new value written to the output.
The sink is displaying the 5 samples at the input.

Binary file not shown.

After

Width:  |  Height:  |  Size: 5.1 KiB

@ -102,8 +102,7 @@ float32_t buf2[BUFFERSIZE2]={0};
CG_BEFORE_SCHEDULER_FUNCTION
uint32_t scheduler(int *error,const char *testString,
int someVariable)
uint32_t scheduler(int *error,int someVariable)
{
int cgStaticError=0;
uint32_t nbSchedule=0;
@ -120,7 +119,7 @@ uint32_t scheduler(int *error,const char *testString,
/*
Create node objects
*/
ProcessingNode<float32_t,7,float32_t,5> filter(fifo0,fifo1,4,testString,someVariable);
ProcessingNode<float32_t,7,float32_t,5> processing(fifo0,fifo1,4,"testString",someVariable);
Sink<float32_t,5> sink(fifo1);
Source<float32_t,5> source(fifo0);
@ -138,7 +137,7 @@ uint32_t scheduler(int *error,const char *testString,
{
case 0:
{
cgStaticError = filter.run();
cgStaticError = processing.run();
}
break;

@ -16,8 +16,7 @@ extern "C"
#endif
extern uint32_t scheduler(int *error,const char *testString,
int someVariable);
extern uint32_t scheduler(int *error,int someVariable);
#ifdef __cplusplus
}

@ -36,42 +36,29 @@ class ProcessingNode(Node):
### Define nodes
floatType=CType(F32)
src=Source("source",floatType,5)
b=ProcessingNode("filter",floatType,7,5)
b.addLiteralArg(4)
b.addVariableArg("testString","someVariable")
processing=ProcessingNode("processing",floatType,7,5)
processing.addLiteralArg(4,"testString")
processing.addVariableArg("someVariable")
sink=Sink("sink",floatType,5)
g = Graph()
g.connect(src.o,b.i)
g.connect(b.o,sink.i)
g.connect(src.o,processing.i)
g.connect(processing.o,sink.i)
print("Generate graphviz and code")
conf=Configuration()
conf.debugLimit=1
conf.cOptionalArgs=["const char *testString"
,"int someVariable"
conf.cOptionalArgs=["int someVariable"
]
#conf.displayFIFOSizes=True
# Prefix for global FIFO buffers
#conf.prefix="sched1"
#conf.dumpSchedule = True
sched = g.computeSchedule(config=conf)
#print(sched.schedule)
print("Schedule length = %d" % sched.scheduleLength)
print("Memory usage %d bytes" % sched.memory)
#
#conf.postCustomCName = "post.h"
#conf.CAPI = True
#conf.prefix="global"
#conf.dumpFIFO = True
#conf.CMSISDSP = False
#conf.switchCase = False
sched.ccode("generated",conf)
with open("test.dot","w") as f:

@ -6,6 +6,6 @@ int main(int argc, char const *argv[])
{
int error;
printf("Start\n");
uint32_t nbSched=scheduler(&error,"Test",1);
uint32_t nbSched=scheduler(&error,1);
return 0;
}

@ -9,10 +9,10 @@ digraph structs {
fontname="times"
filter [label=<
processing [label=<
<TABLE BORDER="0" CELLBORDER="1" CELLSPACING="0" CELLPADDING="4">
<TR>
<TD ALIGN="CENTER" PORT="i">filter<BR/>(ProcessingNode)</TD>
<TD ALIGN="CENTER" PORT="i">processing<BR/>(ProcessingNode)</TD>
</TR>
</TABLE>>];
@ -32,13 +32,13 @@ source [label=<
source:i -> filter:i [label="f32(11)"
source:i -> processing:i [label="f32(11)"
,headlabel=<<TABLE BORDER="0" CELLPADDING="2"><TR><TD><FONT COLOR="blue" POINT-SIZE="12.0" >7</FONT>
</TD></TR></TABLE>>
,taillabel=<<TABLE BORDER="0" CELLPADDING="2"><TR><TD><FONT COLOR="blue" POINT-SIZE="12.0" >5</FONT>
</TD></TR></TABLE>>]
filter:i -> sink:i [label="f32(5)"
processing:i -> sink:i [label="f32(5)"
,headlabel=<<TABLE BORDER="0" CELLPADDING="2"><TR><TD><FONT COLOR="blue" POINT-SIZE="12.0" >5</FONT>
</TD></TR></TABLE>>
,taillabel=<<TABLE BORDER="0" CELLPADDING="2"><TR><TD><FONT COLOR="blue" POINT-SIZE="12.0" >5</FONT>

@ -1,15 +1,12 @@
/* ----------------------------------------------------------------------
* Project: CMSIS DSP Library
* Title: AppNodes.h
* Description: Application nodes for Example 1
*
* $Date: 29 July 2021
* $Revision: V1.10.0
* Description: Application nodes for Example 10
*
* Target Processor: Cortex-M and Cortex-A cores
* -------------------------------------------------------------------- */
/*
* Copyright (C) 2010-2021 ARM Limited or its affiliates. All rights reserved.
* --------------------------------------------------------------------
*
* Copyright (C) 2021-2023 ARM Limited or its affiliates. All rights reserved.
*
* SPDX-License-Identifier: Apache-2.0
*

@ -6,7 +6,7 @@ project(Example10)
add_executable(example10 main.cpp)
sdf(example10)
sdf(example10 graph.py test)
add_sdf_dir(example10)
target_include_directories(example10 PRIVATE ${CMAKE_CURRENT_SOURCE_DIR})

@ -0,0 +1,68 @@
# Example 10
Please refer to the [simple example](../simple/README.md) to have an overview of how to define a graph and it nodes and how to generate the C++ code for the static scheduler. This document is only explaining additional details
This example is implementing a [dynamic / asynchronous mode](../../Async.md).
It is enabled in `graph.py` with:
`conf.asynchronous = True`
There is an option to increase the FIFO size compared to their synchronous values. To double the value (increase by `100%`) we write:
`conf.FIFOIncrease = 100`
The graph implemented in this example is:
![graph10](docassets/graph10.png)
There is a global iteration count corresponding to one execution of the schedule.
The odd source is generating a value only when the count is odd.
The even source is generating a value only when the count is even.
The processing is adding its inputs. If no data is available on an input, 0 is used.
In case of FIFO overflow or underflow, any node will skip its execution.
All nodes are generating or consuming one sample but the FIFOs have a size of 2 because of the 100% increase requested in the configuration settings.
Thus in this example :
* A sample is not always generated on an edge
* A sample is not always available on an edge
The dataflow on each edge is thus not static and vary between iterations of the schedule
## Expected outputs
```
Schedule length = 9
Memory usage 34 bytes
```
```
Start
0
0
1
1
2
2
3
3
4
4
5
5
6
6
7
7
8
8
9
9
```

Before

Width:  |  Height:  |  Size: 24 KiB

After

Width:  |  Height:  |  Size: 24 KiB

@ -2,7 +2,7 @@ from cmsisdsp.cg.scheduler import *
### Define new types of Nodes
class SinkAsync(GenericSink):
def __init__(self,name,theType,inLength):

@ -3,13 +3,11 @@
* Title: AppNodes.h
* Description: Application nodes for Example 2
*
* $Date: 29 July 2021
* $Revision: V1.10.0
*
* Target Processor: Cortex-M and Cortex-A cores
* -------------------------------------------------------------------- */
/*
* Copyright (C) 2010-2021 ARM Limited or its affiliates. All rights reserved.
* --------------------------------------------------------------------
*
* Copyright (C) 2021-2023 ARM Limited or its affiliates. All rights reserved.
*
* SPDX-License-Identifier: Apache-2.0
*

@ -6,7 +6,7 @@ project(Example2)
add_executable(example2 main.cpp)
sdf(example2)
sdf(example2 graph.py test)
add_sdf_dir(example2)
target_include_directories(example2 PRIVATE ${CMAKE_CURRENT_SOURCE_DIR})

@ -1,20 +1,23 @@
# Example 2
Please refer to [Example 1](example1.md) for the details about how to create a graph and the C++ support classes.
Please refer to the [simple example](../simple/README.md) to have an overview of how to define a graph and it nodes and how to generate the C++ code for the static scheduler.
The [simple example with CMSIS-DSP](../simpledsp/README.md) is giving more details about `Constant` nodes and CMSIS-DSP functions in the compute graph.
In this example. we are just analyzing a much more complex example to see some new features:
- Delay
- CMSIS-DSP functions
- Some default nodes : sliding buffer
- SlidingBuffer
This example is not really using a MFCC or a TensorFlow Lite node. It is just providing some wrappers to show how such a nodes could be included in a graph:
The graph is:
![graph2](graph2.PNG)
![graph2](docassets/graph2.PNG)
It is much more complex:
- First we have a source delayed by 10 samples ;
- First we have a stereo source delayed by 10 samples ;
- Then this stereo source is split into left/right samples using the default block Unzip
- The samples are divided by 2 using a CMSIS-DSP function
- The node HALF representing a constant is introduced (constant arrays are also supported)
@ -24,18 +27,11 @@ It is much more complex:
- Another sliding buffer
- An a block representing TensorFlow Lite for Micro (a fake TFLite node)
Note that those blocks (MFCC, TFLite) are doing nothing in this example. It is just to illustrate a more complex example that someone may want to experiment with for keyword spotting.
Note that those blocks (MFCC, TFLite) are doing nothing in this example. It is just to illustrate a more complex example typical of keyword spotting applications.
Examples 5 and 6 are showing how to use the CMSIS-DSP MFCC.
The new features compared to `example1` are:
- Delay
- CMSIS-DSP function
- Constant node
- SlidingBuffer
Let's look at all of this:
Let's look at the new features compared to example 1:
## Delay
@ -43,9 +39,7 @@ Let's look at all of this:
g.connectWithDelay(src.o, toMono.i,10)
```
To add a delay on a link between 2 nodes, you just use the `connectWithDelay` function. Delays can be useful for some graphs which are not schedulable. They are implemented by starting the schedule with a FIFO which is not empty but contain 0 samples.
To add a delay on a link between 2 nodes, you just use the `connectWithDelay` function. Delays can be useful for some graphs which are not schedulable. They are implemented by starting the schedule with a FIFO which is not empty but contain some 0 samples.
## CMSIS-DSP function
@ -59,16 +53,18 @@ sa=Dsp("scale",floatType,blockSize)
The corresponding CMSIS-DSP function will be named: `arm_scale_f32`
The code generated in `sched.cpp` will not require any C++ class, It will look like:
The code generated in `scheduler.cpp` will not require any C++ class, It will look like:
```C++
{
float32_t* i0;
float32_t* o2;
i0=fifo2.getReadBuffer(160);
o2=fifo4.getWriteBuffer(160);
arm_scale_f32(i0,HALF,o2,160);
cgStaticError = 0;
float32_t* i0;
float32_t* i1;
float32_t* o2;
i0=fifo3.getReadBuffer(160);
i1=fifo4.getReadBuffer(160);
o2=fifo5.getWriteBuffer(160);
arm_add_f32(i0,i1,o2,160);
cgStaticError = 0;
}
```
@ -84,23 +80,21 @@ A constant node is defined as:
half=Constant("HALF")
```
In the C++ code, `HALF` is expected to be a value defined in `custom.h`
In the C++ code, HALF is expected to be a value defined in custom.h
In the Python generated code, it would be in custom.py
Constant values are not involved in the scheduling (they are ignored) and they have no io. So, to connect to a constant node we do:
Constant values are not involved in the scheduling (they are ignored) and they have no IO. So, to connect to a constant node we do:
```python
g.connect(half,sa.ib)
```
There is no "o", "oa" suffixes for the constant node half.
There is no "o", "oa" suffixes for the constant node `half`.
Constant nodes are just here to make it easier to use CMSIS-DSP functions.
## SlidingBuffer
Sliding buffers and OverlapAndAdd are used a lot so they are provided by default.
Sliding buffers and OverlapAndAdd are used a lot so they are provided in the `cg/nodes/cpp`folder of the `ComputeGraph` folder.
In Python, it can be used with:
@ -114,3 +108,18 @@ There is no C++ class to write for this since it is provided by default by the f
It is named `SlidingBuffer` but not `SlidingWindow` because no multiplication with a window is done. It must be implemented with another block as will be demonstrated in the [example 3](example3.md)
## Expected outputs
```
Schedule length = 302
Memory usage 10720 bytes
```
And when executed:
```
Start
Nb = 40
```
Execution is running for 40 iterations without errors.

Before

Width:  |  Height:  |  Size: 28 KiB

After

Width:  |  Height:  |  Size: 28 KiB

@ -3,13 +3,10 @@
* Title: AppNodes.h
* Description: Application nodes for Example 3
*
* $Date: 29 July 2021
* $Revision: V1.10.0
*
* Target Processor: Cortex-M and Cortex-A cores
* -------------------------------------------------------------------- */
/*
* Copyright (C) 2010-2021 ARM Limited or its affiliates. All rights reserved.
* --------------------------------------------------------------------
*
* Copyright (C) 2021-2023 ARM Limited or its affiliates. All rights reserved.
*
* SPDX-License-Identifier: Apache-2.0
*

@ -6,7 +6,7 @@ project(Example3)
add_executable(example3 main.cpp custom.cpp)
sdf(example3)
sdf(example3 graph.py test)
add_sdf_dir(example3)
target_include_directories(example3 PRIVATE ${CMAKE_CURRENT_SOURCE_DIR})

@ -0,0 +1,172 @@
# Example 3
Please refer to the [simple example](../simple/README.md) to have an overview of how to define a graph and it nodes and how to generate the C++ code for the static scheduler. This document is only explaining additional details
This example is implementing a working example with FFT. The graph is:
![graph3](docassets/graph3.PNG)
The example is:
- Providing a file source which is reading a source file and then padding with zero
- A sliding window
- A multiplication with a Hann window
- A conversion to/from complex
- Use of CMSIS-DSP FFT/IFFT
- Overlap and add
- File sink writing the result into a file
The new feature s compared to previous examples are:
- The constant array HANN
- The CMSIS-DSP FFT
## Constant array
It is like in example 2 where the constant was a float.
Now, the constant is an array:
```python
hann=Constant("HANN")
```
In `custom.h`, this array is defined as:
```C++
extern const float32_t HANN[256];
```
## CMSIS-DSP FFT
The FFT node cannot be created using a `Dsp` node in Python because FFT is requiring specific initializations. So, a Python class and C++ class must be created. They are provided by default in the ffamework butg let's look at how they are implemented:
```python
class CFFT(GenericNode):
def __init__(self,name,theType,inLength):
GenericNode.__init__(self,name)
self.addInput("i",theType,2*inLength)
self.addOutput("o",theType,2*inLength)
@property
def typeName(self):
return "CFFT"
```
Look at the definition of the inputs and outputs : The FFT is using complex number so the ports have twice the number of float samples. The argument of the constructor is the FFT length in **complex** sample but `addInput` and `addOutput` require the number of samples of the base type : here float.
We suggest to use as arguments of the blocks a number of samples which is meaningful for the blocks and use the lengths in standard data type (f32, q31 ...) when defining the IO.
So here, the number of complex samples is used as arguments. But the IO are using the number of floats required to encode those complex numbers hence a factor of 2.
The C++ template is:
```C++
template<typename IN, int inputSize,typename OUT,int outputSize>
class CFFT;
```
There are only specific implementations for specific datatype. No generic implementation is provided.
For, float we have:
```C++
template<int inputSize>
class CFFT<float32_t,inputSize,float32_t,inputSize>: public GenericNode<float32_t,inputSize,float32_t,inputSize>
{
public:
CFFT(FIFOBase<float32_t> &src,FIFOBase<float32_t> &dst):GenericNode<float32_t,inputSize,float32_t,inputSize>(src,dst)
{
arm_status status;
status=arm_cfft_init_f32(&sfft,inputSize>>1);
};
int prepareForRunning() override
{
if (this->willOverflow() ||
this->willUnderflow())
{
return(CG_SKIP_EXECUTION_ID_CODE); // Skip execution
}
return(0);
};
int run() override
{
float32_t *a=this->getReadBuffer();
float32_t *b=this->getWriteBuffer();
memcpy((void*)b,(void*)a,inputSize*sizeof(float32_t));
arm_cfft_f32(&sfft,b,0,1);
return(0);
};
arm_cfft_instance_f32 sfft;
};
```
It is verbose but not difficult. The constructor is initializing the CMSIS-DSP FFT instance and connecting to the FIFO (through GenericNode).
The run function is applying the `arm_cfft_f32`. Since this function is modifying the input buffer, there is a `memcpy`. It is not really needed here. The read buffer can be modified by the CFFT. It will just make it more difficult to debug if you'd like to inspect the content of the FIFOs.
THe function `prepareForRunning` is only used in asynchronous mode. Please refer to the documentation for the asynchronous mode.
This node is provided in `cg/nodes/cpp` so no need to define it. You can just use it by including the right headers.
It can be used by just doing in your `AppNodes.h` file :
```c++
#include "CFFT.h"
```
From Python side it would be:
```python
from cmsisdsp.cg.scheduler import *
```
The scheduler module is automatically including the default nodes.
## Expected output
Output of Python script:
```
Schedule length = 25
Memory usage 11264 bytes
```
Output of execution:
```
Start
Nb = 40
```
It is running for 40 iterations of the scheduler without errors.
The python script `debug.py` can be used to display the content of `input_example3.txt` and `../build/output_example3.txt`
It should display the same sinusoid but it is delayed in `output_example3.txt` by a few samples because of the sliding buffer. The sliding buffer will generate 256 samples in output each time 128 samples are received in input. As consequence, at start, 256 samples with the half set to zero are generated.
We can check it in the debug script by comparing a delayed version of the original to the output.
You should get something like:
![sine](docassets/sine.png)
We have 40 execution of the schedule iteration. In each schedule iteration we have two sinks. A sink is producing 192 samples.
So, the execution is producing `40 * 2 * 192 == 15360` so a bit less than the `16000` samples in input.
If we compare the input and output taking into account this length difference and the delay of 128 samples, we get (by running `debug.py`):
```
Comparison of input and output : max absolute error
6.59404862823898e-07
```

@ -0,0 +1,20 @@
import numpy as np
from pylab import figure, clf, plot, xlabel, ylabel, xlim, ylim, title, grid, axes, show,semilogx, semilogy
from numpy import genfromtxt
ref_data = genfromtxt('input_example3.txt', delimiter=',')
figure()
plot(ref_data)
output_data = genfromtxt('../build/output_example3.txt', delimiter=',')
plot(output_data)
show()
print(ref_data.shape)
print(output_data.shape)
nb = output_data.shape[0] - 128
print("Comparison of input and output : max absolute error")
diff = output_data[128:] - ref_data[:nb]
print(np.max(np.abs(diff)))

Before

Width:  |  Height:  |  Size: 22 KiB

After

Width:  |  Height:  |  Size: 22 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 74 KiB

@ -2,6 +2,8 @@
It is exactly the same example as example 3 but the code generation is generating Python code instead of C++.
![graph4](docassets/graph4.png)
The Python code is generated with:
```python
@ -12,6 +14,12 @@ and it will generate a `sched.py` file.
A file `custom.py` and `appnodes.py` are also required.
The example can be run with:
`python main.py`
Do not confuse `graph.py,` which is used to describe the graph, with the other Python files that are used to execute the graph.
## custom.py
```python
@ -25,7 +33,7 @@ An array HANN is defined for the Hann window.
## appnodes.py
This file is defining the new nodes which were used in `graph.py`. In `graph.py` which are just defining new kind of nodes for scheduling purpose : type and sizes.
This file is defining the new nodes which were used in `graph.py`.
In `appnodes.py` we including new kind of nodes for simulation purpose:
@ -33,8 +41,6 @@ In `appnodes.py` we including new kind of nodes for simulation purpose:
from cmsisdsp.cg.scheduler import *
```
The CFFT is very similar to the C++ version of example 3. But there is no `prepareForRunning`. Dynamic / asynchronous mode is not implemented for Python.
```python
@ -110,3 +116,22 @@ DISPBUF = np.zeros(16000)
nb,error = s.scheduler(DISPBUF)
```
The example can be run with:
`python main.py`
## Expected outputs
```
Generate graphviz and code
Schedule length = 25
Memory usage 11264 bytes
```
And when executed:
![sine](docassets/sine.png)
As you can see at the beginning, there is a small delay during which the output signal is zero.

@ -3,13 +3,11 @@
# Title: appnodes.py
# Description: Application nodes for Example 4
#
# $Date: 29 July 2021
# $Revision: V1.10.0
#
# Target Processor: Cortex-M and Cortex-A cores
# -------------------------------------------------------------------- */
#
# Copyright (C) 2010-2021 ARM Limited or its affiliates. All rights reserved.
# Copyright (C) 2021-2023 ARM Limited or its affiliates. All rights reserved.
#
# SPDX-License-Identifier: Apache-2.0
#

@ -1,20 +0,0 @@
import numpy as np
from cmsisdsp.cg.static.nodes.simu import *
a=np.zeros(10)
f=FIFO(10,a)
f.dump()
nb = 1
for i in range(4):
w=f.getWriteBuffer(2)
w[0:2]=nb*np.ones(2)
nb = nb + 1
f.dump()
print(a)
for i in range(4):
w=f.getReadBuffer(2)
print(w)

Binary file not shown.

After

Width:  |  Height:  |  Size: 24 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 50 KiB

@ -0,0 +1,25 @@
# Example 5
This is a pure python example. It is computing a sequence of MFCC with an overlap of 0.5 s and it is creating an animation.
It can be run with:
`python main.py`
The `NumPy` sink at the end is just recording all the MFCC outputs as a list of buffers. This list is used to create an animation.
<img src="docassets/graph5.png" alt="graph5" style="zoom:100%;" />
## Expected output
```
Generate graphviz and code
Schedule length = 292
Memory usage 6614 bytes
```
And when executed you should get an animation looking like this:
![mfcc](docassets/mfcc.png)
The Python `main.py` contains a line which can be uncommented to record the animation as a `.mp4` video.

@ -1,15 +1,12 @@
###########################################
# Project: CMSIS DSP Library
# Title: appnodes.py
# Description: Application nodes for Example 4
#
# $Date: 29 July 2021
# $Revision: V1.10.0
# Description: Application nodes for Example 5
#
# Target Processor: Cortex-M and Cortex-A cores
# -------------------------------------------------------------------- */
#
# Copyright (C) 2010-2021 ARM Limited or its affiliates. All rights reserved.
# Copyright (C) 2021-2023 ARM Limited or its affiliates. All rights reserved.
#
# SPDX-License-Identifier: Apache-2.0
#

Binary file not shown.

After

Width:  |  Height:  |  Size: 11 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 112 KiB

@ -3,13 +3,10 @@
* Title: AppNodes.h
* Description: Application nodes for Example 6
*
* $Date: 29 July 2021
* $Revision: V1.10.0
*
* Target Processor: Cortex-M and Cortex-A cores
* -------------------------------------------------------------------- */
/*
* Copyright (C) 2010-2021 ARM Limited or its affiliates. All rights reserved.
* --------------------------------------------------------------------
*
* Copyright (C) 2021-2023 ARM Limited or its affiliates. All rights reserved.
*
* SPDX-License-Identifier: Apache-2.0
*

@ -6,7 +6,7 @@ project(Example6)
add_executable(example6 main.cpp mfccConfigData.c)
sdf(example6)
sdf(example6 graph.py test)
add_sdf_dir(example6)
target_include_directories(example6 PRIVATE ${CMAKE_CURRENT_SOURCE_DIR})

Some files were not shown because too many files have changed in this diff Show More

Loading…
Cancel
Save