|
The DOL framework features a code generation back-end that allows to efficiently
execute applications on the Sony/Toshiba/IBM Cell Broadband Engine (CBE). The
code generation back-end relies on a lightweight run-time system based on
protothreads and windowed FIFOs.
The technical details behind this back-end are described in the following paper:
W. Haid, L. Schor, K. Huang, I. Bacivarov, and L. Thiele. Efficient Execution
of Kahn Process Networks on Multi-Processor Systems Using Protothreads and
Windowed FIFOs. In Proc. IEEE Workshop on Embedded Systems for Real-Time
Multimedia (ESTIMedia), pages 35—44, Grenoble, France, Oct. 2009.
(online access)
(BibTex)
The aim of this page is to shortly summarize the main ideas and show how to
actually use the back-end. In addition, the page contains links to further documents
and hosts the material for replicating the experiments performed in the paper.
Protothreads:
Protothreads are
usually used for programming constrained (in terms of memory and
performance) embedded systems. Protothreads are a simple, yet effective,
approach to execute (preemptive) processes using a single CPU context and a
single stack. Therefore, the context switch overhead is very low and no
further multi-threading support is required to execute multiple processes on
a single processor.
Windowed FIFO:
Unlike standard FIFOs, windowed FIFOs support direct access to a
continuous data segment in the (circular) FIFO buffer. These segments are
called "windows" which leads to the name "windowed FIFO". Compared to
standard FIFOs, windowed FIFOs are more efficient because unnecessary memory
copies can be avoided. The Kahn process network semantics is not affected by
using windowed FIFOs instead of standard FIFOs.
The main features of the developed run-time system are:
cooperative multi-threading on the PPE and ths SPEs -
direct windowed FIFO communication between processes mapped
to SPEs (PPE not involved)
overlapping of computation and communication by making use
of DMA engines (memory flow controllers)
In summary, this allows for an efficient, completely distributed execution of Kahn
process networks on the CBE.
CBE Package
The CBE package (click to download)
includes the following directories and files. Refer to the
Get Started section below for instructions how to set up the CBE framework on
a computer.
- dol: DOL distribution including CBE runtime-environment, code
generator, and programmer's guide. This is an extended version of DOL compared to
the one contained in
dol_ethz.zip.
- multiprocessor: experiments for execution on the CBE
- singleprocessor: experiments for execution on a single Linux workstation
- source: source files of all the experiments
- README: explanations for executing the experiments
Get Started
This section provides the basic steps, starting from the download of the CBE package
to the execution of an example application on the CBE.
More detailed information is available in the tool guide which is included in the
DOL framework. The programmer's guide describing CBE-specific issues is available
in the CBE package.
The requirements for executing applications leveraging the CBE
package are:
If the above mentioned environments are in place, do the
following to execute an application. For illustration, a simple producer-consumer
example referred to as examplecell will be used.
Note: examplecell is in the source
directory of the CBE package and it includes the process network, the architecture,
and the mapping, all described in a DOL compliant manner (see the
DOL documentation for further details).
- Set up the DOL framework, as described on the
DOL page.
- After the build directory has been created, change to
this directory,
$ cd build/bin/main
- $ ant -f runexample.xml -Dnumber=cell cell
The output should look then similar to the following one:
$ ant -f runexample.xml -Dnumber=cell cell
Buildfile: runexample.xml
showversion:
showantversion:
[echo] Use Apache Ant version 1.6.5 compiled on February 17 2006.
showjavaversion1:
[echo] Use Java version 1.5.0_06 (required version: 1.5.0 or higher).
showjavacversion1:
[echo] Use Java version 1.5.0_06 (required version: 1.5.0 or higher).
cell:
prepare:
[echo] Create directory examplecell.
[echo] Copy C source files.
validate:
[echo] check XML compliance of examplecell_flattened.xml.
[java] /home/user/DOL/DOLCrt/dolPrototype/trunk/examples/examplecell/examplecell.xml is valid.
flatten1:
[echo] Create flattened XML examplecell_flattened.xml.
[java] .............................................
[javac] Compiling 1 source file to /home/user/DOL/DOLCrt/dolPrototype/trunk/build/bin/main/examplecell
dol_cell1:
[echo] Run cell generation.
[java] Read process network from XML file
[java] -- full filename: file:/home/user/DOL/DOLCrt/dolPrototype/trunk/build/bin/main/examplecell/examplecell_flattened.xml
[java] -- Process network model from XML [Finished]
[java] Read architecture from XML file
[java] -- full filename: file:/home/user/DOL/DOLCrt/dolPrototype/trunk/examples/examplecell/cell.xml
[java] -- Architecture model from XML [Finished]
[java] Read mapping from XML file
[java] -- full filename: file:/home/user/DOL/DOLCrt/dolPrototype/trunk/examples/examplecell/mapping.xml
[java] -- Mapping from XML [Finished]
[java] Consistency check:
[java] APPL: Checking resource name ...
[java] APPL: Checking channel ports ...
[java] APPL: Checking channel connection . ..
[java] APPL: Checking Process connection ...
[java] ARCH: Checking resource name ...
[java] ARCH: Checking network simulators ...
[java] MAP: Checking multiple bindings ...
[java] MAP: Checking that all processes have a binding ...
[java] -- Consistency check [Finished]
[java] Generating Mapping in Dotty format:
[java] -- Generation [Finished]
[java] Generating Cell-package:
[java] Cell: Use predefined mapping.
[java] All other parameters are ignored.
[java] Read architecture from XML file
[java] -- full filename: file:/home/user/DOL/DOLCrt/dolPrototype/trunk/examples/examplecell/cell.xml
[java] -- Architecture model from XML [Finished]
[java] Read mapping from XML file
[java] -- full filename: file:/home/user/DOL/DOLCrt/dolPrototype/trunk/examples/examplecell/mapping.xml
[java] -- Mapping from XML [Finished]
[java] Mapped process generator to the PPU
[java] Mapped process square_0 to the SPU_4
[java] Mapped process square_1 to the SPU_1
[java] Mapped process square_2 to the SPU_2
[java] Mapped process square_3 to the SPU_3
[java] Mapped process square_4 to the SPU_4
[java] Mapped process square_5 to the SPU_5
[java] Mapped process square_6 to the SPU_1
[java] Mapped process square_7 to the SPU_2
[java] Mapped process square_8 to the SPU_3
[java] Mapped process consumer to the SPU_0
[java] Cell: Nr of SPE is 6
[java] Cell: Mapped some processes to the PPE
[java] -- Generation [Finished]
BUILD SUCCESSFUL
Total time: 7 seconds
- Copy the generated source files to the CBE platform (or simulator).
- On the CBE platform, change to the directory examplecell/cell,
where you can compile the application by simply executing
make.
- Your executable has been successfully created, and now you can execute the
application, by running ./sc_application.
The output should look then similar to the following one:
$ ./sc_application
)PPE: spe thread start run
)PPE: spe thread start run
)PPE: spe thread start run
)PPE: spe thread start run
)PPE: spe thread start run
)PPE: spe thread start run
SPU 0: start to execute
SPU 4: start to execute
SPU 5: start to execute
SPU 1: start to execute
SPU 2: start to execute
SPU 3: start to execute
consumer: 0.000000
consumer: 512.000000
consumer: 1024.000000
consumer: 1536.000000
consumer: 2048.000000
consumer: 2560.000000
)PPE: spe thread finish run
)PPE: spe thread finish run
)PPE: spe thread finish run
)PPE: spe thread finish run
)PPE: spe thread finish run
)PPE: spe thread finish run
)PPE:) Complete running all super-fast SPEs and PPE
- Note: To execute the experiments, several run.sh scripts are
provided in the singleprocessor and the multiprocessor
directories.
|
|