Erika Enterprise Benchmark for Nios 2

From ErikaWiki

Jump to: navigation, search


Nios II/dsPIC performance tests on the FRSH kernel (year 2008)

Please check the FRESCOR IST Project Document WP4 D-EP7v2.pdf.

Nios II performance tests (year 2005)

Reference platform

The Altera Nios II is a flexible platform that allows different configurations with different performance. When considering Erika Enterprise performance measurements, we only considered a reference platform derived from the 'standard' Altera reference design.

Moreover, also Erika Enterprise is highly configurable, providing different compiling options that impact of the system performance. These options are in fact a series of '#ifdef' in the kernel source code that inevitably changes the code footprint and the memory usage.

Having said that, the hardware used to take these measurements is the same used for the Erika Enterprise and RT-Druid tutorial, and runs on a Stratix 1S40 Altera evaluation board. The design have been generated by Altera Quartus II 5.0, Altera Nios II IDE 5.0, Evidence RT-Druid withErika Enterprise 1.2. The design is composed by two Nios II/s running at 50MHz with Instruction cache of 4Kb, and a few standard Altera peripherals. Code and data resides on external SRAM. Timings were measured using an Altera Performance Counter. Software projects used to perform the measurements are available in the full version of the products.

The hardware design on which the measurements have been performed is neither the fastest, nor the slowest design that can be done with Nios II, but is somehow an "average" design.

Erika Enterprise code footprint

The following table presents the memory usage of a typical subset of Erika Enterprise primitives, with:

  • BCC1 conformance class.
  • Monostack Kernel.
  • Standard Status.
  • Support for Autostart of task and alarms.
  • Support for Nested Interrupts.
  • Support for StartupHook and ShutdownHook.
  • Startup Barrier (only multiprocessor).
  • Interprocessor interrupt (only multiprocessor).
  • Support for remote task activations (only multiprocessor).
  • Support for global resources (only multiprocessor).

Code Footprint

Primitive Single CPU (bytes) Multiple CPU (bytes)
ActivateTask 284 300
GetResource 64 248
ReleaseResource 208 524
StartOS 368 604
TerminateTask 568 568
SetRelAlarm 112 112
CounterTick 760 792
IRQ handling (added wrt Nios II HAL) 212 212
Interprocessor IRQ handling - 1424
Internal routines 220 220


  • Multiple CPU numbers are referred to the Master CPU.
  • The difference in footprint for the GetResource and ReleaseResource, and the big size of the Interprocessor IRQ handling are due to inline expansion of spin-lock handling code.
  • The difference in the StartOS primitive is due to the implementation of the startup barrier in multiprocessor systems.

Erika Enterprise data footprint

The following table reports the memory usage of typical Erika Enterprise application. The comments below clarifies the parameters used for the Applications considered

Local Data Structure

Memory section Single CPU (bytes) Multiple CPU (bytes)
.rodata 48 60
.data 48 48
.sdata 64 96
.sbss 12 16
.bss 8 8
.common 12 12

Multiprocessor Global Shared Data Structure

Memory section Single CPU (bytes) Multiple CPU (bytes)
.rodata - 28
.data - 44
.sdata - 60
.sbss - 4


Single CPU numbers considers a BCC1 configuration containing 3 tasks, 1 resources, 1 Application mode, 0 Counters, 0 Alarms, and Autostart features. Multiple CPU mumbers considers the memory usage on the Master CPU of a 2 CPU BCC1 configuration containing 4 tasks (3 of them on the Master CPU), 1 local resource, 1 global resource, 1 Application mode, 0 Counters, 0 Alarms, and Autostart features. Memory usage varies with the numbers of CPUs, tasks, resources, that are instantiated inside the OIL file.

ERIKA Enterprise execution times

The following timing figures are an example of performance measurements that can be obtained from a typical deployment of Erika Enterprise. This is NOT the best performance you can get ot of an Erika Enterprise deployment, because performance highly depends on:

the topology of the multiprocessor system; the memory layout; the Avalon bus traffic; the compiler options. In particular, in the case we considered, both CPU code and data were allocated on SRAM, meaning also that Multiple CPU may collide accessing the same memory bank. Tightly coupled memories were not used in the example.

Execution Times

Primitive Single CPU Multiple CPU
clocks usec clocks usec
ActivateTask (local task, with preemption) 313 6.26 323 6.46
ActivateTask (local task, without preemption) 238 4.76 235 4.7
ActivateTask (remote task) - - 780 15.6
GetResource (local resource) 64 1.28 121 2.42
GetResource (global resource) - - 386 7.72
ReleaseResource (local resource, without preemption) 120 2.4 144 2.88
ReleaseResource (local resource, with preemption) - - 218 4.36
ReleaseResource (global resource, without preemption) - - 143 2.86
ReleaseResource (global resource, with preemption) - - 308 6.16
TerminateTask (going to a ready task) 551 11.02 536 10.72
TerminateTask (going to a stacked task) 308 6.16 325 6.5


Single CPU example has been made using the same hardware configuration of the Double CPU data, and creating, compiling and debugging a single processor system. The BCC1 conformance class (with O(n) ready queue queuing algorithms) has been used. The difference between a local and a remote task activation is mainly due to the internal use of the Altera Avalon Mutex. This overhead will be reduced in the next versions ofErika Enterprise. The small differences in the TerminateTask timings are most probably due to instruction cache misses (the code executed is the same).

Personal tools