TNKernel port for ARM Cortex-M4 with FPU


1. The author

  TNKernel port for Cortex M4 CPU (the code and the description) was written by Vyacheslav Ovsiyenko.
  For any additional information about the port, please contact him directly(tnkernel.m4f(at)

2. FPU context switching

  There are two basic approaches for the switching FPU context within RTOS environment, defined in [1].
  The Cortex-M4F TNKernel port implements both of them.

  The selection is performed with TN_SUPPORT_FPU compilation parameter.


  There are no any FPU support in the port. The code compiled with this option can also run on the Cortex-M3.


  The port supports FPU using the "Lazy Stacking" method. Bits ASPEN and LSPEN in FPCCR register are
set permanently. At the moment of the task context switching the port checks the flag (bit 4 in return value)
whether the task has ever used FPU.
  If FPU was used, the actual FPU context is saved in the current task stack. Only the higher half of FP
registers (S16-S31) are explicitly saved that triggers the hidden "lazy saving" of lower half of FP registers
(S0-S15) and FP status word.
  Then the next task is checked for the FPU usage. If FPU will be used, the new required FPU context
is loaded from the next task stack.
  Only higher half of FP registers are restored explicitly. The other FP registers and FP status word are forced
to be load by the Cortex-M4F automatically at the exception handler return.
  This approach allows to use the floating point code within interrupt and exception handlers,
but requires more stack space and has increased context switching time. If the task has ever used the FPU
(possible in the ancient past) and ever not going to use it anymore, nevertheless it requires the FPU context
stack operations at the context switching - due to the sticky nature of FPCA bit.


  The port supports FPU, using the context switching on demand. The port maintains the system variable
keeping the task TCB pointer, which holds the FPU context at the moment. “Holds” means the actual data
are loaded in FPU registers correspond to this task.
  If holder task is active (coincides with tn_curr_run_task), then the FPU usage is allowed - bits
ASPEN and LSPEN in FPCCR register are set, otherwise (not holder task is active) the bits are cleared.
  When context is switched to the other task, that doesn't holds FPU, the FPU usage is just disabled.
  Any auxiliary operations with FPU context are not performed at all.
  If the new task (that does not holds the FPU) tries to access the FPU, theUsageFault exception is raised
with NOCP flag.
  The exception handler checks the conditions and does the actual FPU context switching - it stores the
previous content of all FPU registers into the previous FPU holder task TCB and loads the new context
from the next task TCB. The system variable is also set to point to the new holder.

  On the one hand, this approach provides a shorter context switch time, on the other hand, it not allows
direct usage of the floating-point code (without explicit FPU content saving and restoring).
  The FPU context is switched at the actual demand and, commonly, it becomes very rare event, excluding
the cases when the several tasks use the FPU in simultaneous heavy load fashion.

  Any case, when FPU is used, the following issue should be taken into consideration (citation from [1]):

  "This issue is made more complicated by the fact that the Embedded Application Binary Interface (EABI)
  standard permits the use of the FPU for non-floating-point operations. For example, if the code
  is compiled with compile options which specify that a FPU is available, and the function being
  compiled requires many registers for data processing and ends up fully utilizing the registers
  in the general register bank, then the C compiler might use some of the registers in the FPU as
  temporary data storage"

  It assumes the "hidden" FPU usage by C code, even without explicit operations with floating point
data types. It is not common-wide situation but it is possible.
  So, generally speaking, we should compile the RTOS routines with no-FPU options.

3. The port implementation

  The port contains the test project source code, TNKernel sources(see below),
makefiles for GCC, Keil and IAR compilers and utilities(make.exe etc).

  The assembler files (startup.s and tn_port_cm4f_asm.s) are common for all compilers and suppose
the C preprocessor usage (appropriate commands already included in makefiles). It is convenient
for the test, but for the "real" projects it seems that a better solution is to use separate
assempler files per each compiler with compiler/assembler native assembler syntax (for example,
the IAR compilers v.6.40 and v.6.50 provides mixed C/ASM macros processing in different manner etc.)

TNKernel for the port comes with some optimizations:

    - functions tn_cpu_save_sr() and tn_cpu_restore_sr() are inlined
    - CDLL routines can be optionally inlined (with defining the USE_INLINE_CDLL compilation parameter)
    - ffs_asm() routine is inlined (as well uses the “clz” Cortex instruction)
    - the tn_switch_context_request() routine is added. The switch context request
      is raised only in the case when tn_next_task_to_run variable is updated.
      The routines tn_switch_context() and the tn_int_exit() are not used(empty macros)

The context switch time is reduced by about 30 percent with the optimizations.


      [1]      "ARM Application Note 228: Cortex-M4(F) Lazy Stacking and Context Switching".

Downloads   TNKernel Cortex M4 with FPU port source code.
  The file also contains an examples for STM32F407 CPU and projects(makefiles) for IAR ARM 6.xx, GCC 4.x (GNU ARM embedded/CodeSourcery), Keil 4.xx