CPU Overruns and CPU Usage - A Debugging Guide

Embedded CPU Usage Problems - Diagnosis and Troubleshooting

This post will overview everything needed to know about CPU usage when it comes to the PLECS Coder embedded micro-controller (MCU) targets. This includes the TI C2000, STM32, and XMC targets that are available to use with the PLECS Coder. This will not cover utilization concerns regarding the RTBox, which may be covered in a separate post.

Background

Task Scheduler

All of the embedded coder targets supported by PLECS use the same approach for running control tasks - a rate-monotonic scheduler (RMS). The RMS is responsible for dispatching all of the tasks created (implicitly or explicitly) by a PLECS model that is being deployed to an MCU. The basis of the RMS is simple - tasks that are expected to run at a faster rate are given a higher priority. This is to prevent as optimally as possible the scenario of a task failing to complete before it is expected to run again, which would throw a control system into chaos.

In the embedded world, we make a few additional stipulations: all tasks must run at a step size that is an integer multiple of the base task’s step size, making the base task the fastest task and always run to completion before initiating any slower task. This allows the entire scheduler to run off of one interrupt, and allows for generally higher CPU utilization limits than a scheduler that allows for arbitary task periods.

To understand how the RMS works in practice with our targets, see the below model and timing diagram:

As shown, the “Base Task” is executed every time period, while the “Voltage Control Task” and the “MPP Control Task” are executed every 2 and every 4 time periods respectively. As such, they are given priority according to their base period time: “Base Task” always runs before the “Voltage Control Task” (and can interrupt the task to do so), and “Voltage Control Task” always runs before the “MPP Control Task”. This ensures that all of the tasks are able to complete before they are scheduled to run once again. When all tasks are completed, the background task runs (which may handle non-critical operations such as communication and SFO calibration).

The RMS is what handles determining what is appropriate to run. Always when an interrupt fires, the base task will first be run. After the base task completes, the embedded dispatcher then selects and runs the highest priority task that is either currently running or scheduled to run. It continues to run through the list of tasks until all are completed and the background task can be run.

The interrupt that drives this base task is determined either explicitly via a “Control Task Trigger” block being placed in the model, or via an implicit process to assign an appropriate interrupt if no “Control Task Trigger” is specified. In the above example, the interrupt selected is the ADC end of conversion signal. This means that once the ADC has finished converting results from its read operation (which is triggered by the PWM), an interrupt will fire that will trigger the “Base Task” to run, and subsequently any scheduled or ongoing tasks to run. Along with any triggered ADC, the PWM and Timer blocks (including the Peak Current Controller and HR Timer) can also be configured to drive the “Control Task Trigger”.

When a “Control Task Trigger” is not explicitly given, PLECS Coder will attempt to infer from the model a potential source for the interrupt. If no block (such as an ADC, PWM, or Timer block) is firing interrupts at the appropriate rate to run the base task at the selected step size, a timer will be implicitly created to run the base task.

CPU Overruns

Given this RMS scheduler (or any task scheduler), a critical question arises - what happens if a task is unable to complete before it is selected by the scheduler to run a new iteration? In the embedded controls world, this is a big problem. A task failing to complete can have a wide variety of controls breaking implications, ranging from “stale” data propagation where downstream tasks operate on expired data, to critical operations being “starved out” and not performed, such as status communication, duty cycle updates, or a variety of other potential mishaps. As such, this is considered a critical failure and cannot be allowed to happen. This is referred to here as a “CPU Overrun” or “Task Overrun”.

When a CPU overrun occurs, the dispatcher will take immediate action to ensure a safe shutdown of the controls. This includes disabling all PWMs (if Powerstage Protection is used), stopping the further execution of any tasks, and issuing a PIL message to the user (which can be seen when connecting via external mode). Note that this does not reset the state of any other peripherals - for example, a digital out may be stuck high at the time of the panic occurring, a PWM channel not allowing control via the powerstage protection block will continue switching, etc. This information will be helpful later in this post to diagnose if a CPU Overrun is occuring.

CPU Utilization

CPU overruns pose a major concern, as an overrun will entirely halt the controls. In order to avoid an overrun, it is ideal to measure how close the code running on the CPU is to overrunning. One effective way to quantify this is to measure how much of a given task’s period is spent on completing that task. This is a “CPU Utilization” metric that can be used to determine the proximity to a task overrun that a given model is facing.

For the microcontroller targets, a “CPU Load” (TI C2000) or “Base Task Load” (STM32, XMC) block is offered to measure the percentage load of the Base Task. That is, it will output a fraction representing the time of the total base task period that is spent in the base task compared to the total period of the base task. For a single tasking model, this will represent the entire CPU Utilization, while for a multi-tasking model, it represents only a piece of the puzzle. It is an imperfect measure, but can be used as a “warning flag” for when you may be risking overrun, as will be mentioned in the below section regarding diagnosing a CPU overrun.

An interesting feature of the RMS is that it can be mathematically determined what a safe total CPU Utilization is. For more information see here. The Liu and Layland bound dictates that for a RMS that allows arbitrary task scheduling, a total CPU utilization of 69.3% or less is guaranteed to be able to be scheduled without risk of an overrun. In effect, because all task periods for the PLECS Coder are required to be an integer multiple of the base task period, a total CPU utilization of at least 80% will generally be tolerated. A utilization bound of 100% can be guaranteed by making all slower task periods a multiple of ALL faster task periods (i.e. 10us, 20us, 40us, 120us tasks would work), but note that a few or more percentage points of that 100% in actuality would need to be available for scheduler and interrupt overhead, data transfer, and allowing the background task to run.

Diagnosing a CPU Overrun

To know if you are experiencing a CPU overrun, there are usually a few tell tale signs that you can look out for:

  • Your code is “frozen” - LEDs are not blinking, PWM duty cycles are not updating, no response to signals is taking place

  • You are completely unable to connect via external mode - you may see the following error message:

  • You can connect initially to external mode, but then receive one of the following error messages:

  • If you are utilizing a powerstage protection block, your PWMs will be deactivated entirely

  • You experience one of the above issues and your base task sample time is quite fast (approaching 1e-5s or faster)

  • NOTE: As of today (2/23/2026) there is a bug that CPU overruns are not properly detected in single-tasking mode. As such, you can instead spot an overrun by code taking too long to run. For example, a blinking LED scheduled to blink every second may be blinking every 2 seconds instead. You will also notice that you will not be able to connect properly to external mode and receive one of the above error messages.

Additionally, you can run the following test to determine a likely task overrun:

  • Insert a CPU load block into your model, and attach it to a scope/display (or perhaps a DAC if you wish to measure via hardware without overhead from external mode added)
  • Enable external mode (if measuring via scope)
  • Increase your base sample time or remove/comment out items from your model until you are able to see the CPU load, either via external mode or some hardware method like the DAC

If you see when you are finally able to read the CPU usage a high number is displayed (in the 70 to 80% range), it is exceedingly likely that there was a task overrun occurring before. Note however that this is a measure of the base task load, so it may be possible that other tasks are running too long while the base task load is below this 70 to 80% range.

Recommendations for avoiding overruns

There are a couple of steps you can take to ensure that your tasks complete on time. These are common culprits that add many unnecessary clock cycles to your execution time or reduce your available task time that can save a model from the brink of overrun.

  1. Ensure all look-up tables are “evenly-spaced”

When lookup tables are evenly spaced, the code can be greatly optimized to only require one math operation to translate the input value to the desired table output value. This makes the code significantly more efficient - a non-evenly spaced lookup table requires a binary search to evaluate, which can balloon the execution time.

  1. Ensure all C-scripts are “safe” (and avoid if possible)

C-scripts pose multiple issues for execution time. Inherently, they have a slight overhead to use over doing the same operations using our model blocks. Additionally, a large amount of execution time can be added by doing operations that utilize doubles. Avoid declaring double variables and using functions that expect doubles - instead opt for floats and float functions. For example, using sin(value) will be significantly slower than using sinf(value). Also watch out for implicit conversions - float x = y * 1.2f is significantly faster than float x = y * 1.2 (note the f). Other tricks include attempting to avoid math.h functions when possible (x * x is significantly faster than pow(x, 2)), multipllication is faster than division (x * 0.5f is faster than x / 2.0f), and opting for ternary operators can in many cases be faster than using if-else statements.

  1. Move lower priority operations to slower running tasks

Not all operations need to run at the faster possible rate. By taking less time dependent tasks and moving them to slower tasks, the overall CPU load will drop. For example, communications, thermal ADC readings, state machine logic, and more can likely be moved to a slower task.

  1. Watch out for an excessive number of “scoped” signals and non-inlined parameters

If you are attempting to read too many values via scope and display blocks, or have too many parameters that are not “inlined” (can be changed at run time via methods like external mode), this can add significant usage to your model. Try to keep only the scoped/displayed values and non-inlined parameters that are necessary if up against the CPU usage limit.

  1. Increase the sample time for tasks in your model

If you have pared your model down to the most efficient it can be, then your best option remaining to run everything you need is to increase the sample time of tasks in your model. This will allow longer for these tasks to complete, and give more buffer to avoid overruns. For example, raising the execution time of the base task from 1e-5s to 2e-5s will double the amount of time that the base task has to complete before its next cycle.

1 Like