-
Notifications
You must be signed in to change notification settings - Fork 168
asm::delay blocks for 1.5 times longer on 0.6.5 #325
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This is #312; previously the implementation was broken on Cortex-M7 cores (#236) where it would complete in fewer than the requested number of cycles, whereas now it always takes at least the requested number of cycles (or more, in these cases). It's definitely not a breaking API change, but it's come in on cortex-m 0.6.5 (and 0.6.6 and 0.6.7) because it's using 0.7.1's delay implementation, which I guess could have been more clearly indicated in the changelog. A non-breaking version bump changing what could have been "calibrated" delays could definitely cause users problems. That said, the docs for this method are extremely clear that it may delay indefinitely longer than requested and shouldn't be used if accurate timing is required, and don't make any promises about constant vs geometric factors. If there was an easy way to specialise it for CPU architecture so that it remains correct on all cortex-m cores that would be great, but as it is I think the current behaviour (correct on all CPUs, but slower than before) is better than the old behaviour (may undercount on some CPUs, but is more accurate on others), at least for 0.7. We could consider releasing 0.6.8 with the old delay behaviour and keeping the bug fix only for 0.7, but it would take some messing with the asm, which in 0.6.5+ is entirely re-exported from the new system in 0.7; we'd need to bring back the old |
Actually faced the same problem. I checked the assembler code in both cases with and without inline-asm feature. Initialization may be different, but the interesting part is the same: 0x0800040c <+160>: subs r1, #1
0x0800040e <+162>: bne.n 0x800040c <seg1::__cortex_m_rt_main+160> Here we have subs function, that decrements r1 value and branch that repeats this action untill r1 become zero. But totally we have 3 cycles on these two instructions and here's why:
According to this info it can be 3 to 5 cycles total, so we should better divide input by 3 instead of 2. On my stm32f401 and stm32f446 microcontrollers it takes 3 cycles always. I am not sure about other architectures than cortex-m4, but probably they should have the same behavior. |
The nuisance is that it does change with architecture, in particular Cortex-M7 has a dual-issue pipeline which means two instructions may be retired per clock cycle in some cases. As an example, consider this code: #[cortex_m_rt::entry]
fn main() -> ! {
rtt_init_print!();
let mut cp = cortex_m::Peripherals::take().unwrap();
cp.DCB.enable_trace();
cp.SCB.enable_icache();
cortex_m::peripheral::DWT::unlock();
cp.DWT.enable_cycle_counter();
rprintln!("delay(x) | cpu cycles");
rprintln!("-------------|-----------");
for i in 1..10 {
let delay = i * 1000;
let t0 = cortex_m::peripheral::DWT::cycle_count();
cortex_m::asm::delay(delay);
let t1 = cortex_m::peripheral::DWT::cycle_count();
rprintln!("{:<13}| {}", delay, t1 - t0);
}
loop {}
} I built it in release mode for STM32F401:
STM32F767:
The actual delay loop in the output binary is the same as you posted, a Actually, this kind of sucks - the whole reason we use |
@adamgreig I see. So if there is such a wide range of possible cycles per instruction, maybe there is a way to check it in compile time? Does the |
Those two examples were the exact same binary on the same target - the CPU isn't something we know at compile time. Possibly we should just document that it will retire |
Random thought: what about putting the delay in terms of |
Uh oh!
There was an error while loading. Please reload this page.
This snippet prints
1_500_000
with cortex-m 0.6.5 but1_000_000
with cortex-m 0.6.4Behavior observed on both the nRF52840 and the STM32L433.
I personally consider this to be a breaking change.
The API doc does say that function blocks for "at least" N clock cycles but I would the function at worst to be off by some constant/arithmetic factor (+k) not by a geometric factor (*k).
And I would only expect that arithmetic factor in presence of interrupts.
I'm aware that delay can be off by a geometric factor is there's Flash latency but that's not what observed above: Flash configuration was not changed; only the cortex-m version was.
The text was updated successfully, but these errors were encountered: