Skip to content

enable_icache is wrong on M7 without inline-asm feature #232

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
cbiffle opened this issue Jul 2, 2020 · 25 comments
Closed

enable_icache is wrong on M7 without inline-asm feature #232

cbiffle opened this issue Jul 2, 2020 · 25 comments
Labels

Comments

@cbiffle
Copy link
Contributor

cbiffle commented Jul 2, 2020

This has been a fun one to track down.

tl;dr: if enabling instruction and/or data caches on your M7 is causing weird crashes, turn on the cortex_m inline-asm feature and make sure you're compiling at --release. Oh, and you're on nightly now. Sorry.

Details

I've got an STM32H7B3 here, and shortly after enabling the icache I would sometimes get hard faults that looked suspiciously like it trying to execute random data. (I'm doing this early enough in boot that I don't have configurable faults broken out, but they were either BF or MMF.)

The key here is sometimes -- some builds worked okay. Until I tried turning on dcache. Then things went squirrely.

It came down to: how many instructions did the compiler decide to insert between turning on the cache and the dsb / isb sequence? If the answer is not zero and contains a branch or memory access, weird things will happen.

The current implementation of enable_icache reads as follows:

    #[inline]
    pub fn enable_icache(&mut self) {
        // Don't do anything if I-cache is already enabled
        if Self::icache_enabled() {
            return;
        }

        // NOTE(unsafe): No races as all CBP registers are write-only and stateless
        let mut cbp = unsafe { CBP::new() };

        // Invalidate I-cache
        cbp.iciallu();

        // Enable I-cache
        // NOTE(unsafe): We have synchronised access by &mut self
        unsafe { self.ccr.modify(|r| r | SCB_CCR_IC_MASK) };

        crate::asm::dsb();
        crate::asm::isb();
    }

The tail end of that function is almost right.

  • The modify will end in a str.
  • The barriers will follow.

However,

  • There's no use of compiler_fence or equivalent to prevent instructions from being rearranged above the barriers. (Perhaps we can assume this won't happen because they're extern "C"? I'm fairly certain I just saw it happen but I didn't save the listing.)
  • dsb and isb default to being function calls

This means the actual dynamic instruction trace here is

str r1, [r0, #0]
bl __dsb
dsb sy
bx lr
bl __isb
isb sy
bx lr

In other words, we've got one branch in the pipeline before the write even takes effect, and no fewer than three before the isb causes a prefetch/pipeline flush. These instructions often execute incorrectly. (On a reasonably fast M7, "often" becomes "always" in my testing.)

Enabling the inline-asm feature fixes this by changing the dynamic trace to be what you'd expect:

str r1, [r0, #0]
dsb sy
isb sy

This is probably an issue with any existing code attempting to use isb to synchronize with memory map or cache changes. To do this correctly, one might consider making enable_icache (and enable_dcache though it crashes less reliably) contingent on the inline-asm feature, but that only produces the right instruction sequence at --release.

These routines probably need to be written in assembly to be correct on stable toolchains (and on debug builds on nightly toolchains, for that matter, to bind the compiler's hands). I haven't thought about the equivalent disable cases in detail, but they and the clean/invalidate routines probably bear review as well.

@jonas-schievink
Copy link
Contributor

jonas-schievink commented Jul 2, 2020

+1 for rewriting these routines (or at least the critical parts of them) in assembly, I don't think we can rely on LLVM to always produce the right instruction sequence here.

@adamgreig
Copy link
Member

adamgreig commented Jul 2, 2020

Ah, as soon as I saw the email subject I realised exactly what had happened. Thanks for the detailed report. On the whole you're right and this will need to be fixed by moving the routine to assembly. We're really desperate for proper thumb intrinsics! Or I guess the new inline assembly would also satisfy.

There's no use of compiler_fence or equivalent to prevent instructions from being rearranged above the barriers. (Perhaps we can assume this won't happen because they're extern "C"? I'm fairly certain I just saw it happen but I didn't save the listing.)

https://github.com/rust-embedded/cortex-m/blob/master/src/asm.rs#L221 ;)

As it happens the 'extern C' does add something with the effect of a memory clobber so should prevent reordering of memory access in the output, but an explicit fence wouldn't hurt. We had much the same issue recently around ordering entering/leaving a CriticalSection.

@adamgreig
Copy link
Member

For the record this is the same problem people hit more often when using the externally linked 'intrinsics' to modify the stack pointer (e.g. to bootload) where the msp::write() call might modify the stack pointer and then pop some stuff off the stack, which ends as well as you'd expect and is also fixed by using inline-asm. It's a pain.

@cbiffle
Copy link
Contributor Author

cbiffle commented Jul 2, 2020

Any comment asking if a barrier is required should be replaced by a barrier. :-) You might lose some performance, but you're less likely to lose correctness.

I largely agree with @jonas-schievink, though I would go one step farther and suggest that dsb, dmb, and isb should be removed from the API. Any time you are writing code using explicit platform-dependent barrier instructions (as opposed to atomic types, or some library that contains barriers), you probably care about instruction reordering; Rust-level functions like this don't allow you to reason about it, and are almost impossible to use correctly, particularly across compiler versions. The current functions are footguns.

But if that seems too disruptive, then yes, going through every use and auditing why it's using a barrier, which ARM documentation it was following, whether the instruction order is actually significant, whether it's using the correct combination of machine and compiler fences, and then whether it needs to be rewritten in assembly anyway, would also work.

@adamgreig
Copy link
Member

There are definitely use cases where dmb and dsb are required for correctness but it doesn't matter if it's a function call, for example interoperating with a DMA on a shared memory region. I wouldn't want to remove them from the API. I'm less certain about uses for isb() that don't care about it potentially turning into two branches. We should definitely document more carefully that the won't necessarily optimise into a single inline instruction.

I suspect every case using those barriers needs to prevent memory reordering across the barrier, but at present either the function call is an extern function call (which implies a memory clobber)* or has an explicit memory clobber in the inline asm, which will prevent memory reordering around the call*.

* gosh, i hope so, anyway

@cbiffle
Copy link
Contributor Author

cbiffle commented Jul 2, 2020

I'm less certain about uses for isb() that don't care about it potentially turning into two branches.

I'm pretty sure that's not a thing, except for code that is cargo-culting isb because it saw it elsewhere. Which is pretty common unfortunately. isb() delenda est.

@cbiffle
Copy link
Contributor Author

cbiffle commented Jul 2, 2020

Okay, I admit that I have found a case where isb() is not wrong: in a privileged-mode kernel, activating the MPU, and using isb() to ensure that the settings apply before switching to unprivileged code. It's only correct in that case because the kernel "knows" that its code is accessible both in the default map and after the MPU is enabled, so continuing to execute a pipeline(s) full of instructions from before the MPU got turned on is OK. (Though in practice, switching to unprivileged usually involves an exception return, which implies an isb on armv7-m.)

What about marking isb() unsafe? It is, technically, unsafe, in its current form -- my system was executing nonexistent instructions and stomping on memory before I fixed this.

@adamgreig
Copy link
Member

"memory trouble? apply patented tincture __DSB(); __ISB(); topically as required"

What about marking isb() unsafe? It is, technically, unsafe, in its current form -- my system was executing nonexistent instructions and stomping on memory before I fixed this.

I think that's a bug in the isb() implementation rather than an incorrect safety contract; in principle I don't think executing isb should be able to cause memory unsafety (not executing it is what's potentially unsafe...).

How about:

  • Add pre-built assembly snippet which enables icache and immediately runs dsb and isb; always use that in tail of enable_icache, resolving the immediate issue
  • Document more clearly in all our pre-built "intrinsics" that calling them (except with release mode inline-asm) will result in a function call that may cause unwanted side effects
  • The Cortex-M7 docs are quite explicit that "you must always place a DSB and ISB instruction sequence after a cache maintenance operation", so I don't think we should remove any; the question is more whether any of them require we make sure there's no extra branches involved first. I think that shouldn't affect the data cache, and doesn't cause problems when disabling the icache, in which case enable_icache is the only problem, but I'd welcome more eyes on that...
  • We need to be very sure that calls to DSB and ISB cannot be reordered around; we could add a compiler_fence(Ordering::SeqCst) to these methods if we can't satisfy ourselves that the external function call and/or inline-asm-with-memory-clobber does the job. It's a shame that compiler_fence has a panic path though, I'm not sure under what conditions that is optimised out.

@cbiffle
Copy link
Contributor Author

cbiffle commented Jul 3, 2020

Add pre-built assembly snippet which enables icache and immediately runs dsb and isb; always use that in tail of enable_icache, resolving the immediate issue

sgtm.

Document more clearly in all our pre-built "intrinsics" that calling them (except with release mode inline-asm) will result in a function call that may cause unwanted side effects

I'm concerned that this doesn't proactively alert any current users that their code is wrong, the way a deprecation might. But, yes. This would be good.

The Cortex-M7 docs are quite explicit that "you must always place a DSB and ISB instruction sequence after a cache maintenance operation", so I don't think we should remove any

Yes, if you can cite docs, the barriers should be there. It looks like the cortex-m crate is using these barriers primarily in cache cases, so they are probably fine.

(I'll file a separate bug on how NVIC::mask ought to contain an isb.)

I think that shouldn't affect the data cache, and doesn't cause problems when disabling the icache, in which case enable_icache is the only problem, but I'd welcome more eyes on that...

I have definitely had intermittent failures when enabling the dcache (in fact, merely enabling the icache was not enough to trigger this in my code). I'm not sure if it was the dcache sequence or the interaction of the two, though. I'd use the asm snippet and the barriers in both cases.

In reading these cache maintenance functions -- are they written assuming they are executing in critical sections? Currently you could take an interrupt after enabling/disabling icache and before the barriers, which would be bad. (Worst case, the interrupt could have occurred shortly before the operation, and so the next instruction in the pipeline begins the interrupt handler.) Likewise between disabling dcache and cleaning.

These are privileged operations, so the standard cortex_m critical section should work (I'll file a separate bug about why I phrased that sentence like that).

It's a shame that compiler_fence has a panic path though, I'm not sure under what conditions that is optimised out.

Yeah, it really wants its Ordering as a const generic parameter, but that's of course not a thing yet.

@adamgreig
Copy link
Member

re the docs, I'm looking at the Cortex-M7 User Guide, ARM DUI 0646B, pg296:

image

@adamgreig
Copy link
Member

I'm concerned that this doesn't proactively alert any current users that their code is wrong, the way a deprecation might.

What would you replace all the instrinsics with, if deprecating them? Or are you only thinking about ISB/DSB (which we'd still need to offer somehow, of course)?

I have definitely had intermittent failures when enabling the dcache

It would be interesting to work out what's causing the failure here in case we need to put more things into the asm blob. Is it some interaction with the stack when dcache is getting turned on/off? Do we end up needing to do cache cleaning+disable all in asm in case a call hits the stack between disable and clean?

In reading these cache maintenance functions -- are they written assuming they are executing in critical sections?

The ARM CMSIS cache management methods do not enter a critical section or talk about them in the documentation, but clearly taking an interrupt in the middle of this sequence would be ruinous, so I wonder what's up with that. We could consider adding a critical section to all the cache management operations, but the overhead might be annoying in the more frequently executed methods.

@cbiffle
Copy link
Contributor Author

cbiffle commented Jul 4, 2020

What would you replace all the instrinsics with, if deprecating them? Or are you only thinking about ISB/DSB (which we'd still need to offer somehow, of course)?

Oh, sorry, I brought up deprecation as an example of how to communicate an issue to existing users. I don't know that you want to deprecate the existing intrinsics. Unless they prove unsalvageable of course.

Do we end up needing to do cache cleaning+disable all in asm in case a call hits the stack between disable and clean?

Probably. Any sequence provided as a literal assembly listing in either the v7-M ARM or the M4/M7 TRMs needs to be read very carefully before being rewritten in a higher-level language. I'd have to squint at cleaning and invalidation particularly.

The ARM CMSIS cache management methods do not enter a critical section or talk about them in the documentation, but clearly taking an interrupt in the middle of this sequence would be ruinous, so I wonder what's up with that.

Vendor code is often wrong. :-) I'm glad you are also seeing the issue there, I was concerned that I had missed something.

In a higher-level crate, if being in a critical section gives you a witness type, you could require that type to be passed in, proving a critical section exists, to avoid the overhead. (I felt like the interrupt::free function used to generate one of these, but I might be misremembering and thinking of some RTFM code.) In a lower-level crate, things probably just have to be unsafe with instructions that they need to be called without possibility of nonlocal control transfer to be used safely.

@adamgreig
Copy link
Member

Probably. Any sequence provided as a literal assembly listing in either the v7-M ARM or the M4/M7 TRMs needs to be read very carefully before being rewritten in a higher-level language.

The CMSIS routines ARM recommends for cache management are fully in C so at least in principle I feel like we should be able to get Rust to do the right thing, the only caveat being the DSB/ISB intrinsics. I wonder if we can come to a clearer understanding of what precisely is causing the fault behaviour and so where a branch must be avoided and where it's acceptable; I'd rather not just rewrite the entire cache management in assembler just to be sure.

In a higher-level crate, if being in a critical section gives you a witness type, you could require that type to be passed in, proving a critical section exists, to avoid the overhead.

That's still the case; interrupt::free passes a &CriticalSection token to the closure, so we could implement unsafe underlying cache management operations and safe wrappers which require the critical section token and then call the unsafe methods.

@cbiffle
Copy link
Contributor Author

cbiffle commented Jul 4, 2020

The CMSIS routines ARM recommends for cache management are fully in C so at least in principle I feel like we should be able to get Rust to do the right thing, the only caveat being the DSB/ISB intrinsics.

This is a case where the correctness of the code, if written in a higher-level language, will require some understanding of compiler behavior.

  • The C compilers ARM targets either implement __DSB and friends directly, or support forced-inlined functions with assembly bodies. So they're operating with the equivalent of the inline-asm feature.
  • The C functions contain no abstractions or function calls, and are likely to be compiled as straight-line code even at -O0.

I agree that you could, in principle, write C code in Rust (i.e. avoiding register abstractions, using raw pointer writes) that would compile similarly in debug[1]. Would it be clearer/more maintainable/more obviously correct than assembly? Maybe! Can't rule it out until I've seen it, but I'd probably reach for assembly here for long-term stability.

[1]: note that I'm pretty sure I have observed the compiler out-lining core::ptr::write and friends at times, so that statement might not actually be true. This might have been a long time ago, or I might be misremembering.

The lack of a critical section requirement in the CMSIS code still appears to be a bug/omission in the CMSIS code. Based on past (and recent) experience I'd caution against assuming CMSIS code is correct without a deep understanding of how ARMCC processes it -- particularly in cases like this where the ARM recommendation interacts with vendor-implemented memory hierarchies and the like.

I wonder if we can come to a clearer understanding of what precisely is causing the fault behaviour and so where a branch must be avoided and where it's acceptable; I'd rather not just rewrite the entire cache management in assembler just to be sure.

I'll have a go at writing a reduction. But in the case of the basic cache maintenance operations, the ARM docs (which you excerpted) are pretty clear, and I'd caution against assuming that you can control instruction generation too precisely through several levels of indirection/compiler. It seems potentially fragile. (For instance, my inline-asm change only fixes things on --release, as far as I can tell, and that's only been tested on opt-level = "z" and codegen-units = 1 with lto.)

Re: witness types,

That's still the case

Oh good! So that would be an opportunity for a low-overhead safe API for unprivileged/monolithic apps.

@jonas-schievink
Copy link
Contributor

We cannot currently write the equivalent to the C code in Rust due to lack of stable ARM intrinsics and inline assembly - either of the two would be needed to guarantee that dsb(); compiles down to a single instruction in debug mode.

@adamgreig
Copy link
Member

I definitely agree we need to at least write the parts of the sequence that must not have a branch in the middle in assembly (e.g. enabling icache followed by ISB); I'm just wondering if we will also have to write all the rest of the cache maintenance (e.g. the clean and invalidate loops and so on) in asm too. Some of it's at least marginally nontrivial (e.g. invalidate_dcache_by_address).

@cbiffle
Copy link
Contributor Author

cbiffle commented Jul 4, 2020

I've got an intermittent repro case, but it's intermittent. I've gotten it to crash something like 1% of the time.

Machine state at crash looks suspiciously similar to this instability in Linux early in its M7 port. I'm reviewing the patches that followed that to see what they changed. (Note that Linux's cache maintenance operations are all in assembly.)

Instruction sequence around the crash is

 8000398:       4c31            ldr     r4, [pc, #196]  ; = e000ed14 - CCR
 800039a:       6820            ldr     r0, [r4, #0]    ; read CCR
 800039c:       0380            lsls    r0, r0, #14     ; bit 17?
 800039e:       d40a            bmi.n   80003b6         ; branch ahead if set
 80003a0:       2000            movs    r0, #0
 80003a2:       f8c4 023c       str.w   r0, [r4, #572]  ; ICIALLU - missing suggested barrier
 80003a6:       6820            ldr     r0, [r4, #0]    ; read CCR ... some more
 80003a8:       f440 3000       orr.w   r0, r0, #131072 ; set bit 17
 80003ac:       6020            str     r0, [r4, #0]    ; write it back
 80003ae:       f003 fcb6       bl      8003d1e <__dsb>  <---- fatal call to dsb
 80003b2:       f003 fcb7       bl      8003d24 <__isb>
 80003b6:       f003 fcb2       bl      8003d1e <__dsb>  if you can't tie knots,
 80003ba:       f003 fcb3       bl      8003d24 <__isb>  tie lots

08003d1e <__dsb>:
 8003d1e:       f3bf 8f4f       dsb     sy  <-- hard fault taken here
 8003d22:       4770            bx      lr

The word where the fault occurs seems to read as zero when the fault happens, rather than the correct instruction.

Edit: Ah, the fault is reproducible by running a different non-cached binary that does not have dsb at the same address and then resetting into a binary that attempts to enable the cache. Presumably the race is being revealed by stale contents. More updates as I have 'em.

@adamgreig
Copy link
Member

Are you testing with dcache on/off/both, and also is the branch predictor enabled?

@cbiffle
Copy link
Contributor Author

cbiffle commented Jul 4, 2020

I've tested with both, but in this case it is crashing immediately before attempting to enable the dcache. Branch predictor is on, though I didn't request it, perhaps the runtime is enabling it?

Dump of the M7's IEBR registers shows

(gdb) mon mdw 0xe000efb0 4
0xe000efb0: 00000000 00020e91 00000000 00000000

Only interesting contents are in IEBR1 there, but IEBR1 sez:

  • Type of error = non-correctable
  • RAM bank = tag ram
  • RAM location:
    • Way: 0
    • Index: 0xe9
    • Line doubleword offset: 0
  • Locked: not locked by software
  • Valid: yes

So there has been a cache tag ECC error. If I'm doing my set-associative cache math right, that looks like it's on the line immediately after where the dsb function should be, which (given that the branch went to the last two bytes of the line, containing the first two bytes of a four-byte instruction) isn't totally shocking.

CFSR says usage error (invalid isntruction). HFSR says forced (usage fault escalated). Not super useful there.

So it seems like a branch racing instruction cache initialization is bad. It would be harder to demonstrate similar issues on dcache initialization because you'd have to get a load to issue at the right time, but with the M7's ability to dual-issue loads it ought to be doable if you think it's important. (Me, I say let's take ARM's advice and not put instructions in between cache enable and barrier.)

I have no data on other cache operations yet. As for the invalidate-by-address operation, as long as none of the functions it calls are transitively referencing any statics (which... I'd have to read them all), the only issue I could see is that you can't control its use of dcached memory during the algorithm (in the form of stack, at the very least). I'd find it easier to reason about if the dcache invalidation algorithm were not using the dcache while invalidating, and so I'd write it in assembly, personally. Reasonable people may differ of course.

cbiffle added a commit to oxidecomputer/cortex-m that referenced this issue Jul 4, 2020
See rust-embedded#232, which this partially fixes -- there's still the question of
taking an interrupt in the midst of these sequences.
@cbiffle
Copy link
Contributor Author

cbiffle commented Jul 4, 2020

Got a draft PR up -- this doesn't address the need for a critical section. It would be safe to simply use cps in the assembly sequences, because the SCB is only accessible to privileged code anyway. If you have no objections I'll amend the PR.

@adamgreig
Copy link
Member

Thanks! Do you think those are the only two methods which require writing in assembly then?

Re the critical section, if you did just use cps in the assembly you'd still need to store the current state before restoring in case we're already operating with interrupts off; maybe better to just put a interrupt::free around the call in the public function.

@cbiffle
Copy link
Contributor Author

cbiffle commented Jul 4, 2020

Thanks! Do you think those are the only two methods which require writing in assembly then?

No, I think those are the only two that we appear to agree need to be in assembly. :-) I'd do the other cache maintenance functions as I've said.

Re the critical section, if you did just use cps in the assembly you'd still need to store the current state

Yes, of course. The normal pattern is

mrs rX, PRIMASK
cpsid i
do stuff
msr PRIMASK, rX

Advantage: assembly routine becomes more clearly correct; interrupts are disabled for a shorter and more predictable period.

Disadvantage: critical section hidden in assembly. (Edit: additional disadvantage: y'all appear to be replicating all assembly sequences in two places without mechanisms to prevent divergence(?), so I guess any longer sequence is a risk.)

I could go either way. Got a preference?

cbiffle added a commit to oxidecomputer/cortex-m that referenced this issue Jul 4, 2020
See rust-embedded#232, which this partially fixes -- there's still the question of
taking an interrupt in the midst of these sequences.
cbiffle added a commit to oxidecomputer/cortex-m that referenced this issue Jul 4, 2020
See rust-embedded#232, which this partially fixes -- there's still the question of
taking an interrupt in the midst of these sequences.
cbiffle added a commit to oxidecomputer/cortex-m that referenced this issue Jul 4, 2020
See rust-embedded#232, which this partially fixes -- there's still the question of
taking an interrupt in the midst of these sequences.
cbiffle added a commit to oxidecomputer/cortex-m that referenced this issue Jul 5, 2020
See rust-embedded#232, which this partially fixes -- there's still the question of
taking an interrupt in the midst of these sequences.
bors bot added a commit that referenced this issue Jul 5, 2020
234: Use assembly sequences to enable caches. r=adamgreig a=cbiffle

See #232, which this partially fixes -- there's still the question of
taking an interrupt in the midst of these sequences.

Co-authored-by: Cliff L. Biffle <[email protected]>
@adamgreig
Copy link
Member

(for future reference, further discussion was in #234).

Thanks very much for reporting this and getting the fix in; I know the crate hasn't seen enough use with priv/unpriv use cases and I'm glad to get fixes merged to help that.

@adamgreig
Copy link
Member

I think we can close this out now thanks to #234, I've opened #238 to track future work. Thanks!

adamgreig pushed a commit that referenced this issue Jul 20, 2020
See #232, which this partially fixes -- there's still the question of
taking an interrupt in the midst of these sequences.
adamgreig pushed a commit that referenced this issue Jul 20, 2020
See #232, which this partially fixes -- there's still the question of
taking an interrupt in the midst of these sequences.
@njankowski-renesas
Copy link

Hi All,

I found this thread while looking for related topics.

Have not tested, only looked.

I think the actual problem with the code that was being inserted before the DSB/ISB instructions was that instruction cache lookup was enabled before instruction cache first initialization had completed.

The inserted code was wrong to have been placed there (adding compiler barriers to DMB/DSB/ISB was correct), but not for the reason of causing this crash.

The code as-is today could use additional memory barriers per Arm references, and the exception masking is likely unnecessary.

This code could still crash if an exception were to be taken after CCR.IC is enabled but before the DSB could execute during first initialization, but it is coincidentally protected by the exception masking.

Details

Cortex-M7 does not have automatic cache invalidation, so the cache contents are undefined after reset.

Before enabling the instruction cache or data cache, they must be invalidated.

The code must wait for this invalidation to complete by using a DSB/ISB pair before enabling the cache with the corresponding CCR.xC register bit.

On master, this code does not wait for the instruction cache invalidation to complete before writing CCR.IC.

Because on master the exceptions are masked and a DSB/ISB pair immediately follows setting CCR.IC, coincidentally no issue occurs since any pending ICIALLU operation is waited upon by that DSB/ISB pair.

CMSIS 4 had this same missing DSB/ISB pair in SCB_EnableICache after writing ICIALLU, which was fixed in CMSIS 5.

Also note that DSB/ISB functions/macros should also add a compiler barrier, which it seems they already do in this repository because of compiler_fence(Ordering::SeqCst) being present.

Refer to CMSIS 5 or 6 for examples on both topics.

I do not think disabling exceptions is required for cache maintenance.

If lookup is enabled by CCR.xC, the hardware should properly lookup and allocate in the caches even with on-going cache maintenance, except during this first time initialization after reset.

An application is still ultimately responsible for ensuring cache maintenance is complete, whenever the application expects it to be completed.

Code

/// Enables I-cache if currently disabled.
///
/// This operation first invalidates the entire I-cache.
#[inline]
pub fn enable_icache(&mut self) {
    // Don't do anything if I-cache is already enabled
    if Self::icache_enabled() {
        return;
    }

CMSIS 5 AND 6 ALSO INCLUDE A DSB/ISB PAIR HERE AFTER CHECKING IF ICACHE IS ENABLED

    // NOTE(unsafe): No races as all CBP registers are write-only and stateless
    let mut cbp = unsafe { CBP::new() };

    // Invalidate I-cache
    cbp.iciallu();

MISSING DSB/ISB PAIR

    // Enable I-cache
    extern "C" {
        // see asm-v7m.s
        fn __enable_icache();
    }

    // NOTE(unsafe): The asm routine manages exclusive access to the SCB
    // registers and applies the proper barriers; it is technically safe on
    // its own, and is only `unsafe` here because it's `extern "C"`.
    unsafe {
        __enable_icache();
    }
}

References

Arm Cortex-M7 Processor Technical Reference Manual
Revision r1p2
4.1.3 Initializing and enabling the L1 cache
Invalidate instruction cache

You can use the following code example to invalidate the entire instruction cache, if it has been included in the processor.
The operation is carried out by writing to the ICIALLU register in the PPB memory region.

    ICIALLU EQU 0xE000EF50
    MOV r0, #0x0
    LDR r11, =ICIALLU
    STR r0, [r11]
    DSB
    ISB

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants