Skip to content

Support extra parameter on attachInterrupt() #4519

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

paulo-raca
Copy link
Contributor

When writing an arduino library, it is difficult to connect interrupt handlers with the component instance that should be notified.

In C, the common pattern to fix this is to specifying a void* parameter with the callback, where the listener can store any context necessary.

This patch adds the new function attachInterruptParam() to implement this pattern.

@paulo-raca
Copy link
Contributor Author

A very simple example of how this can be used to writing components:

class PinListener {
  int pin;

  static void isr(void* arg) {
    PinListener* listener = (PinListener*)arg;
    listener->onInterrupt();
  }

public:

  PinListener(int pin) {
    this->pin = pin;
    pinMode(pin, INPUT_PULLUP);
    attachInterruptParam(digitalPinToInterrupt(pin), isr, CHANGE, this);   
  }

  void onInterrupt() {
    Serial.print("Pin ");
    Serial.print(pin);
    Serial.print(" changed to ");
    Serial.println(digitalRead(pin));
  }
};



PinListener listeners[] = {
  PinListener(0),
  PinListener(1),
  PinListener(2),
  PinListener(7)
};



void setup() {
  Serial.begin(9600);
}

void loop() {
}

@NicoHood
Copy link
Contributor

NicoHood commented Feb 3, 2016

Doesnt this break the current API with no extra param? I am not sure if the code works in this case. Have you tested it?

Also this adds a lot of overhead, even though it could be very useable. Not sure about that.

However I like the compact definitions in WInterrupts.c.

If you ask me I'd rethink the whole API in general. I consumes quite some flash, even though only a singe PinInterrupt is used.

And I'd also change those:

 #define CHANGE 1
 #define FALLING 2
 #define RISING 3

to

 #define FALLING 1
 #define RISING 2
 #define CHANGE 3

So you have bit0 indicate falling, bit1 rising and both together a change.

@matthijskooijman
Copy link
Collaborator

The code looks good to me. The compatibility wrapper is a clever way to still support the existing parameterless version (@NicoHood, I think that still works), though it might be better for performance to not have the extra function call. One could instead store void* for the function pointers and cast them to the right type when calling, which I presume would be more efficient (perhaps smaller as well). The challenge then is how to decide whether to pass a parameter. Using NULL as a marker would work, but mean that you can never pass NULL, which isn't so great. Another option would be to use the LSB of the function pointer (since that is always 0 due to alignment), but that might be fragile as well. Perhaps storing a bitmask (which would only take 1 byte with the current boards) is ok, though.

The introduction of IMPLEMENT_ISR looks good to me, but ideally that would be in a separate commit, since it is totally unrelated to the main change in this PR.

@PaulStoffregen
Copy link
Contributor

Has anyone tested the performance impact?

@paulo-raca
Copy link
Contributor Author

Thanks for the feedback.

I'll split the IMPLEMENT_ISR macro into a separate PR.

@PaulStoffregen: How can I measure the performance impact?

@matthijskooijman, what do you think of using a poison value on voidFuncPtrParam (Maybe nothing) to mean "parameterless funcion".

#define IMPLEMENT_ISR(vect, interrupt) \
    ISR(vect) { \
        if (intFuncParam [interrupt] == nothing) {
            ((voidFuncPtr)intFunc[interrupt])();
        } else {
            ((voidFuncPtrParam)intFunc[interrupt])(intFuncParam[interrupt]);
        }
    }

The tradeoff is that ISR functions will take a little more flash to handle the condition inline

@NicoHood, Although I agree the current implementation wastes some flash with unused interrupts, I really don't know how to rewrite it in a more efficient way. I'm taking suggestions :D

compatFunc maintains compatibility with current parameterless API. I've tested it and it works.

About the constant values, I agree with you, that makes more sense, but looks like the values are defined by the hardware

@matthijskooijman
Copy link
Collaborator

@matthijskooijman, what do you think of using a poison value on voidFuncPtrParam (Maybe nothing) to mean "parameterless funcion".

That sounds reasonable, checking a pointer is probably a lot faster than calling a function. There is still overhead, of course, from loading the parameter and checking it.

As for measuring the overhead - disassembling and counting cycles would work, or doing some kind of runtime timekeeping (though that's more tricky to get accurate).

@matthijskooijman
Copy link
Collaborator

@matthijskooijman, what do you think of using a poison value on voidFuncPtrParam (Maybe nothing) to mean "parameterless funcion".

Wait, that's what I proposed with letting "NULL" to mean "no parameter". However, this will remove one value from the available parameter values, which I think is not a good plan (there is a fair chance that a user will pass that value anyway, especially if the void* argument is used to pass integers instead of pointers, and because function and data pointers can overlap).

What I thought you were saying, was to use a poisoned as the function pointer value, which means that the parameter contains the actual (parameterless) function pointer to call. This is more reliable because valid values for the function pointer array are only pointers into the flash, and never pointers into RAM or plain integers.

@paulo-raca
Copy link
Contributor Author

You are right.

Something like this?

        if (intFunc [interrupt] == compatFunc) {
                ((voidFuncPtr)intFuncParam[interrupt])();
        } else {
                intFunc[interrupt])(intFuncParam[interrupt]);
        }

@matthijskooijman
Copy link
Collaborator

Yup, that's what I had in mind indeed. You should probably get intFunc[interrupt] once and put it in a variable, though, since intFunc is volatile so the compiler likely won't optimize it otherwise. As for the compatFunc (which probably needs some better name), ideally it would be a function containing of just a "ret" instruction, so it only takes up 2 bytes. There might be some gcc attributes to achieve that (to prevent any pre- and postamble from being generated, possibly by making it a naked ISR).

paulo-raca pushed a commit to paulo-raca/Arduino that referenced this pull request Feb 4, 2016
The current code is very verbose and a painful to maintain (Change ISR implementation in 20 different places? No Thanks!).

(This was originally part of arduino#4519, but we all agreeded it deserved it's own PR)
@paulo-raca
Copy link
Contributor Author

I've made a separate PR for the IMPLEMENT_ISR macro on #4521

@paulo-raca
Copy link
Contributor Author

I haven't compared the performance yet, but the memory use on a pro micro (atmega32u4) is

  • using compatFunc: +66 bytes on flash, +10 bytes of ram
  • using @matthijskooijman's suggestion: +132 bytes on flash, +10 bytes of ram

@paulo-raca
Copy link
Contributor Author

Current code:

#define IMPLEMENT_ISR(vect, interrupt) \
  ISR(vect) { \
    intFunc[interrupt](); \
  }

ISR: Executes 3 instructions (7 cycles) from code + 34 instructions of boilerplate

..... Here goes prologue boilerplace: 17 instructions (Mostly `push`)
 1e8:   e0 91 00 01     lds     r30, 0x0100
 1ec:   f0 91 01 01     lds     r31, 0x0101
 1f0:   09 95           icall
..... Here goes epilogue boilerplace: 17 instructions (Mostly `pop`)

My code, using compatFunc:

https://github.com/paulo-raca/Arduino/blob/attachInterruptParam/hardware/arduino/avr/cores/arduino/WInterrupts.c

static void compatFunc(void* realCallback) {
  voidFuncPtr f = realCallback;
  f();
}

#define IMPLEMENT_ISR(vect, interrupt) \
  ISR(vect) { \
    intFunc[interrupt](intFuncParam [interrupt]); \
  }

ISR: Executes 5 instructions (11 cycles) from code + 34 instructions of boilerplate

 ..... Here goes prologue boilerplace: 17 instructions (Mostly `push`)
 202:   e0 91 00 01     lds     r30, 0x0100
 206:   f0 91 01 01     lds     r31, 0x0101
 20a:   80 91 32 01     lds     r24, 0x0132
 20e:   90 91 33 01     lds     r25, 0x0133
 212:   09 95           icall
 ..... Here goes epilogue boilerplace: 17 instructions (Mostly `pop`)

compatFunc: Executes 2 instructions (3 cycles)

 186:   fc 01           movw    r30, r24
 188:   09 94           ijmp

Summary:

  • Flash: +66 bytes
  • RAM: +10 bytes
  • ISR execution: +2 instructions and +4 cycles (with parameters), +4 instructions and +7 cycles (without parameters)

Suggested code, using if inside the ISR:

https://github.com/paulo-raca/Arduino/blob/attachInterruptParam2/hardware/arduino/avr/cores/arduino/WInterrupts.c

#define IMPLEMENT_ISR(vect, interrupt) \
  ISR(vect) { \
    voidFuncPtrParam cb = intFunc[interrupt]; \
    void* param = intFuncParam [interrupt]; \
    if (cb == NOARG_CALLBACK) { \
      ((voidFuncPtr)param)(); \
    } else { \
      cb(param); \
    } \
  }

ISR: Executes 9-11 (16-18 cycles) out of 12 instructions from code + 34 instructions of boilerplate
(note that the brne is 2 cycles when branching and 1 cycle when not)

..... Here goes prologue boilerplace: 17 instructions (Mostly `push`)
 1fe:   e0 91 00 01     lds     r30, 0x0100
 202:   f0 91 01 01     lds     r31, 0x0101
 206:   80 91 32 01     lds     r24, 0x0132
 20a:   90 91 33 01     lds     r25, 0x0133
 20e:   ef 3f           cpi     r30, 0xFF       ; 255
 210:   2f ef           ldi     r18, 0xFF       ; 255
 212:   f2 07           cpc     r31, r18
 214:   19 f4           brne    .+6             ; 0x21c <__vector_1+0x40>
 216:   fc 01           movw    r30, r24
 218:   09 95           icall
 21a:   01 c0           rjmp    .+2             ; 0x21e <__vector_1+0x42>
 21c:   09 95           icall
..... Here goes epilogue boilerplace: 17 instructions (Mostly `pop`)

Summary:

  • Flash: +132 bytes
  • RAM: +10 bytes
  • ISR execution: +4 instructions and +9 cycles (with parameters), +6 instructions and +11 cycles (without parameters)

Could be optimized to +4 instructions and +9 cycles (with parameters), +5 instructions and +9 cycles (without parameters)

@paulo-raca paulo-raca force-pushed the attachInterruptParam branch from 91a68d9 to 847bc9e Compare February 4, 2016 02:27
@matthijskooijman
Copy link
Collaborator

@paulo-raca, thanks for the thorough comparison :-D

I'm actually quite surprised about the mess the compiler makes when compiling my suggestion. There's a lot of overhead (12 instructions from my count) from storing and reloading variables on the stack, when they could just as well be kept around in registers (AFAICS all of the call-clobbered registers are pushed by the boilerplate, so they should be available). I wonder if it would be worth implementing this in hand-coded assembly perhaps...

Also interesting is that compatfunction has something similar going on, which is even more silly since it stores r24 and r25 on the stack and reloads them immediately to the same registers twice. It's almost as if you compiled without optimizations enabled?

@paulo-raca
Copy link
Contributor Author

You are right, I completely forgot to enable optimizations :/

I'll update it later

(I hope most of the ISR boilerplate goes away with it)

@matthijskooijman
Copy link
Collaborator

@paulo-raca, that suggests you compiled with avr-gcc manually? For a fair comparison, it would seem best to actually use the Arduino IDE for compilation, so you get the "official" compiler flags and compilation process. To extract the .elf file from the IDE for disassembly, just look through the verbose compilation output to find where it is stored.

@paulo-raca
Copy link
Contributor Author

@matthijskooijman, I updated the previous comment with optimized disassembly generated from the IDE.

Turns out that using compatFunction is extremely efficient and gives only +4 instructions overhead compared with the current implementation (2 instructions if you are using the callback with parameter)

@matthijskooijman
Copy link
Collaborator

I looked up the number of cycles in addition to the number of instructions, and I took the liberty of adding them in your comment. I believe the last code snippet could be optimized (by hand-coding in assembly, or perhaps by making the compiler smarter) to remove one icall and rjmp instruction, saving four bytes of flash and two cycles, but it' still slower than the compatFunc approach.

To me the compatFunc approach seems decent, the compiler seems smart enough to make it really small without any boilerplate. I tend to think about calls, and especially indirect ones, as expensive, but that seems to mostly true on x86 (which does heavy pipelining and speculative execution). On the AVR, an indirect calll is just three cycles (even a cycle faster than a direct call, interestingly enough).

So, your original approach seems the most efficient. Any users that do not need this new parameter feature will pay an additional 7 cycles. The current latency is 17*2+7 (assuming all 17 preamble boilerplate instructions are push at 2 cycles per instruction) 41 cycles. This new feature brings a measureable, but not enormous, cost.

On additional option we could consider is to simply pass the parameter, whether it is valid or not. Knowing AVR's calling conventions, we know that passing a parameter will simply occupy two registers. If the called function does not expect these parameters, it will just ignore them, without any additional problems. I don't see any way to actually verify this during compilation (e.g. by writing things in such a way that the compiler will see this optimization opportunity and exploit it, and/or an error be thrown if the compiler at some point handles calling things differently), but that does not need to be a problem necessarily (a decent comment should be written, though, to ensure this does not break when ported to other platforms).

If you would just pass the parameter, the additional overhead is two lds instructions, or 4 cycles for both cases, which seems reasonable to me.

@paulo-raca
Copy link
Contributor Author

I like the idea!

@paulo-raca paulo-raca force-pushed the attachInterruptParam branch from 847bc9e to c926025 Compare February 5, 2016 23:51
@paulo-raca
Copy link
Contributor Author

I've implemented it, and everything is looking good.

Thanks you for help, @matthijskooijman

@paulo-raca
Copy link
Contributor Author

Is there anything else I can do about this?
Can we have it merged?

As soon as it gets settled, I plan to do the same on Due

@facchinm
Copy link
Member

Hi @paulo-raca ,
I merged #4521, would you mind rebasing this one against current master ? 😄

@paulo-raca
Copy link
Contributor Author

Rebased. Thanks!

@paulo-raca
Copy link
Contributor Author

It's been a while... Should we merge or discard?

@paulo-raca paulo-raca closed this Jul 14, 2016
@tni
Copy link

tni commented May 5, 2017

On ARM Cortex M4, invoking that callback is 5 vs. 4 instructions.

The void* parameter would be passed in a register, so a function(void) could be called as function(void*).

\\

@Makuna:

The void* parameter works fine with lamdas, e.g.:

class S {
public:
    void callback();
};

void registerCallback(void (*callback_fn)(void*), void* context) {
    // ...
}

void test(S* s) {
    // ...
    registerCallback([](void* ctx){ ((S*) ctx)->callback();  }, s);
} 

@paulo-raca
Copy link
Contributor Author

I planned to follow up this patch with an ARM versions, but seems like it's not going anywhere :(

Anyway, I find that 5 instructions is still a pretty low overhead for what (I think) is a pretty useful feature.

Not sure about the lamdas, but I think it probably won't support lamdas with a capture list -- Just guessing, can you try it?

@tni
Copy link

tni commented May 5, 2017

No, you can't use lambda captures. They need storage that a function pointer doesn't have (you would need to use std::function). Lambdas without captures decay to function pointers.

The Cortex M4 overhead is 1 extra instruction. :-)

Copy link
Collaborator

@matthijskooijman matthijskooijman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sort of dropped off my radar, but a post on the mailing list got my attention again.

I just had another look at the code, and it looks good and acceptable for mergingm to me. @cmaglie @facchinm, would you have a look?

@paulo-raca, could you perhaps update your comment with assembly comparison with the assembly for the current approach?

@aaronw2
Copy link

aaronw2 commented Aug 29, 2017

This would be extremely useful for me as well. It works well with C++ since I can just pass the this pointer as a parameter.

I didn't know about this thread. In my implementation for the ESP32 I also pass the interrupt (pin) number to the handler, though with a bit more overhead I could live without that. It just means that I need to extract the pin number from my class instead.

When writing an arduino library, it is difficult to connect interrupt handlers with the component instance that should be notified.

In C, the common pattern to fix this is to specifying a `void*` parameter with the callback, where the listener can store any context necessary.

This patch adds the new function `attachInterruptParam()` to implement this pattern.
@paulo-raca
Copy link
Contributor Author

paulo-raca commented Aug 29, 2017

@matthijskooijman, the current approach adds 2 instructions / 4 cycles on AVR:

Before:

     d78:       e0 91 14 01     lds     r30, 0x0114     ; 0x800114 <intFunc+0x8>
     d7c:       f0 91 15 01     lds     r31, 0x0115     ; 0x800115 <intFunc+0x9>
     d80:       09 95           icall

After:

     d78:       e0 91 14 01     lds     r30, 0x0114     ; 0x800114 <intFunc+0x8>
     d7c:       f0 91 15 01     lds     r31, 0x0115     ; 0x800115 <intFunc+0x9>
     d80:       80 91 4a 01     lds     r24, 0x014A     ; 0x80014a <intFuncParam+0x8>
     d84:       90 91 4b 01     lds     r25, 0x014B     ; 0x80014b <intFuncParam+0x9>
     d88:       09 95           icall

On an Arduino Pro Micro (Atmega32U4) it adds 56 bytes of Flash and 10 bytes of RAM

@paulo-raca paulo-raca force-pushed the attachInterruptParam branch from c26f0ca to 9eb8ba0 Compare August 29, 2017 15:34
@paulo-raca
Copy link
Contributor Author

@aaronw2, thanks for bringing it up again :)

When it gets merged I'll also make patches for ESP8266, ESP32 and Due

@cmaglie
Copy link
Member

cmaglie commented Sep 1, 2017

@paulo-raca
Is the "non-portable" trick applicable also to ARM?
Do you have a reference implementation to compare the performance also there?

@paulo-raca
Copy link
Contributor Author

I haven't tried, but I believe it does.

According to @tni, the overhead on arm is only 1 instruction.

@matthijskooijman
Copy link
Collaborator

Also, even if ARM needs a bit more instructions, that's probably less of a problem given the usually faster MCUs. And since this is just an implementation detail, that does not affect the API, this should not hinder the changes on AVR.

@cmaglie
Copy link
Member

cmaglie commented Sep 1, 2017

I haven't tried, but I believe it does.

Ok, may you explain why calling a function with extra parameters is safe in the AVR case?
I'd like to have some clarification I'm not so easy to merge something that really looks like something you should not do :-)

Also, even if ARM needs a bit more instructions, that's probably less of a problem given the usually faster MCUs.

The point is how much is this "bit more" on the ARM side, so I'm going to give it a try to check it out.

@paulo-raca
Copy link
Contributor Author

Ok, may you explain why calling a function with extra parameters is safe in the AVR case?

On AVR, the extra arguments are sent on registers r24 and r25. If the function doesn't expect these arguments, these values on these registers are simply be ignored.

The big red warning mostly applies to ABIs where arguments are stored on the stack.
E.g., If the ISR pushes 2 arguments on the stack but the callback only pops one when returning, everything breaks.

paulo-raca added a commit to paulo-raca/ArduinoCore-sam that referenced this pull request Sep 1, 2017
In C, the common pattern to fix this is to specifying a void* parameter with the callback, where the listener can store any context necessary.

This patch adds the new function attachInterruptParam() to implement this pattern.

This is a port of arduino/Arduino#4519, but for Arduino Due.
@paulo-raca
Copy link
Contributor Author

paulo-raca commented Sep 1, 2017

I've just written an equivalent PR for Arduino Due.

The same trick also worked on ARM, and the ISR overhead is only 2 instructions.

@dplasa
Copy link

dplasa commented Sep 14, 2017

The big red warning mostly applies to ABIs where arguments are stored on the stack.
E.g., If the ISR pushes 2 arguments on the stack but the callback only pops one when returning,
everything breaks.

Is this really true? I believe the caller has to clean up the stack or else one could not realise functions with variable count of parameters like
int printf(char* format, ...);
where the called function cannot know the number of arguments passed to it.

At least that's how it's done in C on x86.

@paulo-raca
Copy link
Contributor Author

@dplasa, this is true for variadic functions, but the compiler might use a different calling convention on them.

E.g., on AVR:

void foo(char a, char b) {}
void bar(char a, ...) {}
void callFoo() {
  foo(1,2);
}
void callBar() {
  bar(1,2);
}

compiles to

[...]
void callFoo() {
[...]
  foo(1,2);
  2e:   62 e0           ldi     r22, 0x02       ; 2
  30:   81 e0           ldi     r24, 0x01       ; 1
  32:   e6 df           rcall   .-52            ; 0x0 <foo>
}
[...]
void callBar() {
[...]
  bar(1,2);
  42:   1f 92           push    r1
  44:   82 e0           ldi     r24, 0x02       ; 2
  46:   8f 93           push    r24
  48:   81 e0           ldi     r24, 0x01       ; 1
  4a:   8f 93           push    r24
  4c:   e5 df           rcall   .-54            ; 0x18 <bar>
  4e:   0f 90           pop     r0
  50:   0f 90           pop     r0
  52:   0f 90           pop     r0
}
[...]

On x86 there is a number of calling conventions where the callee should clean up the task: https://en.wikipedia.org/wiki/X86_calling_conventions#Callee_clean-up

@dplasa
Copy link

dplasa commented Oct 18, 2017

Still my argument holds :)

  • If the caller passes arguments on the stack - it cleans them up after the call (void callBar()...)
  • If the caller passes arguments in registers (void callFoo()) nothing needs to be cleaned

If an void ISR() without argument was called the compiler would probably just rcall the ISR code when the interrupt occurs.

If we change this to ISR(void* arg) then arg is either passed by register which is no deal to an ISR(void) function which just ignores the register...
If the compiler would generate code such that arg is passed over the stack, the caller will also clean up the stack - and again the ISR(void) just ignores the bit on the stack.

@paulo-raca
Copy link
Contributor Author

paulo-raca commented Jan 4, 2019

Hello,
Since the AVR core has been moved into a separate sepository, I created a new PR with the same changes: arduino/ArduinoCore-avr#58

There is also a version for SAM: arduino/ArduinoCore-sam#44

If these are merged in the official API, I'll also port this feature for ESP8266 / ESP32

@per1234 per1234 closed this Jan 4, 2019
@aaronw2
Copy link

aaronw2 commented Feb 1, 2019

I'm not aware of any Arduino compatible core that would have a problem with this. In every ABI I see supported, the first argument is always passed in a register.

@paulo-raca
Copy link
Contributor Author

paulo-raca commented Feb 1, 2019

I'm not aware either, but it's better to make it very explicit, since it is an abuse of the ABI

@aaronw2
Copy link

aaronw2 commented Feb 6, 2019

I might also add that in C, it is always the caller's responsibility to clean up the stack from any parameters that are passed. Passing an extra parameter will not cause a function call to fail. If passed on a stack, parameters are passed in a right to left order so that extra parameters on the stack will not cause a function to fail either. If the Pascal calling convention were used instead of C then this would be an issue since it is up to the callee to clean up the stack, but since this is C it is up to the caller to clean it up.

@paulo-raca
Copy link
Contributor Author

Ok, so what if I replace the comment with this?

// To support callbacks with and without parameters with minimum overhead,
// this relies on that fact that in C calling conventions extra argument on a
// function call are safely ignored without side-effects.

@aaronw2
Copy link

aaronw2 commented Feb 6, 2019

That sounds good to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Architecture: AVR Applies only to the AVR microcontrollers (Uno, etc.) Component: Core Related to the code for the standard Arduino API
Projects
None yet
Development

Successfully merging this pull request may close these issues.