Skip to content

Inefficient bytecode generated compared to Scala 2.13 #12161

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
japgolly opened this issue Apr 20, 2021 · 2 comments · Fixed by #12157
Closed

Inefficient bytecode generated compared to Scala 2.13 #12161

japgolly opened this issue Apr 20, 2021 · 2 comments · Fixed by #12157

Comments

@japgolly
Copy link
Contributor

In my UnivEq library I have ==*/!=* ops which are macros that expand to normal ==/!=.

Comparing the bytecode generated for:

def testBoolean(a: Boolean): Boolean =
  (a !=* a) || (a ==* a)

we get this for Scala 2.13 (note: no -opt flags used):

  public boolean testBoolean(boolean);
    Code:
       0: iload_1
       1: iload_1
       2: if_icmpeq     9
       5: iconst_1
       6: goto          10
       9: iconst_0
      10: ifne          26
      13: iload_1
      14: iload_1
      15: if_icmpne     22
      18: iconst_1
      19: goto          23
      22: iconst_0
      23: ifeq          30
      26: iconst_1
      27: goto          31
      30: iconst_0
      31: ireturn

and this for Scala 3

  public boolean testBoolean(boolean);
    Code:
       0: iload_1
       1: istore_2
       2: iload_1
       3: istore_3
       4: iload_2
       5: iload_3
       6: if_icmpeq     13
       9: iconst_1
      10: goto          14
      13: iconst_0
      14: ifne          38
      17: iload_1
      18: istore        4
      20: iload_1
      21: istore        5
      23: iload         4
      25: iload         5
      27: if_icmpne     34
      30: iconst_1
      31: goto          35
      34: iconst_0
      35: ifeq          42
      38: iconst_1
      39: goto          43
      42: iconst_0
      43: ireturn

The Scala 3 bytecode seems inefficient compared to that of Scala 2.13. Ideally they'd be the same for something this simple, no?

@japgolly
Copy link
Contributor Author

Weirder: if I peek at the bytecode for using ==/!= directly without macros:

def testBoolean(a: Boolean): Boolean =
  (a != a) || (a == a)

Both Scala 2.13 and Scala 3 result in:

  public boolean testBoolean(boolean);
    Code:
       0: iload_1
       1: iload_1
       2: if_icmpne     10
       5: iload_1
       6: iload_1
       7: if_icmpne     14
      10: iconst_1
      11: goto          15
      14: iconst_0
      15: ireturn

Is there some kind of optimiser phase being applied to normal code but not the output of inlining?

@japgolly
Copy link
Contributor Author

@nicolasstucki I just tried your #12157 branch to see the effect it would have on the bytecode.

  • Recompiling the code as-is (i.e. so with the manual inline matching and delegation for primitives), the generated bytecode doesn't change
  • Changing the inline ops to simply be a == b and a != b, the generates bytecode matches that of Scala 2.13

It's awesome to see that #12157 brings us to parity with Scala 2.13, but I'm not sure it should close this issue. Why is it that even the new bytecode for inline a == b not the same as a manual a == b?

This below from inline a == b,

       0: iload_1
       1: iload_1
       2: if_icmpeq     9
       5: iconst_1
       6: goto          10
       9: iconst_0
      10: ifne          26
      13: iload_1
      14: iload_1
      15: if_icmpne     22
      18: iconst_1
      19: goto          23
      22: iconst_0
      23: ifeq          30
      26: iconst_1
      27: goto          31
      30: iconst_0
      31: ireturn

should be the same as manual a == b below, no?

       0: iload_1
       1: iload_1
       2: if_icmpne     10
       5: iload_1
       6: iload_1
       7: if_icmpne     14
      10: iconst_1
      11: goto          15
      14: iconst_0
      15: ireturn

I would've thought that after macro expansion the resulting ASTs would be the same and therefore should result in the same, efficient bytecode. That's why I was wondering if there's some optimisation phase that's not being applied after macro expansion. WDYT?

@Kordyjan Kordyjan added this to the 3.0.1 milestone Aug 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants