Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up IV writes #6767

Merged
merged 9 commits into from Dec 2, 2022
Merged

Speed up IV writes #6767

merged 9 commits into from Dec 2, 2022

Conversation

tenderlove
Copy link
Member

@tenderlove tenderlove commented Nov 18, 2022

This PR takes advantage of shapes in the set instance variable instruction.

It handles a few different cases:

Shape Transitions No Transition
Needs to allocate IV buffer
No Allocation

For the time being, we still fire a write barrier in many cases. We never need to fire a write barrier when the object we reference is an immediate. If we know from the virtual stack that the object we're writing is an immediate, we will not generate WB code. If we don't know what type it is, then we do a runtime check and only fire the WB if it's a heap object.

Ideally we could fire the WB even less frequently. The generational algorithm only wants the WB so that we can know old to young references. Unfortunately, the GC also implements an incremental marking algorithm. If the GC is doing incremental marking, then it relies on the WB for executing any black / white / grey transitions.

In other words, object age is not the only factor for deciding whether or not the WB should execute, we also have to take in to account whether or not the GC is currently doing incremental marking.

There must be something clever we can do to handle the above situation, I'm just not sure about it right now.

Benchmarks look like this:

before: ruby 3.2.0dev (2022-11-18T20:04:10Z master 9e067df76b) +YJIT [x86_64-linux]
after: ruby 3.2.0dev (2022-11-18T21:13:32Z iv-write 6f7f05fb68) +YJIT [x86_64-linux]

--------------  -----------  ----------  ----------  ----------  ------------  -------------
bench           before (ms)  stddev (%)  after (ms)  stddev (%)  before/after  after 1st itr
30k_ifelse      233.7        0.0         227.4       0.1         1.03          1.00         
30k_methods     557.3        0.1         543.9       0.1         1.02          1.04         
activerecord    72.8         0.2         68.0        1.2         1.07          1.03         
binarytrees     139.8        1.7         136.0       0.8         1.03          1.02         
cfunc_itself    30.5         0.1         30.3        0.1         1.01          1.02         
chunky_png      436.1        0.4         455.3       0.3         0.96          0.96         
erubi           198.8        0.7         201.6       0.8         0.99          0.99         
erubi_rails     12.2         8.2         12.2        8.0         1.00          0.97         
etanni          334.6        0.3         337.2       0.4         0.99          1.00         
fannkuchredux   1398.2       0.2         1396.6      0.2         1.00          1.00         
fib             41.4         0.1         41.5        0.7         1.00          1.00         
getivar         24.5         0.1         24.5        0.1         1.00          1.03         
hexapdf         1441.0       1.6         1410.9      2.1         1.02          1.01         
keyword_args    38.3         0.2         36.8        0.1         1.04          1.06         
lee             628.6        0.8         641.8       1.1         0.98          0.98         
liquid-render   76.4         1.7         75.8        1.6         1.01          1.00         
mail            105.3        0.1         104.4       0.1         1.01          0.98         
nbody           64.9         0.4         55.8        0.7         1.16          1.12         
optcarrot       2276.9       0.5         1762.3      0.4         1.29          1.28         
psych-load      1358.3       0.1         1309.7      0.0         1.04          1.03         
railsbench      1284.0       2.2         1280.2      2.0         1.00          1.00         
respond_to      21.9         0.6         17.9        0.4         1.23          1.27         
ruby-lsp        50.5         19.5        50.1        20.4        1.01          0.99         
rubykon         4851.4       0.5         4642.6      2.6         1.04          1.03         
setivar         30.2         1.7         10.3        0.2         2.93          1.03         
setivar_object  50.4         0.2         32.0        0.3         1.57          1.03         
str_concat      26.4         1.7         26.3        3.2         1.00          1.02         
--------------  -----------  ----------  ----------  ----------  ------------  -------------
Legend:
- before/after: ratio of before/after time. Higher is better for after. Above 1 represents a speedup.
- after 1st itr: ratio of before/after time for the first benchmarking iteration.

I really really need to study the respond_to benchmark. I don't see any instance variables in that benchmark and yet this patch makes it faster.

@matzbot matzbot requested a review from a team November 18, 2022 22:00
@tenderlove
Copy link
Member Author

I wanted to measure the impact of this patch when writing non-immediate instance variables, so I made a benchmark to measure it.

We get a nice speed up there, though not as good as the immediate case:

before: ruby 3.2.0dev (2022-11-18T20:04:10Z master 9e067df76b) +YJIT [x86_64-linux]
after: ruby 3.2.0dev (2022-11-18T21:13:32Z iv-write 6f7f05fb68) +YJIT [x86_64-linux]

--------------  -----------  ----------  ----------  ----------  ------------  -------------
bench           before (ms)  stddev (%)  after (ms)  stddev (%)  before/after  after 1st itr
setivar_object  50.4         0.2         32.0        0.2         1.57          1.07         
--------------  -----------  ----------  ----------  ----------  ------------  -------------
Legend:
- before/after: ratio of before/after time. Higher is better for after. Above 1 represents a speedup.
- after 1st itr: ratio of before/after time for the first benchmarking iteration.

yjit/src/codegen.rs Outdated Show resolved Hide resolved
yjit/src/codegen.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@maximecb maximecb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good with a few comments

@maximecb
Copy link
Contributor

@tenderlove some test failures on arm, in the backend.

@tenderlove
Copy link
Member Author

@tenderlove some test failures on arm, in the backend.

Yep, @jemmaissroff and I are looking in to it. I'll re-request a review when we get it green. 🙇

@jemmaissroff
Copy link
Contributor

Yep, @jemmaissroff and I are looking in to it. I'll re-request a review when we get it green. 🙇

I worked with @k0kubun this week to fix upstream. I don't have permissions to rebase this PR off master. But I made #6839, a draft PR of this branch rebased off master to show that CI will be green once we rebase off the latest master.

yjit/src/codegen.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@maximecb maximecb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rebased off-of master and removed the double store of the shape id now that the issue with the backend is fixed.

Just one open comment to resolve wrt defining a constant for the shape id offset if possible.

@k0kubun k0kubun merged commit 41bacd9 into ruby:master Dec 2, 2022
@jemmaissroff jemmaissroff mentioned this pull request Dec 2, 2022
@tenderlove tenderlove deleted the iv-write branch December 4, 2022 04:49
@tenderlove
Copy link
Member Author

Thanks for merging this. Wow!

YJIT Benchmarks | Benchmarks and continuous checks for YJIT performance  2022-12-03 20-50-45

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
5 participants