Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rust YJIT #5826

Merged
merged 3 commits into from Apr 27, 2022
Merged

Rust YJIT #5826

merged 3 commits into from Apr 27, 2022

Conversation

XrXr
Copy link
Member

@XrXr XrXr commented Apr 19, 2022

In December 2021, we opened an issue to solicit feedback regarding the porting of the YJIT codebase from C99 to Rust. There were some reservations, but this project was given the go ahead by Ruby core developers and Matz. Since then, we have successfully completed the port of YJIT to Rust. We are opening this pull request to upstream Rust YJIT, effectively replacing the C version of YJIT.

The new Rust version of YJIT has reached parity with the C version, in that it passes all the CRuby tests, is able to run all of the YJIT benchmarks, and performs similarly to the C version (because it works the same way and largely generates the same machine code). We've even incorporated some design improvements, such as a more fine-grained constant invalidation mechanism which we expect will make a big difference in Ruby on Rails applications.

Because we want to be careful, YJIT is guarded behind a configure option:

./configure --enable-yjit # Build YJIT in release mode
./configure --enable-yjit=dev # Build YJIT in dev/debug mode

By default, YJIT does not get compiled and cargo/rustc is not required. If YJIT is built in dev mode, then cargo is used to fetch development dependencies, but when building in release, cargo is not required, only rustc. At the moment YJIT requires Rust 1.60.0 or newer.

The YJIT command-line options remain mostly unchanged, and more details about the build process are documented in doc/yjit/yjit.md.

The CI tests have been updated and do not take any more resources than before.

The development history of the Rust port is available at the following commit for interested parties: Shopify@1fd9573

Our hope is that Rust YJIT will be compiled and included as a part of system packages and compiled binaries of the Ruby 3.2 release. We do not anticipate any major problems as Rust is well supported on every platform which YJIT supports, but to make sure that this process works smoothly, we would like to reach out to those who take care of building systems packages before the 3.2 release is shipped and resolve any issues that may come up.


Please feel free to comment or ask questions on this pull request if you have any.

AppVeyor CI checks are currently red on the master branch, so they will be red on this PR as well.

yjit/src/asm/mod.rs Outdated Show resolved Hide resolved
yjit/src/asm/mod.rs Outdated Show resolved Hide resolved
defs/gmake.mk Outdated Show resolved Hide resolved
template/Makefile.in Outdated Show resolved Hide resolved
Comment on lines +307 to +316
$(Q) if [ -f '$(YJIT_LIBS)' ]; then \
set -eu && \
echo 'merging $(YJIT_LIBS) into $@' && \
$(RMALL) '$(CARGO_TARGET_DIR)/libyjit/' && \
$(MAKEDIRS) '$(CARGO_TARGET_DIR)/libyjit/' && \
$(CP) '$(YJIT_LIBS)' '$(CARGO_TARGET_DIR)/libyjit/' && \
(cd '$(CARGO_TARGET_DIR)/libyjit/' && $(AR) -x libyjit.a) && \
$(AR) $(ARFLAGS) $@ $$(find '$(CARGO_TARGET_DIR)/libyjit/' -name '*.o') ; \
fi
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not simply include $(YJIT_LIBS) in $(LIBRUBY_A_OBJS) or $(INITOBJS)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did try that, but that creates a nested archive. I got some build errors since ld has trouble finding YJIT symbols when dealing with nested archives.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was messing around on x86_64-darwin, and this patch seems to work. Does this seem reasonable? If so, I will throw up a PR to see if we can make it pass on all platforms

diff --git a/template/Makefile.in b/template/Makefile.in
index fee62c8d30..de339847a0 100644
--- a/template/Makefile.in
+++ b/template/Makefile.in
@@ -304,16 +304,7 @@ $(LIBRUBY_A):
 		@$(RM) $@
 		@-[ -z "$(EXTSTATIC)" ] || $(PRE_LIBRUBY_UPDATE)
 		$(ECHO) linking static-library $@
-		$(Q) $(AR) $(ARFLAGS) $@ $(LIBRUBY_A_OBJS) $(INITOBJS)
-		$(Q) if [ -f '$(YJIT_LIBS)' ]; then \
-		  set -eu && \
-		  echo 'merging $(YJIT_LIBS) into $@' && \
-		  $(RMALL)    '$(CARGO_TARGET_DIR)/libyjit/' && \
-		  $(MAKEDIRS) '$(CARGO_TARGET_DIR)/libyjit/' && \
-		  $(CP) '$(YJIT_LIBS)' '$(CARGO_TARGET_DIR)/libyjit/' && \
-		  (cd '$(CARGO_TARGET_DIR)/libyjit/' && $(AR) -x libyjit.a) && \
-		  $(AR) $(ARFLAGS) $@ $$(find '$(CARGO_TARGET_DIR)/libyjit/' -name '*.o') ; \
-		fi
+		$(Q) $(AR) $(ARFLAGS) $@ $(LIBRUBY_A_OBJS) $(INITOBJS) $(YJIT_LIBS)
 		@-$(RANLIB) $@ 2> /dev/null || true
 
 verify-static-library: $(LIBRUBY_A)

Copy link
Member Author

@XrXr XrXr Apr 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ianks I don't think that's going to work. If you run CI you should see the failure I saw in various different configurations.

Just for posterity, the specific errors I saw came from bigdecimal's extconf failing during linking. In general all of the extensions had trouble linking against object files that happen to depend on internal references to YJIT private symbols.

EDIT: I ran CI with the patch and this is what the error looks like: https://github.com/Shopify/ruby/runs/6131388624?check_suite_focus=true#step:12:772

The extconf log from bigdecimal is not in the GitHub log output but shows the link errors.

/// thankfully those cases are rare and don't cross the FFI boundary.
#[derive(Copy, Clone, PartialEq, Eq, Debug, Hash)]
#[repr(transparent)] // same size and alignment as simply `usize`
pub struct VALUE(pub usize);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

For those interested, the docs say this:

The isize and usize types are pointer-sized signed and unsigned integers. They have the same layout as the pointer types for which the pointee is Sized, and are layout compatible with C's uintptr_t and intptr_t types

defs/gmake.mk Show resolved Hide resolved
You will need to install:
- A C compiler such as GCC or Clang
- GNU Make and Autoconf
- The Rust compiler `rustc` and Cargo (if you want to build in dev/debug mode)
Copy link
Contributor

@ianks ianks Apr 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since 2021 edition is required, we should specify that. Not sure how up to date rustc is for the various package managers

Suggested change
- The Rust compiler `rustc` and Cargo (if you want to build in dev/debug mode)
- The Rust compiler `rustc` (>= 1.60.0) and Cargo (if you want to build in dev/debug mode)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: The OP comment notes that the minimum Rust version is 1.60.0

At the moment YJIT requires Rust 1.60.0 or newer.

//
// What's up with the long prefix? The "rb_" part is to apease `make leaked-globals`
// which runs on upstream CI. The rationale for the check is unclear to Alan as
// we build with `-fvisibility=hidden` so only explicitly marked functions end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rustc doesn't do -fvisibility=hidden. Instead when linking a cdylib it uses a version script to hide exported symbols. This is because any library may end up being linked into a rust dylib in which case all symbols need to be exported for other crates to depend on. In any case you can only get symbol conflicts for symbols not marked as #[no_mangle] when using the exact same rustc version as the rustc version is hashed into mangled symbols. To further reduce the chance of conflicts you can pass a unique string to -Cmetadata which is hashed into mangled symbols too. That leaves the standard library of which most symbols are mangled and thus would only conflict when the exact same version is used which is probably fine. There are a couple of symbols in the standard library that aren't mangled though because of cyclic dependencies between libcore, liballoc and libstd. For them changing rustc to mark them as protected would be fine I think. In addition you could use a version script to prevent exporting any symbols starting with __rust_. This is true for the majority of the unmangled symbols. Only a couple use a different prefix (rust_eh_personality, __rg_alloc, __rdl_alloc, ...). Arguably they should be changed to use the __rust_ prefix too. In addition maybe rustc could start including the rustc version or a hash in them? In any case most of them haven't been changing a lot between versions and may thus be safe to export until rustc gets changed for better hygiene.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added the rustc changes to my personal TODO list. Unless someone else makes these changes first it will probably take a while for me to get to it though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the context around -fvisibility=hidden, and for working on rustc! I think risk of collision is acceptably low for now since the Rust code is built as a static lib and linked into a DSO for that configuration, and the linker takes care of only including parts of the Rust stdlib that we use? I haven't verified that, though, and I added that to my TODO list. Thanks!

Also I think I understand the rationale mentioned in the comment a bit better now. It looks like -fvisibility=hidden is only relevant for dynamic libs so the prefixes are for avoiding collisions in the static lib case. (The comment was about practices in the C parts of Ruby)

yjit/bindgen/src/main.rs Outdated Show resolved Hide resolved
@@ -0,0 +1,24 @@
// Silence dead code warnings until we are done porting YJIT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still necessary?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, still needed for now. We've accumulated a bunch of these and will need to do a pass to address them.

yjit/Cargo.toml Outdated Show resolved Hide resolved
yjit/Cargo.toml Outdated Show resolved Hide resolved
yjit/bindgen/src/main.rs Show resolved Hide resolved
yjit/src/asm/mod.rs Show resolved Hide resolved
yjit/src/asm/mod.rs Show resolved Hide resolved
yjit/src/asm/mod.rs Show resolved Hide resolved
XrXr added a commit to Shopify/ruby that referenced this pull request Apr 20, 2022
@sbeckeriv
Copy link

Dearest Maintainer,

I am very excited about this change! I checked out the branch and ran clippy (https://github.com/rust-lang/rust-clippy) cargo +nightly clippy --fix -Z unstable-options --allow-dirty the auto fix removed un-needed returns, clones, and allocations. There are 54 other warnings some of which are helpful for performance. If a clean clippy run could be had it would be nice to enforce in build process.

Thank you for you work on this! I am happy to leave each clippy suggestion as a github suggestion if that helps.

Becker

@ianks
Copy link
Contributor

ianks commented Apr 20, 2022

Dearest Maintainer,

I am very excited about this change! I checked out the branch and ran clippy (https://github.com/rust-lang/rust-clippy) cargo +nightly clippy --fix -Z unstable-options --allow-dirty the auto fix removed un-needed returns, clones, and allocations. There are 54 other warnings some of which are helpful for performance. If a clean clippy run could be had it would be nice to enforce in build process.

Thank you for you work on this! I am happy to leave each clippy suggestion as a github suggestion if that helps.

Becker

Good call! Might be worthwhile to add add a clippy check to CI soon so the lints don’t pile up. Could help with adoption and keeping up with best practices

XrXr added a commit to Shopify/ruby that referenced this pull request Apr 20, 2022
XrXr added a commit to Shopify/ruby that referenced this pull request Apr 20, 2022
@RossSmyth
Copy link

I would consider setting up a rustfmt configuration as the formatting is not very consistent.

yjit/src/disasm.rs Outdated Show resolved Hide resolved
@RossSmyth
Copy link

I do not know much about JITs and was just linked here by a friend. This is very impressive. There a few points that I just want to make sure are known from the view of a Rust dev.:

  • static mut is a huge footgun waiting. I mentioned it above, but if this crate ever plans to use mutithreading a large part of this crate will be trivially UB. Even without multithreading UB is quite easy because the guarantees that references assert are extremely strong, and static mut makes it very easy to accidentally mess this up. This is done by modifying the static mut T while a &mut T to it is still live, leading to the program being UB. This thread is a good read on the reasons: Consider deprecation of UB-happy static mut rust-lang/rust#53639. The TL;DR is to use a static UnsafeCell<T> and raw pointers for the most part.
  • C FFI vs. Rust
    The general recommended way to go about C FFI is to write the Rust functions without considering the C interface, and then make a separate crate that does the logic for conversions/validating things between C and Rust, while this code has a lot of things just in extern C functions. Though I understand why this isn't done for the most part, but since some do use a pattern similar, separating the FFI interface from the main logic, I would consider it.
  • Unsafe macros
    This is personal preference, but I prefer to leave unsafe explicit rather than wrapping in macros so it's more easily auditable later.

I may try running this with Miri for laughs later. Anyways thanks for the cool project! I'll be keeping an eye on it.

XrXr added a commit to Shopify/ruby that referenced this pull request Apr 20, 2022
@XrXr
Copy link
Member Author

XrXr commented Apr 20, 2022

Thanks for all the suggestions! I pushed some commits to adopt some small changes, but please understand that I simply can’t get to all of them in this PR, especially the suggestions that generate big diffs. I’ll try to respond to your comments, but sorry in advance if I don’t get to yours. I’m outnumbered and I’m not very good or quick with words. 🙏

Since we use linear history in this repo I would like to keep the commit set in this PR small to make the merge manageable. After the initial merge of course we’re open to adopting improvements! Regarding clippy/rustfmt specifically, we’ll think about setting something up because it looks like a lot of folks are interested.

If you are interested in contributing to YJIT, feel free to file PRs in this repo after the merge!

@scottjmaddox
Copy link

Very impressive! Quick question for you: I didn't see any license headers in the YJIT files. Should I understand the Rust YJIT code to be under the same license as the rest of the Ruby codebase? I'm interested in potentially using the dynamic assembler (and possibly more) in my own project.

@XrXr
Copy link
Member Author

XrXr commented Apr 20, 2022

Should I understand the Rust YJIT code to be under the same license as the rest of the Ruby codebase?

Right, we use the same license, no change from when YJIT is in C. I'm not a lawyer though!

I'm interested in potentially using the dynamic assembler (and possibly more) in my own project.

You are welcome to do that, but I feel there has got to be better alternatives on crates.io that's more convenient to use and more Rusty. Good library design is hard work, and Rust YJIT doesn't do that work because, well, we don't intend to be a Rust library on crates.io! Just be aware.

@ianks
Copy link
Contributor

ianks commented Apr 21, 2022

I would consider setting up a rustfmt configuration as the formatting is not very consistent.

As annoying as the initial diff might be, I’m very +1 on doing this as early as possible. I suspect a large majority of rust devs have this configured as part of their workflow / editor configuration, so enabling it will reduce the burden of contributing.

If it’s not in this PR, a follow up formatting PR + a quick-merge would work as well!

yjit/src/cruby.rs Outdated Show resolved Hide resolved
XrXr and others added 3 commits April 26, 2022 18:00
In December 2021, we opened an [issue] to solicit feedback regarding the
porting of the YJIT codebase from C99 to Rust. There were some
reservations, but this project was given the go ahead by Ruby core
developers and Matz. Since then, we have successfully completed the port
of YJIT to Rust.

The new Rust version of YJIT has reached parity with the C version, in
that it passes all the CRuby tests, is able to run all of the YJIT
benchmarks, and performs similarly to the C version (because it works
the same way and largely generates the same machine code). We've even
incorporated some design improvements, such as a more fine-grained
constant invalidation mechanism which we expect will make a big
difference in Ruby on Rails applications.

Because we want to be careful, YJIT is guarded behind a configure
option:

```shell
./configure --enable-yjit # Build YJIT in release mode
./configure --enable-yjit=dev # Build YJIT in dev/debug mode
```

By default, YJIT does not get compiled and cargo/rustc is not required.
If YJIT is built in dev mode, then `cargo` is used to fetch development
dependencies, but when building in release, `cargo` is not required,
only `rustc`. At the moment YJIT requires Rust 1.60.0 or newer.

The YJIT command-line options remain mostly unchanged, and more details
about the build process are documented in `doc/yjit/yjit.md`.

The CI tests have been updated and do not take any more resources than
before.

The development history of the Rust port is available at the following
commit for interested parties:
1fd9573

Our hope is that Rust YJIT will be compiled and included as a part of
system packages and compiled binaries of the Ruby 3.2 release. We do not
anticipate any major problems as Rust is well supported on every
platform which YJIT supports, but to make sure that this process works
smoothly, we would like to reach out to those who take care of building
systems packages before the 3.2 release is shipped and resolve any
issues that may come up.

[issue]: https://bugs.ruby-lang.org/issues/18481

Co-authored-by: Maxime Chevalier-Boisvert <maximechevalierb@gmail.com>
Co-authored-by: Noah Gibbs <the.codefolio.guy@gmail.com>
Co-authored-by: Kevin Newton <kddnewton@gmail.com>
Thanks to suggestions from Stranger6667 on GitHub.

Co-authored-by: Dmitry Dygalo <dmitry@dygalo.dev>
Thanks to suggestion from bjorn3 on GitHub.

Co-authored-by: bjorn3 <bjorn3@users.noreply.github.com>
@XrXr XrXr merged commit 0514d81 into ruby:master Apr 27, 2022
@XrXr XrXr deleted the rust-yjit-upstreaming branch April 27, 2022 16:28
Comment on lines +43 to +46
let foo = (0 as *const $struct_type);

unsafe {
let ptr_field = (&(*foo).$field_name as *const _ as usize);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI this is a dereference of a null pointer, which is UB. It probably doesn't codegen to an actual read of address 0, so you don't get a fault, but it's UB nonetheless. If possible, it would be good to use the memoffset crate instead.

I don't know if this was copy-pasted from elsewhere, but bindgen has the same UB code pattern in it. I wouldn't be surprised if this code has spread all over.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LLVM definitely doesn't read an address of 0 because this is an address computation, no memory is actually being read.

We've been trying to avoid using external crates which is the problem. I would like to use some sort of built in offsetof operator instead, it would look cleaner, but unfortunately it's only accessible through an external crate.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I greatly worry about pasting unsafe code from external crates (multiple times, I've fixed the same piece of UB in 2 or 3 crates because they all copy+pasted instead of having a dependency), memoffset is MIT and the source for offset_of! is right here: https://github.com/Gilnaa/memoffset/blob/66f4c956a72daabeafa87ba9aef339d9035d78b4/src/offset_of.rs#L90 and I think it would be better to copy+paste than definitely have UB.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for letting us know! This macro is actually unused and I'm in the process of deleting it while addressing other dead code lints. We have #[allow(unused_macros)] at the moment which hides the lint. We'll be sure to try for a UB free implementation when we need one. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet