Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Data class implementation: Simple immutable value object #6353

Merged
merged 2 commits into from Sep 30, 2022

Conversation

zverok
Copy link
Contributor

@zverok zverok commented Sep 10, 2022

Source: #16122
Example docs rendering: Data

Design and implementation decisions made:

1. The "define data object method is called Data.define. As per Matz:

define might be a candidate. But I still prefer shorter one (e.g. def), but you can try to persuade me.

There were a few quite reasonable arguments towards define in that ticket. To add to them, my PoV:

  • I believe that nowadays (adding new APIs to the mature language), it is better to use full English words to remove confusion and increase readability;
  • def is strongly associated in Ruby with "defining a method" and became a separate word in Rubyist's dictionary, not a "generic shortcut for 'define'"
  • I believe that the "definition of the new type" (unlike the definition of a new method) is a situation where clarity is more important than saving 3 chars; and somewhat rarer.

2. define accepts keyword and positional args; they are converted to keyword args there, and checked in initialize

Measure = Data.define(:amount, :unit)
Measure.new(1, 'km') # => OK
Measure.new(amount: 1, unit: 'km') # => OK
Measure.new(1) # ArgumentError
Measure.new(amount: 1) # ArgumentError
Measure.new(1, 'km', 'h') # ArgumentError
Measure.new(amount: 1, unit: 'km', comment: 'slow') #=> ArgumentError

The fact that initialize accepts only keyword args and checks them makes it easy to define custom initialize with defaults, for example (it is all explicitly documented, see link above):

Measure = Data.define(:amount, :unit) do
  def initialize(amount:, unit: '-none-') = super
end

Measure[1] #=> #<data Measure amount=1, unit="-none-">

(This might be enough for not to invent a separate API for default values, but this discussion can be postponed.)

3. I didn't introduce any additional APIs yet (e.g. something like with which was discussed). So, the full API of the Data as rendered by RDoc is:
image

I believe it is enough for the introduction, and then the feedback might be accepted for how to make it more convenient.

4. I wrote custom docs for Data class instead of copy-pasting/editing docs for Struct. I have a strong belief that the approach to docs used is more appropriate:

  1. For separate methods, instead of detailed documenting of every behavior quirk by spelling it in many phrases, I provided the explanation of what it does logically and examples of how it can/should be used.
  2. For the entire class, I don't see a point in the recently introduced formal structure of "What's here" with many nested sections, at least for small classes. It sacrifices the lightweight-yet-enough explanations that can be consumed almost instantly, towards the feeling of "there is a book-worth of information to read," making the documentation user's experience more laborious.

Probably it is not a good place to discuss those approaches in general, but at least for the Data class, I believe the chosen approach is superior. If the core team believes it is not true, I think my changes can be merged and then formal docs provided by team members who consider them valuable.

struct.c Show resolved Hide resolved
struct.c Outdated Show resolved Hide resolved
Copy link
Member

@nobu nobu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

struct.c Outdated Show resolved Hide resolved
struct.c Outdated Show resolved Hide resolved
struct.c Outdated Show resolved Hide resolved
struct.c Outdated Show resolved Hide resolved
struct.c Outdated Show resolved Hide resolved
struct.c Outdated Show resolved Hide resolved
struct.c Outdated Show resolved Hide resolved
struct.c Outdated Show resolved Hide resolved
struct.c Show resolved Hide resolved
struct.c Outdated Show resolved Hide resolved
struct.c Outdated Show resolved Hide resolved
@nobu nobu added the Feature label Sep 11, 2022
@zverok zverok force-pushed the data-class branch 2 times, most recently from 78a6767 to 3e4f762 Compare September 22, 2022 18:47
struct.c Outdated Show resolved Hide resolved
struct.c Outdated Show resolved Hide resolved
struct.c Outdated Show resolved Hide resolved
struct.c Outdated Show resolved Hide resolved
struct.c Outdated Show resolved Hide resolved
@nobu nobu merged commit ad65192 into ruby:master Sep 30, 2022
@zverok zverok deleted the data-class branch September 30, 2022 09:42
Comment on lines +2172 to +2186
rb_define_method(rb_cData, "initialize", rb_data_initialize_m, -1);
rb_define_method(rb_cData, "initialize_copy", rb_struct_init_copy, 1);

rb_define_method(rb_cData, "==", rb_data_equal, 1);
rb_define_method(rb_cData, "eql?", rb_data_eql, 1);
rb_define_method(rb_cData, "hash", rb_data_hash, 0);

rb_define_method(rb_cData, "inspect", rb_data_inspect, 0);
rb_define_alias(rb_cData, "to_s", "inspect");
rb_define_method(rb_cData, "to_h", rb_data_to_h, 0);

rb_define_method(rb_cData, "members", rb_data_members_m, 0);

rb_define_method(rb_cData, "deconstruct", rb_data_deconstruct, 0);
rb_define_method(rb_cData, "deconstruct_keys", rb_data_deconstruct_keys, 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all these instance methods should be defined directly on the generated class (and not on Data) to allow future optimizations by specializing them for the members, i.e., https://bugs.ruby-lang.org/issues/16122#note-69 and as I mentioned in an earlier comment on this PR.
Otherwise such optimizations would have to create a new Module which is extra footprint and would be visible in ancestors.

Could you make a PR to adjust that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zverok It's unfortunate you didn't replied to this before the 3.2 release (maybe you didn't see it?).
The cost is we'll need to create an extra Module per Data subclass for optimization.
Hopefully that turns out to not be too much of a footprint drawback.

Given a pattern like

Measure = Data.define(:amount, :unit) do
  def initialize(amount:, unit: '-none-') = super
end

as used in https://bugs.ruby-lang.org/issues/16122#note-68

This actually only works if the initialize defined by Data.define is defined on Data and not on the Data subclass (named Measure here), otherwise the custom initialize would just overwrite it and super wouldn't work.

So for optimizing Data methods we will need to do the same trick as for Struct, to include a Module between the subclass and Data and define optimized methods there, so e.g., keyword arguments are dealt much more efficiently than the generic logic in Data#initialize.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unfortunate you didn't replied to this before the 3.2 release (maybe you didn't see it?).

I didn't.
TBH, communicating via GitHub comment in already closed PR is not the most reliable way to reach me :)

I am receiving GH mentions about everything that happens in ruby/ruby repo, but since Feb 24, I didn't have resource to read through them, so I just recently marked all thousands of them as "read" and started to follow the development anew...

This is not to say I am guaranteed to see mentions unless they are in PRs I am explicitly checking. Not the best mode, probably, but I have no resource to rethink it either 🤷 I am much more accessible on Twitter, email, or in new tickets on the bug tracker, which I also follow but more rigorously.

Also, considering... well, everything, ... I am not guaranteed to be reachable at all, so I believe any discussions about Data or other features I've introduced are better to maintain on the tracker.

Which, I think, is the best place to discuss the problem you raised, because I don't have a strong opinion on this. Though, I would maintain that the "super should refer to the default implementation" is a useful rule for any method, not only initialize. But how to find a compromise between this rule and efficiency, that I am not sure.

rb_undef_alloc_func(rb_cData);
rb_define_singleton_method(rb_cData, "define", rb_data_s_def, -1);

rb_define_singleton_method(rb_cData, "members", rb_data_s_members_m, 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*
* Examples:
*
* Measure = Data.new(:amount, :unit)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should not it be Data.define in this case? Also in some other docs below as well.

@tjad
Copy link

tjad commented Oct 9, 2022

@zverok

Shouldn't Data instances be frozen by default as they are immutable ?
The exact situation I am observing is calling Ractor.shareable? on a data instance.

CustomData = Data.define :attr1, :attr2
d = CustomData.new 1,2
d.frozen?  # produces false
Ractor.shareable? d #produces false

Otherwise I have to explicitly freeze the Data instance, which seems to serve no advantage over Struct in terms of usage with Ractor
If I understand correctly, effectively then multiple objects will be copied/made between Ractor instances when the Data instance is "unsharable"

@tjad
Copy link

tjad commented Oct 9, 2022

Nevermind, turned out to be limitation of my understanding of shareable. shareable means it is actually deeply frozen (or deeply immutable). As attr1 is not guaranteed to be immutable, the Data instance can't be shareable by default.

@ms-ati
Copy link

ms-ati commented Oct 15, 2022

Thank you 🎉 @zverok! Very happy with the design of initialize to handle defaults via keyword args.

Super hopeful 🙏 that we can get the #with method for the essential “copy a
Value object with one or more changes” usage pattern soon!

tomstuart added a commit to tomstuart/wasminna that referenced this pull request Nov 15, 2022
This was added in Ruby 3.2.0-preview3 [0].

[0] ruby/ruby#6353
@joeyparis
Copy link

Should this not be the singular "datum" rather than the plural "data"?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
7 participants