New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add hooks to ActiveJob around retries and discards #33751
Add hooks to ActiveJob around retries and discards #33751
Conversation
Thanks for the pull request, and welcome! The Rails team is excited to review your changes, and you should hear from @schneems (or someone else) soon. If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes. This repository is being automatically checked for code quality issues using Code Climate. You can see results for this analysis in the PR status below. Newly introduced issues should be fixed before a Pull Request is considered ready to review. Please see the contribution instructions for more information. |
yield self, error | ||
ActiveSupport::Notifications.instrument("retry_stopped.active_job", payload) do | ||
yield self, error | ||
end | ||
else | ||
logger.error "Stopped retrying #{self.class} due to a #{error.class}, which reoccurred on #{executions} attempts. The original exception was #{error.cause.inspect}." | ||
raise error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is still worth to notify in this case. For this case we can just use the non-block format so we notify before the exception is raised.
payload = { | ||
job: self, | ||
adapter: self.class.queue_adapter, | ||
error: error, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think our style (although there's no rubocop on for it yet) is to not have a trailing comma. Or maybe I imagined that because I personally don't like them 😉
@rafaelfranca @eileencodes Updated with your feedback in mind. I also decided to move the log calls inside |
logger.error "Retrying #{self.class} in #{wait} seconds, due to a #{error.class}. The original exception was #{error.cause.inspect}." | ||
retry_job wait: determine_delay(wait), queue: queue, priority: priority | ||
ActiveSupport::Notifications.instrument("enqueue_retry.active_job", payload) do | ||
retry_job wait: determine_delay(wait), queue: queue, priority: priority |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should retry_job
publish this notification?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a good question. My thought about this was that if you do the retry manually you don't need to be notified given you know you did the retry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. I just think from a UX perspective it seems weird that calling retry_job
doesn't publish the enqueue_retry
notification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we move ActiveSupport::Notifications.instrument("enqueue_retry.active_job", payload)
to retry_job
?
job: self, | ||
adapter: self.class.queue_adapter, | ||
error: error, | ||
wait: wait |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add :wait
to "enqueue_retry.active_job" table in guides/source/active_support_instrumentation.md
?
Add mention about it to the "Active Support Instrumentation" guide Related to rails#33751 (comment) Follow up rails#33751
Also this commit removes `:wait` from payload of `retry_stopped.active_job`. Related to rails#33751 (comment) Follow up rails#33751 /cc @kaspth, @rafaelfranca
Since rails#33751 was added `enqueue_retry.active_job` instrumentation to the `retry_on` method, then rails#33897 moved the instrumentation to `retry_job` method in order to ensure that this method publish `enqueue_retry.active_job` notification too. See related discussion rails#33751 (comment) Since `enqueue_retry.active_job` moved to `retry_job`, there is no guarantee that payload of `enqueue_retry.active_job` would have `:error`. See test `LoggingTest#test_enqueue_retry_logging_on_retry_job` as example of that case. Related to rails#33897 (comment) I think we can improve notification of `enqueue_retry.active_job`: - If there is no `event.payload[:error]`, then publish like "Retrying RescueJob in 3 seconds." only. - If `event.payload[:wait]` is `nil`, then publish "Retrying RescueJob in 0 seconds." instead of "Retrying RescueJob in nil seconds." - If there is `event.payload[:error]`, then publish "Retrying RescueJob in 3 seconds, due to a DefaultsError.". - Change the type of the message from `error` to `info.` Also, this commit removes part of messages - "The original exception was #{ex.cause.inspect}." of `enqueue_retry.active_job`, `retry_stopped.active_job`, and `discard.active_job` since I haven't found it useful. Please let me know whether you agree with that?
Summary
This started life as #33740 but it was decided that this would be the better implementation for now. The goal here is to add a few additional hooks to ActiveJob that can be used to perform logging and monitoring around retries and discards. Right now the only two options to do so are to implement some version of your own
retry_on
anddiscard_on
or do the monitoring on each individual job.Other Information
Because the notification is sent off after the block is executed I only include the retry stopped case where a block was provided. Since the other case just re-raises the exception I figured whatever exception logging you have in place should handle that case. I'm open to suggestions to implement that differently though.