New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for recurring tasks (cron style jobs) #155
Conversation
f0089cd
to
3ba1861
Compare
f2f10f5
to
974a112
Compare
require "active_job" | ||
require "active_job/queue_adapters" | ||
|
||
require "zeitwerk" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
I'm testing this branch with a frequent cart expiry job that runs every minute. My concern is the amount of noise generated in the jobs table. What are your thoughts on either having the option to add a condition to the task definition such as:
or a task specific version of the |
@klenis, this would depend on your job volume, but a job every minute would be 1,440 jobs per day and 10,080 jobs after one week. How would this compare to your current job volume? As a comparison, in HEY, the noise corresponding to recurring jobs is about ~2,000 per day, but that's negligible compared to regular jobs (over 10M / day). I think this might be the case for most users because the lowest time interval you can schedule jobs to run recurringly is 1 second. In case it helps, this is what we use on HEY to delete jobs that finished over 3 days ago: # config/application.rb
# Keep finished Solid Queue jobs for 3 days
config.solid_queue.clear_finished_jobs_after = 3.days And then as part of our recurring tasks: clear_solid_queue_finished_jobs:
class: "CronJob"
schedule: "42 * * * *"
args: "SolidQueue::Job.clear_finished_in_batches(batch_size: 1000)" Would something like this work for you? |
I guess the scheduled cleaner could be a viable solution. It would be nice to be able to pass Thank you for taking the time to respond and great job with Solid Queue 👏 |
Ohh, interesting idea! I hadn't thought about that as we didn't need that granularity when clearing jobs, but it's something I can certainly add 😊 Thank you! |
Hey @rosa 👋 Does BC run SolidQueue on a dedicated db? |
Yes! We use it for HEY only (for now), and it has its own DB that shares the hardware with the app's main DB. |
…g out We can have dispatcher processes that don't do concurrency maintenance.
It was always zero for the default polling interval, so it was doing nothing and we didn't even realise ^_^U
With the other boolean options.
In the dispatcher and the configuration.
Using concurrent-ruby's scheduled tasks. Each task schedules the next one, like GoodJob does. Add a simple test and allow dispatcher to be initialized without having to pass instantiated recurring tasks.
To avoid any confusion with Active Record's id.
To keep track of the jobs associated with each recurring task and to avoid creating duplicate ones.
…an once Only when the recurring job being enqueued is using Solid Queue as the adapter. This supports other adapters as well, but in that case we can't guarantee unique runs of the same task at the same time.
If we don't explicitly add a ruby2_keywords flag, Active Job will any hash included in the arguments array with keys as `_aj_symbol_keys`, and when deserialized, it'd be treated always as a hash argument instead of keyword arguments. Depending on the job, this might work fine, but if the job uses keyword arguments, trying to execute the job with deserialized arguments will fail. However, the opposite is not true: if the job accepts a hash argument and we pass a hash with the ruby2_keywords flag, it'll work just fine as Active Job will serialize that with keys as `_aj_ruby2_keywords`, so we take advantage of that to simplify the task definition and not having to distinguish between args and kwargs.
It'll be handy in Mission Control when we want to show the configured tasks because we need to aggregate them across dispatchers that might have different configurations.
For example, if we don't keep finished jobs around.
This is useful for those who decide not to have FKs that ensure recurring executions are deleted when jobs are cleared up, so they can just call this method periodically to clear orphaned executions.
Somehow I hadn't noticed that until now ^_^U
Make the loop be part of Poller. Allow to have other other Runnable processes that don't need an infinite loop. I'm still not super happy with these concerns. This needs more work that will come when I properly implement async mode. Right now this is all interleaved in the modules and it shouldn't be.
Thanks @rosa! We have similar job amounts/day--How big is your dedicated queue db? Did y'all consider sharing your main db? |
We did consider it, and in the beginning, when we started using Solid Queue in production, we were running it there (about ~1M jobs per day). We looked into how the write load would look like when moving all the jobs, compared it to the load from the application, and realised it'd be a little less than multiplying the existing write load by 2, leaving less margin for peaks. In the end, we decided to be cautious and moved it to its own DB, which shares the hardware with the main app's DB and other DBs, just a separate database, as we still had a lot of margin in terms of IOPS supported by our disks there, CPU and memory. |
@@ -265,3 +267,48 @@ Solid Queue has been inspired by [resque](https://github.com/resque/resque) and | |||
|
|||
## License |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be at the end?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, @brunoprietog thanks for spotting this! It should totally be at the end 😆
Am I missing something in the solid_queue readme? Not quite sure how you'd set solid_queue up to use a separate database to the one that stores rails model data. |
@n-at-han-k you can use the # Use a separate DB for Solid Queue
config.solid_queue.connects_to = { database: { writing: :solid_queue_primary, reading: :solid_queue_replica } } |
This PR introduces support for recurring (aka. cron-style) tasks. They can be included in the dispatcher's configuration as:
recurring_tasks
is a hash/dictionary, and the key will be the task key internally. Each task needs to have a class, which will be the job class to enqueue, and a schedule. The schedule is parsed using Fugit, so it accepts anything that Fugit accepts as a cron. You can also provide arguments to be passed to the job, as a single argument, a hash, or an array of arguments that can also include kwargs as the last element in the array.The job in the example configuration above will be enqueued every second as:
Tasks are enqueued at their corresponding times by the dispatcher that owns them, and each task schedules the next one. This is pretty much inspired by what GoodJob does.
It's possible to run multiple dispatchers with the same
recurring_tasks
configuration. To avoid enqueuing duplicate tasks at the same time, an entry in a newsolid_queue_recurring_executions
table is created in the same transaction as the job is enqueued. This table has a unique index ontask_key
andrun_at
, ensuring only one entry per task per time will be created. This only works if you havepreserve_finished_jobs
set totrue
(the default), and the guarantee applies as long as you keep the jobs around.Finally, it's possible to configure jobs that aren't handled by Solid Queue. That's it, you can a have a job like this in your app:
You can still configure this in Solid Queue:
and the job will be enqueued via
perform_later
so it'll run in Resque. However, in this case we won't track anysolid_queue_recurring_execution
record for it and there won't be any guarantees that the job is enqueued only once each time.This pull request also introduces a new configuration option for the dispatcher, to opt-out of concurrency maintenance, via
concurrency_maintenance: false
(it'strue
by default). You can have multiple dispatchers and choose that some of them do concurrency maintenance but not all of them, as well as one/some of them being in charge of dispatching recurring tasks but not all of them.Closes #104.
Pending: