Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel testing #31900

Merged
merged 1 commit into from Feb 16, 2018
Merged

Parallel testing #31900

merged 1 commit into from Feb 16, 2018

Conversation

eileencodes
Copy link
Member

@eileencodes eileencodes commented Feb 5, 2018

@tenderlove and I worked on this which adds parallel testing to Rails applications by default. New applications will have parallel testing enabled by default, and older applications can add it to their test helper:

class ActiveSupport::TestCase
  parallelize(workers: 2)
end

Parallel testing in this implementation utilizes forking processes over threads. The reason we (tenderlove and eileencodes) chose forking processes over threads is forking will be faster with single databases, which most applications will use locally. Using threads is beneficial when tests are IO bound but the majority of tests are not IO bound. NOTE: after some experimentation we added a threaded parallelization method as well, but forked processes are still the default.

If an application doesn't want to use parallel testing they can either remove the parallelize block from the test application or set PARALLEL_WORKERS to 1 or fewer. For environments where you want to change the default number of workers from what you've set in your test_helper.rb you can export / set an environment variable to change the number of workers used. The following will use 15 workers and split the tests into 15 processes.

PARALLEL_WORKERS=15 bin/rails test

Note: While parallel testing will work with multiple primary databases, it currently doesn't rollback fixtures correctly. I'm actively working on fixing that but decided it was out of scope for this particular feature, since fixing it is not a feature of parallel testing but rather a bug / inconsistency in how Rails is handled. The fix for that should be coming shortly. Parallel testing and multiple primary databases does work with tests if not using fixtures. I'm not sure why I thought this but I just tested it locally again and the fixtures work. I think I had a bug in my setup the last time I tested this.

If you have multiple databases they can be setup like this in your test_helper.rb

class ActiveSupport::TestCase
  parallelize(workers: 2)

  parallelize_setup do |worker|
    # create a db w/ worker. Runs after processes are forked
  end

  parallelize_teardown do |worker|
    # delete the test databases or other cleanup. Runs before processes are closed
  end
end

To do:

  • Documentation
  • Guides
  • CHANGELOG

cc/ @tenderlove @dhh

@eileencodes eileencodes added this to the 6.0.0 milestone Feb 5, 2018
@eileencodes eileencodes self-assigned this Feb 5, 2018
@tenderlove
Copy link
Member

This looks great! 3 things off the top of my head:

  1. I'm not sure if Windows supports unix sockets. We should test that and possibly opt-out Windows folks.
  2. JRuby definitely doesn't support fork. We might be able to get this to work with Process.spawn out of the box, but maybe we should opt-out JRuby folks for now too?
  3. I like the class-level parallelize API for setting the number of workers, but do we care if people call the method more than once? (I think the answer is "no", but I just want to check)

@nynhex
Copy link

nynhex commented Feb 5, 2018

This is great work! Thanks @eileencodes @tenderlove ❤️

@jasl
Copy link
Contributor

jasl commented Feb 6, 2018

It seems Windows 10 upcoming RS4 release will support unix sockets partially, see https://blogs.msdn.microsoft.com/commandline/2017/12/19/af_unix-comes-to-windows/

@metacritical
Copy link

Does anyone use rails on Windows either development or production? MRI is still large i dont know anyone using JRUby on production. Opting them out for this release is OK.

@GBH
Copy link
Contributor

GBH commented Feb 6, 2018

Is this a replacement for https://github.com/grosser/parallel_tests ?

@eileencodes
Copy link
Member Author

@metacritical I don't want to discount users of JRuby or Windows, however I think since we're a ways out from Rails 6.0 there's a ton of potential for this feature to evolve. I'd love to see a way to use threads over processes if someone finds that useful. I don't think it's necessary for the first iteration though. Once we have an API nailed down and this merged we can iterate on it and add a threads feature.

@GBH yes technically it replaces it, but not because we think the gem is doing anything incorrect. We're not personally using it at GitHub but we wrote our own implementation based on how we parallelize our own tests.

@url = "drbunix://#{file}"
@pool = []

DRb.start_service(@url, @queue)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should generate uri by DRb in order to guarantee uniqueness. See #31591.


require "drb"
require "drb/unix"
require "tempfile"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like we can remove require "tempfile" since we don't use Tempfile in this file.

ActiveRecord::Tasks::DatabaseTasks.create(connection_spec)
ActiveRecord::Base.establish_connection(connection_spec)
if ActiveRecord::Base.connection.migration_context.needs_migration?
ActiveRecord::Tasks::DatabaseTasks.migrate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should always execute (and rely on) ActiveRecord::Tasks::DatabaseTasks.migrate to ensure full setup for test database since this creates tables like schema_migrations, ar_internal_metadata etc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They'll always need a migration because they'll always be brand new. I think however I'm going to drop this implementation and load up the structure/schema instead because it will be faster.

ActiveRecord::Tasks::DatabaseTasks.migrate
end
ensure
ActiveRecord::Base.establish_connection(ActiveRecord::Base.configurations[Rails.env])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is just question: can't realize why do we need to re-establish the connection?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to re-establish the original connection to AR Base since we want the test connection, not the other db connection. Otherwise the tests will try to all run against AR Base with the other db connection, not the test connection.

require "active_support/testing/parallelization"

module ActiveRecord
module TestDatabases
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should express this module as private api by # :nodoc:

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't decided yet if I want to make this private or document it. I think it could be useful for setting up test dbs when you have multiple databases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decided to no doc this for now. Easier to document later than undocument.

@alissonbrunosa
Copy link

alissonbrunosa commented Feb 7, 2018

I'm afraid this problem is beyond the scope, but is there any way to define whether a test should run on threads or processes? For instance, if I had a test that is IO bound, it would be awesome to run on threads.
I don’t know, maybe this will solve the JRuby’s problem with fork as well.

@yeongsheng-tan
Copy link

Nice and sweet. 👏💖

end
end

def <<(o)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What not use delegator for << and pop methods?

end
end

def <<(o)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering what o is — object? Is there some name we could give it that'd help describe what it's intended to be?

@pangolingo
Copy link

Sometimes I set breakpoints in tests with Pry or Byebug. I assume the might not work with multiple processes. Could we add a note to the docs mentioning that? (Sorry I haven't had a chance to test this for myself.)

@yuki24
Copy link
Contributor

yuki24 commented Feb 8, 2018

@pangolingo when debugging with pry or byebug you should probably run a single test rather than running all of them. But at least byebug works well in a multi-threaded environment (pauses other threads when it hits byebug and resumes upon continue). I'm not sure how it behaves in a forked process.

end

# Cleanup required for parallel testing. This can be used to drop databases
# if you're app uses multiple write/read databases or other clean up before
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're => your 😅

end
```

The number of workers passes is the number of times the process will be forked. You may want to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

passes => passed

process. The databases will be suffixed with the number corresponding to the worker. For example, if you
have 2 workers the tests will create `test-database-0` and `test-database-1` respectively.

If the number of workers passes is 1 or fewer the processes will not be forked and the tests will not
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

passes => passed

@eileencodes
Copy link
Member Author

Decided to add a threaded parallelizer after all. JRuby apps will automatically be generated using the threaded one. If you want to use threads just add with: :threads as a keyword argument.

I've updated docs, guides, and added a changelog.

connection_spec["database"] += "-#{i}"
ActiveRecord::Tasks::DatabaseTasks.create(connection_spec)
ActiveRecord::Base.establish_connection(connection_spec)
ActiveRecord::Tasks::DatabaseTasks.migrate
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I plan on replacing this with a structure load or a straight copy of the database later on but currently structure load doesn't work with multiple databases so sticking with migrate for now.

@eileencodes eileencodes changed the title WIP: Parallel testing Parallel testing Feb 15, 2018
@eileencodes eileencodes force-pushed the parallel-testing branch 2 times, most recently from 548f1de to 313a2a6 Compare February 15, 2018 20:12
Provides both a forked process and threaded parallelization options. To
use add `parallelize` to your test suite.

Takes a `workers` argument that controls how many times the process
is forked. For each process a new database will be created suffixed
with the worker number; test-database-0 and test-database-1
respectively.

If `ENV["PARALLEL_WORKERS"]` is set the workers argument will be ignored
and the environment variable will be used instead. This is useful for CI
environments, or other environments where you may need more workers than
you do for local testing.

If the number of workers is set to `1` or fewer, the tests will not be
parallelized.

The default parallelization method is to fork processes. If you'd like to
use threads instead you can pass `with: :threads` to the `parallelize`
method. Note the threaded parallelization does not create multiple
database and will not work with system tests at this time.

parallelize(workers: 2, with: :threads)

The threaded parallelization uses Minitest's parallel exector directly.
The processes paralleliztion uses a Ruby Drb server.

For parallelization via threads a setup hook and cleanup hook are
provided.

```
class ActiveSupport::TestCase
  parallelize_setup do |worker|
    # setup databases
  end

  parallelize_teardown do |worker|
    # cleanup database
  end

  parallelize(workers: 2)
end
```

[Eileen M. Uchitelle, Aaron Patterson]

connection_spec = ActiveRecord::Base.configurations[spec_name]

connection_spec["database"] += "-#{i}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about underscore "_#{i}" to keep file based databases using underscores in file names?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're temporary databases, unless it's going to break file based dbs I don't think underscore vs dash is a big deal.

@eileencodes eileencodes merged commit 7286d81 into rails:master Feb 16, 2018
@eileencodes eileencodes deleted the parallel-testing branch February 16, 2018 13:09
@IgorDobryn
Copy link

Awesome!

@Fudoshiki
Copy link
Contributor

Fudoshiki commented Feb 21, 2018

@eileencodes
In rails 5.2

# frozen_string_literal: true

ENV['RAILS_ENV'] ||= 'test'
require_relative '../config/environment'
require 'rails/test_help'
require 'rubocop/rake_task'

RuboCop::RakeTask.new

class ActiveSupport::TestCase
  # Run tests in parallel with specified workers
  parallelize(workers: 2)

  # Add more helper methods to be used by all tests here...
end

rails t show failed rubocop tests

Now

# frozen_string_literal: true

ENV['RAILS_ENV'] ||= 'test'
require_relative '../config/environment'
require 'rails/test_help'
require 'rubocop/rake_task'

RuboCop::RakeTask.new

class ActiveSupport::TestCase
  # Add more helper methods to be used by all tests here...
end

rails t ignoring RuboCop::RakeTask.new

How use that now?

@eileencodes
Copy link
Member Author

@Fudoshiki can you open a new issue explaining the problem you're having? From your comment I don't understand what the issue is. Thanks!

when :threads
Minitest::Parallel::Executor.new(workers)
else
raise ArgumentError, "#{with} is not a supported parallelization exectutor."

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this exectutor is a typo.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was fixed by a4e226f

@johnvross
Copy link

Someone asked if anyone is using windows for rails development. Yes. I work for a state agency and use the windows subsystem for linux on windows for all development since the fall creators update and they fixed filewatchers. Just FYI we are out here.

@chrishough
Copy link

Does this apply to both rspec and minitest?

@eileencodes
Copy link
Member Author

Hey @johnvross I know you're out there and I'm sorry someone asked that question as it's not an opinion the core team holds. I think you should be supported through the threaded parallelizer, but I'm not sure the Unix sockets that dRB relies on will work for you.

@chrishough this is using Minitest's parallel executor so I don't think it will work for rspec per say but we wrote the API in such a way that it's easy for us to add support for another parallelizer. This feature is still very new and it will be awhile before Rails 6 is released.

@chrishough
Copy link

Thanks @eileencodes. I was hoping it would replace https://github.com/grosser/parallel_tests, and I am definitely curious to see how this plays out.

@WaKeMaTTa
Copy link

@eileencodes if we use rspec or cucumber how we can use this feature?

@thepeoplesbourgeois
Copy link

thepeoplesbourgeois commented Jul 16, 2018

@WaKeMaTTa I think that's the topic of the comments a few notes above yours; I think an adapter for rspec/cucumber needs to be written for this parallelizer

@WaKeMaTTa
Copy link

@thepeoplesbourgeois you are right. Thanks

yahonda added a commit to yahonda/rails that referenced this pull request Aug 27, 2019
Rails 6.0 introduces parallel testing and the default degree of parallelism is configured based on the number of CPU.
refer rails#34735 rails#31900

When any minitest executed under the OS where 2 or more CPU available,
SQLite 3 database files are not git-ignored.

Also updated other files which ignores SQLite database files.

* Steps to reproduce

```
$ git clone https://github.com/yahonda/rep_ignore_sqlite3_databases.git
$ cd rep_ignore_sqlite3_databases/
$ bin/rails test
$ git status
```

* Expected behavior:

- No `Untracked files:`

* Actual behavior:

- SQLite 3 database files appeared

```
$ git status
On branch master
Untracked files:
  (use "git add <file>..." to include in what will be committed)

	db/test.sqlite3-0
	db/test.sqlite3-1
	db/test.sqlite3-2
	db/test.sqlite3-3
	db/test.sqlite3-4
	db/test.sqlite3-5

nothing added to commit but untracked files present (use "git add" to track)
$
```
@atef-ds
Copy link

atef-ds commented Sep 25, 2019

@eileencodes i have an issue with this solution:
sometimes we need to populate our test db with fake data before running the test => with parallel test our new db test are empty and we lost our fake data...
To fix it, i did:

  1. my original db called "db_test" (contains some fake data),
    i add a new one calle db_test_template: createdb -T db_test db_test_template
  2. in test_helper:
parallelize(workers: 4)
parallelize_setup do |worker|
    configuration = ActiveRecord::Base.connection_config
    #delete db
    ActiveRecord::Tasks::DatabaseTasks.drop configuration.stringify_keys
    ActiveRecord::Base.establish_connection(configuration)

    configuration[:template] = ENV["POSTGRES_TEMPLATE"]
   #create a new db 
    ActiveRecord::Tasks::DatabaseTasks.create(configuration.stringify_keys)
  # now i have a db with data that i populated before in the original db
end

To run test :
POSTGRES_TEMPLATE=db_test_template rails test -v it works well for me
@eileencodes can we considerate somehow that we need to keep data that we populated before?

sanemat added a commit to sanemat/rails-boilerplate-mysql that referenced this pull request Dec 20, 2020
jyoun-godaddy pushed a commit to jyoun-godaddy/activestorage that referenced this pull request Jul 5, 2022
Rails 6.0 introduces parallel testing and the default degree of parallelism is configured based on the number of CPU.
refer rails/rails#34735 rails/rails#31900

When any minitest executed under the OS where 2 or more CPU available,
SQLite 3 database files are not git-ignored.

Also updated other files which ignores SQLite database files.

* Steps to reproduce

```
$ git clone https://github.com/yahonda/rep_ignore_sqlite3_databases.git
$ cd rep_ignore_sqlite3_databases/
$ bin/rails test
$ git status
```

* Expected behavior:

- No `Untracked files:`

* Actual behavior:

- SQLite 3 database files appeared

```
$ git status
On branch master
Untracked files:
  (use "git add <file>..." to include in what will be committed)

	db/test.sqlite3-0
	db/test.sqlite3-1
	db/test.sqlite3-2
	db/test.sqlite3-3
	db/test.sqlite3-4
	db/test.sqlite3-5

nothing added to commit but untracked files present (use "git add" to track)
$
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet