Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize CGI.escapeHTML for ASCII-compatible encodings #1164

Closed
wants to merge 4 commits into from

Conversation

k0kubun
Copy link
Member

@k0kubun k0kubun commented Dec 20, 2015

As commented in #156 (comment), I rewrote CGI.escapeHTML in C, which is used by ERB::Util#html_escape.
Since escaping HTML is expensive in rendering a template, I want it to be faster.
For now, I optimized it only for strings whose encoding is ASCII-compatible.

With this benchmark https://gist.github.com/k0kubun/b6af6062bc876190e280, it's about 7 times faster than original implementation in escaping html.

$ ruby bench_escape_html.rb
Calculating -------------------------------------
              before    11.448k i/100ms
               after    31.189k i/100ms
-------------------------------------------------
              before    216.403k (± 7.0%) i/s -      2.152M
               after      1.637M (±10.0%) i/s -     16.125M

Comparison:
               after:  1637408.5 i/s
              before:   216403.5 i/s - 7.57x slower

static VALUE rb_cCGI, rb_mUtil, rb_mEscape;

static VALUE
html_escaped_cat(VALUE str, char c)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function returns no values.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. Fixed in k0kubun@39cdd83.


if (modified) {
rb_str_cat(dest, cstr + beg, len - beg);
return dest;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the encoding of str is ignored and dest is always ASCII-8BIT.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it to associate original encoding and added test case for it in k0kubun@2162835.

@knu
Copy link
Member

knu commented Dec 24, 2015

How much is the final performance gain after fixing the return value not being duped when no replacement takes place?

@k0kubun
Copy link
Member Author

k0kubun commented Dec 24, 2015

You can check performance gain with this benchmark https://gist.github.com/k0kubun/8e1c7efb1e29991e1382.

This is the result of ruby compiled from latest revision b58b970.

$ ruby bench_escape.rb "'&\"<>"
Escape: '&"<>
Calculating -------------------------------------
              before    15.457k i/100ms
               after    63.931k i/100ms
-------------------------------------------------
              before    199.975k (± 5.1%) i/s -      1.005M
               after      1.453M (± 5.8%) i/s -      7.288M

Comparison:
               after:  1453098.6 i/s
              before:   199974.8 i/s - 7.27x slower

$ ruby bench_escape.rb "hello world"
Escape: hello world
Calculating -------------------------------------
              before    56.973k i/100ms
               after   100.344k i/100ms
-------------------------------------------------
              before      1.474M (± 6.1%) i/s -      7.350M
               after      4.419M (± 7.3%) i/s -     21.975M

Comparison:
               after:  4419281.2 i/s
              before:  1473860.3 i/s - 3.00x slower

@knu
Copy link
Member

knu commented Dec 24, 2015

Thanks. So, the new implementation is 3x faster even in the worst cases. Great job!

mrkn pushed a commit to mrkn/ruby that referenced this pull request Apr 17, 2016
* cgi/escape/escape.c: Optimize CGI.escapeHTML for
  ASCII-compatible encodings.  [Fix rubyGH-1164]

git-svn-id: svn+ssh://svn.ruby-lang.org/ruby/trunk@53220 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
k0kubun added a commit to k0kubun/haml that referenced this pull request Feb 8, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
3 participants