bossylobster: The Lesson V8 Can Teach Python and Other Dynamic Languages

Wednesday, August 17, 2011

The Lesson V8 Can Teach Python and Other Dynamic Languages

Being unable to completely give up math for computers, I am naturally drawn to Project Euler and as a result solved a ridiculous number of the problems posted there while learning Python. A few months ago (March 13), after reading Secrets of a Javascript Ninja, I decided to begin converting my solutions to Javascript. A month and a half later I came back to it, and then finally two months after that, I began to take it seriously.

After making this decision, I noticed the prime Sieve of Eratosthenes was mighty fast when I ran it in Chrome, maybe even faster than my beloved Python. I tabled the thought for a little, but never really forgot it. So a few weeks ago (early August 2011), I finally got a working install of node running on my machine and was able to make more of this thought. (I say finally installed because on two previous tries I gave up because of conflicts with my version of gcc, coupled with the fact that I had no good reason to use node.)

When I originally did the conversion, I had skipped problem 8, because my implementation required pulling in the problem data as text from a file. While hanging out with Boris and Eric from on the Chrome Developer Relations team, I decided to give it another go on node (xhr requests not allowed) and found it to be quite simple with readFileSync in the node native fs module. After witnessing this, over this weekend, I decided to harness the power of V8 -- the Javascript engine that powers Chrome and node -- and run all my scripts locally with node. So over a two day period, I hack-hack-hacked my way into converting the Python solutions for problems 11 through 50 (the remaining unconverted) into their Javascript equivalents, while also converting a good portion of my hefty functions module.

Once this was done, I had also found I could replace most of the nice parts about Python with my own equivalent. For example, I was able to replace functionality I needed from the Python set datatype with

function uniq(arr) {
  var result = {};
  for (var i = 0, val; val = arr[i]; i++) {
    result[val] = true;
  }

  return Object.keys(result);
};

and I was able to replace the (amazingly) useful Python handling of long integers with a non-native node package called bigint that uses libgmp among other usings. Of course, for Python's secret sauce -- the list comprehension -- I was able to substitute enough filter, reduce and map statements to almost make it seem like I had never left Pythonland. After doing all this, I also ended up writing my own operator.js to replace the wonderful Python native module operator, and my own timer.js to stand in for the Python native module time.

Finally, I had working code and could do a side by side comparison of V8 and the Python interpreter. Update: I added a column for PyPy, a just in time implementation of Python. Here is what I found (averaging the runtime over 10 separate calls to each function, the results are):

Problem	Answer	Python	Javascript	Ratio (PY/JS)	PyPy
1*	233168	1804ms	1215ms	1.48	385ms
2*	4613732	247ms	102ms	2.42	85ms
3*	6857	4725ms	1508ms	3.13	582ms
4	906609	8708ms	149ms	58.44	282ms
5*	232792560	136ms	186ms	0.73	114ms
6*	25164150	10ms	4ms	2.50	6ms
7	104743	656ms	12ms	54.67	11ms
8*	40824	18045ms	5014ms	3.60	7042ms
9	31875000	610ms	3ms	203.33	8ms
10	142913828922	6628ms	167ms	39.69	116ms
11	70600674	49ms	2ms	24.50	11ms
12	76576500	5127ms	203ms	25.26	100ms
13*	5537376230	1795ms	10710ms	0.17	1423ms
14	837799	5572ms	1712ms	3.25	362ms
15*	137846528820	54ms	18ms	3.00	55ms
16*	1366	1844ms	265ms	6.96	462ms
17	21124	87ms	4ms	21.75	7ms
18*	1074	2291ms	1790ms	1.28	1090ms
19*	171	2254ms	336ms	6.71	342ms
20*	648	1061ms	9154ms	0.12	374ms
21	31626	18910ms	1038ms	18.22	728ms
22	871198282	188ms	7ms	26.86	8ms
23	4179871	83318ms	1120ms	74.39	1295ms
24*	2783915460	206ms	210ms	0.98	139ms
25	4782	5865ms	35ms	167.57	232ms
26	983	28ms	18ms	1.56	4ms
27	-59231	645738ms	22536ms	28.65	28288ms
28*	669171001	8509ms	1037ms	8.21	981ms
29	9183	184ms	96ms	1.92	20ms
30	443839	52167ms	1037ms	50.31	877ms
31	73682	9606ms	257ms	37.38	154ms
32	45228	206888ms	12096ms	17.10	4266ms
33	100	300ms	6ms	50.00	15ms
34	40730	7462ms	2447ms	3.05	247ms
35	55	8617ms	848ms	10.16	242ms
36	872187	189788ms	2183ms	86.94	3532ms
37	748317	2389022ms	71845ms	33.25	61551ms
38	932718654	506ms	10ms	50.60	12ms
39	840	178ms	6ms	29.67	12ms
40*	210	326ms	202ms	1.61	119ms
41	7652413	2627ms	133ms	19.75	65ms
42	162	65ms	7ms	9.29	8ms
43	16695334890	38ms	2ms	19.00	2ms
44	5482660	384013ms	27744ms	13.84	6621ms
45*	1533776805	17ms	4ms	4.25	8ms
46	5777	2864ms	202ms	14.18	65ms
47	134043	400967ms	12838ms	31.23	4425ms
48	9110846700	46ms	16ms	2.88	6ms
49	296962999629	115ms	8ms	14.38	13ms
50	997651	3277ms	80ms	40.96	51ms
*These were very quick to run, so the runtimes are the time taken to run 10000 times.

As you'll notice, standard Python gets its butt kicked. I was kind of saddened by this, but in the end, just giddy that our web is faster because of it (90% of my life is digital) and also that we can do scripting faster on the server side (attribute to Boris Smus) because of the node project (thanks Ryan Dahl).

Standard Python is actually slower in 46 of the 50 problems. In 28 of the 46 node is faster by a factor of 10 or greater, in 9 of those 28 by a factor of 50 or greater and in 2 of the 9 by a factor of 100 or greater! The only 4 in which Python was faster were from the n = 10000 sample. In fact, I was able to pinpoint exactly why:

#5 - My own Javascript implementation of gcd is slower than the native (from fractions import gcd) Python library (resulting in a difference of 50 ms over 10000 iterations)
#13 - The node package bigint is slower than the Python native long int (Javascript is slower by a factor of 6)
#20 - The node package bigint is slower than the Python native long int (Javascript is slower by a factor of 8.5)
#24 - Having to perform two slices is slower in Javascript than in Python and there is no good way to just remove one element (resulting in a difference of 4 ms over 10000 iterations; a little bit about that here)

So what, you ask, is that lesson I speak of? Well, Javascript didn't used to be this fast. How did it get that way? The brilliant and inspired people behind V8 rethought the Javascript compile steps and after much work, we now have code that is closer to the metal (attribute to: Eric Bidelman, i.e. closer to machine code) than we had ever had before. The use of just-in-time compilation and other incredible techniques has taken a formerly slow and clunky language (Javascript) which was used well beyond its original intended scope, and turned it into a damn fast dynamic language. Hopefully, this movement will make its way to Python and other dynamic languages and we can all have our code end up this close to the metal.

Update: In response to the comments, I ran the same code on the same machine, but with PyPy in place of Python. This is the direction I hope standard Python goes in and commend the guys pumping out faster and faster just in time implementations over at PyPy. I went through and counted 20 node wins, 29 PyPy wins and 1 tie. (I reserve the right to update the post with more detailed metrics.) While I do commend them, the results don't really belong in this post because PyPy is still an offshoot. (However, as I understand, they both have ties to C, as PyPy uses GCC to compile to bytecode and V8 is written in C++. Feel free to supplement my knowledge in the comments.)

Update: All benchmarking was run on my Mac Pro Desktop with a 3.2 GHz Quad-Core Intel Xeon processor and 4 cores for a total of 12 GB 1066 MHz DDR3 memory. I used Python version 2.6.1, node version 0.4.9, and PyPy version 1.5 (running on top of Python 2.7.1 with GCC 4.0.1).

29 comments:

Chris CoughlinAugust 17, 2011 at 11:05 AM
It'd be interesting to see what PyPy could make of this.
ReplyDelete
Replies
MattAugust 17, 2011 at 11:07 AM
I'd love to see these run on PyPy.
ReplyDelete
Replies
MattAugust 17, 2011 at 11:07 AM
Ha! Sorry for the double post, just refreshed the page.
ReplyDelete
Replies
UnknownAugust 17, 2011 at 11:18 AM
I'll give it a run and update the post (when I get home from work)
ReplyDelete
Replies
AnonymousAugust 17, 2011 at 12:05 PM
Wow, I never thought to compare the execution speed of a language like JS and Py. I'm quite impressed. Google and the open source community (Chromium) has done some serious work, huh?
ReplyDelete
Replies
UnknownAugust 17, 2011 at 12:09 PM
Indeed
ReplyDelete
Replies
UnknownAugust 17, 2011 at 2:30 PM
Alright all ye commenters, post has been updated, PyPy benchmarking data has been included. (Feel free to benchmark yourselves by cloning the code from http://code.google.com/p/dhermes-project-euler/.)
ReplyDelete
Replies
Sidharth ShahAugust 17, 2011 at 3:54 PM
Interesting insights, thanks for posting. It goes with somewhat what Dropbox folks said -- "Make sure all your inner loops are in C".
ReplyDelete
Replies
Corbin SimpsonAugust 17, 2011 at 4:27 PM
Hi,

I don't think you understand how PyPy works -- PyPy turns Python directly into native code, without GCC. GCC (or another C compiler) is only required to compile PyPy itself, because C compilers have more practice turning C into machine code than PyPy; there is no C compiler involved at runtime.
ReplyDelete
Replies
UnknownAugust 17, 2011 at 4:35 PM
You are quite right, I did not understand that. I will update appropriately. That is exciting news! Do you happen to know if the PyPy people have any chances of getting JIT into standard Python?
ReplyDelete
Replies
carpeliamAugust 17, 2011 at 5:02 PM
I wonder how a CoffeeScript implementation would compare: whether it'd be equal to the JavaScript results, or whether it'd be slightly faster/slower on average. I imagine that CoffeeScript would be an easier conversion from Python, and perhaps just easier to program in over all.
ReplyDelete
Replies
UnknownAugust 17, 2011 at 5:30 PM
CPython (aka 'standard Python') doesn't really have the right architecture to support full-fledged JIT, so efforts along those lines (first psyco, then Unladen Swallow) have ultimately run out of steam.

So, with PyPy 1.5 offering 2.7 compatibility for almost all pure Python code and cpyext offering tolerable compatibility for many C extension modules, we point people seeking more speed in PyPy's direction instead.
ReplyDelete
Replies
jellAugust 18, 2011 at 3:54 AM
now try nightly pypy builds, and that test should have around 15% better timings.
ReplyDelete
Replies
LuisAugust 18, 2011 at 5:40 AM
"...the results don't really belong in this post because PyPy is still an offshoot".

Actually, V8 should be compared with Pypy, since both are just-in-time virtual machines.
Comparing V8 to Cpython is like comparing oranges to apples.
By the way, Pypy is feature complete and mature enough to be used in production (for example, Quora is running on pypy).
ReplyDelete
Replies
UnknownAugust 18, 2011 at 8:37 AM
...However, as I understand, they both have ties to C, as PyPy uses GCC to compile to bytecode and V8 is written in C++

@Luis, I am not trying to misspeak, as I really don't know much about PyPy. What is the best way to say it? I agree that PyPy and V8 are closer to siblings than they are to distant cousins or unrelated. In fact, "The Lesson V8 Can Teach Python and Other Dynamic Languages" has already been digested and run very far by the PyPy people. What I'm really trying to say is V8 is used by Chrome, which now has over 160 million users, so it is definitely in the main. In contrast, though PyPy may be used by quora, it is not part of the standard distribution of Python. (I know this may sound silly because there are multiple mainstream JS engines vs. only a single point of truth for Python distributions.)
ReplyDelete
Replies
UnknownAugust 18, 2011 at 8:41 AM
@Nick

Instead of building JIT into Python the way psyco tries/tried, why not take a more radical approach in the way of PyPy? How much compatibility is actually broken? Is there anything wrong with consulting with PyPy or any other project working on JIT for Python and merging in their features for 3.3/3.2.2/etc.
ReplyDelete
Replies
jellAugust 19, 2011 at 1:29 AM
merge was impossible, that's the reason of external project - pypy
it's not about broken compatibility, it's something more in term of rewriting in useable for jit way.

best way to say it? gcc is used _only_ in process of hmm... production pypy binary (end of translation process, in pypy terminology). in time of code execution binary code is produced internally in pypy without any external util.

btw. sorry for my engrish
ReplyDelete
Replies
LuisAugust 19, 2011 at 6:02 AM
By the way, Pypy 1.6 "Kickass Panda" has just been released with some performance improvements. It would be great to see your updated benchmarks... Is it possible?
ReplyDelete
Replies
UnknownAugust 19, 2011 at 11:28 AM
Luis,

See a side-by-side here for the two versions of PyPy:
http://code.google.com/p/dhermes-project-euler/source/browse/side_by_side_PyPy.txt

Are you proposing I use the 1.6 data instead of the 1.5 data in the table above?
ReplyDelete
Replies
LuisAugust 19, 2011 at 7:54 PM
"Are you proposing I use the 1.6 data instead of the 1.5 data in the table above?"

Well, yes, It would be great to see the newest release data to see if there were any visible improvements on these benchmarks. But I don't want to bother you... :-)
ReplyDelete
Replies
UnknownAugust 20, 2011 at 9:51 AM
It's no bother, in fact I've done it already (and posted it at the link from my previous post http://code.google.com/p/dhermes-project-euler/source/browse/side_by_side_PyPy.txt). The improvements aren't all that noticeable (I'm sure the upgrades were intended for more than just loops and big lists) so I'm going to leave the original PyPy 1.5 results up.
ReplyDelete
Replies
LuisAugust 20, 2011 at 1:36 PM
Thank you!
ReplyDelete
Replies
UnknownSeptember 26, 2011 at 12:50 AM
wow that is one heck of a spec you've got for your machine.
ReplyDelete
Replies
Leonardo SantagadaOctober 13, 2011 at 6:52 AM
There is a new pypy release around the corner, if you can run the benchmark with a nightly[1] and maybe open some bug reports[2] with the slowest examples it would be awesome. Also it is interesting to note that pypy is faster on linux x86_64 than mac os x. As I never run anything in production on OSX I think the benches would be more relevant if you ran them on linux (I also use OSX on my dev machine).

[1] http://buildbot.pypy.org/nightly/trunk/
[2] https://bugs.pypy.org/
ReplyDelete
Replies
UnknownOctober 13, 2011 at 10:11 AM
Leonardo,

I did a new side by side and the results are uploaded to the project site [1]. Every machine I am running linux on is at least four years old, so I ran it on my OS X Desktop (as all other benchmarks). Note: the times are averages over 10 runs, except the problems marked with * are simply the total time for 10000 runs. If you'd like to set up an automated script, the whole source (including the benchmark suite [2]) can be cloned [3].

As for the actual results from [1], there are no noticeable slowdowns, but there are some noticeable speedups (some by a factor of 2). So, no bugs to file for now.

[1] http://code.google.com/p/dhermes-project-euler/source/browse/side_by_side_PyPy1.6vNightly.txt
[2] http://code.google.com/p/dhermes-project-euler/source/browse/python_code/benchmark.py
[3] http://code.google.com/p/dhermes-project-euler/source/checkout
ReplyDelete
Replies
AnonymousJanuary 26, 2013 at 3:38 PM
since benchmarks are always fun, how about you add a column for python-numba? Numba translates roughly-annoted Python (nothing like, and fully fallback-compatible unlike, Cython) into LLVM, which is then marvelously JIT-compiled into machine code.
Initial tests show the improvements of the same order of magnitude as Cython, with far less additional typing...
ReplyDelete
Replies
UnknownJanuary 26, 2013 at 5:36 PM
Feel free to :)
ReplyDelete
Replies
erikFebruary 13, 2013 at 3:24 AM
I wonder if you could do an update with either PyPy 1.9 or 2.0.
ReplyDelete
Replies
UnknownFebruary 13, 2013 at 7:39 AM
The code is open source.

All you need do is run:
https://code.google.com/p/dhermes-project-euler/source/browse/python/benchmark.py
ReplyDelete
Replies

Add comment

New comments are not allowed.