Lost in JIT: 2012

Friday, December 7, 2012

Why I am no longer a voting member of the Python Software Foundation

I am from Europe. It's a place behind a big body of water from the United States, generally in the direction of the east. There are few key differencies, among other things, approach to democracy. We would typically have a large body of people making all important decisions (like a parliment) and a small body of people making more mundane decisions (like a government). In a typical scenario, the government has seriously less power than the parliment, however it's also much more agile, hence more suited for making quick decisions. A good example is budget - the government would create a budget that would then be voted by the parliment. As far as I understand, the idea is to not vote all the details, but instead create a budget that will be approved by the parliment.

The PSF is almost like this. There is a large body of people (PSF members) and a small body of people (PSF board). There is one crucial difference - the PSF members have power only on paper. The only voting that ever happens are for either broadening the powers of the board, voting for the board or for voting in new members. The board make all the actual decisions.

This is not to say that the board makes bad decisions - I seriously cannot pinpoint a single time where it did happen. I'm very happy with the board and with it's policies. However I don't feel I have any voting power. I perfectly understand why it is that way - the PSF members is a big group and even finding a way for everyone to vote in a reasonable manner would be a mission. As a European, I would think it's a mission worth trying though, but as of now I would stay as a non-voting member (also known as emeritus) and wait for the board to make a decision on everything.

Cheers,
fijal

Thursday, July 12, 2012

Call for a new open source economy model

DISCLAIMER: This post is incredibly self-serving. It only makes sense if you believe that open source is a cost-effective way of building software and if you believe my contributions to the PyPy project are beneficial to the ecosystem as a whole. If you would prefer me to go and "get a real job", you may as well stop reading here.

There is a lot of evidence that startup creation costs are plummeting. The most commonly mentioned factors are the cloud, which brings down hosting costs, Moore's law, which does the same, ubiquitous internet, platforms and open source.

Putting all the other things aside, I would like to concentrate on open source today. Not because it's the most important factor -- I don't have enough data to support that -- but because I'm an open source platform provider working on PyPy.

Open source is cost-efficient. As Alex points out, PyPy is operating on a fraction of the funds and manpower that Google is putting into V8 or Mozilla into {Trace,Jaeger,Spider}Monkey, yet you can list all three projects in the same sentence without hesitation. You would call them "optimizing dynamic language VMs". The same can be said about projects like GCC.

Open source is also people - there is typically one or a handful of individuals who "maintain" the project. Those people are employed in a variety of professions. In my experience they either work on their own or for corporations (and corporate interests often take precedence over open source software), have shitty jobs (which don't actually require you to do a lot of work) or scramble along like me or Armin Rigo.

Let me step back a bit and explain what do I do for a living. I work on NumPy, which has managed to generate slightly above $40,000 in donations so far. I do consulting about optimization under PyPy. I look for other jobs and do random stuff. I think I've been relatively lucky. Considering that I live in a relatively cheap place, I can dedicate roughly half of my time to other pieces of PyPy without too much trouble. That includes stuff that noone else cares about, like performance tools, buildbot maintenance, release management, making json faster etc., etc..

Now, the main problem for me with regards to this lifestyle is that you can only gather donations for "large" and "sellable" projects. How many people would actually donate to "reviews, documentation and random small performance improvements"? The other part is that predicting what will happen in the near future is always very hard for me. Will I be able to continue contributing to PyPy or will I need to find a "real job" at some point?

I believe we can come up with a solution that both creates a reasonable economy that makes working on open source a viable job and comes with relatively low overheads. Gittip and Kickstarter are recent additions to the table and I think both fit very well into some niches, although not particularly the one I'm talking about.

I might not have the solution, but I do have a few postulates about such an economical model:

It cannot be project-based (like kickstarter), in my opinion, it's much more efficient just to tell individuals "do what you want". In other words -- no strings attached. It would be quite a lot of admin to deliver each simple feature as a kickstarter project. This can be more in the shades of gray "do stuff on PyPy" is for example a possible project that's vague enough to make sense.
It must be democratic -- I don't think a government agency or any sort of intermediate panel should decide.
It should be possible for both corporations and individuals to donate. This is probably the major shortcoming of Gittip.
There should be a cap, so we don't end up with a Hollywood-ish system where the privileged few make lots of money while everyone is struggling. Personally, I would like to have a cap even before we achieve this sort of amount, at (say) 2/3 of what you could earn at a large company.
It might sound silly, but there can't be a requirement that a receipent must reside in the U.S. It might sound selfish, but this completely rules out Kickstarter for me.

The problem is that I don't really have any good solution -- can we make startups donate 5% of their future exit to fund individuals who work on open source with no strings attached? I heavily doubt it. Can we make VCs fund such work? The potential benefits are far beyond their event horizon, I fear. Can we make individuals donate enough money? I doubt it, but I would love to be proven wrong.

Yours, leaving more questions than answers,
fijal

Thursday, April 19, 2012

Call for a global Immigration Reform

I'm a technology nomad. We changed camels for high powered, fossil fuel burning
jets. I'm working from any place that has internet connection which is
typically within hundreds of meters from any physical location I happen to
be at. I create open source software that brings value to various people,
using mostly loose change and scraps from large corporations for a living.
It's not that much value, after all, who uses PyPy, but the important part
is the sign - it's a small, albeit positive change in the open source ecosystem
that in turn makes it cheaper to create software stacks which ends up in
young companies trying to make a dent in the universe. I'm a plumber fixing
your pipes, one of the many.

And I need a visa for that. I want to have a stamp in my passport that will
state all of the above and provide few clues as to what it actually means:

I will not stay in your country for very long.

The exact place does not matter at all to me - it's all one big internet.

I'll not seek employment at McDonalds and I have a pretty good track record,
go read my bitbucket contributions.

Open Source is software you run into everyday - and this is also because
of people like me.

And yet I'm failing. People running immigration are so detached from my reality
we don't even send postcards to each other. Every single border officer is
suspicious and completely confused as to why and how I do all of that.

How can we change it? How can we end the madness of pointless paperwork?

Cheers,

fijal

Tuesday, February 14, 2012

PyPy and its future challenges

Obviously I'm biased, but I think PyPy is progressing fairly well. However,
I would like to mention some areas where I think pypy is lagging ---
not living up to its promises or the design decisions simply didn't
turn out as good as we hoped for them. In a fairly arbitrary order:

Whole program type inference. This decision has been haunting
separate compilation effort for a while. It's also one of the reasons
why RPython errors are confusing and why the compilation time is so long.
This is less of a concern for users, but more of a concern for developers
and potential developers.

Memory impact. We never scientifically measured
memory impact of PyPy on examples. There are reports of outrageous pypy
memory usage, but they're usually very cryptic "my app uses 300M" and not
really reported in a way that's reproducible for us. We simply have to start
measuring memory impact on benchmarks. You can definitely help by providing
us with reproducible examples (they don't have to be small, but they have
to be open source).

The next group all are connected. The fundamental question is: What to do
in the situation where the JIT does not help? There are many causes, but,
in general, PyPy often is inferior to CPython for all of the examples.
A representative, difficult exammple is running tests. Ideally, for
perfect unit tests, each piece of code should be executed only once. There
are other examples, like short running scripts. It all can
be addressed by one or more of the following:

Slow runtime. Our runtime is slow. This is caused by a combination
of using a higher
level language than C and a relative immaturity compared to CPython. The
former is at least partly a GCC problem. We emit code that does not look
like hand-written C and GCC is doing worse job at optimizing it. A good
example is operations on longs, which are about 2x slower than CPython's,
partly because GCC is unable to effectively optimize code generated
by PyPy's translator.

Too large JIT warmup time. This is again a combination of issues.
Partly this is one of the design decisions of tracing on the metalevel,
which takes more time, but partly this is an issue with our current
implementation that can be addressed. It's also true that in some edge
cases, like running large and complex programs with lots and lots
of megamorphic call sites, we don't do a very good job tracing. Because
a good example of this case is running PyPy's own test suite, I expect
we will invest some work into this eventually.

Slow interpreter. This one is very similar to the slow runtime - it's
a combination of using RPython and the fact that we did not spend much
time optimizing it. Unlike the runtime, we might solve it by having an
unoptimizing JIT or some other medium-level solution that would work good
enough. There were some efforts invested, but, as usual, we lack enough
manpower to proceed as rapidly as we would like.

Thanks for bearing with me this far. This blog post was partly influenced
by accusations that we're doing dishonest PR that PyPy is always fast. I don't
think this is the case and I hope I clarified some of the weak spots, both here
and on the performance page.

EDIT:For what is worth I don't mention interfacing with C here and that's not because I think it's not relevant, it's simply because it did not quite fit with other stuff in this blog post. Consider the list non-exhaustive