[insert title here]

August 23, 2010

College, part 1

Filed under: Uncategorized — gdb @ 12:58 am

Have you ever sat back and tried to trace out the lines of fate that brought you to where you are today? While at home this week, I did exactly that. I spent time reflecting on both the major life decisions and serendipitous events that have molded me into the person I am today.

Before I proceed, I must present a disclaimer. This post is overtly egocentric. Any semblance among its contents of general life lessons are purely coincidental. So read on at your own risk; I cannot refund any time spent reading this post that you could have otherwise spent hacking.

Ever since attending the Canada/USA Mathcamp the summer before 8th grade, I had known I was going to end up at MIT. Mathcamp had been my first real glimpse of the awesomeness that can result from surrounding oneself by a group of motivated, enthusiastic students who all specialize in a similar field. And throughout camp, people had claimed that MIT was basically a year-round clone of Mathcamp. Once I returned to my high school, I could not help but dream of the day when instead I would return to a school where I would be constantly surrounded by nerds and geeks like myself who were as passionate about their field as I. Of course, realized that not everyone at MIT was a mathematician, but the particular choice of field was far less important than the mere existence of this intensity.

When the time for college applications came around, I had submitted my MIT early application by the beginning of October. Since I was fixated on MIT, I barely even considered applying to other schools. Just to be certain, however, I decided to visit campus for Splash and partake in the overnight stay program.

Unfortunately, my first impressions of campus were quite negative, and my second, third, and fourth impressions weren’t much better. I recall being disappointed with the physical aspects of campus, with its bare pipes and uncomfortable classrooms. (I think the Splash classes I was in were in building 2 or so.) I had been expecting that the entire campus was going to look like the Stata center. However, my complaints about MIT’s aesthetic appearance paled in comparison to those about my exposure to its students.

My host for the overnight stay was a frat boy who lived in Baker. (For those who don’t know, Baker is the “social” dorm. Meaning, from what I understand, a lot of parties.) While there is nothing wrong with either of those attributes, neither of those lifestyles were what I was looking for. I meeting his physics-major roommate while he was working with someone on a physics problem set. The pair had already selected the appropriate equation and plugged in some variables, but they couldn’t figure out what to do from there. I looked at the paper, and immediately pointed out that they just had to cancel some common factors. And then they just stared at me. “How did you know how to do that?” they inquired, the awe apparent in their voices. It was at that moment that I realized that in fact not everyone at MIT was amazing at academics, even within their own chosen specialty.

The rest of my visit did little to convince me otherwise. My host talked largely of parties, athletics, and fraternities. I recall talking to another resident of Baker, who told me that after freshman year, half of the males go to the fraternities. The half remaining in the dorms, he said, are the antisocial ones. I had no intention of joining a fraternity, but nor did I relish the thought of spending my years in a dorm full of only antisocial types.

By the time I returned home, I had decided unequivocally that I was not after all destined for MIT. But I had no real idea where I did want to go. Hence, I elected to apply to six other schools (Harvard, Stanford, Princeton, Caltech, Duke, and Harvey Mudd) and decide later.

And lo and behold, after enough time passed, “later” became “now”. In the spring, I began playing the college visit game. I wrote off Princeton and Duke as being not a good fit for me, and so I didn’t bother visiting either. I took a California tour and visited the three California schools in one go. I had an amazing time at Stanford, meeting fascinating people and enjoying the great weather (except for these little caterpillars that were everywhere; they infested the trees, and if you parked your bike under a tree… anyway, I digress). I also met with several professors who painted an excellent picture of Stanford, and one of them gave me the email address of a math major (I recognized his name from the math competition world, so I knew he was quite good, but I had never met him) and recommended I send the student some questions.

And then I went to MIT’s CPW. I attended for three reasons: first, to make sure I really did not want to attend MIT. Second, to visit my friends who were going to be there. And third, to go hang around Harvard a bit.

At Harvard, I vaguely recall sitting in on a class and meeting with my admissions officer. However, what I most vividly remember is meeting with a computer science professor, and then walking out into the middle of a computer science event afterwards. Through chance, I had visited on the day when the computer science department advertises to its potential concentrators, and another professor had just finished explaining some of the intricacies of dual-concentrating with a field that does not require a thesis (fascinating material, of course). But most importantly, there were several older students around, with whom I struck up a conversation. It turned out they were math majors, and one was another name I recognized from the math competition world, and we talked both about the academic Olympiads as well as what life at Harvard was like. Before leaving, some of the students gave me their email addresses, and they told me to email them if I had any further questions.

Long and hard I pondered. I narrowed my list of potential schools to Harvard and Stanford, but I could not choose between them. I felt that I had seen such a limited cross-section of each school, and I did not have enough information to make such a life-defining choice. But I had to choose. And finally, after much deliberation and expenditure of unhelpful advice (“pick one out of a hat, and if you feel disappointed by that choice, then switch to the other”) I made a decision. I chose Stanford.

I proceeded to proclaim my choice to the world. I emailed the people I had met with and told them of my choice. I marked Stanford as the school to send my AP results to. I did everything except actually submit my acceptance card.

A few days prior to the May 1st deadline, as I was filling out the Stanford acceptance card, I decided I may as well try to gather a little bit more data about each school. I emailed both the Harvard and Stanford math majors whose email addresses I had collected. The Harvard students all recommended that I attend Harvard. The Stanford student recommended that I attend Harvard. And so I reversed my earlier decision and chose Harvard.

Why would the Stanford math major recommend Harvard over his own school? He said that while he enjoyed Stanford, he felt that the very top math students are all at MIT or Harvard. It was very lonely for him. And based off of that information, regardless of how happy I would be at Stanford, I knew I would never get out of it what I really wanted.

Anyway, wow, I spewed a lot. If people are interested, I can also write about how I became a computer scientist, as well as how I ultimately ended up at MIT. If not, well, I won’t waste the bits :) .

August 15, 2010

Python signals + Python threads = awesome

Filed under: Uncategorized — gdb @ 1:37 am

Run the following program, and push control-c in the middle. What output do you expect?

#!/usr/bin/env python
import os, sys, time, threading
def do_fork():
    while True:
        if not os.fork():
            print 'hello from child'
            sys.exit(0)
        time.sleep(0.5)
t = threading.Thread(target=do_fork)
t.start()
t.join()

(Spoiler below.)

The output you were expecting is probably quite different from the output you ended up with. I get a tower of exceptions of the form

Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib/python2.6/threading.py", line 532, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.6/threading.py", line 484, in run
    self.__target(*self.__args, **self.__kwargs)
  File "foo.py", line 6, in do_fork
    print 'hello from child'
KeyboardInterrupt

at the rate of two every second. Namely, even though only one KeyboardInterrupt was sent (at the UNIX level, a SIGINT), all future children are being interrupted. I ran into this peculiar behavior while playing around with Python’s multiprocessing library, which fork()s a number of children processes, so this problem can happen in practice.

The man page for fork() specifies that children do not inherit pending signals, and one can check that a similar program written in C does not exhibit this behavior. So what’s happening here? In order to answer this question, I took a short source dive into Python.

A little greping shows that PyOS_setsig in Python/pythonrun.c is responsible for setting up signal handlers. If one looks for its callers, this leads back to Modules/signalmodule.c. Here, one uncovers how Python deals with signals under the covers. When a handled signal signum is received, by default Python sets a flag in the Handlers[signum] struct. Then later a synchronous function is called which goes through the Handlers array and executes the Python level signal handler. Note that this synchronous function runs in the main thread, because Python tries to guarantee that only the main thread will receive signals.

Now, what happens if a fork() occurs after a signal is received but before thae check-for-pending-signals function is called? In this case, the child process is given an identical address space to its parent, including its parent’s Handlers array. Thus it will effectively inherit any pending signals.

To put the icing on the cake, in the above sample program, the main thread is stuck in an uninterruptable join(). This means that it will never take care of any pending signals, and all children are guaranteed to act as though they were KeyboardInterrupted. How’s that for an awesome interaction between threads and signals?

For reference, I’ve reported this issue and submitted a patch. We’ll see what upstream thinks. Anyway, to ask an overly specific question, what surprising interactions have you found between components of high-level languages designed to simulate the layers below them?

August 9, 2010

Improving your python interactive session

Filed under: Uncategorized — gdb @ 5:08 am

One of the best parts of working in an interpreted language is that if one wants to do some quick testing (say, to check the signature of a method or verify the behavior of some code), one need only pull up an interpreter and type in the relevant bits of code.  An interactive interpreter (or REPL) such as python or irb (the interactive ruby interpreter) make this process even easier, allowing you to evaluate code line-by-line and see the result immediately.

Now, working in a REPL feels a lot like working in a shell, and unsurprisingly it is nice when it provides UI features to make editing more convenient. Some REPLs don’t bother to provide fanciness, such as handling of C-c or providing readline support. (The SMLNJ sml REPL is a fantastic example of this.) They are consequently quite cumbersome to work with. Fortunately, for such cases one can just use rlwrap, a readline wrapper that is invocable as rlwrap sml. However, while rlwrap does provide some nice readline history features, it can’t provide other features such as tab completion.

Now, the default settings for the Python interpreter are better than these rather underfeatured REPL. When you type Python, you should see a REPL with readline enabled. In particular, you have history and sensible handling of C-c. But I’ve always had two major nits with the interactive experience. First, there’s no tab completion. And secondly, your history does not persist across sessions.

Little known fact: with the appropriate configuration, one can make both of these happen. The interpreter will execute the file pointed to by the environment variable PYTHONSTARTUP, provided it is set. The Python documentation provides example code to both enable tab completion and saving of history.

I’ve long ago created a Python startup file that does both of these as well as some other small setup tasks. It’s really made me start to enjoy my experience at the Python prompt. What dotifles have you found to be useful for improving the user experience with a particular application?

July 29, 2010

Welcome to DefCon

Filed under: Uncategorized — gdb @ 8:46 pm

I arrived in Las Vegas for DefCon last night.  Exhausted from my travels, once in my hotel room I naturally pulled out my laptop wanting to get online.  I did not remember the SSID of the hotel’s network from last year, so I pulled up the list of available access points:

Available access points

As you can see, there were a number of Guest_Internet_Access access points.  There were so many that clearly they had to be the official access points of the hotel.  Seeing the name “lodgenet”, I vaguely recalled that as being the SSID of the network I had used last year, but I thought that maybe the hotel had transitioned to a new name.  So I opened up a browser and tried to navigate to “test.com”.  I was then redirected to the following page:

Ok, easy enough.  This page looked quite legitimate, and it was served over SSL.  Firefox indicated that it had a valid SSL certificate, and I even pulled up the details thereof:

But this was DefCon, and one can never be too cautious.  I looked at some of the branding:  copyright the “Guest Internet Access Corporation”?  That was an odd name for a company, especially one whose domain name was ip3networks.com.  I pulled out my iPhone and did some Googling for this name.  No hits.   I navigated to ip3networks.com, but no page loaded.  Something smelled rather suspicious.  I did find a reference to a company or a product of some sort called IP3 Networks, but it seemed to be owned by a different company.  So at this point, I was largely convinced that this was a scam.  To verify, I tried poking at the page a bit.  I pushed the submit button without filling in any data, but the original page just reloaded with no error message.  I then tried adding in some bogus data and submitted the page:

This was the final conclusive evidence that this was a scam.  Look at the URL: this suggests an MVC framework such as Ruby on Rails (pulling up the source, the error div had id errorExplanation, which is a Railsism), and that I was the first person to submit such a form.  The page was clearly not professionally made.  Reloading the page with a GET request resulted in a 500 error.   Clearly, someone had quickly cooked up a Rails and dropped in a bunch of access points, hoping to harvest credit card numbers.

Anyway, needless to say I’m staying off the paid wireless: Guest_Internet_Access or other.  I’m currently on the DefCon wireless, which advertises itself as being one of the most hostile wireless networks in the world, but I’m taking care to tunnel all of my traffic.

I love DefCon.

July 24, 2010

If argparse is the future, I want to build a time machine so I don’t have to deal with it

Filed under: Uncategorized — gdb @ 6:01 pm

In some recent discussion on Zephyr, it was noted that since Python 2.7, the optparse library has been deprecated in favor of the newer and fancier argparse library. Newer and fancier means better, right? Well in this case, there’s a regression in the design, which will make the way that Python handles command line arguments to be quite different from how any other language does. This will lead to usability issues, certainly, and I fear it might even lead to security issues.

So what’s the problem, you ask? argparse has this slick “feature” where it will try to detect if a command-line argument looks like an option or an argument. So it will parse

./myprog -x -y

as having two options, -x and -y, and

./myprog -x foo

as having one option, -x, which was passed the value foo. Now, what happens if you actually want to give -x the value -y? It turns out you can do this, by passing it as

./myprog -x=-y

However, no other option parsing library behaves this way. The argparse docs claim that this behavior is desirable because the user probably made a typo and forgot to pass an argument to -x. However, by making this declaration, the library is making the behavior of the library far harder to predict, and can lead to subtle bugs, including security bugs. In particular, this leads to the potential for “option-injection” attacks, where a user manages to pass a script an option it didn’t expect. For example, if I have a remctl script such as
#!/bin/sh
./myprog -x "$1" "$2"

that is callable by an untrusted user, one would expect that I have done my part to avoid unexpected behavior by properly quoting the user’s arguments. However, if the user were to pass, say, a “-o” and a “/etc/remctl/remctl.conf”, depending on my script’s internal workings, I may have just allowed him or her to inject an unintended option, which could lead to privilege escalation.

So why, I ask, is this behavior in argparse? The only way to make your scripts safe is to pass all options in the form -x="$1", but this is unintuitive and nonstandard. Being nonstandard, doesn’t that defeat the point of having a standard option parsing library in the first place? So while argparse does look like it has a lot of genuine shiny, I’m not going to touch it with a ten foot pole unless this behavior is removed.

(Note: after this Zephyr discussion, a few people filed this issue as a bug report, see http://bugs.python.org/issue9334. If no one else takes Steven up on his offer to review a patch, I might be convinced to write one myself.)

July 18, 2010

Source diving for sysadmins

Filed under: Uncategorized — gdb @ 5:50 am

See http://blog.ksplice.com/2010/07/source-diving-for-sysadmins/.

That is all.

July 11, 2010

Contributing back upstream

Filed under: Uncategorized — gdb @ 9:34 pm

This week, I made my first contribution to an open-source community besides SIPB or HCS, both organization where I already know everyone who will be looking at my code.  I submitted a bug report and filed a patch for an unrelated issue in Python’s multiprocessing library:

http://bugs.python.org/issue9207 (bug report)

http://bugs.python.org/issue9205 (patch)

When I submitted these items, I was literally shaking with nervousness.  Here I was, about to inject my observations and code into this unknown community.  What would happen if they rejected my changes outright?  Would I be labeled an outcast, forever a reject from the larger open-source community?  I felt that when I clicked “Submit”, I was really submitting a test whose results would determine my future.  I correspondingly spent far longer than was healthy composing those postings.

Fortunately, it looks like a multiprocessing developer is going to do something with both issues.  I’ve had a bit of dialogue with him, and I feel that on the whole my thoughts have been valued.  On the whole, it’s been a reasonable first experience in this field for me.

July 4, 2010

Why is software so poorly written?

Filed under: Uncategorized — gdb @ 6:38 pm

A few weeks ago I blogged about source diving, and how I’ve recently found myself reading other developers’ production code.  My preliminary reaction is one of amazement.  In particular, I have run into a surprising amount of code that is simply sloppy, broken in edge cases, or is just poorly architected.  I’m not claiming I could have done any better, but all the same I find myself shocked that the world’s standard for production code is not an order of magnitude higher.

I can’t help but wonder why this is the case.  I look at the engineered products around me, and none of them feel like they are lacking in design or execution.  Whether a can of tuna fish or a vending machine, manufactured goods always seem to have a functional perfection about them, conveying the impression that the designer thought through all of their uses, and the builder ensured that every last part worked to perfection.  The same holds true for buildings, airplanes, and cardboard boxes.  So why is it that for any given piece of software, it’s not a question of whether it’s buggy but rather just how buggy it is?  Why is code that is inefficient, or poorly designed, or just plain barely works running on every one of our machines?  And perhaps most importantly, why are the standards for software any different from those for the rest of the engineering world?

I think the answer is far from simple.  The obvious answer, that a lot of this code is written by hobbyists, does not really tell the full story.  In my experience, code produced by companies does not tend to be especially awesome.  I think the real answer has more to do with software being inherently more complicated than other fields of human engineering, coupled with the fact that it’s easy to ship new versions of the product if you mess it up the first time, as well as the fact that failures really don’t cause too much damage.

That being said, I’m still not really sure if software really is all that complicated, or whether that’s just an excuse that software engineers use to justify their laziness.  What are your thoughts?

June 28, 2010

The Curse of the Xtreme Virtual Machine Service

Filed under: Uncategorized — gdb @ 5:19 am

I’m beginning to think that XVM, one of the SIPB services that I maintain, is cursed.  For some unbeknown reason, it seems that at every turn, XVM has managed to run into critical bugs in core pieces of its architecture.  Even its hardware is not above failing in perhaps the most obscure ways possible.

I suppose one small consolation is that I know XVM’s troubles predate my involvement with the project, so I can conclude that the problem is not my proximity.  Some time before I became involved, XVM kept being bitten by a bug in LVM.  I don’t know the full details of the problem, but the solution involved allocating many times more metadata for the LVs than one should reasonable have to allocate.  After that, XVM had some problems with the clustering software, which would deadlock in kernelspace while trying to coordinate access to its RAID.  But with careful debugging in the first case and switching to a new clustering solution in the latter, XVM was functional once again.

Business was quite good for a time.  So good, in fact, that XVM ran out of capacity.  We had four hosts in the pool, and all of them were out of RAM.  Every day, we would receive email indicating that people were trying to create new virtual machines, but we had nowhere to boot them.  Our hosts were rock solid (and indeed, I think those four hosts have been up continuously for as long as I’ve been paying attention).

But one day, we received a grant to expand our capacity.  We bought some new hosts, and after some trouble getting them to boot (our temporary workaround: turn off ECC), we put them in the pool.  But try as we might, our clustering software kept spitting back an error, claiming it could not allocate memory to even list the logical volumes on the RAID.  And then to make matters worse, we noticed that the new hosts would crash after a few days.  With ECC on, they wouldn’t even boot.

When the semester wound down and we finally had time to debug, we sat down with our clustering software’s source and eventually traced the problem to a constant defining the maximum number of locks.  We bumped that a few orders of magnitude, and we haven’t seen any troubles since.  To make matters even more exciting, we couldn’t find a place in the source where that constant is actually used productively.  (One would expect that it defines the size of some static buffer, but the only usages we found were to throw errors if too many locks were taken out at a time.)

And then this past weekend, we noticed that one of our four new hosts, the only one we hadn’t yet configured, was actually able to boot.  We stuck in the drives from one of the failing hosts, and it still was able to boot.  We compared BIOS settings, and we noticed that one interesting difference was that the failing machine had “8-DIMM drive strength” disabled, while the succeeding one had it enabled.  Don’t ask me what that option means; it was undocumented, and some preliminary Googling has not revealed any useful information.  Turning this on allowed our other three hosts to boot.

And so at long last, everything has been resolved!  The curse has been lifted! …Or has it?  For alas, last night we realized that one of the new hosts had frozen.  Tonight we reproduced the error while having a serial console session active, and managed to obtain some interesting MCE output.

On the whole, I actually think I’m ok with how my involvement with XVM seems to be going.  There’s been a lot of frustration and heartbreak, but it’s also been an excellent learning experience, a continual source of new challenges, and it’s really awesome when we finally make some misbehaving component work properly.

June 20, 2010

Source diving

Filed under: Uncategorized — gdb @ 8:02 pm

A few days ago, I was confronted with a question about how Python’s multiprocessing library will behave when worker threads terminate unexpectedly.  Without thought, I immediately pulled began an interactive Python session and ran

>>> import multiprocessing
>>> multiprocessing.__file__
'/usr/lib/python2.6/multiprocessing/__init__.pyc'

(I can never remember where to look for system-wide Python packages, so I just let the system figure it out for me.)  I immediately began diving into the code I had located, searching the raw source of this unfamiliar system for details of its behavior.

A few minutes in to this, I paused.  Why had my first instinct been to read the code rather than the documentation?   Why on earth was I comfortable reading the code in the first place?

I remember for a long time being desperately afraid of other programmers’ code.  I was a newcomer to the world of programming, and here were these lines of source written by far more competent programmers–lines of code that would be too complicated for my puny brain to comprehend.  How could I, a mere mortal, hope to comprehend these gifts from the gods?  Instead, when I needed to learn more about a system, my first line and only line of defense was a Google search, hoping to hit a section of the documentation or an online forum where my question had already been anticipated and answered.

But at some point in the recent future (probably on the order of months), something has begun to change.  For Python systems at least, I am more than comfortable immersing myself in others’ code.  When I want to learn how one works, I know where to begin and how to mentally construct and traverse the chain of dependencies.  I’m just beginning to do the same with some C systems, having source-dived Apace extensively and Git and openais/corosync very peripherally.  In the future, I expect that as I gain a better understanding of how C systems are typically structured, I’ll feel more comfortable source diving other ones as well.

In any case, I’m not sure I can accurately determine what has led to this change.  It could be simply that I have recently had to understand undocumented aspects of systems’ behavior, but I don’t think that’s it.  Rather, I think this marks a turning point in my maturity as a programmer, moving past the stage of being unable or unwilling to peel back the abstraction layers when those abstractions are getting in the way.

Or it could be that I’m just becoming too lazy to Google.

Older Posts »

Powered by WordPress