flowerhack

Something I've struggled with more and more these days is keeping up with all the stuff there is to read on the internet. Every day, there's an update from a friend's blog, or a cool article on a tech news site, or a post from my favorite blogger… I wind up letting myself get interrupted by all these irresistible information-trinkets, trying to skim each article quickly so I can move on to the next one, and searching for more once I've finished all my skimming, hungry for yet more little factoids.

One solution, obviously, would be to just cut down on the number of blogs and feeds that I subscribe to. I'm working on doing that, but it's a hard problem to tackle all at once—how do I decide which feeds are valuable, and which are useless? how do I handle a site which I really enjoy, but has a low signal-to-noise ratio? and so on.

If I could just filter these articles better, though, that'd be another solution altogether.

And it's a solution that works really well! I've been trying out a new article filtering system these past couple weeks, and while I hate to crow "success!" prematurely, I do feel like I'm spending less time "keeping up," and the articles I'm reading are more useful, and I don't feel weirdly anxious about missing out on news.

The idea's very similar to inbox zero, if you're familiar with that.

Basically: I browse the internet as I usually would, going through any blogs or RSS feeds and such that I like. When I see an article I'd like to read, though, instead of reading it right away…

1) I bookmark the article. (I use Pinboard for bookmarks but I'm sure other bookmark managers work too.) Most of the time, I try to tag the article with a relevant category right away—for instance, some tags I've used today are "https," "game design," and "security." If I'm pretty sure I'll want to read the article later on that day, it can remain untagged.

2) Once a day, I look at all my untagged bookmarks. All of these bookmarks must be read on the spot, or else tagged with a relevant category for later reading. No leaving untagged bookmarks lying around!

Using this method, I'm down to about 3-7 articles left untagged at the end of each day, which makes for about a half-hour of "keeping up" with the basic news and shorter pieces of the day. That's just about how long I spend commuting on the bus, so I can get my reading done while in transit. Awesome! It's like a little newspaper.

And the articles that were tagged for later wind up being more like little magazines: read less frequently, and specialized into different topic "groups" based on the tags.

Thus, when I'm in the mood to soak up some long-form journalism or some more technical reading (yay, lazy Sunday afternoons), I can pick one of those tags and spend an hour or two reading about a single topic. This lets me comprehend particularly technical articles better—rather than switching between wildly different areas of computing while going through all the "tech news" links of the day, I'll instead sit down and read several articles about, say, the logjam vulnerability at the same time, and do more of the "try it in your own terminal!"-type experimentation that I don't tend to do when I'm just trying to read over the day's news.

And at the end of the day, my "article inbox" (untagged articles) count remains fixed at zero, which leaves me feeling very relaxed and happy indeed :)

A couple weeks ago, I was reading a pretty excellent paper on internet password research. As I read, found myself becoming vexed. Here was an eminently practical paper, giving very practical suggestions that any web developer could act upon right away... and yet I never would have read the thing at all if I hadn't had a friend in academia who happened to give me a link to it.

What other useful stuff has academia been doing without my noticing, and how can I find out about it?

Like (I suspect) most software engineers, I get my industry news/trends/updates/etc from a smattering of populist-ish sources such as: blogs of programmers I admire, blogs of friends, Hacker News (with reluctance), sometimes Slashdot, that sort of thing. If the password security paper had wanted to spread its ideas via one of those channels, it seems like it'd be pretty doable. Simply extracting the "take-away points" section of that paper into a blog post would make for solid reading, and a link to the full paper could be included for the curious.

I know that I would love it if more academics shared casually-worded summaries of their papers. Even when I'm planning to read the full paper, I'll try to ask someone else "so what's this actually about" before I do. Usually, the on-the-spot, casually-worded summary is more transparent than the abstract, and helps me direct my reading better.

There's also an argument to be made that perhaps software engineers as a whole should really have a source for industry news that doesn't involve upvotes or Randos On The Internet. I would be tremendously curious to hear from engineers in other fields about this—has some other group found a tidy compromise between social news sites and impenetrable academic journals? I suspect this sort of compromise is what the ACM Queue is trying to accomplish, but I only know about the Queue because of, well, another friend in academia. Maybe the Queue just needs a better PR department?

In any case, while not every field seems to be affected by social-industry-news-sharing the same way software is (medicine, for instance, requires that doctors complete X hours of professional development, done via formal exams, classes, and so on), this isn't a problem solely afflicting software engineering, either. I was prompted to write this post after stumbling onto an article about the Volokh Conspiracy, a blog by a handful of legal scholars that apparently has become just as influential as major law journals in shaping US legal thought—arguably moreso, since they can offer faster feedback than the journals can. And as far as I can tell, arXiv has become the "open beta test" for papers in fields like math and astrophysics as well as CS—and I'll sometimes see arXiv posts linked on Facebook or whatnot.

I'd love comments from folks who understand how other fields handle this dilemma, or who have cool ideas I haven't thought of yet!

I spent a little while digging around in CPython recently, and thought I’d share my adventure here. It’s a bit of a riff on Allison Kaptur’s excellent guide to getting started with Python internals—I thought it may be neat to show how my own explorations went, step-by-step, so that perhaps other curious Pythonistas might follow along.

( Read more... )

This is possibly a sign I'm a bit sleep-deprived at the moment (I did the waking-up-early-to-go-birdwatching thing this morning), but I found this bit from the Flickr API docs for their "photo search" function immensely charming:

[parameter:] accuracy (Optional)
Recorded accuracy level of the location information. Current range is 1-16 :
World level is 1
Country is ~3
Region is ~6
City is ~11
Street is ~16

Does this mean Flickr, at its lowest accuracy level, can distinguish between "photo taken on the moon" and "photo taken on earth"? That is the "world" level, after all... :)

I've been super-quiet on the Hacker School blogging and I hope to resume that soon; I've been so busy hacking and learning that I keep forgetting to blog, oops. Suffice to say I've been doing some rad stuff: yesterday I implemented a bitflipping attack on CBC mode encryption, today I spun up a quick Flask app that lets you search Bing via text message, now I'm working on a birding quiz app I've been planning to work on "someday" since April (eep!), and in between all that I've been learning Rust and RUST IS DELIGHTFUL FUN. I'll blather all about it in a post, for sure!

Today the upstart social networking site Ello reaffirmed their promise to never sell user data or ads. Which is good for them, I suppose, but the following line from their announcement made me frown:

With virtually everybody else relying on ads to make money, some members of the tech elite are finding it hard to imagine there is a better way.

But 2014 is not 2004, and the world has changed.

We... we had ad-free social networking in 2004. It was called "one of your friends got a Dreamhost and put some forum software on it and everyone hung out there." If the website got really big and popular, maybe the owner would ask for donations from the users, and usually folks would give enough to keep the place afloat, because everyone wanted to keep hanging out there.

It wasn't glamorous. It didn't give anyone rounds of VC funding or make anyone rich. Sometimes the site would crash from some "IPS driver error" and a grumpy teenager with the heart of a future sysadmin would crawl onto AIM at 2AM to tell everyone they were working on a fix.

But we existed. And for some reason I can't help but feel a little slighted. Ello didn't invent the concept of people hanging out online without ads. (Take, for instance, the very site you're on now, Dreamwidth: another great example of a community bootstrapping and sustaining itself.)

I had similar grumpy feelings when Pinterest was blowing up a few years back—not because of any ill will toward Pinterest, but because of the breathless, astonished tone reporters seemed to take when talking about Pinterest. In particular, they seemed staggered by the fact that the site's users were almost all women, bringing them together in a way never seen before, and how did Pinterest discover the secret of drawing women to the internet?!

And yet, the "social networks" I hung out on during my preteen and teenage years were composed almost entirely of young women. I'm not even sure why that was the case—we talked about gaming and tech a lot, which were supposedly "guy" interests when I was a kid—but it was a prevalent enough gender skew that, on the rare occasion when someone joined with an obviously male handle, we'd joke about how "but there are no boys on the internet!" We were there the whole time; we didn't just starting using the internet when Pinterest came out.

I suppose it's the difference between a Social Network ^TM in the Facebook and Google+ sense, versus the "social networks" I remember. Those "social networks" were small, and never made front-page news (or any news at all), and were more concerned with keeping to themselves than recruiting new members. They were "social networks" in the "people getting together and hanging out" sense. But Social Networks ^TM are big, and self-promote, and have money and influence, because there's a lot more people on the internet nowadays and more money to be made.

Which is fine. I just don't think it should be billed as this Totally New Thing. All sorts of folks have been on the internet for a long while now. Let's acknowledge that, at least a little.

Also of interest: Paul Ford's tilde.club and "how LGBTQ nerds helped create online life as we know it."

I was reading Allison's blog post on how to start exploring Python internals, and one of the suggestions was: try implementing a Python library function without looking at it! I thought this sounded like splendid fun; also, one of the suggestions was namedtuple and I actually REALLY LIKE namedtuple but don't have occasion to use it often enough. So I dove in! Stuff I learned so far doing this:

Metaclasses! I already knew about these in a vague "it's like a thing that creates classes or something" sort of way, and since it seems like namedtuple creates class-like objects, I thought it'd be a good place to start. Probably the most interesting thing I discovered: the plain old type method, which I've always used just to check the types of objects, can also be used to dynamically create new classes! This seems like a super-odd and unintuitive dual functionality, and I found a throwaway comment that claimed this was due to historic/backwards compatibility reasons, but I wasn't able to determine what these reasons were. (Let me know if you know!)

With type() alone, you can create a pretty decent named tuple, which I coded up like so. Granted, it's (a) not a tuple at all, and (b) does some slightly frownyface manhandling of class properties, and (c) doesn't implement all the functionality of namedtuple... BUT, it does handle my most common use case for namedtuple, which tends to be: "Hey, I want a kind-of-throwaway class that'll be used only in a small section of the code—but that throwaway class will make what I'm doing SO MUCH MORE READABLE." Thus, tada! Instant objects with sensible properties!

But for some reason I got to wondering: could you make a function that, say, knows to simply create a Foo when you call namedtuple('Foo', 'my properties'), rather than having to do Foo = namedtuple('Foo', 'my properties')? It turns out the answer is YES, but you have to do evil things to make it happen. Essentially, Python maintains dictionaries of variables for you—try typing globals() or locals() into your Python interpreter to see!

In order to auto-generate our Foo class, then, we want to add Foo to the local variable dictionary of the caller. (Meaning: if we're calling namedtuple('Foo', 'my properties') within our main method, we want Foo to be created in that main method, not just within the namedtuple call.) Turns out there's a _getframe function you can use to get, say, the current frame, or the parent frame... and then just tack Foo onto the parent frame and you're good to go!

But that's all a terrible idea and you shouldn't do it. It's not good for you. It's not good for the planet. Don't be like me.

I've got an actual, good-for-the-planet implementation of namedtuple underway, so hopefully I can share a real gist of that with you all soon!

Edit: Ned pointed out that the super(self.__class__, self).__init__() call I had in my init functions for the janky and trolly tuples wasn't quite right—calling super on our hand-rolled class gets us a NoneType, so it doesn't really make sense to call it. I updated the code to be more correct now. Thanks, Ned!

Crypto challenge update: I can now decrypt repeating-key XOR and detect ECB encryption, woohoo! Now that I'm done with the first "set" of challenges, though, I think I'll take a bit of a break—they're super fun, and I'll come back to them later, but I want to start pairing more and explore some other things, too.

Tonight there was a round of presentations from other Hacker Schoolers and goodness they were awesome. Highlights included: Allison poking around to see how the recursion limit is implemented in Python and discovering amusing details therein, Eunsong's Javascript-based molecular dynamics simulator, and Tanoy demonstrating both his live coding skills and his excellent taste in music by making a Jekyll blog and dropping it on Digital Ocean in less than the amount of time it takes to listen to one rap song.

To wind down this evening, I wanted to dust off my old Heroku account and deploy a Flask app there (I've been trying to move some things off my Linode, and this seemed like an easy one to handle), and ran into a bunch of annoyances with key management. The first key I tried to give Heroku was rejected because "that's already being used by another Heroku account," which suggests I've got yet another account on the internet I've forgotten about, oops. The second key I used authenticated fine, but I couldn't push to git—since my git is configured with a different key—so I had to edit a file in .ssh/config, but the change didn't seem to be helping, and eventually I figured out that I had both an id_rsa and an id_dsa key, and I was referencing the wrong one. Sigh, key management. Hopefully I won't forget about the existence of this Heroku account too, heh.

Alas, this post is late—I left my computer at Hacker School last night and thus couldn't post until I got back this morning. But I'm talking about what I did during Day 3 so this still counts as blogging every day, right?

Anyway! I got some real headway in the crypto challenges, which was satisfying, though, as one might expect, it turns out twiddling bits in Python is rather annoying compared to something like C. Python tries very, very hard not to let you operate on raw bits, so you end up doing a lot of awkward conversions. Like, for the task of "this hex string has been XOR'd against a single character; figure out what character that is," I would up with some code that looked like this...

[chr(ord(byte) ^ key) for byte in hex_str.decode("hex")]

...which is (1) decoding the hex string, (2) reading that one byte at a time, (3) XOR'ing the value of the byte against the key, and (4) converting that back to a character representation. I wound up fumbling a bit getting those conversions nested correctly... I'm hoping to think of a more "systematic" way of handling these soon, maybe like a unicode sandwich for bit-twiddling. Or I could just convert everything to bitarrays and handle the problems that way; we'll see.

I also spent the afternoon reacquainting myself with my faltering early attempt at implementing Raft in Python, which was last updated, uh, seven months ago. I hadn't realized I'd left it abandoned for so long! Definitely hoping to wrap that project up (or maybe just start over from scratch) before I leave New York...

First, a follow-up on yesterday's lulz with the eBird data: I lied a bit when I said it was a tar file that was being troublesome; the initial download was a tar file, which decompressed to a few README-ish files and a gz file, but the actual trouble came about when I tried to decompress the gz file—which contains the actual data, and was causing the trouble.

I decided to see what gzip thought the size of the file should be when uncompressed, and, uh...

dhcp-0059526637-5b-99:ebd_relAug-2014 flowerhack$ gzip -l ebd_relAug-2014.txt.gz
         compressed        uncompressed  ratio uncompressed_name
         7232458369          2856865220 -153.2% ebd_relAug-2014.txt

Apparently gzip thinks my massive text file should be smaller once it's uncompressed??? (And definitely not >60GB like it tried to do?)

( Read more... )

I decided I'd like to try and blog every day while I'm at Hacker School. This will make my blog updates a bit spammier than I normally like, but it also seems like a fun way for me to track my own progress and share what I'm up to with various interested parties, so!

( What I'll be working on! )

( What I worked on today! )