Tumblr Eats a Pelican: The Problem With Static Website Generators

Tl;dr: Don’t bother switching. Tumblr is better.

Guys, my blog is failing.

I’ve been blogging (via Tumblr) at http://www.rogueleaderr.com for two years and the Google analytics data speaks for itself:

Except for one random post about subverting Megabus wifi, my posts hardly ever see more than 200 hits (few of whom actually read to the end). I’m grateful for the readers I have, but I’m only getting marginally more amplification than shouting from a literal soapbox.

So I’ve decided to step up my blogging game

I’ve got some ideas I want to spread. And it’s time to find out whether no one else cares about those ideas or whether I’m just doing a bad job of broadcasting.

So I’m taking a few steps to step things up:

  1. Creating an email newsletter. (You can sign up here)
  2. Getting serious about promoting my blog posts.
  3. Redesigning my blog.

1 & 2 are topics for a different day. This post is about #3, my decision to switch from trusty Tumblr to a static website generator.

Spoiler: I will be wrecked by “what you know you know that just ain’t so…”

I should have listened to Mark Twain. That’s the main moral of this story. The longer story is that I let bad assumptions waste quite a lot of my time.

If you spent any time on Hacker News et al., you’re probably familiar with the classic “hacker profile page”, as best exemplified by Tom Preston-Werner or Kenneth Reitz or Paul Graham. These inimitably classy and minimalist sites generally contain a short tagline-style bio of the hacker, a list of projects, a list of blog posts, and maybe a contact form or easter egg.

I may not have a tenth the skill of any of those chaps, but I know how to steal a good idea when I see one. I wanted my blog to look like theirs and so I assumed (falsely!) that I needed to build my blog the way they built theirs. That meant a static website generator since, after all, Tom Preston-Werner popularized the very idea of static website generators with his tool Jekyll.

If you’re not familiar, static website generators (hereafter “SWG’s”) let you store all the components of your website (including each individual blog post) as files in a directory (often in a markup language like Markdown). You run a script and presto everything is compiled to interlinked HTML files which can be uploaded and served directly from a webserver or even for free from Github.

Static is the move.

As I started researching SWG’s, I quickly realized that there are a lot of options. The most popular seems to be the Ruby-based Jekyll, the original heavy weight. I read Ruby at a 3rd grade level and SWG’s in general seem to require a lot of tweaking, so I was inclined to prefer a Python-based option.

There are plenty of those as well, but I found a glowing post about one called Pelican so, in the absence of any clear compelling differences among the options, I decided Pelican was the way to go. It’s pretty simple to setup a basic Pelican site and even to deploy it publicly on Github Pages (probably 30 minutes work.) But a few minutes later, the problems started.

First, I wanted to create an “About Me” page. That’s quite easy, if you already know how to do it. If you don’t the Pelican docs (and limited tutorials available online) require some pretty close study.

Next, I wanted to make my blog pretty. Pelican ships with some off the shelf themes. IMHO, they’re all somewhere between ugly and “meh.” So I hoped over to CreativeMarket (with a big discount from AppSumo burning a whole in my pocket) and looked for a theme. There are hundreds of WordPress and Tumblr themes for sale, but nothing for Pelican and only a few for “generic HTML site”. Before accepting the cost of customizing one of those to fit Pelican, I googled and found a generic Svbtle ripoff theme for Pelican.

Sure it’s not responsive, but only 7% of my traffic is mobile. So…goodbye forever iPhone losers!

I need to build a link to the past

Next problem: my blog gets a small but non-trivial amount of inbound search traffic. Simply shutting down the old site would kill all traffic to those archives. So I needed to import my posts from Tumblr. But as I’ve learned the hard way on my current project LinerNotes, Googlebot ain’t too bright. Even if I redirect the rogueleaderr.com domain to my pretty new static site, any change in the URL’s of my old posts will at least break any external links to my posts and at worst will cause Google to penalize my posts as both broken links and duplicate content. Plus, cool URI’s don’t change.

A site deployed behind a webserver like nginx could use URL rewrites to redirect incoming URL’s (assuming the URL slugs could be preserved). But…Github pages doesn’t allow redirects (except perhaps via Javascript; I never got that far.)

And I’m back to Tumblr

While googling for how to import from Tumblr to Pelican, I stumbled upon a Tumblr manual page and had a massive facepalm moment.

It turns out that Tumblr has a feature called “Pages” that lets you include and link to static pages right inside your Tumblog.

All you have to do is press a button and drop in a little HTML and you’ve got your own “About Me” page. In other words, basically all the customization power I was hoping to get from a static site is already available in Tumblr with none of the hassle of learning how to use a generator, managing the migration, and maintaining a separate site.

With about 45 minutes of work and the help of the Tumblr Svblte Theme, I was able to create a site that looked and worked better than my Pelican site and preserved all my lovely links and Tumblr followers.

And it’s even responsive.

Moral of the story

Tumblr is a lot more powerful than I realized (as probably is WordPress, which I haven’t used). And Tumblr gives you an awful lot of bang for the setup buck. I’m now struggling to think of a use case that SWG’s allow and Tumblr doesn’t. (Maybe if you want lots of highly customized static pages, or if you want to run a webserver with complicated URL trickery.) If you’re publishing a lot of varied types of content, you’ll probably be better served by a fullblown CMS. SWG’s are only marginally less complicated to learn.

I’ve read worries about “what if Tumblr dies off like Posterous” – well, Tumblr allows export of your posts via an API, so you can deal with learning Pelican as soon as Yahoo writes off its $1bln investment.

One good argument I’ve heard is that storing a blog in a directory allows version control. That’s true, but I basically never change old posts. And it’s easy enough to save a copy of the HTML of a Tumblr theme before changing it.

All in all, the hassles of a Static Website Generator don’t outweight the costs.

Think I’m crazy? Let’s talk it out in the comments.

Did you like this post?

Then upvote it on Hacker News. Follow me on Twitter. Or subscribe to my newsletter.

Finding a co-founder is hard

Finding a technical co-founder for a new startup is notoriously hard. After struggling with it for a while, I decided to figure out exactly how hard it is.

To do that I simply estimated how many needles there are in the co-founder haystack. The answer is discouraging but intuitively accurate: there are only about two thousand potential co-founders in the entire United States.

How hard? Let’s use Fermi Estimation

In Fermi estimation, we wave our hands and spin minimal knowledge into approximate truth. The goal is to quicky get an estimate that’s close enough to correct for our purposes. For a link-bait blog post, “close enough” is not very close at all. And that’s close enough®.

We do Fermi estimation by breaking down our final quantity (number of viable co-founders) into a chain of constituent quantities. I model the problem like this:

Viable co-founders =

(% of programmers who are 2 standard deviations above the mean)

* (total number of programmers in the world)

* (% of world population living in the USA)

* (% of Americans in a startup approriate age-range)

* (% of programmers who know a given technology)

Now we basically just guess each of those quantities (with a Google sense check if possible)

% of programmers who are 2 standard deviations above the mean = 2.5% (by math)

total number of programmers = 25 million (number of StackOverflow accounts)

% of world programmers in the USA = 5% (US pop as % of world pop)[1]

% of Americans in a startup approriate age-range = 20% [2]

% of programmers who know a given technology = 30% (guess)

Now run the numbers:

Okay, so that all adds up to:

2.5% * 25,000,000 * 5% * 20% * 30% = drum roll 1875

So that’s it! Out of a US population of 319 million, a mere ~2000 people are viable co-founders. That’s just 1 in 1.5 million people.

Dang that’s hard.

Then factor in the fact that the vast majority of great programmers are already employed (or 3 years into starting their own succesful company), and that only a small % will have your same interests and a compatible personality. It becomes easy to see why so many startups deliver bad technology or are torn apart by co-founder conflicts.

So…good luck!

P.S. If you’re looking for an opportunity and love music, metadata, and HCIR, shoot me an email at george j london on gmail. Or if you just like Fermi estimations, follow me on Twitter

July 8th Link Dump

So I’ve recently started sending a monthly newsletter to my close friends containing goings on about my life, plus a dump of the best links I found since the last letter. The goings-on aren’t open access, but maybe some of you will find the links interesting. So here they are!

Music:

  • Like Daft Punk’s “Get Lucky”? Here’s an awesome violin cover.

  • An awesome article on the raw stage presence of Father John Misty. (Here’s a preview: “The combination of melody, energy, and rage I d just witnessed had manifested into something bigger than the sum of its parts, and I knew I d seen authentic art the kind that welcomed me to watch the explosion, but didn t care if the scattered firework remnants singed my brow or impaled themselves in my gut on their way back to earth. You will be affected, was the promise, and beyond that, good luck.”)

  • Here’s a cute tool that lets you compose music in the browser.

  • An interactive map of just about every music genre out there.

  • June was a slow music discovery month for me, but here’s a playlist of my favorite songs I found.

Fun / Life:

  • I never thought I liked BMX, then I saw this. Unreal.

  • There’s an entire episode of the (amazing) TV show “Archer” where the titular character is replaced by a raptor that only makes dinosaur sounds. Watch here: holy crap dinosaur archer.

  • A map of the US with foreign-derived place names replaced by the names’ original meaning..

  • Wondering how Niral, Matt and I keep our apartment so cough immaculate? We use a “zone-defense” cleaning system, the administration of which I’ve recently automated.

  • This is a Japanese Yo-Yo Wizard.

  • Overwhelmed with links in your life? I recommend Pocket, and app that lets you save links for later and read them on your phone (even on the subway!)

Design / Marketing:

Tech:

  • Stanford is offering a technical-oriented MOOC that teaches you that hands-dirty essentials of how to build a basic web startup. Normally I’m skeptical on the time/value proposition of MOOC’s, but I’ve been watching some of the videos and this looks like exactly the class I wish I had when I was learning to program.

  • Pro-tip for Chrome: Command-click to open a link in new tab. Got that from an AppSumo video about Mac Shortcuts.

  • You can instantly make your website faster by automatically optimizing your image compression

  • Cool technical article on how the Echonest’s music algorithms understand “genre” to make the interactive map (above).

  • Advance your advanced python by learning about descriptors.

  • Best summary I’ve found about D-Wave, the company selling the world’s first (purportedly) quantum computer

George Related

  • Turns out “rogueleaderr.com” is not a good blog name. So I’m moving most of my blogging to the brand new “Urbem Futurum”. The name is meant to evoke a classically-grounded civic-oriented optimistic futurism. That’s the closest I can get to presenting my weltanschauung as a tagline.

  • Urbem Futurum has a first blog post. In honor of July 4th, here’s my explanation of why “Egypt needs the Federalist Papers”.

Until next time…

Hedging hogz

How to find co-founders / collaborators: advice for myself 2 years ago

[Today I read a mailing list question from an entrepreneur asking how to find developers who would work for no pay. That struck a chord since I wasted a lot of time trying to do that myself when I was starting out on LinerNotes. So I wrote this blurb that I wish I could send back in time to myself. Also NB that I’m still working solo, so take this with a grain of salt!]

The first thing you need to do is to reverse the way you’re approaching the problem.

Unless you’re extremely lucky, you’re not going to have a lot of success with “how can I get people to do what I want?” No one else cares what you want. They care what they want. So instead you need to think in terms of “how can I give other people what they want (and satisfy my own needs in the process)?”

No one is going to work for you for free (would you personally spend months helping a stranger out of the kindness of your heart?) They may work for no cash, but only if you can compensate them in some other way that’s as valuable as cash. So think about what the type of people you need (i.e. programmers) want and think about which of those things you can offer them. The best thing you can offer is traction (i.e. a hugely reduced risk of failure) because everyone wants a seat on a rocket ship. You might offer equity, but if you haven’t raised money yet then your equity is basically worthless.[1]  Or it might be some connection or hard expertise you can share. If you don’t have anything at all to offer, then you need to go focus on yourself for a while and figure out how to either develop yourself to be a more useful partner or how to acquire the resources you need to attract people.

Applying this framework will give you a “candidate archetype” of people with whom you can make a mutually agreeable transaction. Then figure out where that type of person hangs out (e.g. in the UT-Austin computer labs) and go talk directly to them. Computer people are friendly and most of them will talk to strangers as long as you’re humble in your initial contact.

If you’re talking to a lot of people and not finding anyone, the problem is probably with you. Either you’re talking to the wrong people (because you didn’t target well) or you don’t have enough to offer at this point as a partner.

[1] Rule of thumb – think about the amount of cash you’d sell your company for today, and assume the whole equity pie is worth less than 1/10 of that.

Future the Economy Part 4 – Computers and Productivity

Dear reader, this is a unusual day. For the first time (possibly) ever, I’m actually writing my next future the economy post while NOT airborne.

When I left you, we were talking about productivity. I claimed that productivity is the root of all wealth and nearly all improvements in the human condition. And I told you about the two ways that productivity improves:

  1. Doing the same thing faster
  2. Doing different things

It’s pretty obvious how computers let you do #1. Once upon a time, rocket scientists had to compute thousands of ballistic equations using slide rules. “Computer” used to be the job title of a person who simply did computations all day.

Now scientists can do the same thing, but faster. And Moore’s law tells they’ll be able to do that about 50% faster ever year. That’s pretty good, unless of course I want to do something that currently takes a million years (e.g. perform a 1 second/byte calculation on 10 terabytes of data).

For that, I’m going to have to get creative. My only option is to choose an algorithm that does substantially fewer computations or does each computation substantially faster.

Computer scientists have a fancy way of talking about how long a certain type of algorithm will take. When you see expressions like “O(n^2)” (i.e. “big O notation”) or “polynomial time”, people are talking about how much slower an algorithm will run as the amount of input data increases. For some problems, like sorting a list of n integers, a good algorithm might take (n)*(log n) nanoseconds while a bad one takes n^2 nanoseconds. For a billion item list, that’s a difference of about 30 years.

By using the better algorithm, I’ve increased my productivity by about 4,000,000,000%.

These examples are most dramatic in computer science but they apply just as much to normal life. Say I need to attend a meeting in Houston. That’s a ~20 hour travel ordeal OR a 1 hour Skype conversation. If they’re both equivalently good[1], I can 20x my productivity by choosing wisely.

Next time, we’ll get a little meta and talk about how to choose better algorithms.

[1] In real life, alternative solutions are rarely perfect substitutes. But they’re often close enough along the attributes that matter.

Postgres Fuzzy Search Using Trigrams (+/- Django)

When building websites, you’ll often want users to be able to search for something by name. On LinerNotes, users can search for bands, albums, genres etc from a search bar that appears on the homepage and in the omnipresent nav bar. And we need a way to match those queries to entities in our Postgres database.

At first, this might seem like a simple problem with a simple solution, especially if you’re using the ORM; just jam the user input into an ORM filter and retrieve every matching string. But there’s a problem: if you do

Bands.objects.filter(name="beatles")

You’ll probably get nothing back, because the name column in your “bands” table probably says “The Beatles” and as far as Postgres is concerned if it’s not exactly the same string, it’s not a match.

Users are naturally terrible at spelling, and even if they weren’t they’d be bad at guessing exactly how the name is formatted in your database. Of course you can use the LIKE keyword in SQL (or the equivalent ’__contains’ suffix in the ORM) to give yourself a little flexibility and make sure that “Beatles” returns “The Beatles”. But 1) the LIKE keyword requires you to evaluate a regex against every row in your table, or hope that you’ve configured your indices to support LIKE (a quick Google doesn’t tell me whether Django does that by default in the ORM) and 2) what if the user types “Beetles”?

Well, then you’ve got a bit of a problem. No matter how obvious it is to human you that “beatles” is close to “beetles”[1], to the computer they’re just two non-identical byte sequences. If you want the computer to understand them as similar you’re going to have to give it a metric for similarity and a method to make the comparison.

There are a few ways to do that. You can do what I did initially and whip out the power tools, i.e. a dedicated search system like Solr or ElasticSearch. These guys have notions of fuzziness built right in (Solr more automatically than ES). But they’re designed for full-text indexing of documents (e.g. full web pages) and they’re rather complex to set up and administer. ES has been enough of a hassle to keep running smoothly that I took the time to see if I could push the search workload to Postgres, and hence this article.

Unless you need to do something real fancy, it’s probably overkill to use them for just matching names.

Instead, we’re going to follow Starr Horne’s advice and use a Postgres EXTENSION that lets us build fuzziness into our query in a fast and fairly simple way. Specifically, we’re going to use an extension called pg_trgm (i.e. “Postgres Trigram”) which gives Postgres a “similarity” function that can evaluate how many three-character subsequences (i.e. “trigrams”) two strings share. This is actually a pretty good metric for fuzzy matching short strings like names.

To use pg_trgm, you’ll need to install the “Postgres Contrib” package. On ubuntu:

sudo apt-get install postgres-contrib

**WARNING: THIS WILL TRY TO RESTART YOUR DATABASE**

then pop open psql and install pg_trgm (NB: this only works on Postgres 9.1+; Google for the instructions if you’re on a lower version.)

psql

CREATE EXTENSION pg_trgm;

dx # to check it's installed

Now you can do

SELECT *

FROM types_and_labels_view

WHERE label % 'Mountain Goats'

ORDER BY similarity(label, 'Mountain Goats')

DESC LIMIT 100;

And out will pop the 100 most similar names. This will still take a long time if your table is large, but we can improve that with a special type of index provided by pg_trgm:

CREATE INDEX labels_trigram_index ON types_and_labels_table USING gist (label gist_trgm_ops);

or

CREATE INDEX labels_trigram_index ON types_and_labels_table USING gin (label gin_trgm_ops);

(GIN is slower than GIST to build, but answers queries faster.

That’ll take a while to build (possibly quite a while), but once it does you should be able to fuzzy search with ease and speed. If you’re using Django, you will have to drop into writing SQL to use this (until someone, maybe you, writes a Django extension to do this in the ORM.)

And as a frustrating finishing note, my attempt to implement this on LinerNotes was not ultimately succesful. It seems that that index query performance is at least O(n) and with 50 million entities in my database queries take at least 10 seconds. I’ve read that performance is great up to about 100k records then drops off sharply from there. There are some apparently additional options for improving query performance, but I’ll be sticking with ElasticSearch for now.

[1] Sorry, Googlebot! Not sorry, Bingbot.