January 2012, George London

Dear Senator Gillibrand,

As one of your constituents, I’m writing to urge you as strongly as I can to oppose the passage of the Preventing Real Online Threats to Economic Creativity and Theft of Intellectual Property Act of 2011 (a.k.a. PROTECT-IP or PIPA).

Let me clearly explain why I oppose the bill.

I am New York City-based entrepreneur. I am currently working to bring together information and metadata about music and to build a website which makes that information fun and easy to navigate and search. I am strongly resolved to only provide legal and non-copyrighted information (which means no pirated content, no enablement of piracy, and no intentional copyright infringement of any kind.) If I’m fortunate, my business will grow to not only provide an invaluable service to music lovers everywhere but also provide high paying jobs to hundreds (if not thousands) of people in the New York area.

So let me say unambiguously, without hyperbole or dramatization, that the passage of PIPA would very likely kill my business.

Here’s why:

1) I am financing this business using my very limited life savings. After years of working and saving, I have just enough to execute on my business plan. But complying with PIPA would require me to hire and possibly retain a specialized lawyer. That act alone would consume enough of my capital to seriously threaten my plans.

2) A major part of my business plan is to allow users to contribute information, including links. Complying with PIPA would require either developing custom monitoring algorithms or manually visiting each and every link to investigate. Either would cost more than I can afford at this stage without sabotaging my core business. PIPA allows Hollywood and the recording industry to abuse the power of government to unfairly and inefficiently foist the cost of protecting their own commercial interests onto vulnerable entrepreneurs like me.

3) PIPA forcibly sets entrepreneurs apart from our users by casting us as “hall monitors.” Successful entrepreneurs build – and depend on – thriving, passionate online communities by developing deep personal and emotional connections with their users. Nobody connects with a hall monitor.

4) PIPA establishes an infringement notification process that grants plaintiffs a disproportionate and unnecessary preemptive power to interrupt the operation of my website, in direct violation of my fifth amendment right to due process. The cost of fighting a groundless notification would almost certainly bankrupt me. And the RIAA has an established track record of abusing even the current better-safegaurded DMCA-based system by serving inaccurate takedown notices for the primary purpose of harassing and stifling legitimate competition. Please see this linked article for a specific recent example (http://bit.ly/A6vASt).

5) I may be lucky enough to successful start a business using my own savings, but scaling my business to become a large employer will almost certain require outside capital. This is difficult enough in the current dismal economic climate. But it will become nearly impossible if I’m forced ask venture capitalists or bankers to risk supporting a business that suddenly has enormous and unpredictable potential litigation liabilities. PIPA will almost certainly have a large chilling effect on investment in exactly the sort of technology our economy most needs to grow and to remain internationally competitive. Someone somewhere will build this business, but it won’t happen in America unless we take a much more balanced and forward-looking approach to protecting intellectual property.

The internet is a singular modern Wonder of the World, and it has brought America into an era of unprecedented and previously unimaginable intellectual and cultural richness. Even cultural realms which are supposedly threatened by piracy, like music and film, are experiencing a renaissance of creative output which is directly linked to the ever increasing availability of new (and old) ideas which are allowed to freely disseminate and cross-pollinate. Content providers absolutely should be fairly compensated for their labor, but with recognition of the simple reality that the internet has enabled an enormous cohort of new content creators (many of whom labor with no expectation of monetary reward) whose contributions also deserve to be respected and valued by our government.

Please do what you know is right and oppose both PIPA and any future version of the bill that contains provisions threatening the fundamental nature of the internet. Like so many parts of American society, the driving spirit behind the internet is sustained and nourished by creativity, ingenuity, openness, and good old fashioned American freedom. I cannot possibly support or vote for any representative whose actions betray those crucial American values.

Senator Gillibrand – please don’t kill my dream. Please don’t stifle the creativity of hundreds of thousands of your voting constituents. And please don’t threaten the monument to human progress by peaceful collaboration that so many millions of people around the world have spent so many years working together to build.

I hope you’ll seriously consider what I’ve written here. I am not an uninformed reactionary; I am a constituent with a direct, tangible, personal stake in the outcome of this legislative process. I know you’re extremely busy, but I would very much appreciate a direct response to my concerns.

Sincerely,

George London

Founder & CEO of HypeJet.com

[Warning to non-technical followers…keep on walking. This is another obscure hyper-technical post. Also, forgive the bizarre looking images, but it’s not worth the effort to force Tumblr to show them correctly.]

So…you, like me, have spent the last few weeks playing around with various Semantic Web triple stores, trying to figure out which is best suited for whatever particular quasi-mysterious application you’re trying to build.

After an exciting but awkward and unfulfilling first time with Sesame’s native repository and a briefly passionate but now apparently fizzled relationship with Neo4j, you’ve finally found that special store that you’re ready to settle down with, at least until it stops scaling gracefully or something faster and better documented enters your field of view.

Let’s say you’ve even built a bit of application code, and have a cute little toy process running on your laptop that executes SPARQL queries against a locally hosted server.

Now what?

Well, if you’re like me, your just might want to start showing off your ugly little duckling to those friends and family who don’t know enough about technology to laugh at the inadequacy of your architectural endeavor.

So, naturally, you’re going to want to make your application publicly accessible.

Now with a normal Django or Rails web app using a MySQL databse, deploying your demo is a snap using platform as a service (PaaS) solutions like Heroku, which let you publicly deploy your application from git by typing in about three lines of code. Heroku has even recently added basic support for Java, so you can build your app right out of Eclipse and onto a Heroku server.

But what if you’re using some adapted open source code that builds with Ant instead of Maven? And more importantly, what if your SPARQL server needs a 25GB data file to answer queries?

Well then, my friend, you’re pretty much out of luck on the PaaS side. As far as I can tell, you have two choices:

1) Use an infrastructure-as-a-service (IaaS) service like Amazon which lets you spin up your own cloud hosted servers. If you want to build a robust, scalable, secure solution, this is probably the way to go. And if/when I almost invariably move in that direction, I will try to write a blog post explaining how to do this. But it requires quite a bit of “upfront investment” in learning how AWS works and how to create, boot and administer a linux server that can run your code. Plus, it costs money if you want to use a non-trivial amount of computation or transfer and store a non-trivial amount of data.

2) You can do things the old-fashion way, and turn your home computer into a web-server that can handle SPARQL requests from the open internet. This has a lot of disadvantages – it’s probably insecure as all hell, your laptop has to be turned on and connected to the internet for it to work, and if you end up getting any real traffic, you’re going to be clogging up your bandwidth and CPU cycles handling SPARQL requests (plus many ISP’s forbid you from running servers at home.)

But rolling your own has a few trump card advantages.

First, it’s relatively easy (at least if you don’t have to figure out how to do it, which is why I’m writing this guide.)

Second, it lets you run you server with basically no additional configuration or porting or data uploading or anything – if you can run a SPARQL query against localhost, you can use your server as a remote host.

Third, it’s (nearly) free. You may elect to pay for a dynamic DNS service that costs $30/year (though there are free alternatives), but everything else uses software/services you already have.

So, here’s how to do it:

STEP 1: Make sure you have the pre-requisites

In theory, you can probably make this work with just about any computer and any internet connection. But for my purposes, I’m going to assume you have a configuration approximately similar to mine, i.e.:

OSX Lion

Running a SPARQL endpoint through a Tomcat server

Verizon FiOS or similar “always-on” internet connection, via a home router

STEP 2: Setup a static IP on your laptop

For this to work, you do NOT need a static IP from your ISP (which apparently cost extra). We’re going to use a service called “Dynamic DNS” that will let the internet find your network even when your ISP changes your IP address. But you do need a static IP on your laptop so that you your router can figure out what to do with incoming traffic from the internet.

Here’s how to do this on Verizon FiOS if you have a standard Actiontec router. First, open your admin panel by going to 192.168.1.1 in your browser:

Enter your username / password (the default username is “admin” and the default password is, I think, the serial number of the router.) If you can’t remember your login, you can hard-reset the router by pressing the little reset button on the back for ten seconds. This will wipe your configuration, but these routers are pretty good at automatically setting themselves back up.

Now, inside your control panel, click “My Network”, then “Network Connections.” Find the entry for your local area network (in my case “Network (Home/Office), and click the little edit button in the rightmost column of the table. Scroll to the bottom and click “Settings”.

Now, find the line that says “End IP Address”. By default, this is set to something like 192.168.1.255. You need to set the last number to something less than 255 to give you some address space that’s not automatically assigned to devices connecting to your router. I set this to 192.168.1.100. Click “apply”.

For some reason, you can’t just give your laptop it’s own IP and expect the router to talk to it. So next we need to go into the “Advanced” heading on the router control panel and select “IP Address Distribution”.

Click “Connection List”. Then at the bottom of the table of connections click “New Static Connection”.

Type in a name for you laptop, the static IP address you want to use (should be something like 192.168.1.150), and the MAC address of your laptop. On OSX, you can find the MAC address by going to “System Preferences” -> “Network” -> “Wifi” -> “Advanced” -> “Hardware”. (I’m not going to show screens with my particular MAC and network details to try to make it slightly harder to hack me.)

Go back to your router control panel and click “Apply”. Now, your laptop should attach to the router using the IP address you specified. If it doesn’t, try refreshing your IP by going to “System Preferences” -> “Network” -> “Wifi” -> “Advanced” -> “TCP/IP” and clicking “Renew DHCP Lease”. If that doesn’t work, restart your computer.

STEP 3: Get a dynamic DNS provider.

You know those DNS servers on the internets that make it so that you can type www.google.com into your browser, and your computer magically starts exchanging packets with the servers at Google’s IP address, and the Google homepage magically loads?

Well, you can use that same basic technology to get around the fact that your ISP gives you an ever changing address on the internet. The trick is a dynamic DNS service, which gives you a standard “whatever.mysite.com” URL, and automatically handles the nasty business of routing anyone who visits that URL to your router’s IP address. There are free services that do this, but they’re harder to use so I’m just using a fairly slick service called DynDNS (www.dyndns.com)

They require you to sign up for a “Pro-Trial” account which will start charging you after 14 days, but you can apparently cancel the account after a few days and still use them to route to ~5 IP addresses. They’re pretty simple to set up, but this video (http://revision3.com/systm/dyndns) covers the signup/setup process in detail, so I’ll refer you to them instead of repeating. At some point in the process, you’ll need to enter your router’s current IP address, and download a small client to your computer that will let DynDNS know if your IP address changes.

STEP 4: Setup port forwarding on your router.

Okay, so now the internet can find your router. But your router still needs to know what to do with income traffic. So if someone from the webs comes and gives the secret handshake, Mr. Router needs to send them to visit me. We do this with port forwarding.

Let’s go back to our router control panel. Click “Firewall Settings”. Click “Port Forwarding”. Pick your laptop out of the dropdown menu, and select “custom port” form the other menu. It seems that at least for me, Verizon blocks incoming traffic on port 80, the default HTTP port. But that doesn’t matter since it leaves high # ports unblocked. So just enter a random high number like 60000 under port. Click add.

AND THAT’S PRETTY MUCH IT.

Now, anyone who visits “whatever.yoursite.com:60000/whatever” can access that resource on your local machine.

If you actually want to run a SPARQL endpoint, there’s a bit more work to do. So,

STEP 5 (OPTIONAL): Configure Tomcat to deal with remote traffic

Most of the triple stores I’ve experimented with run as applications inside a Tomcat or Jetty servlet instance. If you don’t have one of those setup, you’re in for some not-particularly-fun work that’s way beyond the scope of this post (though you can try this post for a walkthrough of how to get started with a simple Sesame instance).

If you do have a Tomcat server running on your computer, you need one more step to actually use it as a SPARQL endpoint. Tomcat by default will run on Port 8080. We need to set it to run on whatever port we forwarded earlier on (e.g. 60000) so that traffic coming in on that port will hit the server and get a response.

To do this, you need to edit the “server.xml” file inside of your Tomcat installation. For me, the path to the containing folder is: /usr/local/apache-tomcat-7.0.23/conf

Inside of server.xml, look for the block that says:

<Connector port=“XXXXX” protocol=“HTTP/1.1”

connectionTimeout=“20000”

redirectPort=“8443” />

And change XXXXX to whatever port you forwarded.

And that’s actually it. Now when anyone send a url-encoded SPARQL query to “whatever.yoursite.com:60000/sparql” or whatever the appropriate URL is, the server will send back an appropriate response!

What this is useful for:

This is actually a pretty cool result in my opinion. While it is almost certainly not a good idea to run an openly accessibly SPARQL endpoint off your home network because it could easily get hacked or flooded with traffic, you CAN use this method to make an endpoint available to trusted friends and rely on “security by obscurity”. As long as you don’t actually list the access address to your endpoint anywhere, you are probably not going to get bombarded with queries.

But the more cool result is that you can combine this architecture with a REST paradigm to build a fully publicly accessible application using a framework like Rails or Django, throw that up on Heroku, and route all the SPARQL queries to your server behind the scenes. If you have an always on broadband connection and an old laptop lying around, you can throw linux on the laptop, setup Tomcat, copy your data over there, and use that laptop as an always-on server to support your public facing application.

That’s obviously not a scalable solution, but it is free, and way easier than trying to set up a whole AWS infrastructure. And if you’re only getting a handful of visitors to your public site each month, even an old laptop should be able to handle the traffic reasonably well.

Anyway, this walk-through is pretty configuration specific, but I imagine the process should be at least loosely analogous on any other setup. So hopefully this post will save you the trouble of figuring out how to do all this (which is definitely the hardest part). If you have any questions / problems / suggestions for how to do this better, just leave a comment or send me an email and I’ll try to help out or update the post!

Month: January 2012

My open letter to Senator Kirsten Gillibrand opposing the Protect-IP Act (PIPA)

[Quickstart] Turn your laptop into a public remote SPARQL endpoint (or pretty much any kind of public server).