A HTTP proxy with filters

This idea was sparked by Channel 4’s online player 4OD. It makes you watch adverts like you were watching a television, some before, some after, and even some in the middle. It makes sense I guess, they still have to get paid. Except all this is happening on my computer, so I’m sure I can do something about it.

First thought, ad blockers. Don’t work, it’s all stuff within a flash object I essentially still want to use (to watch the actual content) and all the ad blockers I can find are actually still pretty simple, they just block URLs.

Some WebKit “Web Inspector”ing and a bit of wiresharking later and I find the URLs of the adverts being shown. They’re all from CDNs and have complicated session URLs, so I make a general pattern for their URLs to give to an ad blocker. When the video files are blocked, the 4OD player declares there is a problem (which I guess there is really), and stops playing. So it’s going to be more complicated than that.

More wiresharking. Let’s find out where the flash object is getting these video URLs from. From an XML document, with a nice clear structure showing where the adverts are and when to play them. If only there was a way of just removing that <adverts> tag. A browser extension (like greasemonkey or Safari extensions) could change the content of stuff fetched by the browser, but the XML document is being requested and processed by the flash plugin, so that’s out. Some kernel extension to intercept network calls? Sounds messy and beyond me. A proxy, except a misbehaving one that alters content before it forwards it (like those stories of those American ISPs that do the opposite of what I’m trying to do, inserting ads)? That could work. Is there something that already does this? Maybe, but uncertain if it’s configurable enough, and it costs. So that’s out. Can I make one? Good question.

I’d been hearing a lot about Node.js, a platform based around javascript and designed to make the evented model of javascript (like AJAX) available for use with general programming, especially IO. It was a language I already had a lot of experience with, and had libraries for HTTP requests and responses. And the evented model was supposed to make stuff like this fast and efficient. So should be perfect.

The way proxies work is pretty intuitive, and there’s plenty of documentation about it. Making a simple one in node took less than 50 lines, and it worked! Adding filtering rules was simple, I didn’t need to configure my proxy, I could just rewrite it however I wanted! The first thing I tried was removing the <adverts> tag in URLs that matched the 4OD XML info document and it worked. The adverts simply disappeared! A bit of smartening up and I had an easy to use (for me) system to handle any task like this when I need it!