development and systems administration
Last month, January 2013, I started a new project called Sockethub, and I thought I’d write a basic overview of the goals of the project, along with some examples.
In brief: Sockethub implements a “polyglot” messaging service, for both social platforms like facebook or twitter and other systems like email and instant messaging. It assists open web-app developers by providing server-independent, server-side functionality.
Underneath the hood, it’s a socket server (Node.js + Redis) that takes requests from the app and carries them out. The web application specifies it’s intent using a defined JSON object, and sockethub handles the protocol specifics. Things such as sending private messages, public posts, subscribing to feeds, chatting via XMPP - are all handled by sockethub, with the web application getting a consistent, unified, JSON object to communicate with the server.
App developers need not concern themselves with the specifics of the platform they hope to integrate with, they simply define a message object, and specify the platform they’d like to message on (ie. smtp, facebook, xmpp) and the message will be sent. No protocol specific code lies in the application.
Here’s what your app would send to sockethub (via. WebSockets) to deliver a message to someone on facebook.
{
"rid": "d93d3i3d90lamc03",
"verb": "send",
"platform": "facebook",
"actor": {
"address": "zuck"
},
"target": {
"address": "barackobama"
},
"object": {
"body": "it's haz not has"
}
}
For the delivery of the message, that’s all that’s required. Note that “rid” means Request ID, and just needs to be a unique ID that the app can choose, Sockethub will use it when sending any information about the request.
The Sockethub protocol uses JSON objects that loosely follow Activity Streams. A working draft of example types of Sockethub tasks can be found here.
Here’s an example of an email message:
{
"rid": "klf9fwkw3ks0sdxme33",
"verb": "send",
"platform": "email",
"actor": {
"name": "Bob User",
"address": "bob@example.com"
},
"target": {
"name": "Foo Bar",
"address": "foo@bar.com"
},
"object": {
"body": "Hello World!"
}
}
Leaving the implementation details of various platforms out of the application code frees the developer to focus on their application, and not on integrating any number social networks, or pushing out a new release because of breaking API updates. Instead Sockethub worries about all that stuff (hooray!), speaking directly to all of the various protocols needed to get the job done, and just send the message.
While still in it’s infancy, a lot of progress is being made. We currently have working demos for email and facebook, with twitter and xmpp in the process of being completed. There’s still a long way to go, we’d like to support user discovery/search, subscriptions (RSS, twitter feeds, etc.), incoming emails (imap & pop), and much more.
So check it out, star it, clone it, ask questions, file bugs, request platforms/features - or help out and contribute code!
twitter: @sockethub
irc: #sockethub
github: http://github.com/sockethub/sockethub
Currently there aren’t many options when it comes to proxying WebSockets. Nginx doesn’t yet fully support WebSockets out of the box, though some people have opted to take an older version and patch it. I wont go into why I eventually decided to go with HAProxy, but I will link you to an article which does a nice job of summarizing the current state of Proxies and WebSockets.
Thankfully, there’s HAProxy, however for some reason it doesn’t seem to be so well known at the moment. There aren’t many overviews detailing it’s configuration, so I thought it would be useful to list a few common use cases and describe their setup.
Some potential ways to proxy to a WebSocket backend:
proxy based on sub-domain
proxy based on a URI
proxy using automatic detection
First, let’s get the top portion of our haproxy.cfg file out of the way. This is generally what I use for most configurations:
# this config needs haproxy-1.1.28 or haproxy-1.2.1
global
log 127.0.0.1 local0
log 127.0.0.1 local1 notice
maxconn 4096
chroot /usr/share/haproxy
uid 99
gid 99
daemon
defaults
log global
mode http
option httplog
option dontlognull
retries 3
option redispatch
option http-server-close
maxconn 2000
contimeout 5000
clitimeout 50000
srvtimeout 50000
One very important thing to point out in this config is the ‘option http-server-close’ line. Without this, some of the examples below can behave incorrectly. The option tells HAProxy to ignore the servers ‘keepalive’ setting. If it were not specified, then in some cases the conditional rules (used below) would not be re-evaluated every time there is a new request. Instead HAProxy would use the previously established connection for the new request(s) and so therefore would fail to notice that the new request might be a socket request. In short, If you are using WebSockets in a mixed environment, always make sure ‘option http-server-close’ is set.
note: I previously used ‘option httpclose’ which disables keepalive on both the client and server, whereas http-server-close just disables keepalive on the server. Thanks to Willy Tarreau for pointing out the difference in the comments below. See the docs on http-server-close for more information
Now, let’s move on to the various common types of configuration. For the sake of simplicity, we’re going to assume that your socket server is running on port 8000 on localhost. However, this could easily be a completely different server or port.
note: although HAProxy is great for load-balancing, in these examples I’m not covering that kind of setup and so I have no health checks running on the servers. Perhaps I will cover load balancing with HAProxy in a future blog post.
Some people like to have their WebSocket server running on an entirely separate sub-domain (ie. websockets.example.com). This makes it easy to keep the services on completely separate machines, and also allows you to run your sockets on port 80 directly instead of using a second-proxy on the sub-domain.
frontend public
bind *:80
acl is_websocket hdr_end(host) -i ws.example.com
use_backend ws if is_websocket
default_backend www
backend www
timeout server 30s
server www1 127.0.0.1:8080
backend ws
timeout server 600s
server ws1 127.0.0.1:8000
This will direct all traffic going to ws.example.com to your websocket server (In this example it’s localhost, but you could easily plug in the IP to a different server).
note: if you are using 1.5-dev or higher, you can use the ‘timeout tunnel’ option which sets up larger timeout for WS connections than for normal HTTP connections automatically. This means you don’t have to set overly long timeouts on the client side. I haven’t used this option yet, which is why I don’t include it in the examples.
note: since the two ‘backend’ config blocks will not change for any of the examples, we’ll leave them out in the following two configurations.
An alternative to the above setup is to proxy based on the URI (ie. example.com/websockets). This allows you to keep everything operating within the same virtual(or non-virtual) domain.
frontend public
bind *:80
acl is_example hdr_end(host) -i example.com
acl is_websocket path_beg -i /websockets
use_backend ws if is_websocket is_example
default_backend www
This is how I usually set up my servers right now, because the application I’m working on is, for the most part, a standard PHP application, however I use a python socket server (Tornado) for handling certain tasks. We want to keep everything operating on the same domain, therefore when a socket connection is made with a specific URI, we proxy that straight to Tornado, whereas otherwise we send the request to our Nginx + Apache backend (see my previous blog post, 5 reasons to add Nginx to your LAMP stack right now).
The final example uses an automatic detection of the websocket request by examining the HTTP header for the Upgrade: WebSocket line. My personal preference is to use this along with a secondary test to make sure we really want to pass the request along to the socket server.
frontend public
bind *:80
acl is_websocket hdr(Upgrade) -i WebSocket
acl is_websocket_server hdr_end(host) -i ws.example.com
use_backend ws if is_websocket is_websocket_server
default_backend www
This config will ensure that the request not only has the Upgrade: WebSocket header, but also that it’s accessing the correct location for the websocket server (ws.example.com).
These are 3 common ways to configure HAProxy for WebSockets. Perhaps the setup you’ve found works best for you is different? Please share your comments, corrections or alternate configurations in the comments!
The Turpentine Ray, music video for the song “Space Station Mir”. Filmed in Prague, Czech Republic, by the talented Emile Rafael.
http://www.theturpentineray.com
http://theturpentineray.bandcamp.com
http://facebook.com/theturpentineray
(Source: vimeo.com)
The web has continued to grow at ever increasing speed, along with that growth comes increased demands on server response time. It’s becoming quite clear that, although Apache has been a long-standing pillar in the world of the web, it no longer makes sense as an ‘across the board’ solution to serving web pages.
This may seem completely obvious to some groups of people, while to others - Apache is still that familiar face they don’t want to part with. Then there’s the common case of having an existing infrastructure built around Apache that is, more or less, operating acceptably. You can’t justify ripping out pieces just because there exists a (possibly) better solution. In these cases small, cautious, steps are the only real way to go about changing things.
For a majority of people transition is inevitable, however - to ease the possible transition, I suggest giving Nginx a try in a very non-intrusive way: as part of your existing server setup, and not removing Apache altogether (just yet). Here are 5 reasons why you should add Nginx to your LAMP stack today.
OK, so this is definitely not a major reason. After all, you shouldn’t install something just because it’s easy. However I do think a lot of people tend to avoid new things out of fear they will have to spend hours learning new concepts and configuration oddities. This is most definitely not the case with Nginx.
I figured it would be best to mention this reason first, so that I can take you through the setup and lay the foundation.
On an Ubuntu system (12.04), setup took only a few minutes. First, install Nginx:
sudo apt-get install nginx
For the purposes of this article, I’m going to assume you are using Nginx version 1.1.x which is what is packaged for Ubuntu 12.04, otherwise some configuration options we add may not work (like proxy_http_version).
The default configuration is fine as it is, though you will want to add some proxy directives to /etc/nginx/nginx.conf at the end of (but still inside) the http { } section
http {
... default config skipped ...
# proxy settings
proxy_redirect off;
proxy_http_version 1.1;
proxy_set_header Connection "";
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_max_temp_file_size 0;
proxy_connect_timeout 90;
proxy_send_timeout 90;
proxy_read_timeout 90;
proxy_buffer_size 128k;
proxy_buffers 4 256k;
proxy_busy_buffers_size 256k;
proxy_temp_file_write_size 256k;
}
Now, you’ll want to setup your specific site config. On Ubuntu the setup is very similar to Apache. You have your /etc/nginx/sites-available/default config file. Here’s what mine looks like:
server {
listen 80;
server_name
location / {;
proxy_pass http://127.0.0.1:8080/;
proxy_redirect off;
client_max_body_size 10m;
client_body_buffer_size 128k;
}
}
Remember, Nginx is going to act like a proxy. It will accept all incoming requests, and forward them to where they need to go. In this config we specify server_name as our front-facing domain name, listening on port 80. Then, for the location / directive, we specify the host & port that Apache will be listening on (I suggest localhost, port 8080).
Finally we need to tell Apache to stop listening on port 80, and instead to listen on port 8080. We do that with an easy change to the /etc/apache/ports.conf
NameVirtualHost *:8080
Listen 127.0.0.1:8080
I’m assuming here, you have NameVirtualHost setup. If so, make sure to also modify your site config and change the port number there as well, in /etc/apache/sites-available/default
You’re done! Wasn’t that easy? Now just restart Apache and Nginx:
sudo /etc/init.d/apache restart
sudo /etc/init.d/nginx start
At this point, when you visit Nginx should answer on port 80, and redirect the request to localhost:8080 which is where Apache now listens.
The first tweak you can make to boost performance right away, is to set up Nginx to serve all static files for you. This takes load off Apache, as Nginx serves up static files much faster.
It’s extremely easy to setup, just add the following directive to your /etc/nginx/sites-available/default config file:
# serve static files location /res { root /var/www; expires 30d; }
I keep all my static files in the /res directory off my webroot, but you if keep static files in several different directories, it’s easy to add multiple locations at once. For example:
location ~ ^/(images|javascript|js|css|flash|media|static)/ {
That directive will match any of those directories and serve their contents directly, instead of passing the request along to Apache.
Don’t forget to restart Nginx for the changes to take effect!
$ /etc/init.d/nginx restart
Nginx is very light weight and efficient. When put to the test against Apache it repeatedly outperforms, especially when it comes to handling concurrent connections.
This means that you are helping to increase the performance of Apache, as it won’t have to handle so many concurrent connections to serve up static content like images, css and js files.
This gives you immediate savings in resources for your server, even though you’ve actually added another layer to your stack.

graph from http://blog.zhuzhaoyuan.com/2012/02/apache-24-faster-than-nginx/
Nginx is a perfect choice for implementing things like long-polling or WebSockets. Having this proxy in place allows you to experiment with, and implement new ways to get your content to the front-end. With Nginx proxying to an asyncrhronous Python+Tornado app, you can slowly start to build stand-alone additions to your existing web application.
For example, I’ve begun implementing WebSockets for an existing large-scale LAMP application. It would be impossible to do this with Apache + the existing PHP code (which is huge). However with Nginx proxying for me, it’s dead simple to start slowly improving things with a simple Python+Tornado WebSocket app to push new events to the front-end using JSON. All I need to do is add a simple directive to my /etc/nginx/sites-available/default site conf.
location /notify {
proxy_pass http://127.0.0.1:8000;
proxy_redirect off;
}
Once I fire up my tornado service (which should be listening on port 8000), any requests to /notify will be proxied to the tornado app, to handle any async calls I may need to carry out. This can help to speed up my existing PHP site by not only taking more load off Apache, but also allowing for smaller updates to be pushed to the user, instead of the user having to refresh to get them (or implementing some form of short-polling).
Let’s say you outgrow your VPC a year down the road. Already having Nginx in place will make things a lot easier on you.
You can define groups of servers in your nginx.conf, all of which provide the same content. Then specify that group to proxy to. So, for example, you could set up another webserver with the LAMP stack, and add it to your list of servers in the nginx.conf. Now Nginx will load balance between your listed servers, the existing one and the new one.
In this way, you can easily start to offload services until all you have left on your original server is Nginx proxying requests through to your new farm of webservers.

image from http://scalr.net/blog/feature/2-1-feature-highlight-scale-from-one-to-many-servers/
I’ve always been a huge fan of Apache, but over time I’ve found myself in more varied development environments, and surprisingly I think the first performance hiccup is often not the database, or the code, but usually Apache (I have no proof to back up this claim:). This is because Apache tries to solve too many problems with one tool, and 90% of the problems it tries to address only apply to 10% of the scenarios.
The reason Nginx and Lighttpd are seeing increased usage, is because they address a few things really really well. These few things are usually all we need for a website.
What I’ve found really helps with this transitional approach, is that you can begin to test out your existing web application on Nginx, test the waters and see if everything works correctly. If so, you can eventually phase out Apache altogether (if that’s what you decide is appropriate). Alternatively, you can just leave Apache there, because the benefits of having Nginx in the stack still completely outweigh the downside of adding one more piece to the puzzle.
This page has been translated into Spanish language by Maria Ramos from Webhostinghub.com/support/edu