development and systems administration

February 27, 2013 1:51 am

Sockethub - The polyglot approach to the federated social web

Last month, January 2013, I started a new project called Sockethub, and I thought I’d write a basic overview of the goals of the project, along with some examples.

In brief: Sockethub implements a “polyglot” messaging service, for both social platforms like facebook or twitter and other systems like email and instant messaging. It assists open web-app developers by providing server-independent, server-side functionality.

Underneath the hood, it’s a socket server (Node.js + Redis) that takes requests from the app and carries them out. The web application specifies it’s intent using a defined JSON object, and sockethub handles the protocol specifics. Things such as sending private messages, public posts, subscribing to feeds, chatting via XMPP - are all handled by sockethub, with the web application getting a consistent, unified, JSON object to communicate with the server.

App developers need not concern themselves with the specifics of the platform they hope to integrate with, they simply define a message object, and specify the platform they’d like to message on (ie. smtp, facebook, xmpp) and the message will be sent. No protocol specific code lies in the application.

Examples

Here’s what your app would send to sockethub (via. WebSockets) to deliver a message to someone on facebook.

{
  "rid": "d93d3i3d90lamc03",
  "verb": "send",
  "platform": "facebook",
  "actor": {
      "address": "zuck"
  },
  "target": {
      "address": "barackobama"
  },
  "object": {
      "body": "it's haz not has"
  }
} 

For the delivery of the message, that’s all that’s required. Note that “rid” means Request ID, and just needs to be a unique ID that the app can choose, Sockethub will use it when sending any information about the request.

The Sockethub protocol uses JSON objects that loosely follow Activity Streams. A working draft of example types of Sockethub tasks can be found here.

Here’s an example of an email message:

{
  "rid": "klf9fwkw3ks0sdxme33",
  "verb": "send",
  "platform": "email",
  "actor": {
      "name": "Bob User",
      "address": "bob@example.com"
  },
  "target": {
      "name": "Foo Bar",
      "address": "foo@bar.com"
  },
  "object": {
      "body": "Hello World!"
  }
} 

Leaving the implementation details of various platforms out of the application code frees the developer to focus on their application, and not on integrating any number social networks, or pushing out a new release because of breaking API updates. Instead Sockethub worries about all that stuff (hooray!), speaking directly to all of the various protocols needed to get the job done, and just send the message.

Just the beginning

While still in it’s infancy, a lot of progress is being made. We currently have working demos for email and facebook, with twitter and xmpp in the process of being completed. There’s still a long way to go, we’d like to support user discovery/search, subscriptions (RSS, twitter feeds, etc.), incoming emails (imap & pop), and much more.

So check it out, star it, clone it, ask questions, file bugs, request platforms/features - or help out and contribute code!

twitter: @sockethub

irc: #sockethub

http://sockethub.org

github: http://github.com/sockethub/sockethub

sockethub logo

September 20, 2012 6:54 pm

3 ways to configure HAProxy for WebSockets

Currently there aren’t many options when it comes to proxying WebSockets. Nginx doesn’t yet fully support WebSockets out of the box, though some people have opted to take an older version and patch it. I wont go into why I eventually decided to go with HAProxy, but I will link you to an article which does a nice job of summarizing the current state of Proxies and WebSockets.

Thankfully, there’s HAProxy, however for some reason it doesn’t seem to be so well known at the moment. There aren’t many overviews detailing it’s configuration, so I thought it would be useful to list a few common use cases and describe their setup.

Some potential ways to proxy to a WebSocket backend:

  • proxy based on sub-domain

  • proxy based on a URI

  • proxy using automatic detection

First, let’s get the top portion of our haproxy.cfg file out of the way. This is generally what I use for most configurations:

# this config needs haproxy-1.1.28 or haproxy-1.2.1
global
  log  127.0.0.1  local0
  log  127.0.0.1  local1 notice
  maxconn  4096
  chroot   /usr/share/haproxy
  uid  99
  gid  99
  daemon

defaults
  log   global
  mode  http
  option  httplog
  option  dontlognull
  retries  3
  option  redispatch
  option  http-server-close
  maxconn  2000
  contimeout  5000
  clitimeout  50000
  srvtimeout  50000

One very important thing to point out in this config is the ‘option http-server-close’ line. Without this, some of the examples below can behave incorrectly. The option tells HAProxy to ignore the servers ‘keepalive’ setting. If it were not specified, then in some cases the conditional rules (used below) would not be re-evaluated every time there is a new request. Instead HAProxy would use the previously established connection for the new request(s) and so therefore would fail to notice that the new request might be a socket request. In short, If you are using WebSockets in a mixed environment, always make sure ‘option http-server-close’ is set.

note: I previously used ‘option httpclose’ which disables keepalive on both the client and server, whereas http-server-close just disables keepalive on the server. Thanks to Willy Tarreau for pointing out the difference in the comments below. See the docs on http-server-close for more information

Now, let’s move on to the various common types of configuration. For the sake of simplicity, we’re going to assume that your socket server is running on port 8000 on localhost. However, this could easily be a completely different server or port.

note: although HAProxy is great for load-balancing, in these examples I’m not covering that kind of setup and so I have no health checks running on the servers. Perhaps I will cover load balancing with HAProxy in a future blog post.

1. Proxy based on sub-domain

Some people like to have their WebSocket server running on an entirely separate sub-domain (ie. websockets.example.com). This makes it easy to keep the services on completely separate machines, and also allows you to run your sockets on port 80 directly instead of using a second-proxy on the sub-domain.

frontend public
  bind *:80
  acl is_websocket hdr_end(host) -i ws.example.com
  use_backend ws if is_websocket
  default_backend www

backend www
  timeout server 30s
  server www1 127.0.0.1:8080

backend ws
  timeout server 600s
  server ws1 127.0.0.1:8000

This will direct all traffic going to ws.example.com to your websocket server (In this example it’s localhost, but you could easily plug in the IP to a different server).

note: if you are using 1.5-dev or higher, you can use the ‘timeout tunnel’ option which sets up larger timeout for WS connections than for normal HTTP connections automatically. This means you don’t have to set overly long timeouts on the client side. I haven’t used this option yet, which is why I don’t include it in the examples.

note: since the two ‘backend’ config blocks will not change for any of the examples, we’ll leave them out in the following two configurations.

2. Proxy based on URI

An alternative to the above setup is to proxy based on the URI (ie. example.com/websockets). This allows you to keep everything operating within the same virtual(or non-virtual) domain.

frontend public
  bind *:80
  acl is_example hdr_end(host) -i example.com
  acl is_websocket path_beg -i /websockets
  use_backend ws if is_websocket is_example
  default_backend www

This is how I usually set up my servers right now, because the application I’m working on is, for the most part, a standard PHP application, however I use a python socket server (Tornado) for handling certain tasks. We want to keep everything operating on the same domain, therefore when a socket connection is made with a specific URI, we proxy that straight to Tornado, whereas otherwise we send the request to our Nginx + Apache backend (see my previous blog post, 5 reasons to add Nginx to your LAMP stack right now).

3. Proxy using WebSocket detection

The final example uses an automatic detection of the websocket request by examining the HTTP header for the Upgrade: WebSocket line. My personal preference is to use this along with a secondary test to make sure we really want to pass the request along to the socket server.

frontend public
  bind *:80
  acl is_websocket hdr(Upgrade) -i WebSocket
  acl is_websocket_server hdr_end(host) -i ws.example.com
  use_backend ws if is_websocket is_websocket_server
  default_backend www

This config will ensure that the request not only has the Upgrade: WebSocket header, but also that it’s accessing the correct location for the websocket server (ws.example.com).

These are 3 common ways to configure HAProxy for WebSockets. Perhaps the setup you’ve found works best for you is different? Please share your comments, corrections or alternate configurations in the comments!

June 26, 2012 1:47 pm June 15, 2012 3:09 pm

5 reasons to add Nginx to your LAMP stack now

The web has continued to grow at ever increasing speed, along with that growth comes increased demands on server response time. It’s becoming quite clear that, although Apache has been a long-standing pillar in the world of the web, it no longer makes sense as an ‘across the board’ solution to serving web pages.

This may seem completely obvious to some groups of people, while to others - Apache is still that familiar face they don’t want to part with. Then there’s the common case of having an existing infrastructure built around Apache that is, more or less, operating acceptably. You can’t justify ripping out pieces just because there exists a (possibly) better solution. In these cases small, cautious, steps are the only real way to go about changing things.

For a majority of people transition is inevitable, however - to ease the possible transition, I suggest giving Nginx a try in a very non-intrusive way: as part of your existing server setup, and not removing Apache altogether (just yet). Here are 5 reasons why you should add Nginx to your LAMP stack today.

1. It’s extremely lightweight and simple to setup

OK, so this is definitely not a major reason. After all, you shouldn’t install something just because it’s easy. However I do think a lot of people tend to avoid new things out of fear they will have to spend hours learning new concepts and configuration oddities. This is most definitely not the case with Nginx.

 I figured it would be best to mention this reason first, so that I can take you through the setup and lay the foundation.

On an Ubuntu system (12.04), setup took only a few minutes. First, install Nginx:

sudo apt-get install nginx

 For the purposes of this article, I’m going to assume you are using Nginx version 1.1.x which is what is packaged for Ubuntu 12.04, otherwise some configuration options we add may not work (like proxy_http_version).

The default configuration is fine as it is, though you will want to add some proxy directives to /etc/nginx/nginx.conf at the end of (but still inside) the http { } section

http {
    ... default config skipped ...

    # proxy settings
    proxy_redirect     off;

    proxy_http_version 1.1;
    proxy_set_header Connection "";

    proxy_set_header   Host             $host;
    proxy_set_header   X-Real-IP        $remote_addr;
    proxy_set_header   X-Forwarded-For $proxy_add_x_forwarded_for;

    proxy_max_temp_file_size 0;
    proxy_connect_timeout      90;
    proxy_send_timeout         90;
    proxy_read_timeout         90;

    proxy_buffer_size   128k;
    proxy_buffers   4 256k;
    proxy_busy_buffers_size   256k;
    proxy_temp_file_write_size 256k;
}

 Now, you’ll want to setup your specific site config. On Ubuntu the setup is very similar to Apache. You have your /etc/nginx/sites-available/default config file. Here’s what mine looks like:

server {
    listen       80;
    server_name  
    location / {;
        proxy_pass         http://127.0.0.1:8080/;
        proxy_redirect     off;
        client_max_body_size       10m;
        client_body_buffer_size    128k;
    }
}

 Remember, Nginx is going to act like a proxy. It will accept all incoming requests, and forward them to where they need to go. In this config we specify server_name as our front-facing domain name, listening on port 80. Then, for the location / directive, we specify the host & port that Apache will be listening on (I suggest localhost, port 8080).

 Finally we need to tell Apache to stop listening on port 80, and instead to listen on port 8080. We do that with an easy change to the /etc/apache/ports.conf

NameVirtualHost *:8080
Listen 127.0.0.1:8080

 I’m assuming here, you have NameVirtualHost setup. If so, make sure to also modify your site config and change the port number there as well, in /etc/apache/sites-available/default

You’re done! Wasn’t that easy? Now just restart Apache and Nginx:

sudo /etc/init.d/apache restart
sudo /etc/init.d/nginx start

 At this point, when you visit Nginx should answer on port 80, and redirect the request to localhost:8080 which is where Apache now listens.

2. It serves static files faster

 The first tweak you can make to boost performance right away, is to set up Nginx to serve all static files for you. This takes load off Apache, as Nginx serves up static files much faster. 

 It’s extremely easy to setup, just add the following directive to your /etc/nginx/sites-available/default config file:

    # serve static files location /res  {     root    /var/www;     expires 30d;     }

 I keep all my static files in the /res directory off my webroot, but you if keep static files in several different directories, it’s easy to add multiple locations at once. For example:

    location ~ ^/(images|javascript|js|css|flash|media|static)/ {

 That directive will match any of those directories and serve their contents directly, instead of passing the request along to Apache.

Don’t forget to restart Nginx for the changes to take effect!

$ /etc/init.d/nginx restart

3. It can handle many more concurrent connections

 Nginx is very light weight and efficient. When put to the test against Apache it repeatedly outperforms, especially when it comes to handling concurrent connections.

 This means that you are helping to increase the performance of Apache, as it won’t have to handle so many concurrent connections to serve up static content like images, css and js files.

This gives you immediate savings in resources for your server, even though you’ve actually added another layer to your stack.

Apache vs Nginx benchmark

graph from http://blog.zhuzhaoyuan.com/2012/02/apache-24-faster-than-nginx/

4. It opens up new doors

 Nginx is a perfect choice for implementing things like long-polling or WebSockets. Having this proxy in place allows you to experiment with, and implement new ways to get your content to the front-end. With Nginx proxying to an asyncrhronous Python+Tornado app, you can slowly start to build stand-alone additions to your existing web application.

For example, I’ve begun implementing WebSockets for an existing large-scale LAMP application. It would be impossible to do this with Apache + the existing PHP code (which is huge). However with Nginx proxying for me, it’s dead simple to start slowly improving things with a simple Python+Tornado WebSocket app to push new events to the front-end using JSON. All I need to do is add a simple directive to my /etc/nginx/sites-available/default site conf.

    location /notify {
        proxy_pass http://127.0.0.1:8000;
        proxy_redirect off;
     }

 Once I fire up my tornado service (which should be listening on port 8000), any requests to /notify will be proxied to the tornado app, to handle any async calls I may need to carry out. This can help to speed up my existing PHP site by not only taking more load off Apache, but also allowing for smaller updates to be pushed to the user, instead of the user having to refresh to get them (or implementing some form of short-polling).

5. It makes future scaling much easier

 Let’s say you outgrow your VPC a year down the road. Already having Nginx in place will make things a lot easier on you.

 You can define groups of servers in your nginx.conf, all of which provide the same content. Then specify that group to proxy to. So, for example, you could set up another webserver with the LAMP stack, and add it to your list of servers in the nginx.conf. Now Nginx will load balance between your listed servers, the existing one and the new one.

 In this way, you can easily start to offload services until all you have left on your original server is Nginx proxying requests through to your new farm of webservers.

Scaling your LAMP stack with Nginx

image from http://scalr.net/blog/feature/2-1-feature-highlight-scale-from-one-to-many-servers/

Conclusion

I’ve always been a huge fan of Apache, but over time I’ve found myself in more varied development environments, and surprisingly I think the first performance hiccup is often not the database, or the code, but usually Apache (I have no proof to back up this claim:). This is because Apache tries to solve too many problems with one tool, and 90% of the problems it tries to address only apply to 10% of the scenarios.

The reason Nginx and Lighttpd are seeing increased usage, is because they address a few things really really well. These few things are usually all we need for a website.

What I’ve found really helps with this transitional approach, is that you can begin to test out your existing web application on Nginx, test the waters and see if everything works correctly. If so, you can eventually phase out Apache altogether (if that’s what you decide is appropriate). Alternatively, you can just leave Apache there, because the benefits of having Nginx in the stack still completely outweigh the downside of adding one more piece to the puzzle.


This page has been translated into Spanish language by Maria Ramos from Webhostinghub.com/support/edu

This theme was originally based on the Office theme by Alex Penny, with modifications by Nick Jennings