Tuesday 27 March 2012

Squeezing the most performance out of your site

When writing a web page, or full blown application there are a few things that can turn your site from a greyhound into a plain old dog.
This biggest issue that governs the speed of a web site other than the speed or your server is the speed of the internet connection.  The speed of light, at 186,000 miles a second (300,000 km/s) is fast, however for a photon to travel 1000 miles from a client to a server and back, will take around 10ms.  The internet connections that you use though have to also contend with ADSL routers, packet contention, and packet shaping protocols which smooth the flow of data between nodes.  A typical network connection may consist of ten or twenty router hops between client and server and back again.  On top of that you have the tcp protocol that guarantees that the data has not been corrupted in transit.  And if you want SSL, you will need to double latency again when making the initial connection as the two sides set up a secure channel for data to flow.
That 10 ms second request from client to server could be as bad as 50ms... big deal!  Until you look at how your web application is built.

Client side optimisation

A typical web application may have upto 30 icons or spites, two or three javascript pages a css page or two to render properly (if you do not believe me look at extjs, ckeditor etc), in all you could be hitting the server with forty requests of information.  And that does not include the processing time to generate the requests from the client, interpret and respond the request on the server and the rendering effort that the web client needs to do.  All told you could find that your beautiful web application intended to take over the world takes 10 seconds to load by which time the potential client has moved onto your rivals rather ugly but snappier web site.  To mitigate this you can:
  • Combine all your sprites etc into one file, this reduces the number of network round trips between the client and the server, there is a great tool out there called SpriteMe which uses your browser's DOM to scan the used graphics referred to by the css, and combines them into one bitmap along with a list of instructions that you need to take to modify you CSS to read the right portion of the bitmap for each sprite.
  • Combine all your javascript files into one, concatenating them together.  Even files from different vendors.
  • Minify your javascript.  There are great tools for doing just that, there is the Google closure compiler, that loads your javascript into a virtual machine and spits it out without any padding and dead code, a great tool.
  • If it makes sense you can merge the css and javascript into the html file.  Note though that the css and javascript may be used by more than one page, and that they can be cached by the browser, which in this case would make sense to be stored separately from the html
  • Compress your html, javascript and css, this will results in fewer network packets being sent back and forth being sent per file, there are a lot of tools that can do this just search for them.  
If you do all of this you will see that the data is transferred is less and more importantly the number of requests to the server will be slashed from 40 to less than 5, which should mean that your web site loads in a tenth of a time that it did before.

Protocol optimisation

The HTTP protocol back in the early 90's was great in that it was an efficient way of getting static unstylised html from a glorified file server to an often text based web browser on a green screen.  The way it is used now is way beyond what it was initially designed for, and is creaking a bit in terms of efficiency.
The HTTP is a stateless half duplex protocol always initiated by the client, that does not remember anything about what happened in the past.  To manage state and implement web push we have to jump through all sorts of hoops to make it do what we want.  It is also can be incredibly complicated, with a morass of standards that use a mass of different data formats, which include uuencoding, base64, SHA and different text format including Unicode.  I personally think that it is no longer really fit for purpose and I will personally welcome to some rationality being injected into any new standard.
There are however some tweeks and major enhancements that change the game:
  • Keep alive, hopefully your server has got this one switched on, it is a set of http headers that tells the client and server to reuse the connection for the next request from the client.  This is a major performance enhancement, especially for SSL as recreating connections add latency to a new request.  It does have the downside in that it keeps a network port open which could be a limiting factor when there may be thousands of clients trying to connect to the server.
  • Expires or a Cache-Control Header, these headers tell your browser how long the file can be kept in cache before it needs to be reloaded.  The great thing about this is if the browser knows that it does not need to load a file but get it from its own cache then it will do so saving a lot of network latency,
  • Web Sockets, this is the biggie, it is new and so old browsers do not know about this one, you also may have problems with certain firewalls that do not recognise the protocol.  However if they do, you get essentially a raw socket between the server and the client, where each party can communicate to each other without waiting on the other or on any reply from the other.  This is a true revelation in that you can stream requests to the server and get streams of data coming back, along with any server originated event information.  The advantage to this is that while you are waiting for a reply from the server you can be busy sending requests for other files down the same channel all the bitmap concatenation effort that is mentioned before is less important as latency is no longer preventing you from making further requests, thereby giving you the same performance as you would by concatenating data in anticipation of such a request.  Also Web Sockets gives you the advantage to get events from the sever as soon as they arrive without the client having to poll for that information, thereby reducing the load on the server and delay times.
    It is all very well having this newfangled Web Sockets but how do I use it?  There are numerous libraries that specialise in this that can work as extensions to existing libraries including php, node.js, asp.net and C++.  These tools vary though in their maturity and the WebSockets spec is still being tweeked.

Serverside optimisation

This is where things get more interesting.  Developers are torn between performance and scalability, I often think that too much worry about the latter affects the former, when there was no need to be worried in the first place!  In early web applications there was clearly too much data on the hard drive to be possibly loaded into memory and served to clients.  As hard drives are much slower than memory, you often needed a cluster of servers to handle the traffic to the site.  However the content of most web sites nowadays can easily fit into the memory of a well endowed server.  Servers are now much faster and have many cores to allow a lot of work to be done concurrently.
The problem with conventional stateless architectures is that every database object loaded into memory after a web request is made is thrown away.  This increases latency as the serverside logic needs to wait for the database to do its work for every request.
Having a caching middle layer only uses the database to persist objects, all (or the most important) objects are stored in memory, and retrieved from memory using some form of lookup grammar.  With this sever side performance can increase tenfold, and can halve the perceived client latency, and server resources are used more efficiently, as there is less internal network traffic and the database is less busy, doing the important stuff such as persistence and locking.
You can also improve the performance of http and ajax request is to do mulitpart requests and responses.  Due to the half duplex nature of HTTP it is best to package all your requests at once rather than one after another as you may find your application blocks more often that you would like.  If however you are using WebSockets then forget it, this optimisation makes no difference as it is fully asynchronous, so you can stream your requests without waiting for the replies.
Look at content delivery networks, these cache your data more locally to the client than your central site does, if your packets travel only 100 miles rather than 1000, you will get quite considerable performance gains for poorly optimised websites.
Well worth reading: http://developer.yahoo.com/performance/rules.html

No comments:

Post a Comment