Performance and scalability are not the same.
It is insufficient to run through a QA environment and say, “Everything seems fast enough!” You may load test the same QA environment and conclude that performance is good, so you’ll be able to scale. You may be wrong! Performance and scalability are two very different things. You can often get more scalability out of applications that perform well — they aren’t mutually exclusive — but you don’t actually need one to get the other.
Performance is often about algorithms, which may be fast but not scalable. Building a select box in a server page might be very, very fast when you only have 10 items to put in it, but it’ll be a dog if you try 1000. This happens all the time. In fact, it just happened where I work. The simplest solution for the search functionality at hand was to put all users of a specific type into a select box. It worked great for months when we didn’t have many users of that type. But the data grew… and one day it’s a major performance issue for a major customer of our’s. Oops! Joel Spolsky writes about Schlemiel that Painter’s Algorithm, which is an excellent anecdote about algorithms that seem fast, but can’t scale.
Scalability doesn’t require good performance. It helps, but it’s not a prerequisite. Consider a page or process that is slow, say 10 seconds response time. If that page is consistently 10 seconds for every box you throw at it, then you can say it is scalable, if inefficient. You’ve achieved linear growth with a known response time for average performance. You have a known, measured metric in hand with which to estimate hardware requirements to support some future concurrent usage.
Scaling is also a measurement of how your application performs when spread across dozens or even hundreds of nodes! Does your application use a “shared nothing” architecture? Are the nodes chatty, doing multicasts to the entire cluster? If one node talks to every other node in the cluster to perform some kind of update in real time, it won’t scale. You’ll eventually hit a number of nodes where you spend more time keeping the other nodes in sync than doing real work. This is the same reason most OSes can’t keep accurate time below 10ms. It would spend more time keeping time than doing real work.
I’ll close with an excellent visual of performance vs. scalability, given by John Engates, CTO of Rackspace. He recently gave a talk at a MySQL conference about scaling web applications, and I think these two picture are worth more than their two thousand words: