How to Calculate the Median in SQL
by Daniel JamesCalculating the median of a set is more challenging than simply calculating the average or mean of a set. Add in the syntax and complexity of SQL and the task may seem insurmountable at first. But with some simple concepts and a few examples, calculating the median value is no sweat. This how-to will use Transact-SQL for its examples.
Finding The Median
Understand the difference between the median and the mean of a set. The median is the "middle value" of a set, while the mean is the average of all of the elements of a set. For example, given an ordered set of numbers {1, 2, 6, 9, 10, 11}, the median will be 7.5 ([6 + 9] / 2), but the mean is 6.5 ([1 + 2 + 6 + 9 + 10 + 11] / 6). To calculate the median, count the number of elements in the set. If the count is even, take the average of the element at the position to be found by counting the total elements and dividing by two and the position found by dividing the total number of elements by 2 and adding one. If the count is odd, take the element at the position marked by the total count divided by two and rounded up to the nearest integer.
Create an ordered set of numbers with a new integral index. For example, if the numeric data is stored in the "num" column of the "data" table, create a new temporary table containing the "num" value with a new index: CREATE TABLE #values ( ID int NOT NULL IDENTITY(1,1), num numeric(9,4) ) INSERT INTO #values (num) SELECT num FROM data ORDER BY num The ORDER BY statement is very important for calculating the median.
Select the data from the temporary table where the ID is equal to half of the count of records in the table. If there are an odd number of records, take the average of the two values as the final median. This can be accomplished in the single query below: SELECT AVERAGE(num) FROM #values v JOIN ( SELECT COUNT(*) AS cnt FROM #values _v ) c ON ( c.ID = CEILING(_v.cnt / 2.0) AND ( _v.cnt % 2 = 1 -- cnt is odd OR ( _v.cnt % 2 = 0 -- cnt is even AND c.ID = (_v.cnt / 2.0) + 1 ) ) ) This query uses a subquery to find the count of records so that the middle value can be determined. In both the case of an even count and an odd count, the use of CEILING to round up (_v.cnt / 2.0) is accurate. (e.g., Even: 6 / 2 = 3; Odd: 7 / 2 = 3.5, which is rounded up to 4) Additionally, when the count is even, add 1 to (_v.cnt / 2.0) to get the second element to include in the final median.
Tip
- check If you are using a recent version of Microsoft SQL Server, such as 2005 or 2008, you can use "Common Table Expressions" instead of creating a temporary table. Depending on your schema and server configuration, this may provide an increase in performance.
Items you will need
References
Photo Credits
- photo_camera median ahead road sign image by John Steel from Fotolia.com