2
$\begingroup$

I need to sort a table of apartment numbers in SQL. For those unfamiliar, these are not necessarily numbers, but numerical ordering needs to be applied as though they were.

For example, a possible set of apartment numbers would be ordered as-follows:

1
10
101
101-A
1000
200
200A
2000
C
D
E-100
F

Doing ORDER BY CONVERT( int, ApartmentNumber ) wouldn't work because not all apartment-numbers are string-encoded decimal integers. Similarly doing ORDER BY ApartmentNumber wouldn't work because it would place 101 after 1000.

The other QAs on StackOverflow generally concern themselves with either a known fixed-format of value Sorting string column containing numbers in SQL? or doing the sort with error-handling: Sorting field by numerical value and lexicographical value

In my own project, a previous developer used this trick:

ORDER BY
    RIGHT('00000000000000000000' + RTRIM( ApartmentNumber ), 20 ) ASC

...which feels even worse. Thing is: this trick is algorithmically sound - it just feels like a hack to have to have the database engine perform string allocations (multiple times too, if they don't optimize the concatenation and RTRIM into a single string operation).

Another approach proffered on SO is:

ORDER BY
    LEN( ApartmentNumber ),
    ApartmentNumber

...however this yields this incorrect ordering of the input set:

1
C
D
F
10
101
200
1000
2000
200A
101-A
E-100

I'm using SQL Server 2012 locally (in 2008 (Level 100) compatibility mode) and this application will be deployed to an Azure SQL server (Azure SQL V12).

UPDATE:

I've been considering some options, I think the "best" is to use a fixed-length char(10) field instead of varchar(10) and to ensure that the content of the field is always left-aligned, that way the sort-order will be ensured with a simple ORDER BY ApartmentNumber.

UPDATE 2: I realised the above idea (char(10)) doesn't solve the problem of 200A needing to be sorted before 2000. I think the best solution would be to normalize the values to a uniform integer representation and storing that int value in the database. The algorithm to do the conversion would be best (i.e. most succinctly) if it were not written in SQL.

$\endgroup$
8
  • 2
    $\begingroup$ which dbms are you using? $\endgroup$ Commented Aug 22, 2016 at 0:07
  • $\begingroup$ @vkp I was hoping there might be a good cross-platform solution, but I'm targeting SQL Server 2012 and SQL Azure V12. $\endgroup$ Commented Aug 22, 2016 at 0:09
  • $\begingroup$ Perhaps you need to come up with a set of masks that match up with all the various formats. $\endgroup$ Commented Aug 22, 2016 at 0:13
  • $\begingroup$ @shawnt00 Looking at the text - I see what it should really do is identify the first numeric value in the string, sort by that, and then do a lexicographical sort. $\endgroup$ Commented Aug 22, 2016 at 0:33
  • $\begingroup$ Given you're using SQL Server, are you able to use CLR stored procedures? $\endgroup$ Commented Aug 22, 2016 at 0:34

1 Answer 1

0
$\begingroup$

I don't think my original idea of using simple masking patterns was going to work well and I ended up reducing the problem to these four expressions. Hopefully you can find some value in it all.

order by
    case
         when patindex('%[0-9][A-Z-]%', apt) > 0
           then cast(substring(apt, 1, patindex('%[0-9][A-Z-]%', apt)) as int)
         when patindex('[0-9]%', apt) = 1
           then cast(apt as int)
         else 99999
    end /* numeric_prefix */,
    case
         when patindex('%[A-Z][0-9-]%', apt) > 0
           then left(apt, patindex('%[A-Z][0-9-]%', apt))
         when patindex('[A-Z]%', apt) = 1
           then apt
         else ''
    end /* char_prefix */,
    case 
         when patindex('%[A-Z-][0-9]%', apt) > 0
           then cast(substring(apt, patindex('%[A-Z-][0-9]%', apt) + 1, 10) as int)
         else 0
    end /* numeric_suffix */,
    replace(apt, '-', '') /* stripped_hyphen */,
    case when charindex('-', apt) = 0 then 0 else 1 end /* sort_hyphen_last */

http://rextester.com/MFYVN38156

Here's an single expression that you could store and use for sorting:

case
     when patindex('%[0-9][A-Z-]%', apt) > 0
     then right('*****' + left(apt, patindex('%[0-9][A-Z-]%', apt)), 5) +
          right('>>>>>' +
              replace(substring(apt, patindex('%[0-9][A-Z-]%', apt) + 1, 10), '-', ''), 5)
     when patindex('%[A-Z][0-9-]%', apt) > 0
     then right('>>>>>' + left(apt, patindex('%[A-Z][0-9-]%', apt)), 5) +
          right('*****' +
              replace(substring(apt, patindex('%[A-Z][0-9-]%', apt) + 1, 10), '-', ''), 5)
     when patindex('[0-9]%', apt) = 1 then right('*****' + apt, 5) + '>>>>>'
     when patindex('[A-Z]%', apt) = 1 then right('>>>>>' + apt, 5) + '*****'
     else '>>>>>*****'
end +
case when charindex('-', apt) = 0 then '' else '-' end /* collate Latin1_General_BIN */

> was chosen as a character that sorts between the digits and alphabetic characters. * falls before the digits. The output looks like this:

   apt     sort_string  
 -------- ------------- 
  1        ****1>>>>>   
  10       ***10>>>>>   
  101      **101>>>>>   
  101-A    **101>>>>A-  
  200      **200>>>>>   
  200A     **200>>>>A   
  200-A    **200>>>>A-  
  1000     *1000>>>>>   
  2000     *2000>>>>>   
  C        >>>>C*****   
  D        >>>>D*****   
  E        >>>>E*****   
  E10      >>>>E***10   
  E100     >>>>E**100   
  E-100    >>>>E**100-  
  E101     >>>>E**101   
  E-101    >>>>E**101-  
  E200     >>>>E**200   
  E-200    >>>>E**200-  
  E1000    >>>>E*1000   
  E-1001   >>>>E*1001-  
  F        >>>>F*****
$\endgroup$
Sign up to request clarification or add additional context in comments.

4 Comments

I'm curious what the runtime performance of this is. I note that if the individual WHEN cases are evaluated sequentially (instead of in-parallel) then they could be rearranged into a tree to avoid unnecessary evaluation.
How often are you going to be running this whole operation? Is performance really something you need to worry much about here?
This solution doesn't place negative numbers before positive numbers. This could be fixed by replacing the final line with case when charindex('-', apt) = 0 then '+' else '-' end /* collate Latin1_General_BIN */
Were we expecting negative apartment numbers?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.