NAME [Toc] [Back]
sort - sort or merge files
SYNOPSIS [Toc] [Back]
sort [-m] [-o output] [-bdfinruM] [-t char] [-k keydef] [-y [kmem]] [-z
recsz] [-T dir] [file ...]
sort [-c] [-AbdfinruM] [-t char] [-k keydef] [-y [kmem]] [-z recsz] [-T
dir] [file ...]
DESCRIPTION [Toc] [Back]
sort performs one of the following functions:
1. Sorts lines of all the named files together and writes the
result to the specified output.
2. Merges lines of all the named (presorted) files together and
writes the result to the specified output.
3. Checks that a single input file is correctly presorted.
The standard input is read if - is used as a file name or no input
files are specified.
Comparisons are based on one or more sort keys extracted from each
line of input. By default, there is one sort key, the entire input
line. Ordering is lexicographic by characters using the collating
sequence of the current locale. If the locale is not specified or is
set to the POSIX locale, then ordering is lexicographic by bytes in
machine-collating sequence. If the locale includes multi-byte
characters, single-byte characters are machine-collated before multibyte
Behavior Modification Options [Toc] [Back]
The following options alter the default behavior:
-A Sorts on a byte-by-byte basis using each character's
encoded value. On some systems, extended characters
will be considered negative values, and so sort
before ASCII characters. If you are sorting ASCII
characters in a non-C/POSIX locale, this flag
performs much faster.
-c Check that the single input file is sorted according
to the ordering rules. No output is produced; the
exit code is set to indicate the result.
-m Merge only; the input files are assumed to be already
Hewlett-Packard Company - 1 - HP-UX 11i Version 2: August 2003
-o output The argument given is the name of an output file to
use instead of the standard output. This file can be
the same as one of the input files.
-u Unique: suppress all but one in each set of lines
having equal keys. If used with the -c option, check
to see that there are no lines with duplicate keys,
in addition to checking that the input file is
-y [kmem] The amount of main memory used by the sort can have a
large impact on its performance. If this option is
omitted, sort begins using a system default memory
size, and continues to use more space as needed. If
this option is presented with a value, kmem, sort
starts using that number of kilobytes of memory,
unless the administrative minimum or maximum is
violated, in which case the corresponding extremum
will be used. Thus, -y 0 is guaranteed to start with
minimum memory. By convention, -y (with no argument)
starts with maximum memory.
-z recsz The size of the longest line read is recorded in the
sort phase so that buffers can be allocated during
the merge phase. If the sort phase is omitted via
the -c or -m options, a popular system default size
will be used. Lines longer than the buffer size will
cause sort to terminate abnormally. Supplying the
actual number of bytes in the longest line to be
merged (or some larger value) will prevent abnormal
-T dir Use dir as the directory for temporary scratch files
rather than the default directory, which is is one of
the following, tried in order: the directory as
specified in the TMPDIR environment variable;
/var/tmp, and finally, /tmp.
Ordering Rule Options [Toc] [Back]
When ordering options appear before restricted sort key
specifications, the ordering rules are applied globally to all sort
keys. When attached to a specific sort key (described below), the
ordering options override all global ordering options for that key.
The following options override the default ordering rules:
-d Quasi-dictionary order: only alphanumeric characters
and blanks (spaces and tabs), as defined by LC_CTYPE
are significant in comparisons (see environ(5)).
Hewlett-Packard Company - 2 - HP-UX 11i Version 2: August 2003
(XPG4 only.) The behavior is undefined for a sort key
to which -i or -n also applies.
-f Fold letters. Prior to being compared, all lowercase
letters are effectively converted into their
uppercase equivalents, as defined by LC_CTYPE.
-i In non-numeric comparisons, ignore all characters
which are non-printable, as defined by LC_CTYPE. For
the ASCII character set, octal character codes 001
through 037 and 0177 are ignored.
-n The sort key is restricted to an initial numeric
string consisting of optional blanks, an optional
minus sign, zero or more digits with optional radix
character, and optional thousands separators. The
radix and thousands separator characters are defined
by LC_NUMERIC. The field is sorted by arithmetic
value. An empty (missing) numeric field is treated
as arithmetic zero. Leading zeros and plus or minus
signs on zeros do not affect the ordering. The -n
option implies the -b option (see below).
-r Reverse the sense of comparisons.
-M Compare as months. The first several non-blank
characters of the field are folded to uppercase and
compared with the langinfo(5) items ABMON_1 < ABMON_2
< ... < ABMON_12. An invalid field is treated as
being less than ABMON_1 string. For example,
American month names are compared such that JAN < FEB
< ... < DEC. An invalid field is treated as being
less than all months. The -M option implies the -b
option (see below).
Field Separator Options [Toc] [Back]
The treatment of field separators can be altered using the options:
-t char Use char as the field separator character; char is
not considered to be part of a field (although it can
be included in a sort key). Each occurrence of char
is significant (for example, <char><char> delimits an
empty field). If -t is not specified, <blank>
characters will be used as default field separators;
each maximal sequence of <blank> characters that
follows a non-<blank> character is a field separator.
-b Ignore leading blanks when determining the starting
and ending positions of a restricted sort key. If
the -b option is specified before the first -k option
(+pos1 argument), it is applied to all -k options
Hewlett-Packard Company - 3 - HP-UX 11i Version 2: August 2003
arguments). Otherwise, the -b option can be attached
independently to each -k field_start or field_end
option (+pos1 or (-pos2 argument; see below). Note
that the -b option is only effective when restricted
sort key specifications are given.
Restricted Sort Key [Toc] [Back]
-k keydef The keydef argument defines a restricted sort key.
The format of this definition is
which defines a key field beginning at field_start
and ending at field_end. The characters at positions
field_start and field_end are included in the key
field, providing that field_end does not precede
field_start. A missing field_end means the end of
the line. Fields and characters within fields are
numbered starting with 1. Note that this is
different than the obsolete form of restricted sort
keys, where numbering starts at 0. See WARNINGS
Specifying field_start and field_end involves the
notion of a field, a minimal sequence of characters
followed by a field separator or a new-line. By
default, the first blank of a sequence of blanks acts
as the field separator. All blanks in a sequence of
blanks are considered to be part of the next field;
for example, all blanks at the beginning of a line
are considered to be part of the first field.
The arguments field_start and field_end each have the
form m.n which are optionally followed by one or more
of the type options b, d, f, i, n, r, or M. These
modifiers have the functionality for this key only,
that their command-line counterparts have for the
A field_start position specified by m.n is
interpreted to mean the nth character in the mth
field. A missing n means .1, indicating the first
character of the mth field. If the -b option is in
effect, n is counted from the first non-blank
character in the mth field.
A field_end position specified by m.n is interpreted
to mean the nth character in the mth field. If n is
missing, the mth field ends at the last character of
Hewlett-Packard Company - 4 - HP-UX 11i Version 2: August 2003
the field. If the -b option is in effect, n is
counted from the first non-<blank> character in the
Multiple -k options are permitted and are significant
in command line order. A maximum of 9 -k options can
be given. If no -k option is specified, a default
sort key of the entire line is used. When there are
multiple sort keys, later keys are compared only
after all earlier keys compare equal. Lines that
otherwise compare equal are ordered with all bytes
significant. If all the specified keys compare
equal, the entire record is used as the final key.
The -k option is intended to replace the obsolete
[+pos1 [+pos2]] notation, using field_start and
field_end respectively. The fully specified [+pos1
is equivalent to:
-k w+1.x+1,y.0 (if z == 0)
-k w+1.x+1,y+1.z (if z > 0)
Obsolete Restricted Sort Key [Toc] [Back]
The notation +pos1 -pos2 restricts a sort key to one beginning at pos1
and ending at pos2. The characters at positions pos1 and pos2 are
included in the sort key (provided that pos2 does not precede pos1).
A missing -pos2 means the end of the line.
Specifying pos1 and pos2 involves the notion of a field, a minimal
sequence of characters followed by a field separator or a new-line.
By default, the first blank (space or tab) of a sequence of blanks
acts as the field separator. All blanks in a sequence of blanks are
considered to be part of the next field; for example, all blanks at
the beginning of a line are considered to be part of the first field.
pos1 and pos2 each have the form m.n optionally followed by one or
more of the flags bdfinrM. A starting position specified by +m.n is
interpreted to mean character n+1 in field m+1. A missing .n means
.0, indicating the first character of field m+1. If the b flag is in
effect, n is counted from the first non-blank in field m+1; +m.0b
refers to the first non-blank character in field m+1.
A last position specified by -m.n is interpreted to mean the nth
character (including separators) after the last character of the mth
field. A missing .n means .0, indicating the last character of the
mth field. If the b flag is in effect, n is counted from the last
leading blank in field m+1; -m.1b refers to the first non-blank in
Hewlett-Packard Company - 5 - HP-UX 11i Version 2: August 2003
EXTERNAL INFLUENCES [Toc] [Back]
LC_COLLATE determines the default ordering rules applied to the sort.
LC_CTYPE determines the locale for interpretation of sequences of
bytes of text data as characters (e.g., single- verses multibyte
characters in arguments and input files) and the behavior of character
classification for the -b, -d, -f, -i, and -n options.
LC_NUMERIC determines the definition of the radix and thousands
separator characters for the -n option.
LC_TIME determines the month names for the -M option.
LC_MESSAGES determines the language in which messages are displayed.
LC_ALL determines the locale to use to override the values of all the
other internationalization variables.
NLSPATH determines the location of message catalogs for the processing
LANG provides a default value for the internationalization variables
that are unset or null. If LANG is unset or null, the default value of
"C" (see lang(5)) is used.
If any of the internationalization variables contains an invalid
setting, sort behaves as if all internationalization variables are set
to "C". See environ(5).
International Code Set Support [Toc] [Back]
Single- and multi-byte character code sets are supported.
EXAMPLES [Toc] [Back]
Sort the contents of infile with the second field as the sort key:
sort -k 2,2 infile
Sort, in reverse order, the contents of infile1 and infile2, placing
the output in outfile and using the first two characters of the second
field as the sort key:
sort -r -o outfile -k 2.1,2.2 infile1 infile2
Sort, in reverse order, the contents of infile1 and infile2, using the
first non-blank character of the fourth field as the sort key:
sort -r -k 4.1b,4.1b infile1 infile2
Hewlett-Packard Company - 6 - HP-UX 11i Version 2: August 2003
Print the password file (/etc/passwd) sorted by numeric user ID (the
third colon-separated field):
sort -t: -k 3n,3 /etc/passwd
Print the lines of the presorted file infile, suppressing all but the
first occurrence of lines having the same third field:
sort -mu -k 3,3 infile
DIAGNOSTICS [Toc] [Back]
sort exits with one of the following values:
0 All input files were output successfully, or -c was
specified and the input file was correctly presorted.
1 Under the -c option, the file was not ordered as specified,
or if the -c and -u options were both specified, two input
lines were found with equal keys. This exit status is not
returned if the -c option is not used.
>1 An error occurred such as when one or more input lines are
When the last line of an input file is missing a new-line character,
sort appends one, prints a warning message, and continues.
If an error occurs when accessing the tables that contain the
collation rules for the specified language, sort prints a warning
message and defaults to the POSIX locale.
If a -d, -f, or -i option is specified for a language with multi-byte
characters, sort prints a warning message and ignores the option.
WARNINGS [Toc] [Back]
Numbering of fields and characters within fields (-k option) has
changed to conform to the POSIX standard. Beginning at HP-UX Release
9.0, the -k option numbers fields and characters within fields,
starting with 1. Prior to HP-UX Release 9.0, numbering started at 0.
A field separator specified by the -t option is recognized only if it
is a single-byte character.
The character type classification categories alpha, digit, space, and
print are not defined for multi-byte characters. For languages with
multi-byte characters, all characters are significant in comparisons.
For non-text input files, the behaviour is undefined.
AUTHOR [Toc] [Back]
sort was developed by OSF and HP.
Hewlett-Packard Company - 7 - HP-UX 11i Version 2: August 2003
FILES [Toc] [Back]
SEE ALSO [Toc] [Back]
comm(1), join(1), uniq(1), environ(5), lang(5).
STANDARDS CONFORMANCE [Toc] [Back]
sort: SVID2, SVID3, XPG2, XPG3, XPG4, POSIX.2
Hewlett-Packard Company - 8 - HP-UX 11i Version 2: August 2003 [ Back ]