MySQL 5.7 Reference Manual Including MySQL NDB Cluster 7.5 and NDB Cluster 7.6
This section describes when MySQL can use an index to satisfy
an ORDER BY
clause, the
filesort
operation used when an index
cannot be used, and execution plan information available from
the optimizer about ORDER BY
.
An ORDER BY
with and without
LIMIT
may return rows in different orders,
as discussed in Section 8.2.1.17, “LIMIT Query Optimization”.
In some cases, MySQL may use an index to satisfy an
ORDER BY
clause and avoid the extra
sorting involved in performing a filesort
operation.
The index may also be used even if the ORDER
BY
does not match the index exactly, as long as
all unused portions of the index and all extra
ORDER BY
columns are constants in the
WHERE
clause. If the index does not
contain all columns accessed by the query, the index is used
only if index access is cheaper than other access methods.
Assuming that there is an index on
(
, the
following queries may use the index to resolve the
key_part1
,
key_part2
)ORDER BY
part. Whether the optimizer
actually does so depends on whether reading the index is
more efficient than a table scan if columns not in the index
must also be read.
In this query, the index on
(
enables
the optimizer to avoid sorting:
key_part1
,
key_part2
)
SELECT * FROM t1 ORDER BYkey_part1
,key_part2
;
However, the query uses SELECT *
,
which may select more columns than
key_part1
and
key_part2
. In that case,
scanning an entire index and looking up table rows to
find columns not in the index may be more expensive than
scanning the table and sorting the results. If so, the
optimizer is not likely to use the index. If
SELECT *
selects only the index
columns, the index is used and sorting avoided.
If t1
is an InnoDB
table, the table primary key is implicitly part of the
index, and the index can be used to resolve the
ORDER BY
for this query:
SELECTpk
,key_part1
,key_part2
FROM t1 ORDER BYkey_part1
,key_part2
;
In this query, key_part1
is
constant, so all rows accessed through the index are in
key_part2
order, and an index
on (
avoids
sorting if the key_part1
,
key_part2
)WHERE
clause is
selective enough to make an index range scan cheaper
than a table scan:
SELECT * FROM t1 WHEREkey_part1
=constant
ORDER BYkey_part2
;
In the next two queries, whether the index is used is
similar to the same queries without
DESC
shown previously:
SELECT * FROM t1 ORDER BYkey_part1
DESC,key_part2
DESC; SELECT * FROM t1 WHEREkey_part1
=constant
ORDER BYkey_part2
DESC;
In the next two queries,
key_part1
is compared to a
constant. The index is used if the
WHERE
clause is selective enough to
make an index range scan cheaper than a table scan:
SELECT * FROM t1 WHEREkey_part1
>constant
ORDER BYkey_part1
ASC; SELECT * FROM t1 WHEREkey_part1
<constant
ORDER BYkey_part1
DESC;
In the next query, the ORDER BY
does
not name key_part1
, but all
rows selected have a constant
key_part1
value, so the index
can still be used:
SELECT * FROM t1 WHEREkey_part1
=constant1
ANDkey_part2
>constant2
ORDER BYkey_part2
;
In some cases, MySQL cannot use indexes
to resolve the ORDER BY
, although it may
still use indexes to find the rows that match the
WHERE
clause. Examples:
The query uses ORDER BY
on different
indexes:
SELECT * FROM t1 ORDER BYkey1
,key2
;
The query uses ORDER BY
on
nonconsecutive parts of an index:
SELECT * FROM t1 WHEREkey2
=constant
ORDER BYkey1_part1
,key1_part3
;
The query mixes ASC
and
DESC
:
SELECT * FROM t1 ORDER BYkey_part1
DESC,key_part2
ASC;
The index used to fetch the rows differs from the one
used in the ORDER BY
:
SELECT * FROM t1 WHEREkey2
=constant
ORDER BYkey1
;
The query uses ORDER BY
with an
expression that includes terms other than the index
column name:
SELECT * FROM t1 ORDER BY ABS(key
); SELECT * FROM t1 ORDER BY -key
;
The query joins many tables, and the columns in the
ORDER BY
are not all from the first
nonconstant table that is used to retrieve rows. (This
is the first table in the
EXPLAIN
output that does
not have a const
join
type.)
The query has different ORDER BY
and
GROUP BY
expressions.
There is an index on only a prefix of a column named in
the ORDER BY
clause. In this case,
the index cannot be used to fully resolve the sort
order. For example, if only the first 10 bytes of a
CHAR(20)
column are
indexed, the index cannot distinguish values past the
10th byte and a filesort
is needed.
The index does not store rows in order. For example,
this is true for a HASH
index in a
MEMORY
table.
Availability of an index for sorting may be affected by the
use of column aliases. Suppose that the column
t1.a
is indexed. In this statement, the
name of the column in the select list is
a
. It refers to t1.a
,
as does the reference to a
in the
ORDER BY
, so the index on
t1.a
can be used:
SELECT a FROM t1 ORDER BY a;
In this statement, the name of the column in the select list
is also a
, but it is the alias name. It
refers to ABS(a)
, as does the reference
to a
in the ORDER BY
,
so the index on t1.a
cannot be used:
SELECT ABS(a) AS a FROM t1 ORDER BY a;
In the following statement, the ORDER BY
refers to a name that is not the name of a column in the
select list. But there is a column in t1
named a
, so the ORDER
BY
refers to t1.a
and the index
on t1.a
can be used. (The resulting sort
order may be completely different from the order for
ABS(a)
, of course.)
SELECT ABS(a) AS b FROM t1 ORDER BY a;
By default, MySQL sorts GROUP BY
queries as if
you also included col1
,
col2
, ...ORDER BY
in the query.
If you include an explicit col1
,
col2
, ...ORDER BY
clause that contains the same column list, MySQL optimizes
it away without any speed penalty, although the sorting
still occurs.
If a query includes GROUP BY
but you want
to avoid the overhead of sorting the result, you can
suppress sorting by specifying ORDER BY
NULL
. For example:
INSERT INTO foo SELECT a, COUNT(*) FROM bar GROUP BY a ORDER BY NULL;
The optimizer may still choose to use sorting to implement
grouping operations. ORDER BY NULL
suppresses sorting of the result, not prior sorting done by
grouping operations to determine the result.
GROUP BY
implicitly sorts by default
(that is, in the absence of ASC
or
DESC
designators for GROUP
BY
columns). However, relying on implicit
GROUP BY
sorting (that is, sorting in
the absence of ASC
or
DESC
designators) or explicit sorting
for GROUP BY
(that is, by using
explicit ASC
or DESC
designators for GROUP BY
columns) is
deprecated. To produce a given sort order, provide an
ORDER BY
clause.
If an index cannot be used to satisfy an ORDER
BY
clause, MySQL performs a
filesort
operation that reads table rows
and sorts them. A filesort
constitutes an
extra sorting phase in query execution.
To obtain memory for filesort
operations,
the optimizer allocates a fixed amount of
sort_buffer_size
bytes up
front. Individual sessions can change the session value of
this variable as desired to avoid excessive memory use, or
to allocate more memory as necessary.
A filesort
operation uses temporary disk
files as necessary if the result set is too large to fit in
memory. Some types of queries are particularly suited to
completely in-memory filesort
operations.
For example, the optimizer can use
filesort
to efficiently handle in memory,
without temporary files, the ORDER BY
operation for queries (and subqueries) of the following
form:
SELECT ... FROMsingle_table
... ORDER BYnon_index_column
[DESC] LIMIT [M
,]N
;
Such queries are common in web applications that display only a few rows from a larger result set. Examples:
SELECT col1, ... FROM t1 ... ORDER BY name LIMIT 10; SELECT col1, ... FROM t1 ... ORDER BY RAND() LIMIT 15;
For slow ORDER BY
queries for which
filesort
is not used, try lowering the
max_length_for_sort_data
system variable to a value that is appropriate to trigger a
filesort
. (A symptom of setting the value
of this variable too high is a combination of high disk
activity and low CPU activity.)
To increase ORDER BY
speed, check whether
you can get MySQL to use indexes rather than an extra
sorting phase. If this is not possible, try the following
strategies:
Increase the
sort_buffer_size
variable value. Ideally, the value should be large
enough for the entire result set to fit in the sort
buffer (to avoid writes to disk and merge passes), but
at minimum the value must be large enough to accommodate
15 tuples. (Up to 15 temporary disk files are merged and
there must be room in memory for at least one tuple per
file.)
Take into account that the size of column values stored
in the sort buffer is affected by the
max_sort_length
system
variable value. For example, if tuples store values of
long string columns and you increase the value of
max_sort_length
, the
size of sort buffer tuples increases as well and may
require you to increase
sort_buffer_size
. For
column values calculated as a result of string
expressions (such as those that invoke a string-valued
function), the filesort
algorithm
cannot tell the maximum length of expression values, so
it must allocate
max_sort_length
bytes
for each tuple.
To monitor the number of merge passes (to merge
temporary files), check the
Sort_merge_passes
status variable.
Increase the
read_rnd_buffer_size
variable value so that more rows are read at a time.
Change the tmpdir
system variable to point to a dedicated file system with
large amounts of free space. The variable value can list
several paths that are used in round-robin fashion; you
can use this feature to spread the load across several
directories. Separate the paths by colon characters
(:
) on Unix and semicolon characters
(;
) on Windows. The paths should name
directories in file systems located on different
physical disks, not different
partitions on the same disk.
With
EXPLAIN
(see Section 8.8.1, “Optimizing Queries with EXPLAIN”), you can check whether
MySQL can use indexes to resolve an ORDER
BY
clause:
In addition, if a filesort
is performed,
optimizer trace output includes a
filesort_summary
block. For example:
"filesort_summary": { "rows": 100, "examined_rows": 100, "number_of_tmp_files": 0, "sort_buffer_size": 25192, "sort_mode": "<sort_key, packed_additional_fields>" }
The sort_mode
value provides information
about the contents of tuples in the sort buffer:
<sort_key, rowid>
: This
indicates that sort buffer tuples are pairs that contain
the sort key value and row ID of the original table row.
Tuples are sorted by sort key value and the row ID is
used to read the row from the table.
<sort_key, additional_fields>
:
This indicates that sort buffer tuples contain the sort
key value and columns referenced by the query. Tuples
are sorted by sort key value and column values are read
directly from the tuple.
<sort_key,
packed_additional_fields>
: Like the
previous variant, but the additional columns are packed
tightly together instead of using a fixed-length
encoding.
EXPLAIN
does not distinguish
whether the optimizer does or does not perform a
filesort
in memory. Use of an in-memory
filesort
can be seen in optimizer trace
output. Look for
filesort_priority_queue_optimization
. For
information about the optimizer trace, see
Section 8.15, “Tracing the Optimizer”.