问题描述

我有一个大的 wordpress 数据库:

关键表中的行:

730K wp_posts
404K wp_terms
752K wp_term_relationships
27K wp_term_taxonomy
1.8 Million wp_postmeta

问题是我有一个查询需要 5 秒钟才能完成,我想在添加任何缓存之前优化查询。

mysql> SELECT wp_posts.ID
 FROM wp_posts
 INNER JOIN wp_term_relationships
 ON (wp_posts.ID = wp_term_relationships.object_id)
 LEFT JOIN wp_postmeta
 ON (wp_posts.ID = wp_postmeta.post_id
 AND wp_postmeta.meta_key = '_Original Post ID' )
 LEFT JOIN wp_postmeta AS mt1
 ON ( wp_posts.ID = mt1.post_id )
 WHERE 1=1
 AND wp_posts.ID NOT IN (731467)
 AND ( wp_term_relationships.term_taxonomy_id IN (5) )
 AND wp_posts.post_type = 'post'
 AND (wp_posts.post_status = 'publish'
 OR wp_posts.post_status = 'private')
 AND ( wp_postmeta.post_id IS NULL
 OR ( mt1.meta_key = '_Original Post ID'
 AND CAST(mt1.meta_value AS CHAR) = 'deleted' ) )
 GROUP BY wp_posts.ID
 ORDER BY  wp_posts.ID DESC
 LIMIT 0, 20;

这是结果:

+--------+
| ID     |
+--------+
| 731451 |
| 731405 |
| 731403 |
| 731397 |
| 731391 |
| 731385 |
| 731375 |
| 731363 |
| 731361 |
| 731353 |
| 731347 |
| 731345 |
| 731335 |
| 731331 |
| 731304 |
| 731300 |
| 731284 |
| 731273 |
| 731258 |
| 731254 |
+--------+

对查询进行解释会产生以下信息

+----+-------------+-----------------------+--------+------------------------------------------------------------+------------------+---------+----------------------------------------+--------+-----------------------------------------------------------+
| id | select_type | table                 | type   | possible_keys                                                 | key              | key_len | ref                                    | rows   |   Extra                                                     |
+----+-------------+-----------------------+--------+------------------------------------------------------------+------------------+---------+----------------------------------------+--------+-----------------------------------------------------------+
|  1 | SIMPLE      | wp_term_relationships | range  | PRIMARY,term_taxonomy_id                                      | term_taxonomy_id | 16      | NULL                                   | 130445 |   Using where; Using index; Using temporary; Using filesort |
|  1 | SIMPLE      | wp_posts              | eq_ref | PRIMARY,post_name,type_status_date,post_parent,post_author | PRIMARY          | 8       | mydatabase.wp_term_relationships.object_id |      1 | Using where                                               |
|  1 | SIMPLE      | wp_postmeta           | ref    | post_id,meta_key                                           | post_id          | 8       | mydatabase.wp_term_relationships.object_id |      1 | Using where                                               |
|  1 | SIMPLE      | mt1                   | ref    | post_id                                                    | post_id          | 8       | mydatabase.wp_term_relationships.object_id |      1 | Using where                                               |
+----+-------------+-----------------------+--------+------------------------------------------------------------+------------------+---------+----------------------------------------+--------+-----------------------------------------------------------+

如何优化此查询以加载更快?我认为一个自定义索引将是去的方式,但不知道在哪些领域。另外我试图订购结果 wp_posts.ID DESC,但是得到同样的时间来执行查询。

最佳解决方案

我有完全相同的问题。这个问题不是可以修改的,而是可以修改一些你可能不应该使用的代码 (或者写一个过滤器或’drop-in’) 。问题是 SQL 语句中的 CAST 指令。它会在整个表完成任何事情之前,记录的数量,它需要一段时间。

捕获查询,删除以下"AND CAST(mt1.meta_value AS CHAR) = 'deleted'"并运行它,现在应该要快很多。

编辑:(更正) 将查询更改为"AND mt1.meta_value = 'deleted'"

我不知道开发人员在添加无用的 CAST 时是怎么想的,MySQL 没有它可以正常工作 (除了大小之外,TEXT 与 CHAR 没有任何区别) 。我相信有一些边的情况下,删除它不会给出所需的结果,但我还没有找到一个。

长活 Wordpress SQL X)

次佳解决方案

所以如果我在这个项目中使用 wordpress,我将要做的是创建一个反向的 post id 索引。我不认为这是”correct” 答案,有些人一定会完全不同意这种做法,但这正在为我生产中。

我从这个博客帖子中得到了这个想法:

https://www.igvita.com/2007/08/20/pseudo-reverse-indexes-in-mysql/

As I recently discovered, MySQL currently only supports storage of index values in ascending order….It took all of three weeks for AideRSS to hit the 3 million plus indexed blog posts, and in the process I could feel the site getting more sluggish: the descending order by clause was killing us. In the worst case, merging a union of several queries meant the performance hit of a filesort operation!

由于 mysql 的这个限制,下面的命令& 排序功能是查询中的瓶颈。

GROUP BY wp_posts.ID
ORDER BY  wp_posts.ID DESC

但是,从生产中平均 5-15 秒的平均值中删除这些条件,查询将在 20ms 内返回。但是问题是这些帖子按照最新到最新的顺序排列。我想要的是最新到最老的

SELECT wp_posts.*
FROM wp_posts
WHERE wp_posts.ID IN(
SELECT distinct(ID)
FROM wp_posts
INNER JOIN wp_term_relationships
ON (wp_posts.ID = wp_term_relationships.object_id)
LEFT JOIN wp_postmeta
ON (wp_posts.ID = wp_postmeta.post_id
AND wp_postmeta.meta_key = '_Original Post ID' )
LEFT JOIN wp_postmeta AS mt1
ON ( wp_posts.ID = mt1.post_id )
WHERE wp_posts.ID NOT IN (795025)
AND ( wp_term_relationships.term_taxonomy_id IN (1) )
AND wp_posts.post_type = 'post'
AND (wp_posts.post_status = 'publish'
OR wp_posts.post_status = 'private')
AND ( wp_postmeta.post_id IS NULL
OR ( mt1.meta_key = '_Original Post ID'
AND CAST(mt1.meta_value AS CHAR) = 'deleted' ) ) ) limit 0, 20;

所以再次回到这个帖子:https://www.igvita.com/2007/08/20/pseudo-reverse-indexes-in-mysql/

Following Peter Zaitsev’s advice on faking a reverse index, I decided to sidestep our problem by creating a separate reverse timestamp for the publication time of an indexed blog post. The trick is, since all indexes are stored in ascending order, instead of storing the publication date, you need to store a ‘countdown’ value from some date in the future. A few SQL queries will do the trick:

而不是存储帖子创建日期”reversed”,我决定在 wp_posts 表格中以 Post 格式存储 Post ID 。

所以我在我的 wordpress 数据库的 mysql 中添加了另外一列到 wp_posts 表。此新列存储 int 负数。

alter table wp_posts add column reverse_post_id int;

更新当前帖子以获取新列的相反数字:

 update wp_posts set reverse_post_id = (ID/ -1);

然后我在这个新的 reverse_post_id 上创建一个索引:

  create index reverse_post_id_index on wp_posts(post_type,post_status,reverse_post_id);

目前我通过自定义 api 界面编程插入文章,所以我插入后创建反向的 post id 。在通过界面插入帖子后,我将添加一个钩子来在 wordpress 中创建 reverse_post_id 。

我还要添加一个 mysql 调度事件以一定的时间间隔来更新 wp_posts,其中 reverse_post_id 为空。

最终查询看起来像这样,在生产中的运行时间不超过 20 ms:

SELECT wp_posts.*
FROM wp_posts
WHERE wp_posts.ID IN(
SELECT distinct(ID)
FROM wp_posts **force index (reverse_post_id_index)**
INNER JOIN wp_term_relationships
ON (wp_posts.ID = wp_term_relationships.object_id)
LEFT JOIN wp_postmeta
ON (wp_posts.ID = wp_postmeta.post_id
AND wp_postmeta.meta_key = '_Original Post ID' )
LEFT JOIN wp_postmeta AS mt1
ON ( wp_posts.ID = mt1.post_id )
WHERE wp_posts.ID NOT IN (795025)
AND ( wp_term_relationships.term_taxonomy_id IN (1) )
AND wp_posts.post_type = 'post'
AND (wp_posts.post_status = 'publish'
OR wp_posts.post_status = 'private')
AND ( wp_postmeta.post_id IS NULL
OR ( mt1.meta_key = '_Original Post ID'
AND CAST(mt1.meta_value AS CHAR) = 'deleted' ) ) ) limit 0, 20;

请注意,添加 「强制索引 (reverse_post_id_index)」 将以最新的 desc 顺序返回 wp_posts,而不使用”order by” 操作。需要注意的是,reverese_post_id 不能为空。

再次,这可能不是正确的答案,但我发现使我的工作和我的情况的答案。

参考文献

注:本文内容整合自 Google/Baidu/Bing 辅助翻译的英文资料结果。如果您对结果不满意,可以加入我们改善翻译效果:薇晓朵技术论坛。