有些站长朋友使用wordpress做采集站,不断的云采集各类文章自动发布到自己的网站上。但采集站最大的问题就是会采集到很多重复文章。这时,我们需将采集过来的重复文章进行去重处理。
以下是二种一次性去除标题重复文章的方法:
- 去除重复文章,只保留一篇 (只测试了这一种方法,确实有效,删除前一定要备份数据)
CREATE TABLE my_tmp AS SELECT MIN(ID) AS col1 FROM wp_posts GROUP BY post_title;
DELETE FROM wp_posts WHERE ID NOT IN (SELECT col1 FROM my_tmp);
DROP TABLE my_tmp; - 去除重复文章,一篇都不保留
CREATE TABLE my_tmp AS Select ID AS col1 From wp_posts Where post_title In (Select post_title From wp_posts Group By post_title Having Count(*)>2);
DELETE FROM wp_posts WHERE ID IN (SELECT col1 FROM my_tmp);
DROP TABLE my_tmp; - 另一种去除所有重复文章的方法
CREATE TABLE my_tmp AS Select ID AS col1 From wp_posts Where post_title In (Select post_title From wp_posts Group By post_title Having Count(*)>2);DELETE FROM wp_posts WHERE ID IN (SELECT col1 FROM my_tmp); DROP TABLE my_tmp;
操作方法很简单,只需将上面的SQL语句,放到自己网站数据库的 SQL框里,然后执行就可以了。(注意:操作之前,请先进行网站备份)