38

我有一张带有一些 ids + 标题的表格。我想让标题列独一无二,但它已经有超过 60 万条记录,其中一些是重复的(有时是几十倍)。

如何删除除一个以外的所有重复项,以便在之后向标题列添加唯一键?

4

8 回答 8

80

此命令添加唯一键,并删除所有生成错误的行(由于唯一键)。这将删除重复项。

ALTER IGNORE TABLE table ADD UNIQUE KEY idx1(title); 

编辑:请注意,此命令可能不适用于某些 MySQL 版本的 InnoDB 表。有关解决方法,请参阅此帖子。(感谢“匿名用户”提供此信息。)

于 2010-05-19T16:45:22.287 回答
12

仅使用原始表的不同行创建一个新表。可能还有其他方法,但我发现这是最干净的。

CREATE TABLE tmp_table AS SELECT DISTINCT [....] FROM main_table

更具体地说
更快的方法是将不同的行插入到临时表中。使用删除,我花了几个小时从 800 万行的表中删除重复项。使用 insert 和 distinct,只用了 13 分钟。

CREATE TABLE tempTableName LIKE tableName;  
CREATE INDEX ix_all_id ON tableName(cellId,attributeId,entityRowId,value);  
INSERT INTO tempTableName(cellId,attributeId,entityRowId,value) SELECT DISTINCT cellId,attributeId,entityRowId,value FROM tableName;  
DROP TABLE tableName;  
INSERT tableName SELECT * FROM tempTableName;  
DROP TABLE tempTableName;  
于 2010-05-19T16:43:50.110 回答
1

下面的查询可用于删除除“id”字段值最低的一行之外的所有重复项

DELETE t1 FROM table_name t1, table_name t2 WHERE t1.id > t2.id AND t1.name = t2.name

类似地,我们可以保留 'id' 中值最高的行,如下所示

 DELETE t1 FROM table_name t1, table_name t2 WHERE t1.id < t2.id AND t1.name = t2.name
于 2017-11-19T11:34:47.097 回答
1

由于 MySqlALTER IGNORE TABLE 已被弃用,因此您需要在添加索引之前实际删除重复的日期。

首先编写一个查找所有重复项的查询。在这里,我假设这email是包含重复项的字段。

SELECT
    s1.email
    s1.id, 
    s1.created
    s2.id,
    s2.created 
FROM 
    student AS s1 
INNER JOIN 
    student AS s2 
WHERE 
    /* Emails are the same */
    s1.email = s2.email AND
    /* DON'T select both accounts,
       only select the one created later.
       The serial id could also be used here */
    s2.created > s1.created 
;

接下来只选择唯一的重复 ID:

SELECT 
    DISTINCT s2.id
FROM 
    student AS s1 
INNER JOIN 
    student AS s2 
WHERE 
    s1.email = s2.email AND
    s2.created > s1.created 
;

一旦确定仅包含要删除的重复 ID,请运行删除。您必须添加(SELECT * FROM tblname)以便 MySql 不会抱怨。

DELETE FROM
    student 
WHERE
    id
IN (
    SELECT 
        DISTINCT s2.id
    FROM 
        (SELECT * FROM student) AS s1 
    INNER JOIN 
        (SELECT * FROM student) AS s2 
    WHERE 
        s1.email = s2.email AND
        s2.created > s1.created 
);

然后创建唯一索引:

ALTER TABLE
    student
ADD UNIQUE INDEX
    idx_student_unique_email(email)
;
于 2017-03-23T16:12:08.230 回答
0

这显示了如何在 SQL2000 中执行此操作。我对 MySQL 语法并不完全熟悉,但我确信有类似的东西

create table #titles (iid int identity (1, 1), title varchar(200))

-- Repeat this step many times to create duplicates
insert into #titles(title) values ('bob')
insert into #titles(title) values ('bob1')
insert into #titles(title) values ('bob2')
insert into #titles(title) values ('bob3')
insert into #titles(title) values ('bob4')


DELETE T  FROM 
#titles T left join 
(
  select title, min(iid) as minid from #titles group by title
) D on T.title = D.title and T.iid = D.minid
WHERE D.minid is null

Select * FROM #titles
于 2010-05-19T16:43:59.327 回答
0
delete from student where id in (
SELECT distinct(s1.`student_id`) from student as s1 inner join student as s2
where s1.`sex` = s2.`sex` and
s1.`student_id` > s2.`student_id` and
s1.`sex` = 'M'
    ORDER BY `s1`.`student_id` ASC
)
于 2013-05-21T13:35:52.667 回答
0

Nitin 发布的解决方案似乎是最优雅/合乎逻辑的解决方案。

但是它有一个问题:

ERROR 1093 (HY000): 您不能在 FROM 子句中指定目标表 'student' 进行更新

然而,这可以通过使用 (SELECT * FROM student) 而不是 student 来解决:

DELETE FROM student WHERE id IN (
SELECT distinct(s1.`student_id`) FROM (SELECT * FROM student) AS s1 INNER JOIN (SELECT * FROM student) AS s2
WHERE s1.`sex` = s2.`sex` AND
s1.`student_id` > s2.`student_id` AND
s1.`sex` = 'M'
ORDER BY `s1`.`student_id` ASC
)

将您的 +1 推荐给 Nitin,以提出最初的解决方案。

于 2013-09-05T17:27:02.790 回答
0

删除 MySQL 表上的重复项是一个常见问题,通常伴随着特定需求。如果有人感兴趣,这里(删除 MySQL 中的重复行)我将解释如何使用临时表以可靠且快速的方式删除 MySQL 重复项(针对不同用例提供示例)。

在这种情况下,这样的事情应该可以工作:

-- create a new temporary table
CREATE TABLE tmp_table1 LIKE table1;

-- add a unique constraint    
ALTER TABLE tmp_table1 ADD UNIQUE(id, title);

-- scan over the table to insert entries
INSERT IGNORE INTO tmp_table1 SELECT * FROM table1 ORDER BY sid;

-- rename tables
RENAME TABLE table1 TO backup_table1, tmp_table1 TO table1;
于 2017-11-20T16:54:07.230 回答