PDA

View Full Version : Re-indexing and removing duplicates



TCS
4-Jan-2006, 05:25 AM
I have a file which contains numerous duplications. I need to be able to run
a command to re-index and remove these duplicates. I have tried the dfsort
command with the -b parameters and the -d parameters. These however just
create the .bad file. I know about the maintenance program and cleanup but
as there are so many of these records it would take a long time to go
through them all.


I would really appreciate someones help on this as it is happening to one of
our customers.


Regards

TimCS

DavePorter
4-Jan-2006, 06:40 AM
Hi there Tim,

How did the duplicate records come about ?
Is it corruption or because a new index has been added ?

Are you in a position to write a program to remove the duplicates ?

One option may be to read out the raw data to a text file ( dfquery can
do that ), use an editor to remove the duplicates ( maybe one with a
macro facility like NoteTab Pro or similar if possible ). Then use the
read.flx program to read the remaining data back in to an empty table.

HTH - cheers, Dave Porter


TCS wrote:

>I have a file which contains numerous duplications. I need to be able to run
>a command to re-index and remove these duplicates. I have tried the dfsort
>command with the -b parameters and the -d parameters. These however just
>create the .bad file. I know about the maintenance program and cleanup but
>as there are so many of these records it would take a long time to go
>through them all.
>
>
>I would really appreciate someones help on this as it is happening to one of
>our customers.
>
>
>Regards
>
>TimCS
>
>
>
>

Michael Fenton
4-Jan-2006, 07:12 AM
If the .bad is a consistant list of duplicate recnums, and the duplicate
records, as viewed in cleanup, as identical pairs, then you could simply
read in the next line of the .bad, find one of the pair by its recnum,
delete it and repeat. BUT if there are huge .bads, you need to know why
they are occurring. Are there program errors ? Are there network or
server errors ? Are there any bad data errors listed in the .bad ?

Michael Fenton

Tim Casson-Smith
4-Jan-2006, 07:43 AM
Guys

The duplicates occurred because of a hardware fault at the time. The index does not include the recnum. I have run an automatic cleanup after running a reindex through the file maintenance. However it keeps stopping stating that it cannot find records which to me seems strange.

Regards

TimCS

-------- Original Message --------
Subject: Re: Re-indexing and removing duplicates (04-Jan-2006 11:41)
From: Dave Porter <dave@dj-software.com.au>
To: dataflex

>
> Hi there Tim,
>
> How did the duplicate records come about ?
> Is it corruption or because a new index has been added ?
>
> Are you in a position to write a program to remove the duplicates ?
>
> One option may be to read out the raw data to a text file ( dfquery can
> do that ), use an editor to remove the duplicates ( maybe one with a
> macro facility like NoteTab Pro or similar if possible ). Then use the
> read.flx program to read the remaining data back in to an empty table.
>
> HTH - cheers, Dave Porter
>
>
> TCS wrote:
>
> >I have a file which contains numerous duplications. I need to be able to
> run
> >a command to re-index and remove these duplicates. I have tried the dfsort
> >command with the -b parameters and the -d parameters. These however just
> >create the .bad file. I know about the maintenance program and cleanup but
> >as there are so many of these records it would take a long time to go
> >through them all.
> >
> >
> >I would really appreciate someones help on this as it is happening to one
> of
> >our customers.
> >
> >
> >Regards
> >
> >TimCS
> >
> >
> >
> >

Bob Worsley
4-Jan-2006, 08:44 AM
If you have large numbers of duplicates, DAC's utilities for removing them
will probably not work. They are designed for small numbers and will
usually crash miserably with a large bad file. This still happens in VDF
11, I proved it the other day. And that's not necessarily a bad thing, one
shouldn't be trying to remove huge quantities of dupes that way.

There are at least two ways to approach the problem, the hard way and the
easy way.
Hard way - If the dupes aren't completely dupes in that each subsequent
"dupe" contains more recent information than the previous, you have a lot of
manual work to fix it all in that human intervention will be required to
make decisions. Someone would have to go through the file(s) manually and
delete unwanted records. Ich!
Easier way - Dump to a flat file, set up your file so that the index is
indeed unique, add a bit of code to the import routine to ignore error 28,
and import. Kind of a crowbar method, but it will work. The problem is
that you get pot luck on which of the duplicated records you will import.
If that's ok, then go this route. Be sure and reindex afterward to be sure
that indexes didn't get busted during the process.

There is a third way which is to try and write a routine that will remove
duplicates, and you might be able to do that but only you know your data
enough to understand what criteria would be necessary.

Bob Worsley

"TCS" <fred@bloggs.co.uk> wrote in message
news:UcN0#kREGHA.1712@dacmail.dataaccess.com...
> I have a file which contains numerous duplications. I need to be able to
run
> a command to re-index and remove these duplicates. I have tried the dfsort
> command with the -b parameters and the -d parameters. These however just
> create the .bad file. I know about the maintenance program and cleanup but
> as there are so many of these records it would take a long time to go
> through them all.
>
>
> I would really appreciate someones help on this as it is happening to one
of
> our customers.
>
>
> Regards
>
> TimCS
>
>

Ben Weijers
4-Jan-2006, 09:37 AM
Bob,

WYP send me the dat file (and tag and such) with dups file that crashes
dbBuilder when trying to cleanup. TIA,

Ben Weijers
Data Access Worldwide

Bob Worsley
4-Jan-2006, 01:14 PM
I'll have to think about where/when I did that, Ben. As I said, the
situation was with thousands of dupes, I converted a non-unique index to
unique. And, considering what I did, figured it really was out of the scope
of dbBldr to fix! <g> I'll try and duplicate it and send if I can remember
what file I did it with.
Bob


"Ben Weijers" <Ben.Weijers@dataaccess.nl> wrote in message
news:J2p$rxTEGHA.1756@dacmail.dataaccess.com...
> Bob,
>
> WYP send me the dat file (and tag and such) with dups file that crashes
> dbBuilder when trying to cleanup. TIA,
>
> Ben Weijers
> Data Access Worldwide
>
>

Larry Heiges
4-Jan-2006, 01:58 PM
Maybe add recnum to each of the indexes, then write a program to step
through looking for dups. You may need some logic or human interface
to determine which dups to delete.

Larry Heiges
App-2-Win Systems, Inc.

JimNC9
4-Jan-2006, 02:12 PM
Run DFMAINT, select cleanup and set to "auto"


Jim /*

WebApp Hosting
http://www.advanceddesignsinc.com/Web%20Hosting.htm



"TCS" <fred@bloggs.co.uk> wrote in message
news:UcN0%23kREGHA.1712@dacmail.dataaccess.com...
>I have a file which contains numerous duplications. I need to be able to
>run
> a command to re-index and remove these duplicates. I have tried the dfsort
> command with the -b parameters and the -d parameters. These however just
> create the .bad file. I know about the maintenance program and cleanup but
> as there are so many of these records it would take a long time to go
> through them all.
>
>
> I would really appreciate someones help on this as it is happening to one
> of
> our customers.
>
>
> Regards
>
> TimCS
>
>

Knut Sparhell
4-Jan-2006, 09:51 PM
TCS wrote:
> I have a file which contains numerous duplications. I need to be able to run
> a command to re-index and remove these duplicates. I have tried the dfsort
> command with the -b parameters and the -d parameters. These however just
> create the .bad file. I know about the maintenance program and cleanup but
> as there are so many of these records it would take a long time to go
> through them all.

My approach:

1. Create all indexes that fail bacause of duplicates as non-unique.

2. Make a program that runs through those indexes and either:

a) Deletes the duplicates, making a log of the deletes
c) "Move" the duplicates by changing the key, if this means the
index is also changed.
b) Changes the content of the indexed field(s) in a ways that
doesn't reduce the actual information. Like adding "(DUPn)" at the end
of a string.

3. Restore the indexes.

4. Check and correct manually, or make a program that helps the system
administrator find them and decide what to do with the "problem"
records, if any.

--
Knut Sparhell, Norway

TimCS
5-Jan-2006, 05:40 AM
Hi all

Sorry for not coming back sooner but I discovered the auto option thanks in cleanup. I really do appreciate everyones input on this as it was a major issue for the customer in question. I did have a bit of fun during this because I did not acknowledge that only 1 index really needed running first before the cleanup as I was getting record not found in the cleanup when I chose them all .

Regards

TimCS

-------- Original Message --------
Subject: Re: Re-2: indexing and removing duplicates (04-Jan-2006 18:50)
From: Larry Heiges <it@abbottpetro.com>
To: dataflex

>
> Maybe add recnum to each of the indexes, then write a program to step
> through looking for dups. You may need some logic or human interface
> to determine which dups to delete.
>
> Larry Heiges
> App-2-Win Systems, Inc.