Find duplicate field values in ArcGIS using Python

As ESRI is making it’s move away from VB Script and towards Python, I’m also slowly updating my stash of code snippets along the way. One of those little pieces of code I use quite often is one that identifies duplicate field names in a layer’s attribute table. I find this particularly helpful when I’m cleaning up tax parcel data, looking for duplicate parcel-ID numbers or SBL strings. Since I’ve been working a lot with parcel data lately, I figured it was time to move this code over to Python, too. So, here it is in step-by-step fashion…

1 – Add a new “Short Integer” type field to your attribute table (I usually call mine "Dup").

2 – Open the field calculator for the new field

3 – Choose "Python" as the Parser

4 – Check "Show Codeblock"

5 – Insert the following code into the "Pre-Logic Script Code:" text box making sure you preserve the indents:


uniqueList = []
def isDuplicate(inValue):
  if inValue in uniqueList:
    return 1
  else:
    uniqueList.append(inValue)
    return 0


6 – In the lower text box, insert the following text, replacing "!InsertFieldToCheckHere!" with whichever field you want to check the duplicates for:


isDuplicate( !InsertFieldToCheckHere! )

This will populate your new field with 0’s and 1’s, with the 1’s identifying those fields that are duplicates.

 

FindDups-Python

8 thoughts on “Find duplicate field values in ArcGIS using Python

  1. Ralph

    Hi Don,

    thanks for sharing this.
    The shortcoming I see with the suggested approach is that the first occurrence of a duplicate (or triplicate, …) value will get the value “0” in the “Dup” field, indicating that it is not a duplicate. Of course, this can be a desirable feature in certain applications, but I think often one wants to identify all redundant values not all but the first occurrence.

    Also, the suggested analysis is only binary: “either duplicate, or more precisely, n-icate” or “not duplicate”. You don’t know the number of occurrences of a certain value.

    I suggested a similar method which rules out the two afore-mentioned shortcomings. It indicates all redundant values as such and it also indicates the number of occurrences rather than just flag redundancies.
    You can glean the algorithm here:
    http://geo.ebp.ch/2011/03/24/mehrfach-auftretende-werte-in-einer-attributtabelle-finden
    The site is in German, but I’m sure you can figure it out easily.

    Reply
  2. Bruce B.

    I think the speed of this calculation could be amped up by putting the unique “list” into a dictionary, rather than a list. Works great; thanks, Don! I tend to disagree with Ralph. I think that, more often than not, especially if you’re dealing with some sort of field of unique identifiers, users will probably want to keep a single record of value and purge duplicates.

    uniqueDict = {}
    def isDuplicate(inValue):
    if inValue in uniqueDict:
    return “Duplicated Record”
    else:
    uniqueDict.update({inValue:inValue})
    return “Single Record”

    Reply
    1. Mallikarjun

      It’s awsome… The time taken is exactly as if we’re calculating with simple expression.

      Thanks a lot….

      Reply
  3. Haley

    Good Morning! Is there a way to modify this code to use the field calculator to give both occurances of the item the value of “1” instead of just the latter? I’m looking to do this in field calculator, not the python window (which may not be possible (?) since none of the blogs contain a way to do it using the field calculator) and I didn’t find the German site useful. Thanks! – H

    Reply

Leave a Reply