Find duplicate field values in ArcGIS using Python

As ESRI is making it’s move away from VB Script and towards Python, I’m also slowly updating my stash of code snippets along the way. One of those little pieces of code I use quite often is one that identifies duplicate field names in a layer’s attribute table. I find this particularly helpful when I’m cleaning up tax parcel data, looking for duplicate parcel-ID numbers or SBL strings. Since I’ve been working a lot with parcel data lately, I figured it was time to move this code over to Python, too. So, here it is in step-by-step fashion…

1 – Add a new “Short Integer” type field to your attribute table (I usually call mine "Dup").

2 – Open the field calculator for the new field

3 – Choose "Python" as the Parser

4 – Check "Show Codeblock"

5 – Insert the following code into the "Pre-Logic Script Code:" text box making sure you preserve the indents:


uniqueList = []
def isDuplicate(inValue):
  if inValue in uniqueList:
    return 1
  else:
    uniqueList.append(inValue)
    return 0


6 – In the lower text box, insert the following text, replacing "!InsertFieldToCheckHere!" with whichever field you want to check the duplicates for:


isDuplicate( !InsertFieldToCheckHere! )


This will populate your new field with 0’s and 1’s, with the 1’s identifying those fields that are duplicates.

 

FindDups-Python

Reader Comments

  1. Hi Don,
    thanks for sharing this.
    The shortcoming I see with the suggested approach is that the first occurrence of a duplicate (or triplicate, …) value will get the value “0” in the “Dup” field, indicating that it is not a duplicate. Of course, this can be a desirable feature in certain applications, but I think often one wants to identify all redundant values not all but the first occurrence.
    Also, the suggested analysis is only binary: “either duplicate, or more precisely, n-icate” or “not duplicate”. You don’t know the number of occurrences of a certain value.
    I suggested a similar method which rules out the two afore-mentioned shortcomings. It indicates all redundant values as such and it also indicates the number of occurrences rather than just flag redundancies.
    You can glean the algorithm here:
    http://geo.ebp.ch/2011/03/24/mehrfach-auftretende-werte-in-einer-attributtabelle-finden
    The site is in German, but I’m sure you can figure it out easily.

  2. I think the speed of this calculation could be amped up by putting the unique “list” into a dictionary, rather than a list. Works great; thanks, Don! I tend to disagree with Ralph. I think that, more often than not, especially if you’re dealing with some sort of field of unique identifiers, users will probably want to keep a single record of value and purge duplicates.
    uniqueDict = {}
    def isDuplicate(inValue):
    if inValue in uniqueDict:
    return “Duplicated Record”
    else:
    uniqueDict.update({inValue:inValue})
    return “Single Record”

    1. It’s awsome… The time taken is exactly as if we’re calculating with simple expression.
      Thanks a lot….

  3. Works exactly as described – very easy to use for a Python beginner…

  4. Good Morning! Is there a way to modify this code to use the field calculator to give both occurances of the item the value of “1” instead of just the latter? I’m looking to do this in field calculator, not the python window (which may not be possible (?) since none of the blogs contain a way to do it using the field calculator) and I didn’t find the German site useful. Thanks! – H

  5. Great “quick and dirty” solution. This is python the way it is supposed to be used in GIS. Thanks!

  6. Thanks! is it possible to use two fields in conjunction i.e. isDuplicate( !Field1! !Field2!)
    (this approach didn’t work obviously, is there a way to do it?)

  7. Good day!! Please, is there a way I could filter or select the records and their corresponding duplicates without going through all the records
    Thanks.

  8. Hello, I am having a similar problem, I have tried to use the code from the German site but it is just not working for me. I know basically nothing about all of this coding stuff so I really need a detailed view of what I need to type in the pre logic script code. The name of my feature class is base_Clip2, and the attribute column I want to find duplicates in is called STREETNUMB. I want to just have the output column to say 0 for each row that doesn’t have a duplicate and a 1 for each row that does have a duplicate. The pre logic code I am using is off of the German site as follows:
    import arcpy
    uniqueList = {}
    ## Set the name of the feature class here
    fc = “testpoints”
    rows = arcpy.SearchCursor(fc)
    for row in rows:
    ## Set the name of the attribute here
    value = row.getValue(“type”)
    if value not in uniqueList:
    uniqueList[value] = 1
    else:
    uniqueList[value] = uniqueList[value] + 1
    def findIncidence(inValue):
    return uniqueList[inValue]
    Where I think I am getting confused is what do I input and where for me specifically.
    for uniqueList = {} do I put anything in here and if so will it have ! or ” or nothing before and after it?
    for fc = “testpoints” is this just base_Clip2 and do I put ! or ” or nothing before and after it with in the quotation marks?
    for value = row.getValue(“type”) do I put STREETNUMB and if so do I put ! or ” or nothing before and after it with in the quotation marks?
    for the other [value]’s do I put anything in these or do I leave them alone since value is defined already?
    same question for (inValue)?
    Please I really need help with this I just don’t understand what I am doing.

Comments are closed.