The Contains Duplicate problem is a common coding challenge where the goal is to determine if an array (or list in Python) contains any duplicate elements. This problem tests your understanding of basic data structures, array manipulation, and the use of hash-based collections for optimized searching.

In this post, we will explore different methods to solve this problem using Python, starting with a simple brute-force solution, followed by a more efficient sorting-based approach, and finally an optimal solution using a Python set.

Problem Definition

You are given a list of integers nums, and you need to determine if any value appears more than once in the list. Return True if any value appears at least twice, otherwise return False.

Example:

Input: nums = [1, 2, 3, 1]
Output: True

Input: nums = [1, 2, 3, 4]
Output: False

Input: nums = [1, 1, 1, 3, 3, 4, 3, 2, 4, 2]
Output: True

Constraints:

  • The length of the list can be up to (10^5) elements.
  • Each element can range from (-10^9) to (10^9).

Let’s explore various ways to solve this problem using Python.

Solution 1: Brute-Force Approach

Approach:

The brute-force method checks every pair of elements in the list to see if they are equal. If any two elements are the same, the function returns True. If no duplicates are found, it returns False.

This approach uses two nested loops to compare each element with every other element in the list.

Code Implementation:

def contains_duplicate(nums):
    for i in range(len(nums)):
        for j in range(i + 1, len(nums)):
            if nums[i] == nums[j]:
                return True
    return False

# Test the function
nums = [1, 2, 3, 1]
print(contains_duplicate(nums))  # Output: True

Time and Space Complexity:

  • Time Complexity: O(n²). For each element, we check all other elements, which leads to a quadratic time complexity.
  • Space Complexity: O(1). The algorithm uses only a few variables and no additional data structures.

Explanation:

This solution uses two loops: the outer loop picks an element, and the inner loop compares it to all elements that follow. If two elements are found to be equal, the function returns True immediately. If no duplicates are found by the end of the loops, it returns False.

Drawbacks:

While this approach works, it is inefficient for large lists due to its O(n²) time complexity, making it unsuitable for lists with thousands or millions of elements.

Solution 2: Sorting Approach

Approach:

A more efficient way to solve this problem is by first sorting the list. Once the list is sorted, duplicates will be adjacent to each other, and we can simply check if any two consecutive elements are the same.

Code Implementation:

def contains_duplicate(nums):
    nums.sort()  # Sort the list
    for i in range(1, len(nums)):
        if nums[i] == nums[i - 1]:  # Check adjacent elements
            return True
    return False

# Test the function
nums = [1, 2, 3, 1]
print(contains_duplicate(nums))  # Output: True

Time and Space Complexity:

  • Time Complexity: O(n log n). The sorting operation dominates the time complexity, as sorting takes O(n log n).
  • Space Complexity: O(1) if sorting is done in place, but Python's sort() method operates in place by default, so no extra space is used aside from the input list.

Explanation:

In this solution, we sort the list first. Sorting brings all duplicate elements next to each other. After sorting, we iterate through the list and check if any two consecutive elements are the same. If they are, the function returns True. Otherwise, it returns False after completing the loop.

Benefits and Drawbacks:

  • Benefits: More efficient than the brute-force approach, as sorting reduces the number of comparisons.
  • Drawbacks: Sorting modifies the original list, which might not be desirable in some cases. Also, it has a time complexity of O(n log n), which is not optimal for large datasets.

Solution 3: Using a Set (Optimal Solution)

Approach:

The most efficient solution involves using a set. A set is an unordered collection of unique elements, and it allows for constant-time lookups and insertions on average. We can iterate through the list and check if the element is already in the set. If it is, we return True because a duplicate has been found. If not, we add the element to the set and continue.

Code Implementation:

def contains_duplicate(nums):
    seen = set()  # Create an empty set
    for num in nums:
        if num in seen:  # Check if num is already in the set
            return True
        seen.add(num)  # Add num to the set
    return False

# Test the function
nums = [1, 2, 3, 1]
print(contains_duplicate(nums))  # Output: True

Time and Space Complexity:

  • Time Complexity: O(n). We iterate through the list once, and both lookup and insertion operations in a set take O(1) on average.
  • Space Complexity: O(n). We use extra space for the set to store the elements.

Explanation:

In this solution, we use a set to store elements as we traverse the list. For each element, we check if it is already present in the set. If it is, we return True because a duplicate has been found. If not, we add the element to the set and move on to the next one. If we finish the loop without finding any duplicates, we return False.

Benefits and Drawbacks:

  • Benefits: This is the most efficient solution in terms of time complexity, processing each element in constant time on average.
  • Drawbacks: The space complexity is higher due to the extra memory used for the set. However, this trade-off is usually acceptable given the performance benefits.

Conclusion

In this post, we explored three different ways to solve the Contains Duplicate problem in Python:

  1. Brute-force approach: Simple to understand but inefficient for large lists with a time complexity of O(n²).
  2. Sorting approach: Faster than brute-force, with O(n log n) time complexity, but modifies the original list.
  3. Set approach: The optimal solution with O(n) time complexity and O(n) space complexity, making it the best choice for most cases.

When solving this problem, the set-based solution is typically the most practical, as it balances time efficiency with ease of implementation.