The Contains Duplicate problem is a common algorithmic challenge that requires determining whether a given array of integers contains any duplicate elements. This problem is often encountered in coding interviews and is essential for understanding array manipulation and hash-based data structures.

In this post, we will explore several approaches to solving this problem in C#, starting with a brute-force solution, moving to a more optimized sorting-based approach, and concluding with the most efficient method using a HashSet.

Problem Definition

You are given an array of integers nums. The task is to determine if any value appears at least twice in the array. Return true if any value appears more than once, and false otherwise.

Example:

Input: nums = [1, 2, 3, 1]
Output: true

Input: nums = [1, 2, 3, 4]
Output: false

Input: nums = [1, 1, 1, 3, 3, 4, 3, 2, 4, 2]
Output: true

Constraints:

  • The length of the array can be up to (10^5) elements.
  • Each element in the array can range from (-10^9) to (10^9).

Let’s explore different ways to solve this problem using C#.

Solution 1: Brute-Force Approach

Approach:

In the brute-force approach, we compare every element in the array with every other element to check if they are the same. If we find two elements that are equal, we return true. If no duplicates are found by the end, we return false.

Code Implementation:

using System;

public class Solution {
    public bool ContainsDuplicate(int[] nums) {
        for (int i = 0; i < nums.Length; i++) {
            for (int j = i + 1; j < nums.Length; j++) {
                if (nums[i] == nums[j]) {
                    return true;  // Duplicate found
                }
            }
        }
        return false;  // No duplicates found
    }

    public static void Main() {
        int[] nums = {1, 2, 3, 1};
        Solution sol = new Solution();
        Console.WriteLine(sol.ContainsDuplicate(nums));  // Output: true
    }
}

Time and Space Complexity:

  • Time Complexity: O(n²) due to two nested loops.
  • Space Complexity: O(1) since no additional space is used, aside from a few variables.

Explanation:

This brute-force solution uses two loops to compare each element of the array with every other element. If any two elements are equal, the function returns true. If no duplicates are found after all comparisons, the function returns false.

Drawbacks:

The brute-force approach is inefficient for large arrays due to its O(n²) time complexity, making it impractical for arrays with thousands of elements.

Solution 2: Sorting Approach

Approach:

A more efficient way to solve this problem is by sorting the array first. After sorting, any duplicate elements will be adjacent to each other. We can then simply check if any two consecutive elements are the same.

Code Implementation:

using System;

public class Solution {
    public bool ContainsDuplicate(int[] nums) {
        Array.Sort(nums);  // Sort the array
        for (int i = 1; i < nums.Length; i++) {
            if (nums[i] == nums[i - 1]) {  // Compare adjacent elements
                return true;
            }
        }
        return false;  // No duplicates found
    }

    public static void Main() {
        int[] nums = {1, 2, 3, 1};
        Solution sol = new Solution();
        Console.WriteLine(sol.ContainsDuplicate(nums));  // Output: true
    }
}

Time and Space Complexity:

  • Time Complexity: O(n log n) due to the sorting operation.
  • Space Complexity: O(1) if sorting is done in place using Array.Sort().

Explanation:

This approach works by first sorting the array using C#’s Array.Sort() method. Once the array is sorted, adjacent elements are checked for equality. If any two consecutive elements are the same, we return true. Otherwise, if no duplicates are found, the function returns false.

Benefits and Drawbacks:

  • Benefits: More efficient than brute force because sorting reduces the number of comparisons.
  • Drawbacks: Sorting modifies the original array, and the time complexity is still O(n log n), which can be slower for very large arrays.

Solution 3: Using a HashSet (Optimal Solution)

Approach:

The most efficient way to solve this problem is by using a HashSet. A HashSet only stores unique elements, and both insertion and lookup operations have an average time complexity of O(1). As we iterate through the array, we can check if the element is already in the set. If it is, we return true, indicating a duplicate. Otherwise, we add the element to the set and continue.

Code Implementation:

using System;
using System.Collections.Generic;

public class Solution {
    public bool ContainsDuplicate(int[] nums) {
        HashSet<int> seen = new HashSet<int>();  // Create a HashSet to store unique elements
        foreach (int num in nums) {
            if (seen.Contains(num)) {  // Check if num is already in the set
                return true;  // Duplicate found
            }
            seen.Add(num);  // Add num to the set
        }
        return false;  // No duplicates found
    }

    public static void Main() {
        int[] nums = {1, 2, 3, 1};
        Solution sol = new Solution();
        Console.WriteLine(sol.ContainsDuplicate(nums));  // Output: true
    }
}

Time and Space Complexity:

  • Time Complexity: O(n) since we are iterating through the array once, and set operations (insertions and lookups) take O(1) on average.
  • Space Complexity: O(n) due to the extra space used by the HashSet to store the elements.

Explanation:

In this approach, we use a HashSet to store elements as we iterate through the array. For each element, we check if it already exists in the set. If it does, we return true. If not, we add the element to the set. If no duplicates are found after the loop completes, the function returns false.

Benefits and Drawbacks:

  • Benefits: This is the most efficient solution in terms of time complexity, with O(n) performance. It does not modify the original array.
  • Drawbacks: The extra memory required to store the set makes the space complexity O(n), but this is typically an acceptable trade-off for performance.

Edge Cases

When solving the Contains Duplicate problem, it is essential to account for different edge cases to ensure your solution works in all scenarios. Below are some critical edge cases that should be considered and tested when implementing the solution:

1. Empty Array:

If the input array is empty, there are no elements to compare, so the function should return false.

Example:

int[] nums = new int[] {};
Console.WriteLine(sol.ContainsDuplicate(nums));  // Output: false
  • Expected Output: Since there are no elements in the array, there cannot be any duplicates, so the output should be false.

2. Array with One Element:

If the input array contains only one element, it is impossible to have duplicates, so the function should return false.

Example:

int[] nums = new int[] { 1 };
Console.WriteLine(sol.ContainsDuplicate(nums));  // Output: false
  • Expected Output: Since there is only one element, it cannot have any duplicates, so the output should be false.

3. Array with All Unique Elements:

If the input array contains entirely unique elements (i.e., no duplicates), the function should return false.

Example:

int[] nums = new int[] { 1, 2, 3, 4, 5 };
Console.WriteLine(sol.ContainsDuplicate(nums));  // Output: false
  • Expected Output: Since all the elements are unique, there are no duplicates, so the output should be false.

4. Array with All Duplicate Elements:

If the entire array consists of the same element repeated multiple times, the function should return true.

Example:

int[] nums = new int[] { 1, 1, 1, 1, 1 };
Console.WriteLine(sol.ContainsDuplicate(nums));  // Output: true
  • Expected Output: Since every element in the array is the same, the function should return true.

5. Array with Negative Numbers:

The array may contain negative numbers. The solution should correctly handle these and return whether duplicates exist.

Example:

int[] nums = new int[] { -1, -2, -3, -1 };
Console.WriteLine(sol.ContainsDuplicate(nums));  // Output: true
  • Expected Output: The element -1 appears twice, so the function should return true.

6. Array with Negative and Positive Numbers:

The array may contain both negative and positive numbers. Ensure the function correctly differentiates between them.

Example:

int[] nums = new int[] { -1, 2, -3, 3 };
Console.WriteLine(sol.ContainsDuplicate(nums));  // Output: false
  • Expected Output: Since all elements are unique, the function should return false.

7. Array with Zeros:

The array may contain zeros, and the function should handle multiple zeros correctly.

Example:

int[] nums = new int[] { 0, 1, 2, 0 };
Console.WriteLine(sol.ContainsDuplicate(nums));  // Output: true
  • Expected Output: The element 0 appears twice, so the function should return true.

8. Array with Mixed Duplicates and Unique Elements:

The array may contain a mix of unique and duplicate elements. The function should return true as soon as a duplicate is found.

Example:

int[] nums = new int[] { 1, 2, 3, 2, 4 };
Console.WriteLine(sol.ContainsDuplicate(nums));  // Output: true
  • Expected Output: The element 2 appears twice, so the function should return true.

9. Array with Large Numbers:

The array may contain very large or very small integers within the range [-10^9, 10^9]. Ensure that the solution works with large inputs.

Example:

int[] nums = new int[] { -1000000000, 1000000000 };
Console.WriteLine(sol.ContainsDuplicate(nums));  // Output: false
  • Expected Output: Since the two elements are different and within the given range, the function should return false.

10. Array with Maximum Length (10^5 elements):

If the array contains the maximum number of elements (100,000), the solution should efficiently handle large datasets without performance issues. This is particularly important for the brute-force solution, which may fail for such cases due to its O(n²) time complexity.

Conclusion

In this post, we explored three different ways to solve the Contains Duplicate problem in C#:

  1. Brute-force approach: Simple but inefficient with O(n²) time complexity.
  2. Sorting approach: More efficient with O(n log n) time complexity, but it modifies the original array.
  3. HashSet approach: The optimal solution with O(n) time complexity and O(n) space complexity, making it the best choice for large datasets.

In practice, the HashSet approach is often the best choice because of its speed and simplicity, especially when working with large arrays.