The Contains Duplicate problem is a common challenge in coding interviews and competitive programming. The task is to determine whether an array contains any duplicate elements. In this post, we will walk through different solutions to this problem using the C programming language. We'll start with a brute-force approach, proceed to a sorting-based solution, and then explore an efficient method using a hash table.

Problem Definition

You are given an array of integers nums and the task is to check whether any value appears at least twice in the array. The function should return 1 (true) if any value appears more than once, and 0 (false) otherwise.

Example:

Input: nums = [1,2,3,1]
Output: 1 (true)

Input: nums = [1,2,3,4]
Output: 0 (false)

Input: nums = [1,1,1,3,3,4,3,2,4,2]
Output: 1 (true)

Constraints:

  • The size of the array can range from 1 to (10^5).
  • Each element in the array can range from (-10^9) to (10^9).

Let’s explore different ways to solve this problem in C.

Solution 1: Brute-Force Approach

Approach:

In the brute-force approach, we compare every element in the array with every other element to check if they are the same. If we find two elements that are equal, we return 1. If no duplicates are found by the end of the comparisons, we return 0.

Code Implementation:

#include <stdio.h>

int containsDuplicate(int* nums, int numsSize) {
    for (int i = 0; i < numsSize; ++i) {
        for (int j = i + 1; j < numsSize; ++j) {
            if (nums[i] == nums[j]) {
                return 1;  // Duplicate found
            }
        }
    }
    return 0;  // No duplicates
}

int main() {
    int nums[] = {1, 2, 3, 1};
    int numsSize = sizeof(nums) / sizeof(nums[0]);
    
    printf("%d\n", containsDuplicate(nums, numsSize));
    return 0;
}

Time and Space Complexity:

  • Time Complexity: O(n²). For each element, we are comparing it with every other element.
  • Space Complexity: O(1). No extra space is required aside from a few variables.

Explanation:

The brute-force solution uses two nested loops. The outer loop selects an element, and the inner loop checks if the selected element matches any of the subsequent elements in the array. If we find two elements that are equal, we return 1. If no duplicates are found, we return 0.

Drawbacks:

This approach is inefficient for large arrays due to the O(n²) time complexity, making it impractical for inputs with thousands or millions of elements.

Solution 2: Sorting Approach

Approach:

In this approach, we first sort the array. Once the array is sorted, any duplicates will be next to each other. After sorting, we only need to check if any two consecutive elements are the same. Sorting reduces the number of comparisons significantly.

Code Implementation:

#include <stdio.h>
#include <stdlib.h>

int compare(const void* a, const void* b) {
    return (*(int*)a - *(int*)b);  // Comparison function for qsort
}

int containsDuplicate(int* nums, int numsSize) {
    qsort(nums, numsSize, sizeof(int), compare);  // Sort the array
    for (int i = 1; i < numsSize; ++i) {
        if (nums[i] == nums[i - 1]) {  // Compare adjacent elements
            return 1;  // Duplicate found
        }
    }
    return 0;  // No duplicates
}

int main() {
    int nums[] = {1, 2, 3, 1};
    int numsSize = sizeof(nums) / sizeof(nums[0]);
    
    printf("%d\n", containsDuplicate(nums, numsSize));
    return 0;
}

Time and Space Complexity:

  • Time Complexity: O(n log n). Sorting the array takes O(n log n), which dominates the time complexity.
  • Space Complexity: O(1). We use qsort() which sorts the array in place, so no additional space is needed beyond the input array.

Explanation:

The array is sorted using qsort(), and after sorting, we check if any two consecutive elements are the same. If they are, we return 1. If no consecutive duplicates are found, we return 0.

Benefits and Drawbacks:

  • Benefits: Sorting reduces the number of comparisons and is more efficient than the brute-force approach.
  • Drawbacks: Although faster than brute force, the sorting operation still takes O(n log n) time, which is not ideal for very large arrays. Additionally, sorting modifies the original array, which might not be acceptable in some situations.

Solution 3: Using a Hash Table (Optimal Solution)

Approach:

The most efficient approach to solving this problem is by using a hash table. We can create a hash table to store elements as we iterate through the array. For each element, we check if it already exists in the hash table. If it does, we return 1 because it's a duplicate. If not, we add it to the table and move on.

In C, we can use the stdlib.h functions along with a dynamically allocated array to implement a simple hash table for this problem.

Code Implementation:

#include <stdio.h>
#include <stdlib.h>

#define TABLE_SIZE 100003  // A large prime number for hash table size

int hashFunction(int key) {
    return abs(key) % TABLE_SIZE;  // Simple hash function
}

int containsDuplicate(int* nums, int numsSize) {
    int* hashTable = (int*)calloc(TABLE_SIZE, sizeof(int));  // Hash table

    for (int i = 0; i < numsSize; ++i) {
        int hashIndex = hashFunction(nums[i]);

        // Linear probing in case of collision
        while (hashTable[hashIndex] != 0) {
            if (hashTable[hashIndex] == nums[i]) {  // Duplicate found
                free(hashTable);  // Free allocated memory
                return 1;
            }
            hashIndex = (hashIndex + 1) % TABLE_SIZE;  // Move to next index
        }

        // Insert element into the hash table
        hashTable[hashIndex] = nums[i];
    }

    free(hashTable);  // Free allocated memory
    return 0;  // No duplicates
}

int main() {
    int nums[] = {1, 2, 3, 1};
    int numsSize = sizeof(nums) / sizeof(nums[0]);

    printf("%d\n", containsDuplicate(nums, numsSize));
    return 0;
}

Time and Space Complexity:

  • Time Complexity: O(n). Each element is processed once, and hash table operations (insertion and lookup) take constant time on average.
  • Space Complexity: O(n). We need extra space to store the elements in the hash table.

Explanation:

In this solution, we use a hash table to store elements as we traverse the array. The hashFunction() computes the hash index for each element. If a collision occurs (i.e., two elements hash to the same index), we use linear probing to resolve the collision by moving to the next available slot. If we encounter an element already in the hash table, we return 1. If we finish the loop without finding duplicates, we return 0.

Benefits and Drawbacks:

  • Benefits: This approach is the most efficient in terms of time complexity, processing each element in constant time on average.
  • Drawbacks: The space complexity is higher due to the use of a hash table, and implementing a hash table in C requires manual memory management.

Edge Cases

When solving the Contains Duplicate problem in C, it’s important to account for various edge cases to ensure the solution works in all scenarios. Below are some key edge cases that should be considered when implementing the solution:

1. Empty Array:

If the input array is empty, there are no elements to compare, so the function should return false.

Example:

int nums[] = {};
int size = sizeof(nums) / sizeof(nums[0]);
printf("%d\n", containsDuplicate(nums, size));  // Output: 0 (false)
  • Expected Output: Since there are no elements in the array, there cannot be any duplicates, so the function should return false.

2. Array with One Element:

If the array contains only one element, it is impossible to have duplicates, so the function should return false.

Example:

int nums[] = {1};
int size = sizeof(nums) / sizeof(nums[0]);
printf("%d\n", containsDuplicate(nums, size));  // Output: 0 (false)
  • Expected Output: Since there is only one element, it cannot have duplicates, so the function should return false.

3. Array with All Unique Elements:

If the input array contains all unique elements (i.e., no duplicates), the function should return false.

Example:

int nums[] = {1, 2, 3, 4, 5};
int size = sizeof(nums) / sizeof(nums[0]);
printf("%d\n", containsDuplicate(nums, size));  // Output: 0 (false)
  • Expected Output: Since all elements in the array are unique, the function should return false.

4. Array with All Duplicate Elements:

If the array consists entirely of the same element repeated multiple times, the function should return true.

Example:

int nums[] = {1, 1, 1, 1};
int size = sizeof(nums) / sizeof(nums[0]);
printf("%d\n", containsDuplicate(nums, size));  // Output: 1 (true)
  • Expected Output: Since every element in the array is the same, the function should return true.

5. Array with Negative Numbers:

The array may contain negative numbers. The solution should handle these correctly and identify duplicates among negative numbers.

Example:

int nums[] = {-1, -2, -3, -1};
int size = sizeof(nums) / sizeof(nums[0]);
printf("%d\n", containsDuplicate(nums, size));  // Output: 1 (true)
  • Expected Output: The element -1 appears twice, so the function should return true.

6. Array with Negative and Positive Numbers:

The array may contain both negative and positive numbers. The solution should handle and distinguish between them correctly.

Example:

int nums[] = {-1, 2, -3, 3};
int size = sizeof(nums) / sizeof(nums[0]);
printf("%d\n", containsDuplicate(nums, size));  // Output: 0 (false)
  • Expected Output: Since all elements are unique, the function should return false.

7. Array with Zeros:

The array may contain zero values, and the solution should handle multiple zeros correctly.

Example:

int nums[] = {0, 1, 2, 0};
int size = sizeof(nums) / sizeof(nums[0]);
printf("%d\n", containsDuplicate(nums, size));  // Output: 1 (true)
  • Expected Output: The element 0 appears twice, so the function should return true.

8. Array with Mixed Duplicates and Unique Elements:

The array may contain a mix of unique and duplicate elements. The function should return true as soon as a duplicate is found.

Example:

int nums[] = {1, 2, 3, 2};
int size = sizeof(nums) / sizeof(nums[0]);
printf("%d\n", containsDuplicate(nums, size));  // Output: 1 (true)
  • Expected Output: The element 2 appears twice, so the function should return true.

9. Array with Large Numbers:

The array may contain very large or very small integers within the range [-10^9, 10^9]. Ensure that the solution can handle these extreme values correctly.

Example:

int nums[] = {-1000000000, 1000000000};
int size = sizeof(nums) / sizeof(nums[0]);
printf("%d\n", containsDuplicate(nums, size));  // Output: 0 (false)
  • Expected Output: Since the two elements are different and within the valid range, the function should return false.

10. Array with Maximum Length (100,000 elements):

If the array contains the maximum number of elements (100,000), the solution should handle the large dataset efficiently. This is especially important for the brute-force solution, which may fail or run too slowly for such input sizes.

Example:

int nums[100000];
for (int i = 0; i < 100000; i++) nums[i] = i;
nums[99999] = 99998;  // Add a duplicate
printf("%d\n", containsDuplicate(nums, 100000));  // Output: 1 (true)
  • Expected Output: The array contains a duplicate at the last position, so the function should return true. The solution should handle the large input size efficiently.

Conclusion

In this post, we discussed three different solutions for solving the Contains Duplicate problem in C:

  1. Brute-force approach: A simple but inefficient solution with O(n²) time complexity.
  2. Sorting approach: A faster solution using sorting, but with O(n log n) complexity.
  3. Hash table approach: The most efficient solution with O(n) time complexity, using a hash table for quick lookups.

Among these solutions, the hash table approach is the most optimal in terms of time efficiency, making it the best choice for larger arrays.