The Contains Duplicate problem is a common coding challenge in software development, where the task is to determine if an array contains duplicate elements. This problem is a fundamental example of array manipulation and hashing.

In this post, we will explore several different ways to solve this problem in C++. Each solution will be progressively more efficient, starting from a brute-force approach and ending with a highly optimized method using hash sets.

Problem Definition

You are given an array of integers nums. The task is to determine if any value appears at least twice in the array. Return true if any value appears more than once, and false if every element is unique.

Example:

Input: nums = [1,2,3,1]
Output: true

Input: nums = [1,2,3,4]
Output: false

Input: nums = [1,1,1,3,3,4,3,2,4,2]
Output: true

Constraints:

  • The size of the array nums can range from 1 to (10^5).
  • Each element in the array can be as small as (-10^9) and as large as (10^9).

Now, let’s go through different methods to solve this problem.

Solution 1: Brute-Force Approach

Approach:

In this solution, we compare every element with every other element in the array to check for duplicates. If two elements are found to be the same, we return true. If no duplicate is found after all comparisons, we return false.

This solution uses two nested loops: one to pick an element and the other to check if any element after it is the same.

Code Implementation:

#include <iostream>
#include <vector>
using namespace std;

bool containsDuplicate(vector<int>& nums) {
    for (int i = 0; i < nums.size(); ++i) {
        for (int j = i + 1; j < nums.size(); ++j) {
            if (nums[i] == nums[j]) {
                return true;
            }
        }
    }
    return false;
}

int main() {
    vector<int> nums = {1, 2, 3, 1};
    cout << (containsDuplicate(nums) ? "true" : "false") << endl;
    return 0;
}

Time and Space Complexity:

  • Time Complexity: O(n²). We have two nested loops, where each element is compared with all other elements, leading to a quadratic time complexity.
  • Space Complexity: O(1). The algorithm does not require any additional space apart from a few variables.

Explanation:

In this brute-force solution, we iterate through every possible pair of elements and compare them. If any two elements are the same, we return true. If no duplicates are found by the end, we return false.

Drawbacks:

This approach becomes very slow when dealing with larger arrays. As the size of the array increases, the number of comparisons increases quadratically, which can cause performance issues for arrays with tens of thousands of elements. Thus, we need more efficient solutions.

Solution 2: Sorting Approach

Approach:

A more optimized approach involves sorting the array first. Once the array is sorted, duplicates will be adjacent to each other. This means we can simply check if any two consecutive elements in the sorted array are equal.

Code Implementation:

#include <iostream>
#include <vector>
#include <algorithm>
using namespace std;

bool containsDuplicate(vector<int>& nums) {
    sort(nums.begin(), nums.end());  // Sort the array
    for (int i = 1; i < nums.size(); ++i) {
        if (nums[i] == nums[i - 1]) {  // Compare adjacent elements
            return true;
        }
    }
    return false;
}

int main() {
    vector<int> nums = {1, 2, 3, 1};
    cout << (containsDuplicate(nums) ? "true" : "false") << endl;
    return 0;
}

Time and Space Complexity:

  • Time Complexity: O(n log n). The sorting operation takes O(n log n), which dominates the overall time complexity.
  • Space Complexity: O(1) if the sorting is done in place, otherwise O(n) if we use extra space for sorting.

Explanation:

In this solution, we sort the array first using the built-in sort() function. Sorting re-arranges the elements in non-decreasing order, so any duplicates will now be next to each other. After sorting, we just need to compare adjacent elements to find if there are any duplicates.

Benefits and Drawbacks:

  • Benefits: This approach is more efficient than the brute-force solution because sorting significantly reduces the number of comparisons.
  • Drawbacks: Sorting takes O(n log n), and while this is much faster than O(n²), it still isn’t the most efficient solution available. Additionally, sorting changes the order of the elements in the array, which might not be acceptable in certain applications.

Solution 3: Using a Hash Set (Optimal Solution)

Approach:

The most efficient way to solve the problem is by using a hash set. A hash set is a data structure that allows for fast lookups and insertions, both of which have an average time complexity of O(1). As we iterate through the array, we can check whether the current element is already present in the hash set. If it is, we return true because that means the element is a duplicate. If not, we add the element to the hash set and continue.

Code Implementation:

#include <iostream>
#include <vector>
#include <unordered_set>
using namespace std;

bool containsDuplicate(vector<int>& nums) {
    unordered_set<int> seen;  // Hash set to store elements
    for (int num : nums) {
        if (seen.find(num) != seen.end()) {  // If num is already in the set
            return true;
        }
        seen.insert(num);  // Add num to the set
    }
    return false;
}

int main() {
    vector<int> nums = {1, 2, 3, 1};
    cout << (containsDuplicate(nums) ? "true" : "false") << endl;
    return 0;
}

Time and Space Complexity:

  • Time Complexity: O(n). We only traverse the array once, and hash set operations (insertion and lookup) take constant time on average.
  • Space Complexity: O(n). We need extra space to store the elements in the hash set.

Explanation:

In this solution, we use an unordered set (a hash set) to keep track of elements we have encountered. As we iterate through the array, we check if the current element exists in the set. If it does, we return true immediately because it’s a duplicate. Otherwise, we insert the element into the set and move on. If we finish iterating through the array without finding any duplicates, we return false.

Benefits and Drawbacks:

  • Benefits: This is the most efficient solution in terms of time complexity because each element is processed only once, and set operations are fast.
  • Drawbacks: This solution requires extra space to store the elements in the set, which can be a downside if memory is constrained.

Edge Cases

When solving the Contains Duplicate problem in C++, it’s important to account for various edge cases to ensure the solution works in all scenarios. Below are some important edge cases to consider when implementing the solution:

1. Empty Array:

If the input array is empty, there are no elements to compare, so the function should return false.

Example:

std::vector<int> nums = {};
std::cout << containsDuplicate(nums) << std::endl;  // Output: 0 (false)
  • Expected Output: Since there are no elements in the array, there cannot be any duplicates, so the function should return false.

2. Array with One Element:

If the array contains only one element, it’s impossible to have duplicates, so the function should return false.

Example:

std::vector<int> nums = {1};
std::cout << containsDuplicate(nums) << std::endl;  // Output: 0 (false)
  • Expected Output: Since there is only one element, it cannot have duplicates, so the function should return false.

3. Array with All Unique Elements:

If the input array contains all unique elements (i.e., no duplicates), the function should return false.

Example:

std::vector<int> nums = {1, 2, 3, 4, 5};
std::cout << containsDuplicate(nums) << std::endl;  // Output: 0 (false)
  • Expected Output: Since all elements are unique, the function should return false.

4. Array with All Duplicate Elements:

If the array consists entirely of the same element repeated multiple times, the function should return true.

Example:

std::vector<int> nums = {1, 1, 1, 1};
std::cout << containsDuplicate(nums) << std::endl;  // Output: 1 (true)
  • Expected Output: Since every element in the array is the same, the function should return true.

5. Array with Negative Numbers:

The array may contain negative numbers. The solution should correctly handle and identify duplicates among negative numbers.

Example:

std::vector<int> nums = {-1, -2, -3, -1};
std::cout << containsDuplicate(nums) << std::endl;  // Output: 1 (true)
  • Expected Output: The element -1 appears twice, so the function should return true.

6. Array with Negative and Positive Numbers:

The array may contain both negative and positive numbers. The solution should correctly distinguish between them.

Example:

std::vector<int> nums = {-1, 2, -3, 3};
std::cout << containsDuplicate(nums) << std::endl;  // Output: 0 (false)
  • Expected Output: Since all elements are unique, the function should return false.

7. Array with Zeros:

The array may contain zero values, and the solution should handle multiple zeros correctly.

Example:

std::vector<int> nums = {0, 1, 2, 0};
std::cout << containsDuplicate(nums) << std::endl;  // Output: 1 (true)
  • Expected Output: The element 0 appears twice, so the function should return true.

8. Array with Mixed Duplicates and Unique Elements:

The array may contain a mix of unique and duplicate elements. The function should return true as soon as a duplicate is found.

Example:

std::vector<int> nums = {1, 2, 3, 2};
std::cout << containsDuplicate(nums) << std::endl;  // Output: 1 (true)
  • Expected Output: The element 2 appears twice, so the function should return true.

9. Array with Large Numbers:

The array may contain very large or very small integers within the range [-10^9, 10^9]. The solution should handle these extreme values correctly.

Example:

std::vector<int> nums = {-1000000000, 1000000000};
std::cout << containsDuplicate(nums) << std::endl;  // Output: 0 (false)
  • Expected Output: Since the two elements are different and within the valid range, the function should return false.

10. Array with Maximum Length (100,000 elements):

If the array contains the maximum number of elements (100,000), the solution should handle the large dataset efficiently. This is especially important for the brute-force solution, which may fail or run too slowly for such input sizes.

Example:

std::vector<int> nums(100000);
for (int i = 0; i < 100000; i++) nums[i] = i;
nums[99999] = 99998;  // Add a duplicate
std::cout << containsDuplicate(nums) << std::endl;  // Output: 1 (true)
  • Expected Output: The array contains a duplicate at the last position, so the function should return true. The solution should handle the large input size efficiently.

Conclusion

In this post, we explored three different ways to solve the Contains Duplicate problem in C++:

  1. Brute-force approach: A simple method with quadratic time complexity, which is too slow for large arrays.
  2. Sorting approach: A more efficient solution with O(n log n) time complexity, but it modifies the original array.
  3. Hash set approach: The optimal solution with O(n) time complexity and O(n) space complexity, which offers the best balance between efficiency and readability.

The hash set solution is the best choice in most scenarios due to its fast runtime and simplicity in implementation.