How to Solve the Contains Duplicate Problem in Java
The Contains Duplicate problem is a common algorithmic challenge where the goal is to determine whether a given array contains any duplicate elements. This problem is frequently encountered in coding interviews and real-world applications where unique values need to be enforced.
In this post, we will explore several approaches to solve this problem using Java, starting with a brute-force solution, moving to a more optimized sorting-based approach, and ending with the most efficient solution using a HashSet
.
Problem Definition
You are given an array of integers nums
. The task is to determine if any value appears at least twice in the array. Return true
if any value appears more than once, and false
otherwise.
Example:
Input: nums = [1, 2, 3, 1]
Output: true
Input: nums = [1, 2, 3, 4]
Output: false
Input: nums = [1, 1, 1, 3, 3, 4, 3, 2, 4, 2]
Output: true
Constraints:
- The length of the array can be up to (10^5) elements.
- Each element in the array can be as large as (10^9) or as small as (-10^9).
Let’s explore different solutions to this problem in Java.
Solution 1: Brute-Force Approach
Approach:
In the brute-force approach, we compare every element with every other element in the array to see if any two elements are the same. If two elements are found to be equal, we return true
. If no duplicates are found, we return false
.
Code Implementation:
public class ContainsDuplicate {
public static boolean containsDuplicate(int[] nums) {
for (int i = 0; i < nums.length; i++) {
for (int j = i + 1; j < nums.length; j++) {
if (nums[i] == nums[j]) {
return true; // Duplicate found
}
}
}
return false; // No duplicates found
}
public static void main(String[] args) {
int[] nums = {1, 2, 3, 1};
System.out.println(containsDuplicate(nums)); // Output: true
}
}
Time and Space Complexity:
- Time Complexity: O(n²) because of two nested loops.
- Space Complexity: O(1), since no extra space is used beyond a few variables.
Explanation:
This solution uses two loops to compare each element of the array with every other element. If any two elements are equal, the function immediately returns true
. If no duplicates are found after all comparisons, the function returns false
.
Drawbacks:
The brute-force solution is inefficient for large arrays because it has a time complexity of O(n²), which makes it slow when the array contains thousands of elements.
Solution 2: Sorting Approach
Approach:
A more efficient way to solve this problem is by first sorting the array. After sorting, any duplicates will be next to each other. Then, we simply check if any two consecutive elements are the same.
Code Implementation:
import java.util.Arrays;
public class ContainsDuplicate {
public static boolean containsDuplicate(int[] nums) {
Arrays.sort(nums); // Sort the array
for (int i = 1; i < nums.length; i++) {
if (nums[i] == nums[i - 1]) { // Compare adjacent elements
return true; // Duplicate found
}
}
return false; // No duplicates found
}
public static void main(String[] args) {
int[] nums = {1, 2, 3, 1};
System.out.println(containsDuplicate(nums)); // Output: true
}
}
Time and Space Complexity:
- Time Complexity: O(n log n) due to the sorting operation.
- Space Complexity: O(1) if sorting is done in place.
Explanation:
This approach works by first sorting the array using Arrays.sort()
. After the array is sorted, we check consecutive elements to see if they are equal. If any two adjacent elements are the same, we return true
. If no duplicates are found after traversing the array, we return false
.
Benefits and Drawbacks:
- Benefits: The sorting approach is more efficient than brute force since sorting reduces the number of comparisons.
- Drawbacks: Sorting modifies the original array, and the time complexity is still O(n log n), which can be slow for large datasets.
Solution 3: Using a HashSet (Optimal Solution)
Approach:
The most efficient way to solve this problem is by using a HashSet
. A HashSet
only stores unique elements, so as we iterate through the array, we can check if the current element already exists in the set. If it does, we return true
because that means a duplicate was found. Otherwise, we add the element to the set and continue.
Code Implementation:
import java.util.HashSet;
public class ContainsDuplicate {
public static boolean containsDuplicate(int[] nums) {
HashSet<Integer> set = new HashSet<>(); // Create a HashSet
for (int num : nums) {
if (set.contains(num)) { // If num is already in the set
return true; // Duplicate found
}
set.add(num); // Add num to the set
}
return false; // No duplicates found
}
public static void main(String[] args) {
int[] nums = {1, 2, 3, 1};
System.out.println(containsDuplicate(nums)); // Output: true
}
}
Time and Space Complexity:
- Time Complexity: O(n) because we are iterating through the array once, and the average time complexity for
HashSet
operations (insertion and lookup) is O(1). - Space Complexity: O(n) due to the extra space used by the
HashSet
.
Explanation:
In this solution, we use a HashSet
to store elements as we traverse the array. For each element, we check if it is already present in the set. If it is, we return true
because that means the element is a duplicate. If not, we add the element to the set and move on to the next element. If the entire array is processed without finding duplicates, we return false
.
Benefits and Drawbacks:
- Benefits: This solution is the most efficient in terms of time complexity, with O(n) performance, and it is straightforward to implement.
- Drawbacks: The space complexity is O(n) because we need extra memory to store the elements in the
HashSet
. However, this trade-off is usually acceptable given the significant performance gain.
Edge Cases
When solving the Contains Duplicate problem in Java, it is important to account for various edge cases to ensure the solution works in all scenarios. Below are some key edge cases that should be considered when implementing your solution:
1. Empty Array:
If the input array is empty, there are no elements to compare, so the function should return false
.
Example:
int[] nums = {};
System.out.println(containsDuplicate(nums)); // Output: false
- Expected Output: Since there are no elements in the array, there cannot be any duplicates, so the function should return
false
.
2. Array with One Element:
If the array contains only one element, it is impossible to have duplicates, so the function should return false
.
Example:
int[] nums = {1};
System.out.println(containsDuplicate(nums)); // Output: false
- Expected Output: Since there is only one element, it cannot have duplicates, so the function should return
false
.
3. Array with All Unique Elements:
If the input array contains all unique elements (i.e., no duplicates), the function should return false
.
Example:
int[] nums = {1, 2, 3, 4, 5};
System.out.println(containsDuplicate(nums)); // Output: false
- Expected Output: Since all elements in the array are unique, the function should return
false
.
4. Array with All Duplicate Elements:
If the array consists entirely of the same element repeated multiple times, the function should return true
.
Example:
int[] nums = {1, 1, 1, 1};
System.out.println(containsDuplicate(nums)); // Output: true
- Expected Output: Since every element in the array is the same, the function should return
true
.
5. Array with Negative Numbers:
The array may contain negative numbers. The solution should handle these correctly and identify duplicates among them.
Example:
int[] nums = {-1, -2, -3, -1};
System.out.println(containsDuplicate(nums)); // Output: true
- Expected Output: The element
-1
appears twice, so the function should returntrue
.
6. Array with Negative and Positive Numbers:
The array may contain both negative and positive numbers. The solution should handle and distinguish between these correctly.
Example:
int[] nums = {-1, 2, -3, 3};
System.out.println(containsDuplicate(nums)); // Output: false
- Expected Output: Since all elements are unique, the function should return
false
.
7. Array with Zeros:
The array may contain zero values, and the solution should handle multiple zeros correctly.
Example:
int[] nums = {0, 1, 2, 0};
System.out.println(containsDuplicate(nums)); // Output: true
- Expected Output: The element
0
appears twice, so the function should returntrue
.
8. Array with Mixed Duplicates and Unique Elements:
The array may contain a mix of unique and duplicate elements. The solution should return true
as soon as a duplicate is found.
Example:
int[] nums = {1, 2, 3, 2};
System.out.println(containsDuplicate(nums)); // Output: true
- Expected Output: The element
2
appears twice, so the function should returntrue
.
9. Array with Large Numbers:
The array may contain very large or very small integers within the range [-10^9, 10^9]
. Ensure that the solution can handle these extreme values correctly.
Example:
int[] nums = {-1000000000, 1000000000};
System.out.println(containsDuplicate(nums)); // Output: false
- Expected Output: Since the two elements are different and within the valid range, the function should return
false
.
10. Array with Maximum Length (100,000 elements):
If the array contains the maximum number of elements (100,000), the solution should handle the large dataset efficiently. This is especially important for the brute-force solution, which may fail or run too slowly for such input sizes.
Example:
int[] nums = new int[100000];
for (int i = 0; i < 100000; i++) nums[i] = i;
nums[99999] = 99998; // Add a duplicate
System.out.println(containsDuplicate(nums)); // Output: true
- Expected Output: The array contains a duplicate at the last position, so the function should return
true
. The solution should handle the large input size efficiently.
Conclusion
In this post, we explored three different solutions for solving the Contains Duplicate problem in Java:
- Brute-force approach: A simple but inefficient solution with O(n²) time complexity.
- Sorting approach: A faster solution with O(n log n) time complexity but modifies the original array.
- HashSet approach: The optimal solution with O(n) time complexity and O(n) space complexity, making it the best choice for large datasets.
When working with large arrays, the HashSet
solution is typically the most practical due to its efficiency in both time and space.