Top K Frequent Words
Given a non-empty list of words, return the k most frequent elements.
Your answer should be sorted by frequency from highest to lowest. If two words have the same frequency, then the word with the lower alphabetical order comes first.
Example 1: Input: ["i", "love", "leetcode", "i", "love", "coding"], k = 2 Output: ["i", "love"] Explanation: "i" and "love" are the two most frequent words. Note that "i" comes before "love" due to a lower alphabetical order.
Example 2: Input: ["the", "day", "is", "sunny", "the", "the", "the", "sunny", "is", "is"], k = 4 Output: ["the", "is", "sunny", "day"] Explanation: "the", "is", "sunny" and "day" are the four most frequent words, with the number of occurrence being 4, 3, 2 and 1 respectively. Note:
You may assume k is always valid, 1 ≤ k ≤ number of unique elements. Input words contain only lowercase letters.
Follow up: Try to solve it in O(n log k) time and O(n) extra space.
Can you solve it in O(n) time with only O(k) extra space?
Solution: Using priority queue
We can use a hash table to cound the number of each words. Then we insert each word with its number of occurences into a priority queue. We sort the each element by it's occurences. We can start pop k elements from the queue.
Time complexity: O(n*logk) and O(n) extra space
class Solution {
public:
struct Comp {
bool operator()(const pair<int, string>& lhs, const pair<int, string>& rhs) const {
if (lhs.first != rhs.first)
return lhs.first < rhs.first;
return lhs.second > rhs.second;
}
};
vector<string> topKFrequent(vector<string>& words, int k) {
unordered_map<string, int> m;
for (auto w : words) m[w]++;
priority_queue<pair<int, string>, vector<pair<int, string>>, Comp> q;
for (auto it : m) q.emplace(it.second, it.first);
vector<string> res;
while (k > 0 && !q.empty()) {
res.push_back(q.top().second);
q.pop();
k -= 1;
}
return res;
}
};
Solution: Using Bucket Sort
The idea is to create a bucket of set string of size N+1, first we need to use a hash table to store word count. Then we start insert each element in hash table to its corresponding bucket, the set will ensure the uniqueness and ascending order.
vector<string> topKFrequent(vector<string>& words, int k) {
unordered_map<string, int> freq;
vector<set<string>> bucket(words.size()+1, set<string>());
for (auto num: words) freq[num]++;
for (auto it: freq) {
bucket[it.second].insert(it.first);
}
vector<string> ans;
for (int i=bucket.size()-1; i >=0; i--) {
for (auto word: bucket[i]) {
ans.push_back(word);
k -= 1;
if (k == 0) return ans;
}
}
}
Solution: Using properties of map and set
We can take advantage of map and set, each element in map and set are stored in sorted order. We can first use a hash table to count each word, then insert a pair of int and set into the map. The set will ensure the uniqueness.
set is commonly implemented as a red-black binary search tree. Insertion on this data structure has a worst-case of O(log(n)) complexity, as the tree is kept balanced.
Time complexity is: O(N*logN) Space complexity is: O(N)
vector<string> topKFrequent(vector<string>& words, int k) {
unordered_map<string, int> freq;
map<int, set<string>> m;
for (string word: words) freq[word]++;
for (auto it: freq) {
m[it.second].insert(it.first);
}
vector<string> ans;
for (auto it = m.rbegin(); it != m.rend(); it++) {
for (auto word: it->second) {
if (k > 0) {
ans.push_back(word);
k -= 1;
} else {
return ans;
}
}
}
return ans;
}