【LeetCode】792. 匹配子序列的单词数

题目描述

给定字符串 s 和字符串数组 words, 返回 words[i] 中是s的子序列的单词个数。
字符串的子序列是从原始字符串中生成的新字符串，可以从中删去一些字符(可以是none)，而不改变其余字符的相对顺序。
例如， “ace” 是 “abcde” 的子序列。

示例 1：

输入: s = “abcde”, words = [“a”,“bb”,“acd”,“ace”]
输出: 3
解释: 有三个是 s 的子序列的单词: “a”, “acd”, “ace”。

示例 2：

输入: s = “dsahjpjauf”, words = [“ahjpjau”,“ja”,“ahbwzgqnuk”,“tnmlanowax”]
输出: 2

提示：

1 <= s.length <= 5 * 104
1 <= words.length <= 5000
1 <= words[i].length <= 50
words[i]和 s 都只由小写字母组成。

朴素方法：暴力求解，时间复杂度太高，此题会超时，不可用。

class Solution {
public:
    bool isSubseq(string &s, string subS){
        for(int i=0, j=0; i<s.size(); i++){
                if(s[i] == subS[j]){
                    j++;
                    if(j == subS.size()) return true;
                }
        }
        return false;
    }
    int numMatchingSubseq(string s, vector<string>& words) {
        int res=0;
        for(int i=0; i<words.size(); i++){
            if(isSubseq(s, words[i]))    res++;
        }
        return res;
    }
};
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

方法一：朴素方法+二分查找（优化）

class Solution {
public:
    int numMatchingSubseq(string s, vector<string>& words) {
        // pos数组存放s中每个字符对应的下标
        vector<vector<int>> pos(26);
        int res=words.size(); // 存放结果个数

        for(int i=0; i<s.size(); i++)   pos[s[i]-'a'].push_back(i);

        // 遍历每个words
        for(auto &w : words){
            // i用于标记当前s匹配到的下标位置
            int i=-1;
            // 遍历words中的每个字符
            for(char c : w){
                auto& t=pos[c - 'a'];
                // j存储在t数组中二分查找的第一个大于或等于i的下标
                int j = upper_bound(t.begin(), t.end(), i) - t.begin();
                // 如果二分查找失败，返回t.end，此时j == t.size()
                if(j == t.size()){
                    res --;
                    break;
                }
                // 查找成功，更新i
                i = t[j]; 
            }
        }
        return res;
    }
};
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30

方法二：分桶

class Solution {
public:
    int numMatchingSubseq(string s, vector<string>& words) {
        vector<queue<string>> d(26);
        // 把words中每个待匹配字符串根据首字母分到不同的桶中
        for(auto &w : words)    d[w[0] - 'a'].emplace(w);
        int res=0;

        // 遍历字符串s
        for(char& c : s){
            auto& q = d[c - 'a'];
            for(int k = q.size(); k>0; k--){
                auto t = q.front();
                q.pop();

                // 匹配
                if(t.size() == 1)   res ++; 
                // 不匹配，则将剩下的字符串重新分到另外一个桶中
                else d[t[1] - 'a'].emplace(t.substr(1));
            }
        }
        return res;
    }
};
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

方法三：分桶优化

class Solution {
public:
    int numMatchingSubseq(string s, vector<string>& words) {
        vector<queue<pair<int, int>>> d(26);
        // 把words中每个待匹配字符串根据首字母分到不同的桶中
        // 每个桶存储单词的下标i和当前匹配到的位置
        for(int i=0; i<words.size(); i++)   
        	d[words[i][0] - 'a'].emplace(i, 0);
        int res=0;

        // 遍历字符串s
        for(char& c : s){
            auto& q = d[c - 'a'];
            for(int t = q.size(); t>0; t--){
                auto [i, j] = q.front();
                q.pop();
                // 匹配
                if(++j == words[i].size())   res ++; 
                // 不匹配，则将剩下的字符串重新分到另外一个桶中
                else d[words[i][j] - 'a'].emplace(i, j);
            }
        }
        return res;
    }
};
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25

心得

我一开始写了朴素方法，其实也就是暴力求解。对于words中的每个单词，使用isSubstr函数在s中查找，如果查找到返回true，res加一。该方法对于部分测试点会超时。因此需要进行优化，方法一使用了二分查找提高在s中查找的效率。

方法一：二分查找
1.参考资料附上了二分查找的思想，二分查找使用函数upper_bound(begin, end, num)，如果能查找到，返回第一个>=num的值的地址，该值减去begin得到下标位置，如果不能查找到则返回end，该值减去begin得到数组的长度size。因此，如果返回值==size，说明查找失败。
2.官方题解有几个值得我学习的地方：
（1）c++里遇到迭代器要积极使用for(auto &)；
（2）求逆思想：res也可以先存储words.size()，如果遇到查找不到的情况则res–。
（3）二维数组vector和push_back函数的使用。

方法二：分桶
思想：
（1）将 words 中的所有单词根据首字母来分桶，即：把所有单词按照首字母分到26个桶中，每个桶中存储的是所有以该字母开头的所有单词。
（2）然后从 s 的第一个字符开始遍历，假设当前字符为 ‘a’，我们从 ‘a’ 开头的桶中取出所有单词。对于取出的每个单词，如果此时单词长度为 1，说明该单词已经匹配完毕，我们将答案加 1；否则将单词的首字母去掉，然后放入下一个字母开头的桶中，比如对于单词 “acd”，去掉首字母 ‘a’ 后，我们将其放入 ‘c’ 开头的桶中。
（3）遍历完s后就得到了答案。

方法三：分桶优化： 每个桶可以只存储单词的下标 i 以及该单词当前匹配到的位置 j，这样可以节省空间。

参考资料：
[1]查找算法：二分查找
[2]关于lower_bound( )和upper_bound( )的常见用法
[3]优秀题解：二分查找、分桶
[4]stl之emplace函数的使用

相关阅读:
windows中毒
 java实用代码-----HttpsUtil
【故障公告】k8s 开船记：增加控制舱（control-plane）造成的翻船
 计算机图形学：光线追踪（ray tracing）
Jmeter压测入门教程
 加拿大公司注册
 vue3 利用 Composition provide 实现祖先组件向后代组件传值
 基于java+springboot+vue实现的旅游管理系统（文末源码+lw+ppt）23-402
微信小程序开发---条件渲染和列表渲染
 sql高级语法的应用
原文地址：https://blog.csdn.net/weixin_43894455/article/details/127898802