(Analysis by Richard Qi)

First, we consider an easier problem. Suppose our task was to verify whether some string $T$ could have been a string Bessie was trying to type, and let $S$ be the input string.

We can solve this easier task using a simple linear time DP. Process each digit of $T$ one at a time. Say that a string of digits s can be "rearranged" to t if Bessie can type t while trying to type s, and let $dp[i]$ be a boolean value which is True if $S[1 \cdots i]$ can be rearranged to $T[1 \cdots i]$. Our goal is to check whether $dp[N] = True$, where $N=|S|$. Notice that $dp[i] = True$ iff $dp[i-j]$ is true for some $1 \leq j \leq 4$, and the substring $S[i-j+1 \cdots i]$ can be rearranged to $T[i-j+1 \cdots i]$. For checking these rearrangements of size at most $4$, it suffices to check whether the substrings $S[i-j+1 \cdots i]$ and $T[i-j+1 \cdots i]$ are both permutations of a single digit, are both permutations of two digits that share a side, or are both permutations of four digits that form a square.

Now, our motivation for solving the original problem is as follows. Suppose we listed out all $9^i$ possible strings of length $i$ ($T_1, T_2, T_3, \cdots$), and computed all $dp$ values up to $dp[i]$ for each of these strings. Then, we could continue this process for $i+1$ by copying each of the $9^i$ strings and their $dp$ values $9$ times, and then continuing each of these copies with a distinct digit from $1-9$, and finally computing $dp[i+1]$ for each of the new $9^{i+1}$ strings using the previous $dp[i]$ values. Then, at the end, we count all strings which end with $dp[N] = True$. Clearly, this gives the correct answer, but is way too slow. However, as we show next, we can simulate this process using only linear time, as it is not necessary to write out all the dp values and all the strings. To do this, we will continually improve this extremely slow process until it can be done in linear time.

First, after computing $dp[i]$, we may discard $dp[i-j]$ for all $j \geq 4$. In other words, we could continue computing all dp values for strings of length $N$ up to $dp[N]$ using only the dp values $dp[i-j]$ for $0 \leq j \leq 3$. This can easily be seen by observing our original dp algorithm for checking whether a string can be rearranged to another string; to compute $dp[i+1]$, we only need $dp[i-j]$ for $j \leq 3$. This saves quite a bit of time as we no longer need to copy the entire dp sequences.

Next, by the same reasoning, notice that we don't need to copy the entire length $i$ strings; we only need to copy the last $3$ digits $T[i-2], T[i-1], T[i]$ for each of the $9^i$ strings $T$.

Now, if we were to simulate the exponential process initially described, it has sped up significantly. We are still creating $9^i$ strings and dp sequences at each timestep $i$, but these strings and dp sequences are of constant length.

Additionally, we can remove any strings and dp sequences in the list at any point in time if they definitely will not result in a final dp value of $N$ (for example, if the last $4$ recorded dp values are False).

Here is the key insight: Now that the $9^i$ strings and dp sequences are of constant length, there cannot be more than a constant number of distinct (string, dp sequence) pairs. So, rather than listing them all out, we can store key value pairs of the form ((string, dp sequence), count), where count represents the number of the $9^i$ strings in the original exponential process that correspond to a specified string and dp sequence. Thus, transitioning from $i$ to $i+1$ can be done in constant time as there are only a constant number of unique (string, dp sequence) pairs. To recover the final answer, we sum the counts of all (string, dp sequence) pairs that have $dp[N] = True$. Thus, the original problem can now be done in $\mathcal O(N)$ time, but with large constant factor.

A naive implementation of the currently described algorithm is enough for $N\le 100$. Also, if the phone numbers don't contain some digits, one can prove that there are less states to keep track of at each timestep, which could allow a naive implementation to pass some of these subtasks.

Richard's Code: $(dp[i-3\ldots i], S[i-2\ldots i])$ pairs are stored in a map with an integer that is a bitmask representing the dp sequence, and a vector which is the last $3$ digits of the sequence. In the code, "bars" is the bitmask representing whether $dp[i-j] = True$ for all $0 \leq j \leq 3$. For each possible digit from $1$ to $9$, the string and dp sequence change, which are the variables "new_nums" and "new_bars", respectively.

#include <bits/stdc++.h>
using namespace std;

using ll = long long;
using str = string;

using pi = pair<int, int>;
#define mp make_pair
#define f first
#define s second

using vi = vector<int>;

#define sz(x) int((x).size())
#define all(x) begin(x), end(x)
#define sor(x) sort(all(x))
#define pb push_back
#define bk back()
#define ins insert

const int MOD = 1e9 + 7;

/**
* Description: Modular arithmetic.
* Source: KACTL
* Verification: https://open.kattis.com/problems/modulararithmetic
*/

struct mi {
int v;
explicit operator int() const { return v; }
mi() : v(0) {}
mi(ll _v) : v(int(_v % MOD)) { v += (v < 0) * MOD; }
};
mi &operator+=(mi &a, mi b) {
if ((a.v += b.v) >= MOD)
a.v -= MOD;
return a;
}
mi operator+(mi a, mi b) { return a += b; }

vi sliceVI(vi v, int l, int r) {
assert(l >= 0 && l <= r && r <= sz(v));
vi res;
for (int i = l; i < r; i++) {
res.pb(v[i]);
}
return res;
}

vector<vi> good_subs;

void genGoodSubs() { // put all good subsets into good_subs (good subsets are
// those which bessie types with one tap)
for (int i = 1; i <= 9; i++) {
good_subs.pb({i});
}
for (int i = 1; i + 3 <= 9; i++) {
good_subs.pb({i, i + 3});
}

for (int i = 1; i <= 2; i++) {
for (int j = 0; j <= 6; j += 3) {
int first_val = i + j;
good_subs.pb({first_val, first_val + 1});
}
}

for (int i = 1; i <= 2; i++) {
for (int j = 0; j <= 3; j += 3) {
vi v;
for (int k = 0; k <= 1; k++) {
for (int l = 0; l <= 3; l += 3) {
v.pb(i + j + k + l);
}
}
sor(v);
good_subs.pb(v);
}
}
}

bool isGoodSubset(vi v) {
sor(v);
for (auto u : good_subs) {
if (u == v)
return true;
}
return false;
}

void solve() {
string S_inp;
cin >> S_inp;
vi S;
S.pb(-100);
for (auto u : S_inp) {
S.pb(u - '0');
}

int N = sz(S) - 1;

auto isSubsetOf = [&](vi v, int r, int l) {
set<int> S_elements;
for (int i = r; i >= l; i--) {
if (i < sz(S) && i > 0) {
S_elements.ins(S[i]);
}
}

for (auto u : v) {
if (!S_elements.count(u))
return false;
}
return true;
};

map<pair<int, vi>, mi> dp;
dp[mp(1, vi{1, 1, 1})] = mi(1);

for (int i = 1; i <= N; i++) {
// before the ith element to after processing the ith
map<pair<int, vi>, mi> ndp;
for (auto u : dp) {
int bars = u.f.f;
vi nums = u.f.s;
mi ways = u.s;
for (int new_dig = 1; new_dig <= 9; new_dig++) {
// generate new nums
vi new_nums{new_dig, nums[0], nums[1]};
// generate new bars
int new_bars = 0;
for (int old_bar = 0; old_bar < 4; old_bar++) {
if (!((bars >> old_bar) & 1))
continue;

if (old_bar + 1 < 4) {
// transition from old bar going left (adding 1)
// check whether the stuff after the bar in constructed string is a
// subset of the 4 things of actual string after the bar
vi right_of_constructed_bar = sliceVI(new_nums, 0, old_bar + 1);
if (isSubsetOf(right_of_constructed_bar, i - (old_bar) + 3,
i - (old_bar))) {
new_bars |= (1 << (old_bar + 1));
}
}

// transition from old bar going to 0 bar
vi all_nums = new_nums;
all_nums.pb(nums.bk); // 4 numbers
vi right_of_constructed_bar = sliceVI(all_nums, 0, old_bar + 1);
if (isGoodSubset(right_of_constructed_bar) &&
isSubsetOf(right_of_constructed_bar, i,
i - sz(right_of_constructed_bar) + 1)) {
new_bars |= 1;
}
}

if (new_bars == 0)
continue;

ndp[mp(new_bars, new_nums)] += ways;
}
}

swap(dp, ndp);
}

mi ans = 0;
for (auto u : dp) {
if ((u.f.f >> 0) & 1) {
ans += u.s;
}
}

cout << ans.v << "\n";
}

int main() {
cin.tie(0)->sync_with_stdio(0);
genGoodSubs();
int T;
cin >> T;
for (int t = 1; t <= T; t++) {
solve();
}
}


Many optimizations can be done to speed up the naive solution and receive full points. These include:

• Representing a state as a single integer instead of a (bitmask, vector) pair.
• Storing the states in an array instead of a map for $O(1)$ update and retrieval.
• Precomputing information about the original string to quickly determine the new state given the old state and a new digit, such as a bitmask for every substring of length at most $4$.
• Moving as much computation outside of inner loops as possible during the transition computations.
• Precomputing whether a string of length at most $4$ could have been typed by Bessie in one tap.
• Using stronger conditions on when to terminate a state that will never be able to reach $dp[N] = True$, for example by considering digits $S[i+1], S[i+2], S[i+3]$.
• Not storing digits that don't affect the answer. For example, if $dp[i-3]=0$, then we don't need to store digit $S[i-2]$. In the code below we account for this by setting $S[i-2]=0$.

These optimizations are implemented in the code below:

#include <bits/stdc++.h>
using namespace std;

using ll = long long;

using pi = pair<int, int>;
#define mp make_pair
#define f first
#define s second

using vi = vector<int>;
#define sz(x) int((x).size())
#define all(x) begin(x), end(x)
#define sor(x) sort(all(x))
#define pb push_back
#define bk back()

const int MOD = 1e9 + 7;

template <class T> void remDup(vector<T> &v) { // sort and remove duplicates
sort(all(v));
v.erase(unique(all(v)), end(v));
}

/**
* Description: Modular arithmetic.
* Source: KACTL
* Verification: https://open.kattis.com/problems/modulararithmetic
*/

struct mi {
int v;
explicit operator int() const { return v; }
mi() : v(0) {}
mi(ll _v) : v(int(_v % MOD)) { v += (v < 0) * MOD; }
};
mi &operator+=(mi &a, mi b) {
if ((a.v += b.v) >= MOD)
a.v -= MOD;
return a;
}
mi operator+(mi a, mi b) { return a += b; }

const int HASHMAX = 16 * 1000;
struct Table {
mi vals[HASHMAX];
bitset<HASHMAX> visited;
vi keys;
void addTo(int key, mi v) {
if (visited[key] == 0) {
visited[key] = 1;
keys.pb(key);
}
vals[key] += v;
}
void reset() {
for (const auto &u : keys) {
vals[u] = 0;
visited[u] = 0;
}
keys.clear();
}
};

vector<vi> good_subs;
bool is_good_new_set[100005];

void genGoodSubs() {
for (int i = 1; i <= 9; i++) {
good_subs.pb({i});
}
for (int i = 1; i + 3 <= 9; i++) {
good_subs.pb({i, i + 3});
}

for (int i = 1; i <= 2; i++) {
for (int j = 0; j <= 6; j += 3) {
int first_val = i + j;
good_subs.pb({first_val, first_val + 1});
}
}

for (int i = 1; i <= 2; i++) {
for (int j = 0; j <= 3; j += 3) {
vi v;
for (int k = 0; k <= 1; k++) {
for (int l = 0; l <= 3; l += 3) {
v.pb(i + j + k + l);
}
}
sor(v);
good_subs.pb(v);
}
}

for (auto u : good_subs) {
for (auto x : u) {
}
}
}

void solve() {
string S_inp;
cin >> S_inp;
vi S{-100};
for (auto u : S_inp) {
S.pb(u - '0');
}

int N = sz(S) - 1;

vector<vi> S_masks = vector<vi>(N + 1, vi(5));
// (i, j) -> mask starting at i, ending at i+j
for (int i = 1; i <= N; i++) {
for (int j = 0; j <= 4; j++) {
for (int k = 0; k <= j; k++) {
S_masks[i][j] |= (1 << S[i + k]);
}
}
}

Table *dp = new Table();
Table *ndp = new Table();

dp->addTo(1 + 111 * 16, mi(1));

for (int i = 1; i <= N; i++) {
// before the ith element to after processing the ith
// assert(sz(dp->keys) <= 50);
vi cand_new_digs;
for (int j = -3; j <= 3; j++) {
if (i + j >= 1 && i + j <= N) {
cand_new_digs.pb(S[i + j]);
}
}
remDup(cand_new_digs);

ndp->reset();
for (auto u : dp->keys) {
int bars = u % 16;
int nums = u / 16;
int max_bars = 0;
for (int j = 0; j < 4; j++) {
if ((bars >> j) & 1) {
max_bars = j;
}
}
mi ways = dp->vals[u];
for (int new_dig : cand_new_digs) {
// generate new nums
array<int, 4> all_nums_arr{new_dig, nums % 10, (nums / 10) % 10,
(nums / 100) % 10};
// generate new bars
int new_bars = 0;
int bar_2_set = 0;
for (int old_bar = 0; old_bar <= max_bars; old_bar++) {
int bar_2_set_dig = all_nums_arr[old_bar];
if ((bar_2_set >> bar_2_set_dig) & 1) {
break;
} else {
bar_2_set |= 1 << bar_2_set_dig;
if ((bars >> old_bar) & 1) {
if ((bar_2_set & S_masks[i - old_bar][3]) == bar_2_set) {
if (is_good_new_set[bar_2_set] &&
bar_2_set == S_masks[i - old_bar][old_bar]) {
new_bars |= 1;
}
if (old_bar < 3) {
new_bars |= 1 << (old_bar + 1);
}
}
}
}
}
if (new_bars == 0)
continue;
// optional optimization: zero out digits that don't matter
for (int j = 3; j; --j) {
if (new_bars & (1 << j)) {
break;
}
all_nums_arr[j - 1] = 0;
}
int new_nums =
all_nums_arr[0] + 10 * all_nums_arr[1] + 100 * all_nums_arr[2];
ndp->addTo(new_bars + 16 * new_nums, ways);
}
}

swap(dp, ndp);
}

mi ans = 0;
for (int u : dp->keys) {
if (((u % 16) >> 0) & 1) {
ans += dp->vals[u];
}
}

cout << ans.v << "\n";
}

int main() {
cin.tie(0)->sync_with_stdio(0);
genGoodSubs();
int T;
cin >> T;
for (int t = 1; t <= T; t++) {
solve();
}
}


The number of distinct states stored in the hash table can be bounded above by $4\cdot 3\cdot 2\cdot 2^3+4\cdot 3\cdot 2^2+4\cdot 2+1=249$, corresponding to the cases where the first $j\ge i-3$ such that $dp[j]=1$ are $j=i-3,i-2,i-1,i$ respectively. Though in the test data, the number of states never exceeds $50$.

Bonus: Try to prove a better bound on the number of distinct states or generate a test case with more than $50$ distinct states.