(Analysis by Benjamin Qi)

Observe that the edges $i\to a_i$ induce a directed graph where every vertex has out-degree 1. This is known as a functional graph. We can solve the problem for each connected component of the graph independently, so for the remainder of the analysis, we will assume the graph consists of a single connected component.

Call cow $i$ inactive if it contributes $0$ to the collective pleasure value rather than $v_i$. From the sample case, among those cows on a simple cycle, it is easy to see that at least one of the cows must be inactive. Consider the cow $c$ in the cycle that occurs latest in $p$. Then either $a_c$ either has not departed her farm already (in which case $a_c$ is inactive) or she has (in which case $c$ is inactive).

As a connected component in a functional graph always contains exactly one simple cycle, the answer must be at most the sum of all $v_i$ minus the minimum $v_i$ among that cycle. Furthermore, we can always construct $p$ that achieves this bound. The construction is as follows:

1. Let cow $c$ be the cow corresponding to the minimum $v_i$ along the cycle.
2. Any permutation $p$ such that $i$ appears earlier than $a_i$ in $p$ for all $i\neq c$ suffices.
3. After removing edge $c\to a_c$ from the component, the component no longer contains a cycle. Therefore, the remaining edges form a directed tree rooted at $c$. So it is always possible to construct $p$ (ex. DFS backward through the tree and add each node to the front of $p$ when it's traversed for the first time).

In the code below, for each connected component I use Floyd's algorithm to detect a vertex along the cycle. After that, I mark every vertex in the connected component as visited. As each connected component is processed in time proportional to its size, the runtime is $O(N)$.

#include <bits/stdc++.h>
using namespace std;

template <class T> using V = vector<T>;
#define all(x) begin(x), end(x)

vector<int> a, v;
vector<vector<int>> child;
vector<bool> done;

void mark_as_done(int x) {
if (done[x]) return;
done[x] = true;
for (int c : child[x]) mark_as_done(c);
}

int solve(int start) {
int x = start, y = start;
do {
x = a[x], y = a[a[y]];
} while (x != y);
int min_along_cycle = INT_MAX;
do {
min_along_cycle = min(min_along_cycle, v[x]);
x = a[x];
} while (x != y);
mark_as_done(x);
return min_along_cycle;
}

int main() {
ios::sync_with_stdio(false);
cin.tie(nullptr);
int N;
cin >> N;
a.resize(N + 1);
v.resize(N + 1);
child.resize(N + 1);
int64_t ans = 0;
for (int i = 1; i <= N; ++i) {
cin >> a[i] >> v[i];
ans += v[i];
child[a[i]].push_back(i);
}
done.resize(N + 1);
for (int i = 1; i <= N; ++i)
if (!done[i]) ans -= solve(i);
cout << ans << "\n";
}


Alternatively, if you are familiar with Gold topics, the answer is just the weight of a maximum spanning forest of the graph (treating the edges as undirected), which can be computed with Kruskal's algorithm.

#include <bits/stdc++.h>
using namespace std;

struct DSU {
vector<int> e;
void init(int N) { e = vector<int>(N, -1); }
int get(int x) { return e[x] < 0 ? x : e[x] = get(e[x]); }
bool unite(int x, int y) {
x = get(x), y = get(y);
if (x == y) return 0;
if (e[x] > e[y]) swap(x, y);
e[x] += e[y];
e[y] = x;
return 1;
}
};

int main() {
ios::sync_with_stdio(false);
cin.tie(nullptr);
int N;
cin >> N;
vector<tuple<int, int, int>> edges;
for (int i = 1; i <= N; ++i) {
int a, v;
cin >> a >> v;
edges.push_back({v, i, a});
}
sort(edges.rbegin(), edges.rend());
DSU D;
D.init(N + 1);
int64_t ans = 0;
for (auto [v, x, y] : edges)
if (D.unite(x, y)) ans += v;
cout << ans << "\n";
}


Bonus: Solve the problem when the $v_i$ can be negative.