In retrospect the Maynard-Tao weights are quite natural, and they can be understood as the result of the following evolution.

**1.** Let us start from the naive weights
$$\nu(n):=1_\text{$n+h_1,\dots,n+h_k$ are primes}$$
If we believe in Dickson's conjecture, then these weights are perfectly fine, because they concentrate on the best $n$'s. The problem is that proving the desired inequality
$$ \sum_{x\leq n\leq 2x}\nu(n)\sum_{i=1}^k 1_\text{$n+h_i$ is prime}>\sum_{x\leq n\leq 2x}\nu(n)\tag{$\ast$}$$
is as hard as proving Dickson's conjecture. Indeed, this inequality implies that not all the $\nu(n)$'s are zero. A slightly refined analytic variant of these naive weights is
$$\nu(n):=\sum_{d\mid P(n)}\mu(d)\log^{k+\ell}\left(\frac{P(n)}{d}\right),$$
where $P(n):=(n+h_1)\dots(n+h_k)$ is as usual. The right hand side is the convolution of $\mu$ and $\log^{k+\ell}$ evaluated at $P(n)$, hence (nontrivially) these weights are nonnegative and they are nonzero if and only if $P(n)$ has at most $k+\ell$ distinct prime factors. So, if $\ell$ is not too large (e.g. $\ell<k/2$), these weights capture $n$'s such that $n+h_1,\dots,n+h_k$ contains several primes. An advantage of these weights is that they incorporate the usual inclusion-exclusion sieve technology via the Möbius values $\mu(d)$. However, they suffer from the same problem as the original naive weights: we cannot evaluate the two sides of ($\ast$) for them. At a technical level, the source of the problem is that the divisors $d$ of $P(n)$ get too large (compared to $x$), and we lose control of the error terms.

**2.** A key idea of sieve theory is to remedy the above mentioned difficulty by restricting to fewer divisors, e.g. by cutting them off at some artificial bound $R$. Of course, a smooth cut-off is usually better for analytic purposes than a sharp cut-off. For example, one can try to use the weights
$$\nu(n):=\left(\sum_{\substack{{d\mid P(n)}\\{d\leq R}}}\mu(d)\left(1-\frac{\log d}{\log R}\right)^{k+\ell}\right)^2,$$
which differs from the previous version (up to scaling) as follows: the restriction $d\leq R$ is in place, the fraction $P(n)/d$ became $R/d$, and the whole sum got squared. Only the last change needs an explanation: the cut-off $d\leq R$ destroys positivity of the weight, and squaring is a good analytic way to restore positivity.

**3.** The above squared weights are precisely the Goldston-Pintz-Yildirim weights. A key insight of Soundararajan was that it is more efficient to perform the smooth cut-off $d\leq R$ by using a more general function of $\frac{\log d}{\log R}$. That is, the Goldston-Pintz-Yildirim weights evolved to
$$\nu(n):=\left(\sum_{d\mid P(n)}\mu(d)g\left(\frac{\log d}{\log R}\right)\right)^2,$$
where $g:\mathbb{R}\to\mathbb{R}$ is a smooth function supported on $[0,1]$.
Morally, these weights try to imitate the event that $P(n)$ has few prime factors by sieving through the divisors $d\mid P(n)$. However, what we really want is that many of the individual factors $n+h_1,\dots,n+h_k$ are primes, not just that their product has few prime factors. So it is natural to further refine the above Soundararajan weights to
$$\nu(n):=\left(\sum_{\forall i: d_i\mid n+h_i}\mu(d_1)\dots\mu(d_k)f\left(\frac{\log d_1}{\log R},\dots,\frac{\log d_k}{\log R}\right)\right)^2,$$
where $f:\mathbb{R}^k\to\mathbb{R}$ is a smooth function supported on the simplex
$$\{(t_1,\dots,t_k)\in\mathbb{R}_{\geq 0}^k: t_1+\dots+t_k\leq 1\}.$$
These are precisely Maynard-Tao weights, and they really try to imitate that many of the individual factors $n+h_1,\dots,n+h_k$ are primes.

**4.** It is worthwhile to bear in mind that the Soundararajan weights (hence also the Goldston-Pintz-Yildirim weights) are a special case of the Maynard-Tao weights. Indeed, let us assume (without much loss of generality) that $n+h_1,\dots,n+h_k$ are pairwise coprime. Then, for
$$f(t_1,\dots,t_k):=g(t_1+\dots+t_k)$$
the Maynard-Tao weights reduce to the Soundararajan weights.