Jump to attached files

Special Elements

Giordano Colli


+TAGS: Prefix Sum, Two Pointers, Oracles, Binary Search, Sorting
+Difficulty: 1500
+Description:
+Problem Link: Codeforces Link

Problem Definition

Given an array $A = <a_1,a_2,...,a_n> : \quad a_i \in \mathbb{N}, \quad 1 \le a_i \le n$
$a_i$ is special if there exists a subarray that sums to $a_i$.
More formally $a_i$ is special if $\exists l,r \in \mathbb{N}: \quad \mathlarger{\sum}_{k=l}^r a_k = a_i \quad 1 \le l < r \le n$
Count the number of special elements.

Simple Idea

The main idea is to check if $a_i$ is a special element or not.
\begin{algorithm} % latex2html id marker 237\caption{\textsc{Meta-Algorithm}(... ...a_i$}{ Counter $\gets$\ Counter$+1$\; } } \Return Counter\; \end{algorithm}

Using this meta-algorithm, the problem reduces to checking if a subarray delimited by $l,r$ that sums to $a_i$ exists.

Simple Checker

To check if a subarray delimited by $l,r$ that sums to $a_i$ exists, it is possible to calculate the sum of each subarray.
\begin{algorithm} % latex2html id marker 245\caption{\textsc{Simple-Checker}(... ...$a_i$}{ \Return \textbf{True}\; } } \Return \textbf{False}\; \end{algorithm}

  • This algorithm scans all the possible distinct $\{l,r\}$ pairs of $\{1,2,3,..,n\}$, or equivalently all the subarrays.These are $\binom{n}{2} \in O(n^2)$.
  • For each of these subarrays, to calculate Sum $O(n)$ operations are performed.
The total time complexity is ( $O(n^2) \cdot O(n)) \in O(n^3)$.
Note that this time complexity is tight because there are at least $\frac{n}{2}$ subarrays of size $\frac{n}{2}$.
Just to calculate these is needed $O(n^3)$ time.
The first algorithm that solves this problem requires $O(n^4)$ time.
\begin{algorithm} % latex2html id marker 262\caption{\textsc{Simple-Algorithm... ..._i$)}{ Counter $\gets$\ Counter$+1$\; } } \Return Counter\; \end{algorithm}

Time: $O(n^4)$
Is it possible to obtain a better time complexity ?

Precalculation Checker

The bottleneck of the previous checker is that each subarray sum is calculated for each $a_i$ scanned.
To avoid these repeated calculations it is possible to precalculate all the sums one time and store them into an array.

\begin{algorithm} % latex2html id marker 271\caption{\textsc{Precalculation}(... ...ets \mathcal{O} \cup \{\text{Sum}\}$ } \Return $\mathcal{O}$\; \end{algorithm}

$\mathcal{O}$ Can be easily implemented as a dynamic vector.
The total time complexity is the same of the SIMPLE-CHECKER Algorithm, with the same Analysis.
There are $O(n^2)$ subarrays, this implies that there are at most $O(n^2)$ distinct sums. This implies that $\mathcal{O}$ uses $O(n^2)$ memory.
It is possible to use the oracle $\mathcal{O}$ into the META-ALGORITHM

\begin{algorithm} % latex2html id marker 288\caption{\textsc{Precalculation-A... ...{O}$}{ Counter $\gets$\ Counter$+1$\; } } \Return Counter\; \end{algorithm}

The PRECALCULATION() takes $O(n^3)$ time.
Checking if $a_i \in \mathcal{O}$ requires $O(n^2)$ time and it must be done for each element of $A$ (there are $O(n)$)
Time: $O(n^3)$
Memory: $O(n^2)$
Is it possible to obtain a better time complexity ?

Binary Search Precalculation Checker

A bottleneck of the previous checker is that to check if $a_i \in \mathcal{O}$ requires $O(n^2)$ time.
Since there can be $\Theta(n^2)$ different sums, performing a linear search is not possible to do better.
It is possible to sort elements of $\mathcal{O}$ and perform a logarithmic binary search

\begin{algorithm} % latex2html id marker 306\caption{\textsc{Precalculation-S... ...\}$ } \textbf{Sort} $\mathcal{O}$\ \; \Return $\mathcal{O}$\; \end{algorithm}

Since the sort instruction takes $O(n^2 \cdot \log{n^2}) \in O(n^2 \log{n})$ time and the first for takes $O(n^3)$ time, the total time complexity is still $O(n^3)$.
\begin{algorithm} % latex2html id marker 321\caption{\textsc{Precalculation-A... ...{O}$}{ Counter $\gets$\ Counter$+1$\; } } \Return Counter\; \end{algorithm}

The PRECALCULATION-SORTED() takes $O(n)$ time. still takes $O(n^3)$ time.
Checking if $a_i \in \mathcal{O}$ takes $O(\log{n^2}) \in O(log{n})$ time.
Even if the checking now requires $O(\log{n})$ time, the PRECALCULATION-SORTED() still takes $O(n^3)$ time.
This means that to speed up the algorithm it must speed up the precalculation time.
Time: $O(n^3)$
Memory: $O(n^2)$
Is it possible to obtain a better time complexity ?

Prefix Sum Binary Search Precalculation Checker

A bottleneck of the previous checker is the precalculation time.
The only instruction that can be faster is Sum $\gets \sum_{k=l}^r a_k$. Since Sum is recalculated each time, the time complexity of PRECALCULATION-SORTED() takes $O(n)$ time.
It is possible to use prefix sums to speed up this process.
First, calculate the prefix sum array $P$ of length $n$.

$\displaystyle P[i] = \sum_{k=1}^i a_k$

For the sake of simplicity set $P[0] = 0$.
The sum of a subarray is simple to represent using $P$.

$\displaystyle \sum_{k=l}^{r} a_k = \sum_{k=1}^{r} a_k - \sum_{k=1}^{l-1} a_k = P[r] - P[l-1] $

Note that this formula is well defined because $P[0] = 0$.
It is easy to calculate the prefix array in $O(n)$ adding $a_i$ to just computed $P[i-1]$.

\begin{algorithm} % latex2html id marker 349\caption{\textsc{Precalculation-S... ...\}$ } \textbf{Sort} $\mathcal{O}$\ \; \Return $\mathcal{O}$\; \end{algorithm}

Since all the operations into the for loop are constant, the time required by the for loop is $O(n^2)$.
Sorting still requires $O(n^2 \log{n})$ time.
\begin{algorithm} % latex2html id marker 361\caption{\textsc{Precalculation-A... ...{O}$}{ Counter $\gets$\ Counter$+1$\; } } \Return Counter\; \end{algorithm}
Since PRECALCULATION-ALGORITHM-SORTED-PREFIXSUMS($A$) now requires $O(n^2 \log{n})$ time and the for loop requires $O(n \log{n})$ time, the entire algorithm takes $O(n^2 \log{n})$ time. The memory used is still $O(n^2)$ because the array $P$ takes $O(n)$ memory.
Time: $O(n^2 \log{n})$
Memory: $O(n^2)$
Is it possible to obtain a better time complexity ?

Binary Search Prefix Sums Checker

The main bottleneck of the previous algorithm is the precomputation.
Observe that it is possible to represent each subarray sum as a difference of two elements of the prefix array $P$.
Notice that if all the elements $a_i$ are greater or equal to 0, the prefix array is monotone.

$\displaystyle P[k] \le P[k+1] \quad \forall k \in \{0,..,n-1\}$

The idea is to find $2$ indices $\{h,k\}$ such that:

$\displaystyle P[h] - P[k] = a_i \quad n \ge h > k \ge 0$

For each element $a_i$ use the monotonicity of $P$ to find $\{i,j\}$, in other words:
  1. fix $a_i$
  2. for each $a_i$ fix all the $P[h]$
  3. search the element $P[k] = P[h] - a_i$
Then the precalculation is only used to find the array $P$, and it is done in $O(n)$ time.
\begin{algorithm} % latex2html id marker 381\caption{\textsc{Precalculation-B... ...arch-PrefixSums}($A$)} $P \gets $\ prefix sums array of $A$\ \; \end{algorithm}

\begin{algorithm} % latex2html id marker 384\caption{\textsc{Precalculation-B... ...$}{ Counter $\gets$\ Counter$+1$\; } } } \Return Counter\; \end{algorithm}

This algorithm takes $O(n^2 \log{n})$ time due to the for loop in which a binary search is executed.
Nothing changed from the previous Algorithm with respect to time complexity.
The memory used to store additional data structures is $O(n)$ due to the prefix array $P$.
Time: $O(n^2 \log{n})$
Memory: $O(n)$
Is it possible to obtain a better time complexity ?

Two Pointers Prefix Sums Checker

To reduce the time complexity of the previous algorithm it is needed to build a checker that better exploits the monotonicity of the prefix array $P$.
Where binary is useful, is often useful to use the two pointers approach.
It is possible to use the Two Pointers approach if there are 2 criteria:
  • increase criterion: defines when a pointer must increase with respect to the considered subarray.
  • decrease criterion: defines when a pointer must increase with respect to the considered subarray.
Usually, pointers are placed either both equal to 0 or the first equal to 0 and the second equal to $n$.
A proof is needed to ensure that a feasible subarray is considered by the two pointers.
Place $h$ to the 0-th position and $k$ to the $n$-th.
Consider the following criteria($\sigma$):
  • increase criterion: if the subarray sum is $< a_i$ increase the value of $k$.
  • decrease criterion: if the subarray sum is $\ge a_i$ increase the value of $h$.
Ensure in your code that $h \le k$

Proposition 8.1   Given the criteria $\sigma$, each subarray $a,b$ that sums to $a_i$ is considered.

Proof. Consider each feasible subarray, i.e. $P[b] - P[a] = a_i$. Since $h < k$ there are two possible cases.
  • $k = b$ and $h < a$
  • $h = a$ and $k > b$
Otherwise, the subarray $a,b$ has been already analyzed.
Case 1: $k = b$ and $h < a$
\scalebox{0.5}{ \begin{tikzpicture}[ipe stylesheet, use as bounding box] \useasb... ...4257); \node[ipe node, font=\Huge] at (394.063, 787.93) {k }; \end{tikzpicture}}

Due to the monotonicity of $P$, $P[h] < P[a]$. This implies that the pointed subarray has a sum lower than $a_i$:

$\displaystyle P[k] - P[h] = P[b] - P[h] < P[b] - P[a] = a_i$

Case 2: $h = a$ and $k > b$
\scalebox{0.5}{ \begin{tikzpicture}[ipe stylesheet, use as bounding box] \useasb... ...4257); \node[ipe node, font=\Huge] at (458.063, 787.93) {k }; \end{tikzpicture}}

Due to the monotonicity of $P$, $P[k] > P[b]$. This implies that the pointed subarray has a sum greater than $a_i$:

$\displaystyle P[k] - P[h] = P[k] - P[a] > P[b] - P[a] = a_i$

$\qedsymbol$

From the previous proposition derives an efficient checker that runs in $O(n)$ time.

\begin{algorithm} % latex2html id marker 429\caption{\textsc{Precalculation-T... ...ters-PrefixSums}($A$)} $P \gets $\ prefix sums array of $A$\ \; \end{algorithm}


\begin{algorithm} % latex2html id marker 432\caption{\textsc{Precalculation-T... ...a_i$}{ Counter $\gets$\ Counter$+1$\; } } \Return Counter\; \end{algorithm}

The presented algorithm runs in $O(n^2)$ time.
  • $O(n)$ time is used to precalculate the prefix sum array.
  • $O(n)$ time is used to check if there is a subarray that sums to $a_i$ for each $1 \le i \le n$
The algorithm uses $O(n)$ memory to store the prefix sum array $P$.

Improvements

The author conjectures that it is not possible to design a non-randomized algorithm that solves this problem in $o(n^2)$ time, in other words, the problem must have a lower bound of $\Omega(n^2)$ time.