Count how many elements in an array are equal to a given target value, using AVX2 intrinsics.
#include <immintrin.h>
#include <cstdint>
int count_matches(const int32_t* arr, int n, int32_t target);
Parameters:
arr — pointer to an array of n signed 32-bit integers, guaranteed 32-byte alignedn — number of elements, guaranteed to be a multiple of 8 and at least 8target — the value to countReturns: the number of elements equal to target
Input: arr = [1, 2, 3, 2, 5, 2, 7, 2], target = 2
Output: 4
8 ≤ n ≤ 1,000,000n is always a multiple of 8arr is 32-byte alignedThere are two common approaches to this problem:
movemask, and count bits with __builtin_popcount.cmpeq produces all-ones per matching lane — which is -1 as a signed integer. Accumulate with add_epi32, then negate the final sum.The second approach avoids the movemask bottleneck entirely and is a useful pattern to know.
| Intrinsic | Description |
|---|---|
_mm256_load_si256(ptr) | Load 256 bits from aligned memory |
_mm256_set1_epi32(x) | Broadcast a 32-bit integer to all lanes |
_mm256_cmpeq_epi32(a, b) | Compare packed 32-bit integers for equality |
_mm256_movemask_ps(v) | Extract the sign bit of each 32-bit float (8-bit mask) |
_mm256_add_epi32(a, b) | Add packed 32-bit integers |
__builtin_popcount(x) | Count set bits in an integer |
Output will appear here after you run or submit.