Given an array of 32-bit signed integers, compute and return their sum using AVX2 intrinsics.
#include <immintrin.h>
#include <cstdint>
int32_t array_sum(const int32_t* arr, int n);
Parameters:
arr — pointer to an array of n signed 32-bit integers, guaranteed 32-byte alignedn — number of elements, guaranteed to be a multiple of 8 and at least 8Returns: the sum of all elements in the array (guaranteed to fit in int32_t)
Input: [1, 2, 3, 4, 5, 6, 7, 8]
Output: 36
8 ≤ n ≤ 1,000,000n is always a multiple of 8[-1000, 1000]arr is 32-byte alignedYour solution should use AVX2 intrinsics to process 8 integers at a time. A scalar solution will produce correct results but will not achieve a meaningful speedup.
| Intrinsic | Description |
|---|---|
_mm256_setzero_si256() | Create a zero vector |
_mm256_load_si256(ptr) | Load 256 bits from aligned memory |
_mm256_add_epi32(a, b) | Add packed 32-bit integers |
_mm256_extracti128_si256(v, 1) | Extract high 128-bit lane |
_mm256_castsi256_si128(v) | Cast 256-bit to low 128-bit (free) |
_mm_add_epi32(a, b) | Add packed 32-bit integers (128-bit) |
_mm_hadd_epi32(a, b) | Horizontal add of packed 32-bit integers |
_mm_extract_epi32(v, idx) | Extract a 32-bit integer |
Output will appear here after you run or submit.