[C++] std::string_view generates worse assembly than pointer + size on Windows
After reading the Windows x64 calling convention documentation1, I noticed that functions parameters greater than 8 bytes are passed by reference instead of via registers.
Given the recent push in C++ towards views like std::span
and std::string_view
, I wondered if these 16 byte containers would add a performance overhead compared to the C-style pointer and size.
With std::string_view
this is indeed the case2. Let’s use a simple function which returns the final character in a string.
#include <cstddef>
#include <string_view>
char get_last(char const * str, std::size_t i) {
return str[i-1];
}
char get_last_sv(std::string_view str) {
return str.back();
}
Using str::string_view
requires double the instructions as the two halves of the string_view
must be loaded from memory before we can access the byte.
str$ = 8
i$ = 16
char get_last(char const*,unsigned __int64) PROC ; get_last, COMDAT
movzx eax, BYTE PTR [rcx+rdx-1]
ret 0
char get_last(char *,unsigned __int64) ENDP ; get_last
str$ = 8
char get_last_sv(std::basic_string_view<char,std::char_traits<char> >) PROC ; get_last_sv, COMDAT
mov rdx, QWORD PTR [rcx+8]
mov rax, QWORD PTR [rcx]
movzx eax, BYTE PTR [rdx+rax-1]
ret 0
char get_last_sv(std::basic_string_view<char,std::char_traits<char> >) ENDP ; get_last_sv
When the functions are used together, the difference still exists. The same double memory read cost must be paid.
char sum(char const* str, std::size_t i, std::string_view str2) {
return get_last(str, i) + get_last_sv(str2);
}
str$ = 8
i$ = 16
str2$ = 24
char sum(char const *,unsigned __int64,std::basic_string_view<char,std::char_traits<char> >) PROC ; sum, COMDAT
mov r9, QWORD PTR [r8]
mov rax, QWORD PTR [r8+8]
movzx eax, BYTE PTR [rax+r9-1]
add al, BYTE PTR [rcx+rdx-1]
ret 0
char sum(char const *,unsigned __int64,std::basic_string_view<char,std::char_traits<char> >) ENDP ; sum
However if the compiler has full knowledge of the strings and their sizes, it can transform the calls into equivalent instructions.
int main() {
constexpr auto str0{"000"};
constexpr auto str1{std::string_view{"000"}};
volatile auto gl{get_last(str0, 3)};
volatile auto gl2{get_last_sv(str1)};
return gl + gl2;
}
gl2$ = 8
gl$ = 16
main PROC ; COMDAT
mov BYTE PTR gl$[rsp], 48 ; 00000030H
mov BYTE PTR gl2$[rsp], 48 ; 00000030H
movsx ecx, BYTE PTR gl2$[rsp]
movsx eax, BYTE PTR gl$[rsp]
add eax, ecx
ret 0
main ENDP