A wide character (wchar_t
) is a data type in C++
designed to represent characters that require more than one byte, typically used for Unicode and extended character sets. It allows for a larger range of characters than the standard char type. Wide characters are used with wide strings and wide character literals, prefixed with L
:
#include <iostream> int main() { std::setlocale(LC_ALL, ""); std::locale::global(std::locale("")); wchar_t c = L'你'; std::wcout << L"size of " << c << " : " << sizeof(c) << std::endl; return 0; }
This code is compiled via
g++ -std=c++11 test.cpp
and the code executio prints:
size of 你 : 4
std::setlocale
is a C
function in the C++
standard library that sets or retrieves the current locale for the specified category, affecting how functions handle locale-specific tasks like string collation, character classification, and numeric formatting. It is typically used to change the locale to the user's environment-defined locale by passing LC_ALL
and an empty string. On the other hand, std::locale::global
is a C++
function that sets the global locale for the entire C++
standard library, affecting all locale-sensitive operations like std::wcout
and std::wstring
operations. Using std::locale::global
ensures that the entire program adheres to the specified locale, facilitating consistent handling of wide characters and internationalization.
LC_ALL
stands for "Locale Category All" and it is an environment variable and a macro used in C
and C++
to set the locale for all locale-sensitive operations within a program, overriding other individual locale categories. It affects how text is formatted, sorted, and interpreted, impacting functions like string comparisons (strcmp
), character conversions (toupper
), and numeric formatting (printf
). It is an environment variable used in Unix-like operating systems to override all individual locale categories (LC_COLLATE
, LC_CTYPE
, LC_MESSAGES
, LC_MONETARY
, LC_NUMERIC
, LC_TIME
) with a single setting.
wstring
is a wide string type in C++ that uses wchar_t
to store wide characters, typically 2 or 4 bytes per character, depending on the platform. It is suitable for handling a wide range of characters, including those from Unicode and other extended character sets.
On the other hand, u8string
introduced in C++20
uses single bytes char8_t
. Each character in UTF-8
can be 1 to 4 bytes to represent UTF-8 encoded text.
wcout
is an output stream in C++ that is specifically designed to handle wide characters (wchar_t
). It is part of the C++ standard library's support for internationalization and Unicode. wcout
is used similarly to cout
for wide character output, allowing formatted printing of wide strings (wstring
) and wide character literals (wchar_t
). It ensures proper handling and display of non-ASCII characters and supports localization through the appropriate locale settings.