02_basics.tex 44.4 KB
 Praetorius, Simon committed Apr 10, 2021 1 \chapter{Language Basics\label{sec:basics}}  Praetorius, Simon committed Apr 10, 2021 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 \section{Introductory example\label{sec:introductory-example}} \begin{minted}[frame=lines,label={Introductory example}]{c++} #include #include using namespace mtl; int main(int argc, char** argv) { int const size = 40, N = size * size; using matrix_t = compressed2D; // Set up a matrix 1,600 x 1,600 with // 5-point-stencil matrix_t A{N, N}; mat::laplacian_setup(A, size, size); // Compute b = A*x with x == 1 dense_vector x{N, 1.0}, b; b = A * x; std::cout << two_norm(b) << std::endl; } \end{minted} Some comments about this example: \begin{itemize} \item Main program: The entry point for the program is the function \cpp{main}. It must be available in all executables and returns an integer indicating an error code of the program (\cpp{0} means no error). \item Typically, there are just two variants of the \cpp{main} function allowed, without arguments or with two arguments containing command-line parameters to the program. Thereby, \cpp{int argc} indicates the number of command-line arguments and \cpp{char** argv} or \cpp{char* argv[]} a sequence of null-terminated character sequences (strings) representing the actual command-line arguments. Especially, the zero-th argument \cpp{argv[0]} corresponds to the name of the executable. \textit{Remark:} Some compilers allow more than those two arguments, that may contain environmental variables. \item Input- and Output is not part of the C++ core language, but is implemented in libraries, like the C++ standard library. Those libraries must be included explicitly. \item Include-files are any regular files that can be found by the compiler. Typically, those includes have the file ending \texttt{.h} (for header file) and contain specifications of the interface of functions or even implementations of those functions. Include in C++ means: the text of the file is copied to the include directive \cpp{#include} into the code. \textit{Remark 1:} Header-files of the C++ standard library do not have a file extension. This is, in order to avoid conflicts with the C standard library (those files have the extension \texttt{.h}). \textit{Remark 2:} There are two variants for the include directive: \cpp{#include } or \cpp{#include "Dateiname"}. In the first variant the include files are searched in the compiler include paths and system paths only while in the second variant it is searched also in the current source directory. This is why the standard library include directives are typically written with angular brackets \cpp{<...>}. \item The functions of the standard library are grouped in the namespace \cpp{std}. Again, this is done in order to avoid conflicts with functions from other libraries and your own code. In order to call functions from the standard library, you have to add the prefix (name resolution operator) \cpp{::}, \eg \cpp{std::sqrt}. \item In addition to the main program, we have a class (structure) \cpp{compressed2D} and (free) functions \cpp{two_norm}, and \cpp{mat::laplacian_setup}. A free function is a function not part of a class. But, there is also a function bound to the class. \item Finally, the output of the result to the screen is by an output-stream object \cpp{std::cout} (part of the standard library). It provides a way to assign new output data to the output device, using the shift'' operator \cpp{<<}. The expression \cpp{std::endl} thereby indicates a line-break (the end-of-line symbol, and it flushes the output). One could also write the character \cpp{'\n'} directly. \item Brackets \cpp{{ }} in C++ inclose a local code block (scope). Variables declared inside a scope can only be accessed from within that scope or a sub-scope. The brackets are also used during the initialization of an object. \item Single line comments are introduced by \cpp{//} and multi-line comments are introduced by \cpp{/* ... */} \end{itemize} Some questions to think about: \begin{itemize} \item What happens if you add another \cpp{main(...)} function to the code? Is it possible to add both main functions, \cpp{main(), main(int, char**)} at the same time? \item There is not just text pushed to the output stream, but also values (numbers). How are numbers printed / converted to strings? How to modify this behavior? \end{itemize} % ================================================================================================= \section{Compiling C++ code\label{sec:compiling}} Compared to scripting or interpreted languages, like Python, Matlab, or JavaScript, C++ code must be translated into machine-readable, executable instructions. This process of translation is called \Index{Compiling}. More generally, one could understand compiling as a transformation of code from one (high-level) language to another (low-level) language. \begin{rem} During the compilation of C++ code, you might even print out intermediate states of its transformation process, like preprocessor output, or assembler output. We will look at these intermediate code in the lecture or exercises to understand better what the compiler is doing with our code. \end{rem} \begin{itemize} \item The process of compiling is performed by a program, called the \Index{compiler}. Typical examples of compilers are \emph{g++}, \emph{clang}, \emph{Intel ICC}, \emph{MSVC}, and others. \item The compiler gets as input a \Index{translation unit}, typically a text file containing the C++ code --- the definition of functions and classes. A program typically consists of many translation units that are combined. \item The output of the compiler is a collection of \Index{object files}, one for each translation unit. \item To generate an executable (or a library) from these object files, the \Index{linker} combines all the objects to a single file. \end{itemize} The process of compiling can be split into several stages: \begin{description} \item[pre-processing] (performed by the \Index{preprocessor}) The content of include files is copied to the include directives, macros and preprocessor constants are evaluated. \item[linguistic analysis] Check of syntax rules. \item[\Index{assembling}] Translation of the language constructs into CPU instructions, \eg in form of assembler code. \item[code output] Transformation of internal code (assembler code) into machine-readable binary code. Collection of symbols into a symbol table with jump references. \end{description} On many linux distributions the C++ compiler of the GNU Compiler Collection (GCC) or the clang compiler of LLVM are preinstalled. Assume that the code from the introductory example is stored in a text file \texttt{distance.cc}. This can be compiled into an executable by % \begin{verbatim} c++ distance.cpp \end{verbatim} % where \texttt{c++} is an alias (often a symbolic link) to the actual compiler. \begin{rem} The version and name of the compiler can be obtained by \texttt{c++ --version}. \end{rem} The result of the compilation is a binary file, named \texttt{a.out}. This is the default executable name, that can be changed by providing the argument \texttt{-o }, \eg % \begin{verbatim} g++ distance.cpp -o distance \end{verbatim} % Later in the lecture we get to know different C++ language features available in a specific version of the C++ standard (see history of C++). The standard can be selected explicitly by the additional argument \texttt{-std=}, \eg for \cxx{11}: % \begin{verbatim} g++ -std=c++11 distance.cpp -o distance \end{verbatim} % where the \texttt{} follows the naming given in the chapter \emph{History of C++}. If you have multiple files to compile, \eg one file provides the implementation of the functions and classes, and the other file just the \cpp{main()} function, we say that we have multiple translation units. Those can be compiled individually and then linked together: % \begin{verbatim} g++ -c file1.cpp g++ -c file2.cpp g++ file1.o file.o -o program \end{verbatim} % The output name of the compiled translation units follows the pattern \texttt{.o}. The compiler allows to combine the compiler and linker call in one line, by listing all the files to compiler one after the other: % \begin{verbatim} g++ file1.cpp file2.cpp -o program \end{verbatim} If a source file depends on some include files (in the top of the file you find the lines \cpp{#include <...>} or \cpp{#include "..."}), the compiler has to search for these \textit{header}-files. It automatically searches in default system paths, but for everything else the compiler has to be pointed to the location of the include files. This can be done by the additional argument \texttt{-I}, \eg % \begin{verbatim} g++ -I/usr/local/library/include/ file1.cpp file2.cpp -o program \end{verbatim} % and if the program depends not only on include files, but also \Index{symbols} (compiled implementations) of library functions, a list of additional libraries to link the executable with has to be appended. Therefore, two arguments are allowed for the compiler: \texttt{-L} and \texttt{-l}, where \texttt{} contains the part of the file name of the library between the prefix \texttt{lib} and the file extension \texttt{.so} or \texttt{.a}. (This might be different on different operating systems, like MacOS or MS Windows). % \begin{verbatim} g++ -I/usr/local/library/include/ file1.cpp file2.cpp -o program \ -L/usr/local/library/lib -llibrary \end{verbatim} % If your project depends on multiple libraries that itself depend on other libraries it gets more and more complicated to put everything correctly into the compile command. To simplify this, there are multiple different \Index{build systems} developed that collect and analyze dependencies and generate compiler commands for you. A classical one is a \Index{Makefile}, that defines various targets that can depend on each other and some way to construct from these targets a sequence of commands to execute in order to compile (build) the executable. Another example is \Index{CMake} (more precisely it is a build system generator). \begin{rem} As you may have noticed, source files that are compiled by the compiler are typically named with a file extension \texttt{.cc}, \texttt{.cpp}, or \texttt{.cxx}. This differs from the include (header) files with file extension \texttt{.h}, \texttt{.hh}, \texttt{.hpp}, or \texttt{.hxx}. Here the first file extension comes from C and is just a abbreviation for \textit{header}. Later in the lecture, we will see source (implementation) files, that are not compiled, but are typically included at the end of the corresponding header file. This is related to template implementations. Sometimes these files are name \texttt{.tpp}, or \texttt{.txx}, but more ofter just \texttt{.impl.hh}, or \texttt{.inc.hh} (with any of the header file extensions from above). While file extensions and naming of files in general is arbitrary, it is recommended to name source and its corresponding header file with the same base name and matching file extensions, \eg \texttt{linear\_algebra.hh} and \texttt{linear\_algebra.cc}. Use the standard file extensions also to get automatic syntax highlighting in your code editor of choice. \end{rem} % ============================================================================== \section{Basic structure of a C++ program\label{sec:code-structure}} Each C++ code resulting in an executable must contain exactly one \cpp{main(...)} function, while both variants % \begin{minted}{c++} int main(); int main(int argc, char* argv[]); // or. int main(int argc, char** argv); \end{minted} % are allowed. The arguments \cpp{argc, argv} are filled when running the executable with command-line arguments. Thereby, the argument \cpp{argc} represents the number of command-line arguments and \cpp{argv} represents and \textit{array} of \textit{strings} (character sequences) representing each individual command-line argument. The fist entry in this array, \cpp{argv[0]}, contains the name of the executed program. \paragraph{Splitting in multiple source files} Code can (and should) be split into multiple translation units representing different components of the program. This splitting means multiple header and source files, where each source file can be translated into an object file without the knowledge of the other source files. Typically, in header files the functions and classes are just \Index{declared}, while in the source file those entities are \Index{defined}. Example 1: A header file contains the \Index{prototype} (interface description) of a function and a class definition. \begin{minted}[frame=lines,label={example.hh}]{c++} #ifndef EXAMPLE_HH #define EXAMPLE_HH // declaration and definition of a class struct Point { double x, y; // declaration of a member function Point subtract(Point const& other) const; }; // declaration of a function double distance(Point const& a, Point const& b); // declaration of a template function template void foo(); #include "example.impl.hh" #endif // EXAMPLE_HH \end{minted} Example 2: The definition of a template function (included at the end of the header file) \begin{minted}[frame=lines,label={example.impl.hh}]{c++} #pragma once // definition of the function foo() template void foo() { /*...*/ } \end{minted} Example 3: The source file, includes the header file and defines the functions \begin{minted}[frame=lines,label={example.cc}]{c++} #include "example.hh" // include the declaration #include // include additional function (declarations) // definition of a member function Point Point::subtract(Point const& other) const { return {this->x - other.x, this->y - other.y}; } // definition of the function distance() double distance(Point const& a, Point const& b) { Point ab = a.subtract(b); return std::sqrt(ab.x * ab.x + ab.y * ab.y); } int main(int argc, char** argv) { Point a{ 1.0, 2.0 }, b{ 7.0,-1.5 }; distance(a,b); return 0; } \end{minted} Some remarks to the examples above: \begin{itemize} \item The triplet \cpp{#ifndef NAME}, \cpp{#define NAME} and \cpp{#endif} builds a so called \textbf{include guard}. It prevents the header file to be included multiple times in the same translation unit. This is not allowed, since the C++ standard imposes a \textbf{one definition rule}, meaning: No translation unit shall contain more than one definition of any variable, function, class type, enumeration type, or template. Another way of enforcing that a file is included only once, is by using the (non-standard) preprocessor directive \cpp{#pragma once} in the top of the include file. This directive is supported by all major compilers and can be used without any problems. \item If you want to (or have to) provide an implementation of a function or class method in a header file, it must be included together with the corresponding declaration. Often this is done by an include statement at the end of the header file. Or the definition is provided together with the declaration. \end{itemize} % ==============================================================================  Praetorius, Simon committed Apr 10, 2021 244 \section{Variables and Datatypes\label{sec:data-type}}  Praetorius, Simon committed Apr 10, 2021 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 C++ is a statically typed language (in contrast to dynamically typed languages like \eg PHP), meaning: each identifier and expression in a C++ program has assigned a type that is already known to the compiler and this type cannot be changed. Examples: \begin{minted}{c++} float x; // x is a single precision floating point number int y = 3+4; // y is an integer variable with initial value 7 float f(int); // f is a function with one integer argument and float return type \end{minted} Here, the variable \cpp{y} is initialized with an expression on the right-hand side of the assignment operator \cpp{=}. This expression \cpp{3+4} also has a type. Since \cpp{3} and \cpp{4} are integer numbers and the result of the addition of two integers is defined to be also an integer, the expression is of type \cpp{int}. \begin{standard}{\S 7.1 (1)} An expression is a sequence of operators and operands that specifies a computation. An expression can result in a value and can cause side effects. \end{standard} \begin{rem} That the expression \cpp{3+4} has the type \cpp{int} is not as trivial as you might think. In some languages it might be a type that could be larger than \cpp{int} but that can hold the value of the addition of these two integers. \end{rem} % ------------------------------------------------------------------------------------------------- \subsection{Automatic type deduction} When the compiler parses an expression, it internally determines its type. In order to create a variable of just that type, the language has introduced the meta-type \cpp{auto}. It is not an actual data-type, but is a placeholder for the type determined by the expression of the initialization: % \begin{minted}{c++} auto x = 3+4; // deduce the type from the expression: x <-- int auto y = long{3+4}; // explicitly committing to a type: y <-- long auto z{3+4}; // same as variable x: z <-- int [C++17] \end{minted} % ------------------------------------------------------------------------------------------------- \subsection{Literals\label{sec:literal}} A literal is a token directly representing a constant value of a concrete type. Examples: \begin{minted}{c++} 42u // unsigned integer literal 108.87e-1 // floating point literal true // boolean literal "Hello" // string literal \end{minted} The type of the literal is often determined by a literal suffix (like in \cpp{42u} the \texttt{u}). All integer literals are just a sequence of digits that has no period or exponent part, while floating point literals must contain a period and/or an exponent part. Character literals are introduced with a single quote, e.g., \cpp{'c'}, and string literals with the double quotes, e.g., \cpp{"Hello"}. There are some more literals, we will see in the exercise. \begin{rem} Since\marginpar{[\cxx11]} \cxx{11} one can define own literals of the form \texttt{built-in literal + \_ + suffix}. This allows, for example, to create numbers with units. Example: \begin{minted}{c++} 101000101_b // binary representation 63_s // seconds 123.45_km // kilometer 33_cent // cent \end{minted} where the implementer is responsible for giving those literals a meaning. \end{rem} \begin{rem} A literal is a \textit{primary expression}. Its type depends on its form (see above). A string literal is an \Index{lvalue}; all other literals are \Index{prvalues} (see chapter about value categories). \end{rem} % ------------------------------------------------------------------------------------------------- \subsection{Declaration -- Definition -- Initialization} \begin{description} \item[Declaration] A declaration may introduce one or more names into a translation unit or redeclare names introduced by previous declarations. \item[Definition] A \textit{declaration} that provides the implementation details of that entity, or in case of variables, reserves memory for the entity. A \textit{declaration} of a class (\cpp{struct}, \cpp{class}, \cpp{enum}, \cpp{union}), function, or method is a definition if the declaration is followed by curly braces containing the implementation body. Variable declarations are always \textit{definitions} unless prefixed with the keyword \cpp{extern}. \item[Initialization] A \textit{definition} with explicit value assignment. \end{description} Examples: \begin{minted}{c++} class Test; // declaration of a class class Test {}; // definition of that class int func(); // declaration of a function int func() { return 7;} // definition of that function extern int i=func(); // definition and initialization of a variable extern int j; // declaration of a variable int k; // definition of a variable int obj(); // !!! declaration of a function \end{minted} A fundamental rule is that you are not allowed to define an object twice. While it may be allowed to declare exactly the same object multiple times, even after the definition. \begin{standard}{\S 6.3 (1)} \textbf{One-definition rule:} No translation unit shall contain more than one definition of any variable, function, class type, enumeration type, or template. \end{standard} \begin{guideline}{Principle} Declare variables as late as possible, usually right before using them the first time and whenever possible not before you can initialize them. \end{guideline} % ------------------------------------------------------------------------------------------------- \subsection{Fundamental Types\label{sec:fundamental-type}} We have seen already some types in the examples above, like integer types and floating-point types. There are more fundamental data-types available in C++. A summary can be found at \url{http://en.cppreference.com/w/cpp/language/types}. Basic types in C++ are categorized into three groups: integral types, floating-point types, and \cpp{void}. Integral types represent integer numbers, while floating-point types might represent fractions. The type \cpp{void} represents the empty set of values. No variable can be declared of type \cpp{void}. Thus, \cpp{void} is an \emph{incomplete type}. It is used as the return type for functions that do not return a value. Any expression can be explicitly converted to type \cpp{void}. \subsubsection{Integral numbers} The group of integral types contains \begin{itemize} \item The boolean type \cpp{bool} with values \cpp{true} and \cpp{false} (both are boolean literals). The size of that type is implementation defined and typically 1 Byte. \item Character types \cpp{char}, \cpp{signed char}, and \cpp{unsigned char} to represent a single character. These are distinct types and either a signed or unsigned integer type of size 1 Byte. There are also larger character types like \cpp{wchar_t}, \cpp{char16_t}, or \cpp{char32_t} to represent larger character sets. \item Standard (signed/unsigned) integer types include \cpp{short int, int, long int, long long int} possibly qualified with the type prefix \cpp{signed, unsigned}. No signed-ness qualification means signed integers. The postfix \cpp{int} may be omitted (except for \cpp{int} itself). The range of representable values for a signed integer type is $-2^{N-1}$ to $2^{N-1} - 1$ (inclusive), where $N$ is called the width of the type. An unsigned integer type has the same width $N$ as the corresponding signed integer type. The range of representable values for the unsigned type is $0$ to $2^{N-1}$ (inclusive). Arithmetic for the unsigned type is performed modulo $2^N$. \emph{Note:} Unsigned arithmetic does not overflow. Overflow for signed arithmetic yields \textbf{undefined behavior} (what this means is explained later). \end{itemize} In the C++ standard the sizes (widths) of the integer types are not specified explicitly, but a minimal size is given. Thus, one finds the relations % \cppline{ sizeof(short) <= sizeof(int) <= sizeof(long) <= sizeof(long long)} % where \cpp{sizeof} is a C++-operator returning the width of a data-type (or an expression) in Byte. On 32-Bit systems, typically the sizes 2, 4, 4, 8 Byte are used, on 64 Bit systems \cpp{long} is often of size 8 Bytes. The\marginpar{[\cxx{11}]} type \cpp{long long} was introduced in \cxx{11} and was available as compiler specific extensions before. \begin{rem} As\marginpar{[\cxx{11}]} for \cpp{char} there are integer type with prescribed width, defined in the header file \cpp{}. Those are named \cpp{std::int16_t, std::uint32_t, ...}. \end{rem} \begin{rem} In the standard library a \textbf{type-alias} is introduced for an integer type often used for vector indices and vector sizes, named \cpp{std::size_t}. It is typically an \cpp{unsigned long int} type, but on some compilers it may be different. The type referenced by \cpp{std::size_t} can store the maximum size of a theoretically possible object of any type (including array). \end{rem} There are special \emph{literals}, suffixes appended to numbers, to indicate explicitly a type: \texttt{U, u, L, l, LL, ll}, where there is no difference between lower and upper case suffixes. \texttt{u} means \cpp{unsigned}, \texttt{l} means \cpp{long}, and \texttt{ll} means \cpp{long long}. Additionally, a prefix can be put in front of the number to indicate a base for the number systems used: \texttt{0} (Null), \texttt{0x}, or \texttt{0b}. Those represent octal, hexadecimal, or binary numbers, respectively. \begin{rem} The type of the integer literal is the first type $\geq$ in which the value can fit and with the right signed-ness. \end{rem} Example: \cppline{ 1234L, 9565ul, 012 == 10, 0x2a == 42} \begin{standard}{\S 5.13.2 (1)} An integer literal is a sequence of digits that has no period or exponent part, with optional separating single quotes that are ignored when determining its value. An integer literal may have a prefix that specifies its base and a suffix that specifies its type. \end{standard} \subsubsection{Floating-point types} Standard types for floating-point numbers are % \cppline{ float, double, long double} % The range of possible values is defined in \cpp{} and the sizes may be compiler dependent, typically 4, 8, 10 Byte. The relation % \cppline{ sizeof(float) <= sizeof(double) <= sizeof(long double)} % holds for the floating-point types. Literals, to indicate how to interpret a number, are \texttt{F,f,L,l} for \cpp{float} and \cpp{long double}. \begin{rem} In GCC an extension is implemented to allow quad-precision arithmetics with the data-type \cpp{__float128}. The size is 16 Byte. Typically, this is implemented as a software library, \eg by concatenating two \cpp{double} types. Only rarely there is hardware support for quad precision numbers (\eg IBM POWER9 CPU). The arithmetic is defined in the standard document \href{https://doi.org/10.1109%2FIEEESTD.2008.4610935}{IEEE 754-2008}. \end{rem} \begin{rem} To do arithmetic with arbitrary precision, there are multiple libraries available. Examples include \href{https://gmplib.org/}{GNU GMP} (Gnu MultiPrecision Arithmetic Library) and \href{https://www.boost.org/doc/libs/1_71_0/libs/multiprecision/doc/html/index.html}{Boost.Multiprecision} library. An example of high precision calculation of the enclosed area of a circle with boost multiprecision is given below. Note, it uses templates to implement the actual algorithm. \end{rem} \begin{minted}[frame=lines,label={multiprecision.cc}]{c++} #include #include #include // Type-alias for floating-point numbers with 50 decimal digits precision and int32_t to represent the exponent using float_50 = boost::multiprecision::cpp_dec_float_50; template T area_of_a_circle(T r) { using boost::math::constants::pi; return pi() * r * r; } template int digits() { return std::numeric_limits::digits10; } int main() { float r_f = float(123) / 100; float a_f = area_of_a_circle(r_f); double r_d = double(123) / 100; double a_d = area_of_a_circle(r_d); float_50 r_mp = float_50(123) / 100; float_50 a_mp = area_of_a_circle(r_mp); // 4.75292 std::cout << std::setprecision(digits()) << a_f << std::endl; // 4.752915525616 std::cout << std::setprecision(digits()) << a_d << std::endl; // 4.7529155256159981904701331745635599135018975843146 std::cout << std::setprecision(digits()) << a_mp << std::endl; } \end{minted} \begin{rem} Arithmetic with floating-point numbers is not the same as arithmetic with real $\mathbb{R}$ numbers. There are effects of rounding, finite representation, cancellation, non-associativity, $\ldots$. Details can be found in the standard document \href{https://standards.ieee.org/content/ieee-standards/en/standard/754-2019.html}{IEEE 754} and are explained in the lecture \emph{Computer Arithmetics} by Prof. W. Walter. \end{rem} \subsection{Number conversion} Whenever you initialize a variable with an expression, the value of that expression must be converted to the type of the variable. Example: \begin{minted}{c++} int l = 1234567890123; // number will be narrowed to fit into integer int \end{minted} \begin{defn} We call an initialization of a value to a smaller type that cannot represent this value a \emph{narrowing initialization} or \emph{narrowing conversion}. \end{defn} In the example above, the compiler will not give any error and compiles fine, although the value might be wrong. Maybe the compiler prints a warning, but not on all warning levels and this is not guaranteed. \begin{guideline}{Principle} Enable all warnings and stick to the C++ standard, \ie use the compiler flags \texttt{-Wall -Wextra -pedantic}, optionally you may even set the flag \texttt{-Werror} to assert an error instead of warnings. \end{guideline} With\marginpar{[\cxx{11}]} \cxx{11} the compiler added the \emph{uniform initialization} using curly brackets, in order to raise an error instead of silently accepting the code, in case of narrowing conversion. This means, almost always use \begin{minted}{c++} long l1{1234567890123}; // or long l2 = {1234567890123}; \end{minted} Some examples of narrowing conversions: \begin{minted}{c++} int i1 = 3.14; // initializes to 3, no error int i2 = {3.14}; // Narrowing ERROR: fractional part lost unsigned u1 = -3; // initializes to largest possible unsigned number unsigned u2{-3}; // Narrowing ERROR: no negative values float f1 = {3.14} // ok. initializes to float number closest to 3.14 double d = 3.14; float f2 = {d}; // Narrowing ERROR. Possible lost of accuracy unsigned u3 = {3}; int i3 = {2}; unsigned u4 = {i3}; // Narrowing ERROR: no negative values int i4 = {u3}; // Narrowing ERROR: no all values \end{minted} \begin{rem} The\marginpar{[\cxx{17}]} curly braces, \ie uniform initialization, also works with the automatic type deduction \cpp{auto}. But be careful! The meaning of the curly braces has changed in \cxx{17} and also before results sometimes in a type different from what you would expect. \begin{minted}{c++} auto x1 = {42}; // C++14 x1 is of type std::initializer_list auto x2 = {42}; // C++17 x2 is of type std::initializer_list auto x3{42}; // C++14: x3 is of type std::initializer_list auto x4{42}; // C++17: x4 is of type int \end{minted} \end{rem} \begin{defn} For floating point values we call a conversion to a smaller data type (\eg \cpp{double -> float}) a \emph{floating-point conversion} (with possibly loss of precision) and otherwise a \emph{floating-point promotion} (represent the value exactly with the larger type). \end{defn} \begin{rem} Note, in floating-point conversion, if a value of \cpp{T1 > T2} is between two floating point number of \cpp{T2}, the rounding to a one of the both values is \emph{implementation defined} and might be controlled with some intrinsic functions. If the value is out of range of \cpp{T2} the behavior is \emph{undefined}. \end{rem} \subsection{Constants\label{sec:const}} An important aspect of programming languages is to control the access to data. A data-type with the property \cpp{const} is called a \emph{constant} and is immutable. The syntax to declare a constant is \cppline{TYPE const VARNAME = VALUE;} The \cpp{const} could also be on the left of the \texttt{TYPE}, but as a rule of thumb put the qualifier \cpp{const} on the right of what should be constant. The compiler will assert an error if you try to modify a constant object. Example: \begin{minted}{c++} int n1 = 0; // non-const object int const n2 = 0; // const object const int n3 = 0; // const object (same as n2) n1 = 1; // OK: mutable object n2 = 2; // ERROR: non-mutable object \end{minted} Constants can be defined using automatic type deduction. Therefore, the keyword \cpp{const} simply qualifies the placeholder \cpp{auto}: % \begin{minted}{c++} auto i1 = 7; // mutable variable auto const i2 = 8; // const integer variable initialized with 8 const auto d1 = 2.0; // const double variable initialized with 2.0 \end{minted} \subsubsection{constexpr specifier} There is another qualifier that is stronger than \cpp{const}: The \cpp{constexpr} specifier declares that it is possible to evaluate the value of the variable at compile time. Such variables can then be used where only compile time constant expressions are allowed. A \cpp{constexpr} specifier used in an object declaration implies \cpp{const}. A \cpp{constexpr} variable must satisfy the following requirements: \begin{itemize} \item its type must be a \emph{LiteralType}. \item it must be immediately initialized \item the full-expression of its initialization, including all implicit conversions, constructors calls, etc, must be a constant expression \end{itemize} The category \emph{LiteralType} cannot yet be fully explained, but especially the fundamental types discussed above are \emph{LiteralTypes}. \begin{rem} \cpp{constexpr} variables, expressions and functions are a powerful tool within C++, available since \cxx{11} and extended in \cxx{14} and \cxx{17}. In the chapter \emph{Meta programming}, we will see how to use \cpp{constexpr} (functions) as a language within C++ to force the compiler to do computations for us. \end{rem} % ------------------------------------------------------------------------------------------------- \subsection{Scopes} Each name that appears in a C++ program is only valid in some possibly discontinuous portion of the source code called its scope. Thus, scopes determine the lifetime and visibility of (non-static) variables and constants. There are different types of scopes, global scope, function scope, class scope, block scope, function parameter scope, namespace scope, \dots. Typically, scopes are blocks of code surrounded by curly braces, except for the global scope that lives outside of functions and classes. A local variable is declared within a block of a function. Its visibility and accessability is limited to within the \texttt{\{ - \}}-enclosed block of its declaration. More precisely, the scope of the variable begins at the point of declaration and ends at the end of the block. \begin{minted}{c++} int main() { // begin of the function block double pi = 3.14; // begin of the variable scope std::cout << pi; } // end of variable scope and function block \end{minted} There might be nested blocks within other blocks that can limit the visibility of names declared inside this block: \begin{minted}{c++} int main() { // begin of the function block { // begin of an inner block scope double pi = 3.14; // begin of the variable scope } // end of variable and block scope std::cout << pi; // ERROR: pi is out of scope } // end of variable scope and function block \end{minted} \begin{guideline}{Principle} Do not put variables in the global scope, i.e., do not use global variables! \end{guideline} \subsubsection{Hiding} In each scope a name can be defined only once (one-definition rule), but in another scope (even nested) the same name can be used to declare a new variable hiding the outer one with lifetime only in that nested scope. \begin{minted}[frame=lines,label={scope.cc}]{c++} int x = 11; // (0) global variable x void f() { // function scope int x; // (1) local x hiding global x x = 1; // assignment to local x (1) { int x; // (2) hides local x (1) x = 2; // assignment to local x (2) } x = 3; // assignment to local x (1) } void f2() { // function scope int y = x; // use global x int x = 1; // (3) hides the global x ::x = 2; // assignment to global x y = x; // use local x (3) x = 2; // assignment to local x (3) } \end{minted} % ================================================================================================= \section{Library types} With the fundamental types and some language constructs we will learn later, you can already build powerful C++ programs. But it is possible to define your own (lass-)types for more complex data-structures, like vectors, tuples, lists, associative maps and sets, and so on. The standard library already defines some of these types thus allows to easily write more advanced programs. Here, I just give an overview about some useful data-structures, later we will look more deeply into the standard library. % ------------------------------------------------------------------------------------------------- \subsection{Strings} Character sequences were already mentioned at the beginning, when discussing the arguments of the function \cpp{main(int, char**)}. This functions accepts a low-level form of arrays of strings its as second argument. But these low-level strings are hard to use correctly. Thus, the standard library defines the type \cpp{std::string} instead: \begin{minted}{c++} #include // for std::cout, std::endl #include int main() { std::string text = "This is a long text"; std::cout << "length = " << text.size() << std::endl; } \end{minted} % ------------------------------------------------------------------------------------------------- \subsection{Container} \subsubsection{Sequence Container} In the header \cpp{} the standard library provides a contiguous resizable vector container with flexible element types: \begin{minted}{c++} #include #include int main() { // vector of 3 doubles std::vector x(3); x[0] = 0.0; x[1] = 1.0; x[2] = 2.0; // direct initialization with values std::vector y = {1.0, 2.0, 3.0}; std::vector y2 = {1.0, 2.0, 3.0}; // since c++17 // add a new entry at the end of the vector x.push_back(3.0); // resize the vector to the specified size y.resize(4); assert(x.size() == y.size()); } \end{minted} The storage of the vector is handled automatically, being expanded and contracted as needed. Vectors usually occupy more space than static arrays (see below), because more memory is allocated to handle future growth. This way a vector does not need to reallocate each time an element is inserted, but only when the additional memory is exhausted. The total amount of allocated memory can be queried using \cpp{capacity()} function. \begin{rem} The \cpp{main()} function needs two argument to represent an array of strings, the size and the address. Instead, one could store this in a vector of strings: \begin{minted}{c++} int main(int argc, char** argv) { std::vector args(argv, argv + argc); // nr of arguments = args.size() // i-th argument = args[i] } \end{minted} There is even a proposal for a future c++ standard to add a signature of the main function providing (something like) strings. \end{rem} \subsubsection{Associative Container} Apart from the \emph{sequence container} \cpp{std::vector}, there is also the associative container \cpp{std::map}, associating a value to a key: \begin{minted}{c++} #include int main() { std::map m; // mapping int -> double m[17] = 42.0 // new association is created on access // test whether the key 42 is found assert(m.count(42) == 1); } \end{minted} This can be combined with vector and string: \begin{minted}{c++} #include #include #include int main() { std::map> m; // mapping string -> vector m["Hello"] = {1,2,3,4,5}; } \end{minted} \begin{rem} Note that maps need more memory and the access is much slower than for a vector. So, if you have an integer key and know the min and max index, prefer a vector over a map, or just use the map as intermediate container to build up the container. \end{rem} \subsection{Iterating over container} All standard containers can be traversed using \emph{range-based for loops} \begin{minted}{c++} #include #include #include int main() { std::map m; // fill up the map std::vector v; // fill up the vector for (auto i : m) std::cout << i.first << ", " << i.second << std::endl; // (key, value) pair, see below for (double d : v) std::cout << d << std::endl; } \end{minted} If you don't know the type of the elements in traversal or it is complicated to write (like for \cpp{std::map}), use (qualified) \cpp{auto} instead. % ------------------------------------------------------------------------------------------------- \subsection{Tuples} A vector represents a tuple of values of the same type. If the type should be different for each element, one could use a \cpp{std::tuple} instead. There, the types of all elements must be given explicitly: % \begin{minted}{c++} std::tuple t = {1, 2.0, 3.0f}; std::tuple t2 = {1, 2.0, 3.0f}; // c++17 \end{minted} % To access an entry in a tuple, one cannot use the classical bracket operator \texttt{[]} as for vectors, but has to call a function instead: % \begin{minted}{c++} double t1 = std::get<1>(t); \end{minted} A special tuple is a pair. It consists of just two elements: % \begin{minted}{c++} std::pair p = {1, 2.0}; std::pair p2 = {1, 2.0}; // c++17 \end{minted} % Here, the elements can be accessed again using \cpp{std::get}, but have also an explicit name: % \begin{minted}{c++} int p0 = p.first; double p1 = p.second; \end{minted} Tuples can be used to return multiple values from a function (see below) and to assign multiple values to a set of variables: % \begin{minted}{c++} // create a tuple from values auto t = std::tuple{0, 1.0, 2.0f}; // assign the tuple entries to variables int t0; double t1; float t2; // not used std::tie(t0,t1,std::ignore) = t; \end{minted} % where \cpp{std::ignore} is an object of unspecified type such that any value can be assigned to it with no effect, and \cpp{std::tie} is a function that takes references to variables and assigns the value of a tuple to it. \subsubsection{Structured Binding} We have seen the effect of \cpp{std::tie} in the last section. There, we had to declare the variables before we can assign values to it and we have to know the types explicitly. With\marginpar{[\cxx{17}]} \cxx{17} this can be combined with \cpp{auto} to create new variables with types deduced from the tuple elements. This is called \Index{structured binding}: % \begin{minted}{c++} auto [t0,t1,t2] = t; \end{minted} % Again, as for classical \cpp{auto} type deduction, the variable declaration can be extended by the const qualifiers: % \begin{minted}{c++} auto const [t0,t1,t2] = t; \end{minted} This structured binding does not only work for tuple-like structures, but also for structs: % \begin{minted}{c++} std::pair p {1, "pair"}; auto [first,second] = p; assert(first == p.first && second == p.second); Point point {0.0, 4.0}; auto [x,y] = point; assert(x == point.x && y == point.y); \end{minted} \subsection{Iterating over associative containers} Recapitulate the iteration example: \begin{minted}{c++} #include #include int main() { std::map m; // fill up the map for (auto i : m) std::cout << i.first << ", " << i.second << std::endl; // (key, value) pair } \end{minted} In the iteration we get a pair \texttt{(key, value)}. This can be split up automatically using structured binding: \begin{minted}{c++} for (auto [key,value] i : m) std::cout << key << ", " << value << std::endl; \end{minted} \subsection{Iterating over tuples} Tuples (and pairs) cannot be traversed like other containers. The reason is that each element in a tuple has a different type and in a loop the elements must have the same type in each iteration. Currently the standard committee discusses and extended version of a loop, like \cpp{for...(auto t : tuple)} or \cpp{for constexpr(auto t : tuple)}. But this is not yet decided. We will see later in the chapter about meta-programming how to write a loop over tuples yourself. Then you get something like \begin{minted}{c++} forEach(tuple, [](auto t) { std::cout << t << std::endl; }); \end{minted} that looks quite similar to a regular loop but works with tuples and pairs and many more. In \cxx{17} some utility functions for tuples are introduced. An example is \cpp{std::apply} to apply an function to each entry of a tuple. This can be used to emulate a \texttt{forEach} loop: \begin{minted}{c++} std::apply([](auto... t) { ((std::cout << t << std::endl), ...); }, tuple); \end{minted} This uses advanced feature like lambda expressions, variadic templates and fold expressions.