v0.5.8, Copyright © November 19, 2020
No point in wasting words here, folks, let’s jump straight into the C code:
'{'?0:3:
E((ck?main((z?(stat(M,&t)?P+=a+255,
execv(M,k),a=G,i=P,y=G&'@'-3?A(*L(V(%d+%d)+%d,0) sprintf(Q,y/
And they lived happily ever after. The End.
What’s this? You say something’s still not clear about this whole C programming language thing?
Well, to be quite honest, I’m not even sure what the above code does. It’s a snippet from one of the entires in the 2001 International Obfuscated C Code Contest1, a wonderful competition wherein the entrants attempt to write the most unreadable C code possible, with often surprising results.
The bad news is that if you’re a beginner in this whole thing, all C code you see probably looks obfuscated! The good news is, it’s not going to be that way for long.
What we’ll try to do over the course of this guide is lead you from complete and utter sheer lost confusion on to the sort of enlightened bliss that can only be obtained though pure C programming. Right on.
This guide assumes that you’ve already got some programming knowledge under your belt from another language, such as Python2, JavaScript3, Java4, Rust5, Go6, Swift7, etc. (Objective-C8 devs will have a particularly easy time of it!)
We’re going to assume you know what variables are, what loops do, how functions work, and so on.
If that’s not you for whatever reason the best I can hope to provide is some pastey entertainment for your reading pleasure. The only thing I can reasonably promise is that this guide won’t end on a cliffhanger…or will it?
I’ll try to stick to Plain Ol’-Fashioned ISO-standard C9. Well, for the most part. Here and there I might go crazy and start talking about POSIX10 or something, but we’ll see.
Unix users (e.g. Linux, BSD, etc.) try running cc
or gcc
from the command line–you might already have a compiler installed. If you don’t, search your distribution for installing gcc
or clang
.
Windows users should check out Visual Studio Community11. Or, if you’re looking for a more Unix-like experience (recommended!), install WSL12 and gcc
.
Mac users will want to install XCode13, and in particular the command line tools.
There are a lot of compilers out there, and virtually all of them will work for this book. And for those not in the know, a C++ compiler will compile C most code, so it’ll work for the purposes of this guide.
This official location of this document is http://beej.us/guide/bgc/14. Maybe this’ll change in the future, but it’s more likely that all the other guides are migrated off Chico State computers.
I’m generally available to help out with email questions so feel free to write in, but I can’t guarantee a response. I lead a pretty busy life and there are times when I just can’t answer a question you have. When that’s the case, I usually just delete the message. It’s nothing personal; I just won’t ever have the time to give the detailed answer you require.
As a rule, the more complex the question, the less likely I am to respond. If you can narrow down your question before mailing it and be sure to include any pertinent information (like platform, compiler, error messages you’re getting, and anything else you think might help me troubleshoot), you’re much more likely to get a response.
If you don’t get a response, hack on it some more, try to find the answer, and if it’s still elusive, then write me again with the information you’ve found and hopefully it will be enough for me to help out.
Now that I’ve badgered you about how to write and not write me, I’d just like to let you know that I fully appreciate all the praise the guide has received over the years. It’s a real morale boost, and it gladdens me to hear that it is being used for good! :-)
Thank you!
You are more than welcome to mirror this site, whether publicly or privately. If you publicly mirror the site and want me to link to it from the main page, drop me a line at beej@beej.us
.
If you want to translate the guide into another language, write me at beej@beej.us
and I’ll link to your translation from the main page. Feel free to add your name and contact info to the translation.
Please note the license restrictions in the Copyright and Distribution section, below.
Beej’s Guide to Network Programming is Copyright © 2020 Brian “Beej Jorgensen” Hall.
With specific exceptions for source code and translations, below, this work is licensed under the Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-nd/3.0/
or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.
One specific exception to the “No Derivative Works” portion of the license is as follows: this guide may be freely translated into any language, provided the translation is accurate, and the guide is reprinted in its entirety. The same license restrictions apply to the translation as to the original guide. The translation may also include the name and contact information for the translator.
The C source code presented in this document is hereby granted to the public domain, and is completely free of any license restriction.
Educators are freely encouraged to recommend or supply copies of this guide to their students.
Contact beej@beej.us
for more information.
“Where do these stairs go?”
“They go up.”—Ray Stantz and Peter Venkman, Ghostbusters
C is a low-level language.
It didn’t used to be. Back in the day when people carved punch cards out of granite, C was an incredible way to be free of the drudgery of lower-level languages like assembly15.
But now in these modern times, current-generation languages offer all kinds of features that didn’t exist in 1972 when C was invented. This means C is a pretty basic language with not a lot of features. It can do anything, but it can make you work for it.
So why would we even use it today?
As a learning tool: not only is C a venerable piece of computing history, but it is connected to the bare metal16 in a way that present-day languages are not. When you learn C, you learn about how software interfaces with computer memory at a low level. There are no seatbelts. You’ll write software that crashes, I assure you. And that’s all part of the fun!
As a useful tool: C still is used for certain applications, such as building operating systems17 or in embedded systems18. (Though the Rust19 programming language is eyeing both these fields!)
If you’re familiar with another language, a lot of things about C are easy. C inspired many other languages, and you’ll see bits of it in Go, Rust, Swift, Python, JavaScript, Java, and all kinds of other languages. Those parts will be familiar.
The one thing about C that hangs people up is pointers. Virtually everything else is familiar, but pointers are the weird one. The concept behind pointers is likely one you already know, but C forces you to be explicit about it, using operators you’ve likely never seen before.
It’s especially insidious because once you grok20 pointers, they’re suddenly easy. But up until that moment, they’re slippery eels.
Everything else in C is just memorizing another way (or sometimes the same way!) of doing something you’ve done already. Pointers are the weird bit.
So get ready for a rollicking adventure as close to the core of the computer as you can get without assembly, in the most influential computer language of all time21. Hang on!
This is the canonical example of a C program. Everyone uses it. (Note that the numbers to the left are for reader reference only, and are not part of the source code.)
/* Hello world program */
#include <stdio.h>
int main(void)
{
printf("Hello, World!\n"); // Actually do the work here
return 0;
}
We’re going to don our long-sleeved heavy-duty rubber gloves, grab a scalpel, and rip into this thing to see what makes it tick. So, scrub up, because here we go. Cutting very gently…
Let’s get the easy thing out of the way: anything between the digraphs /*
and */
is a comment and will be completely ignored by the compiler. Same goes for anything on a line after a //
. This allows you to leave messages to yourself and others, so that when you come back and read your code in the distant future, you’ll know what the heck it was you were trying to do. Believe me, you will forget; it happens.
Now, what is this #include
? GROSS! Well, it tells the C Preprocessor to pull the contents of another file and insert it into the code right there.
Wait—what’s a C Preprocessor? Good question. There are two stages (well, technically there are more than two, but hey, let’s pretend there are two and have a good laugh) to compilation: the preprocessor and the compiler. Anything that starts with pound sign, or “octothorpe”, (#
) is something the preprocessor operates on before the compiler even gets started. Common preprocessor directives, as they’re called, are #include
and #define
. More on that later.
Before we go on, why would I even begin to bother pointing out that a pound sign is called an octothorpe? The answer is simple: I think the word octothorpe is so excellently funny, I have to gratuitously spread its name around whenever I get the opportunity. Octothorpe. Octothorpe, octothorpe, octothorpe.
So anyway. After the C preprocessor has finished preprocessing everything, the results are ready for the compiler to take them and produce assembly code22, machine code23, or whatever it’s about to do. Don’t worry about the technical details of compilation for now; just know that your source runs through the preprocessor, then the output of that runs through the compiler, then that produces an executable for you to run. Octothorpe.
What about the rest of the line? What’s <stdio.h>
? That is what is known as a header file. It’s the dot-h at the end that gives it away. In fact it’s the “Standard I/O” (stdio) header file that you will grow to know and love. It contains preprocessor directives and function prototypes (more on that later) for common input and output needs. For our demo program, we’re outputting the string “Hello, World!”, so we in particular need the function prototype for the printf()
function from this header file. Basically, if we tried to use printf()
without #include <stdio.h>
, the compiler would have complained to us about it.
How did I know I needed to #include <stdio.h>
for printf()
? Answer: it’s in the documentation. If you’re on a Unix system, man printf
and it’ll tell you right at the top of the man page what header files are required. Or see the reference section in this book. :-)
Holy moly. That was all to cover the first line! But, let’s face it, it has been completely dissected. No mystery shall remain!
So take a breather…look back over the sample code. Only a couple easy lines to go.
Welcome back from your break! I know you didn’t really take a break; I was just humoring you.
The next line is main()
. This is the definition of the function main()
; everything between the squirrelly braces ({
and }
) is part of the function definition.
How do you call a different function, anyway? The answer lies in the printf()
line, but we’ll get to that in a minute.
Now, the main function is a special one in many ways, but one way stands above the rest: it is the function that will be called automatically when your program starts executing. Nothing of yours gets called before main()
. In the case of our example, this works fine since all we want to do is print a line and exit.
Oh, that’s another thing: once the program executes past the end of main()
, down there at the closing squirrelly brace, the program will exit, and you’ll be back at your command prompt.
So now we know that that program has brought in a header file, stdio.h
, and declared a main()
function that will execute when the program is started. What are the goodies in main()
?
I am so happy you asked. Really! We only have the one goodie: a call to the function printf()
. You can tell this is a function call and not a function definition in a number of ways, but one indicator is the lack of squirrelly braces after it. And you end the function call with a semicolon so the compiler knows it’s the end of the expression. You’ll be putting semicolons after most everything, as you’ll see.
You’re passing one parameter to the function printf()
: a string to be printed when you call it. Oh, yeah—we’re calling a function! We rock! Wait, wait—don’t get cocky. What’s that crazy \n
at the end of the string? Well, most characters in the string look just like they are stored. But there are certain characters that you can’t print on screen well that are embedded as two-character backslash codes. One of the most popular is \n
(read “backslash-N”) that corresponds to the newline character. This is the character that causing further printing to continue on the next line instead of the current. It’s like hitting return at the end of the line.
So copy that code into a file called hello.c
and build it. On a Unix-like platform (e.g. Linux, BSD, Mac, or WSL), you’ll build with a command like so:
gcc -o hello hello.c
(This means “compile hello.c
, and output an executable called hello
”.)
After that’s done, you should have a file called hello
that you can run with this command:
./hello
(The leading ./
tells the shell to “run from the current directory”.)
And see what happens:
Hello, World!
It’s done and tested! Ship it!
Let’s talk a bit more about how to build C programs, and what happens behind the scenes there.
Like other languages, C has source code. But, depending on what language you’re coming from, you might never have had to compile your source code into an executable.
Compilation is the process of taking your C source code and turning it into a program that your operating system can execute.
JavaScript and Python devs aren’t used to a separate compilation step at all–though behind the scenes it’s happening! Python compiles your source code into something called bytecode that the Python virtual machine can execute. Java devs are used to compilation, but that produces bytecode for the Java Virtual Machine.
When compiling C, machine code is generated. This is the 1s and 0s that can be executed directly by the CPU.
Languages that typically aren’t compiled are called interpreted languages. But as we mentioned with Java and Python, they also have a compilation step. And there’s no rule saying that C can’t be interpreted. (There are C interpreters out there!) In short, it’s a bunch of gray areas. Compilation in general is just taking source code and turning it into another, more easily-executed form.
The C compiler is the program that does the compilation.
As we’ve already said, gcc
is a compiler that’s installed on a lot of Unix-like operating systems24. And it’s commonly run from the command line in a terminal, but not always. You can run it from your IDE, as well.
But we’ll do some command line examples here because there are too many IDEs to cover. Search the Internet for your IDE and “how to compile C” for more information.
So how do we do command line builds?
gcc
If you have a source file called hello.c
in the current directory, you can build that into a program called hello
with this command typed in a terminal:
gcc -o hello hello.c
The -o
means “output to this file”25. And there’s hello.c
at the end, the name of the file we want to compile.
If your source is broken up into multiple files, you can compile them all together (almost as if they were one file, but the rules are actually more complex than that) by putting all the .c
files on the command line:
gcc -o awesomegame ui.c characters.c npc.c items.c
and they’ll all get built together into a big executable.
That’s enough to get started—later we’ll talk details about multiple source files, object files, and all kinds of fun stuff.
“It takes all kinds to make a world, does it not, Padre?”
“So it does, my son, so it does.”
—Pirate Captain Thomas Bartholomew Red to the Padre, Pirates
There sure can be lotsa stuff in a C program.
Yup.
And for various reasons, it’ll be easier for all of us if we classify some of the types of things you can find in a program, so we can be clear what we’re talking about.
It’s said that “variables hold values”. But another way to think about it is that a variable is a human-readable name that refers to some data in memory.
We’re going to take a second here and take a peek down the rabbit hole that is pointers. Don’t worry about it.
You can think of memory as a big array of bytes26 Data is stored in this “array”27. If a number is larger than a single byte, it is stored in multiple bytes. Because memory is like an array, each byte of memory can be referred to by its index. This index into memory is also called an address, or a location, or a pointer.
When you have a variable in C, the value of that variable is in memory somewhere, at some address. Of course. After all, where else would it be? But it’s a pain to refer to a value by its numeric address, so we make a name for it instead, and that’s what the variable is.
The reason I’m bringing all this up is twofold:
So a variable is a name for some data that’s stored in memory at some address.
You can use any characters in the range 0-9, A-Z, a-z, and underscore for variable names, with the following rules:
For Unicode, things get a little different, but the basic idea is that you can start or continue the variable name with one of the characters listed in C99 §D.1, and you can continue but not start a variable name with any of the characters listed in C99 §D.2.
Since those are just number ranges, I’m not going to reproduce them here. If you’re in an environment that supports Unicode, just try it and see if it works.
Just don’t start a variable name with the “Combining Left Harpoon Above” character and you’ll be fine.
Depending on which languages you already have in your toolkit, you might or might not be familiar with the idea of types. But C’s kinda picky about them, so we should do a refresher.
Some example types:
Type | Example | C Type |
---|---|---|
Integer | 3490 |
int |
Floating point | 3.14159 |
float |
Character (single) | 'c' |
char |
String | "Hello, world!" |
char * 28 |
C makes an effort to convert automatically between most numeric types when you ask it to. But other than that, all conversions are manual, notably between string and numeric.
Almost all of the types in C are variants on these types.
Before you can use a variable, you have to declare that variable and tell C what type the variable holds. Once declared, the type of variable cannot be changed later at runtime. What you set it to is what it is until it falls out of scope and is reabsorbed into the universe.
Let’s take our previous “Hello, world” code and add a couple variables to it:
#include <stdio.h>
int main(void)
{
int i; /* holds signed integers, e.g. -3, -2, 0, 1, 10 */
float f; /* holds signed floating point numbers, e.g. -3.1416 */
printf("Hello, World!\n"); /* ah, blessed familiarity */
return 0;
}
There! We’ve declared a couple of variables. We haven’t used them yet, and they’re both uninitialized. One holds an integer number, and the other holds a floating point number (a real number, basically, if you have a math background).
Uninitialized variables have indeterminate value29. They have to be initialized or else you must assume they contain some nonsense number.
This is one of the places C can “get you”. Much of the time, in my experience, the indeterminate value is zero… but it can vary from run to run! Never assume the value will be zero, even if you see it is. Always explicitly initialize variables to some value before you use them!
What’s this? You want to store some numbers in those variables? Insanity!
Let’s go ahead and do that:
int main(void)
{
int i;
i = 2; // Assign the value 2 into the variable i
printf("Hello, World!\n");
return 0;
}
Killer. We’ve stored a value. Let’s print it.
We’re going to do that by passing two amazing parameters to the printf()
function. The first argument is a string that describes what to print and how to print it (called the format string), and the second is the value to print, namely whatever is in the variable i
.
printf()
hunts through the format string for a variety of special sequences which start with a percent sign (%
) that tell it what to print. For example, if it finds a %d
, it looks to the next parameter that was passed, and prints it out as an integer. If it finds a %f
, it prints the value out as a float. If it finds a %s
, it prints a string.
As such, we can print out the value of various types like so:
int main(void)
{
int i = 2;
float f = 3.14;
char *s = "Hello, world!"; // char * ("char pointer") is the string type
printf("%s i = %d and f = %f!\n", s, i, f);
return 0;
}
And the output will be:
Hello, world! i = 2 and f = 3.14!
In this way, printf()
might be similar to various types of format or parameterized strings in other languages you’re familiar with.
C has Boolean types, true or false?
1!
Historically, C didn’t have a Boolean type, and some might argue it still doesn’t.
In C, 0
means “false”, and non-zero means “true”.
So 1
is true. And 37
is true. And 0
is false.
You can just declare Boolean types as int
s:
int x = 1;
if (x) {
"x is true!\n");
printf( }
If you #include <stdbool.h>
, you also get access to some symbolic names that might make things look more familiar, namely a bool
type and true
and false
values:
#include <stdio.h>
#include <stdbool.h>
int main(void) {
bool x = true;
if (x) {
printf("x is true!\n");
}
return 0;
}
But these are identical to using integer values for true and false. They’re just a facade to make things look nice.
C operators should be familiar to you from other languages. Let’s blast through some of them here.
(There are a bunch more details than this, but we’re going to do enough in this section to get started.)
sizeof
OperatorThis operator tells you the size (in bytes) that a particular variable or data type uses in memory.
This can be different on different systems, except for char
(which is always 1 byte).
And this might not seem very useful now, but we’ll be making reference to it here and there, so it’s worth covering.
You can take the sizeof
a variable or expression:
int a = 999;
"%zu", sizeof a); // Prints 8 on my system
print("%zu", sizeof 3.14); // Prints 8 on my system, also print(
or you can take the sizeof
a type (note the parentheses are required around a type name, unlike an expression):
"%zu", sizeof(int)); // Prints 8 on my system
print("%zu", sizeof(char)); // Prints 1 on all systems print(
We’ll make use of this later on.
Hopefully these are familiar:
3; // addition (+) and assignment (=) operators, add 3 to i
i = i + 8; // subtraction, subtract 8 from i
i = i - 9; // multiplication
i = i * 2; // division
i = i / 5; // modulo (division remainder) i = i %
There are shorthand variants for all of the above. Each of those lines could more tersely be written as:
3; // Same as "i = i + 3", add 3 to i
i += 8; // Same as "i = i - 8"
i -= 9; // Same as "i = i * 9"
i *= 2; // Same as "i = i / 2"
i /= 5; // Same as "i = i % 5" i %=
There is no exponentiation. You’ll have to use one of the pow()
function variants from math.h
.
Let’s get into some of the weirder stuff you might not have in your other languages!
C also includes the ternary operator. This is an expression whose value depends on the result of a conditional embedded in it.
// If x > 10, add 17 to y. Otherwise add 37 to y.
10? 17: 37; y += x >
What a mess! You’ll get used to it the more you read it. To help out a bit, I’ll rewrite the above expression using if
statements:
// This expression:
10? 17: 37;
y += x >
// is equivalent to this non-expression:
if (x > 10)
17;
y += else
37; y +=
Or, another example that prints if a number stored in x
is odd or even:
printf("The number %d is %s.\n", x, x % 2 == 0?"even": "odd")
The %s
format specifier in printf()
means print a string. If the expression x % 2
evaluates to 0
, the value of the entire ternary expression evaluates to the string "even"
. Otherwise it evaluates to the string "odd"
. Pretty cool!
It’s important to note that the ternary operator isn’t flow control like the if
statement is. It’s just an expression that evaluates to a value.
Now, let’s mess with another thing that you might not have seen.
These are the legendary post-increment and post-decrement operators:
// Add one to i (post-increment)
i++; // Subtract one from i (post-decrement) i--;
Very commonly, these are just used as shorter versions of:
1; // Add one to i
i += 1; // Subtract one from i i -=
but they’re more subtly different than that, the clever scoundrels.
Let’s take a look at this variant, pre-increment and pre-decrement:
// Add one to i (pre-increment)
++i; // Subtract one from i (pre-decrement) --i;
With pre-increment and pre-decrement, the value of the variable is incremented or decremented before the expression is evaluated. Then the expression is evaluated with the new value.
With post-increment and post-decrement, the value of the expression is first computed with the value as-is, and then the value is incremented or decremented after the value of the expression has been determined.
You can actually embed them in expressions, like this:
10;
i = 5 + i++; // Compute 5 + i, _then_ increment i
j =
"%d, %d\n", i, j); // Prints 11, 15 printf(
Let’s compare this to the pre-increment operator:
10;
i = 5 + ++i; // Increment i, _then_ compute 5 + i
j =
"%d, %d\n", i, j); // Prints 11, 16 printf(
This technique is used frequently with array and pointer access and manipulation. It gives you a way to use the value in a variable, and also increment or decrement that value before or after it is used.
But by far the most common place you’ll see this is in a for
loop:
for (i = 0; i < 10; i++)
"i is %d\n"); printf(
But more on that later.
This is an uncommonly-used way to separated expressions that will run left to right:
10, y = 20; // First assign 10 to x, then 20 to y x =
Seems a bit silly, since you could just replace the comma with a semicolon, right?
10; y = 20; // First assign 10 to x, then 20 to y x =
But that’s a little different. The latter is two separate expressions, while the former is a single expression!
With the comma operator, the value of the comma expression is the value of the rightmost expression:
1, 2, 3;
x =
"x is %d\n", x); // Prints 3, because 3 is rightmost in the comma list printf(
But even that’s pretty contrived. One common place the comma operator is used is in for
loops to do multiple things in each section of the statement:
for (i = 0, j = 10; i < 100; i++, j++)
"%d, %d\n", i, j); printf(
We’ll revisit that later.
For Boolean values, we have a raft of standard operators:
// True if a is equivalent to b
a == b; // True if a is not equivalent to b
a != b; // True if a is less than b
a < b; // True if a is greater than b
a > b; // True if a is less than or equal to b
a <= b; // True if a is greater than or equal to b a >= b;
Don’t mix up assignment =
with comparison ==
! Use two equals to compare, one to assign.
We can use the comparison expressions with if
statements:
if (a <= 10)
printf("Success!\n");
We can chain together or alter conditional expressions with Boolean operators for and, or, and not.
Operator | Boolean meaning |
---|---|
&& |
and |
|| |
or |
! |
not |
An example of Boolean “and”:
// Do something if x less than 10 and y greater than 20:
if (x < 10 && y > 20)
"Doing something!\n"); printf(
An example of Boolean “not”:
if (!(x < 12))
"x is not less than 12\n"); printf(
!
has higher precedence than the other Boolean operators, so we have to use parentheses in that case.
Of course, that’s just the same as:
if (x >= 12)
"x is not less than 12\n"); printf(
but I needed the example!
Booleans are all good, but of course we’re nowhere if we can’t control program flow. Let’s take a look at a number of constructs: if
, for
, while
, and do-while
.
First, a general forward-looking note about statements and blocks of statements brought to you by your local friendly C developer:
After something like an if
or while
statement, you can either put a single statement to be executed, or a block of statements to all be executed in sequence.
Let’s start with a single statement:
if (x == 10) printf("x is 10");
This is also sometimes written on a separate line. (Whitespace is largely irrelevant in C—it’s not like Python.)
if (x == 10)
"x is 10\n"); printf(
But what if you want multiple things to happen due to the conditional? You can use squirrelly braces to mark a block or compound statement.
if (x == 10) {
"x is 10\n");
printf("And also this happens when x is 10\n");
printf( }
It’s a really common style to always use squirrelly braces even if they aren’t necessary:
if (x == 10) {
"x is 10\n");
printf( }
Some devs feel the code is easier to read and avoids errors like this where things visually look like they’re in the if
block, but actually they aren’t.
// BAD ERROR EXAMPLE
if (x == 10)
"x is 10\n");
printf("And also this happens ALWAYS\n"); // Surprise!! Unconditional! printf(
while
and for
and the other looping constructs work the same way as the examples above. If you want to do multiple things in a loop or after an if
, wrap them up in squirrelly braces.
In other words, the if
is going to run the one thing after the if
. And that one thing can be a single statement or a block of statements.
if
statementWe’ve already been using if
for multiple examples, since it’s likely you’ve seen it in a language before, but here’s another:
int i = 10;
if (i > 10) {
"Yes, i is greater than 10.\n");
printf("And this will also print if i is greater than 10.\n");
printf(
}
if (i <= 10) printf("i is less than or equal to 10.\n");
In the example code, the message will print if i
is greater than 10, otherwise execution continues to the next line. Notice the squirrley braces after the if
statement; if the condition is true, either the first statement or expression right after the if will be executed, or else the collection of code in the squirlley braces after the if
will be executed. This sort of code block behavior is common to all statements.
while
statementwhile
is your average run-of-the-mill looping construct. Do a thing while a condition expression is true.
Let’s do one!
// print the following output:
//
// i is now 0!
// i is now 1!
// [ more of the same between 2 and 7 ]
// i is now 8!
// i is now 9!
0;
i =
while (i < 10) {
"i is now %d!\n", i);
printf(
i++;
}
"All done!\n"); printf(
That gets you a basic loop. C also has a for
loop which would have been cleaner for that example.
A not-uncommon use of while
is for infinite loops where you repeat while true:
while (1) {
"1 is always true, so this repeats forever.\n");
printf( }
do-while
statementSo now that we’ve gotten the while
statement under control, let’s take a look at its closely related cousin, do-while
.
They are basically the same, except if the loop condition is false on the first pass, do-while
will execute once, but while
won’t execute at all. Let’s see by example:
/* using a while statement: */
10;
i =
// this is not executed because i is not less than 10:
while(i < 10) {
"while: i is %d\n", i);
printf(
i++;
}
/* using a do-while statement: */
10;
i =
// this is executed once, because the loop condition is not checked until
// after the body of the loop runs:
do {
"do-while: i is %d\n", i);
printf(
i++;while (i < 10);
}
"All done!\n"); printf(
Notice that in both cases, the loop condition is false right away. So in the while
, the loop fails, and the following block of code is never executed. With the do-while
, however, the condition is checked after the block of code executes, so it always executes at least once. In this case, it prints the message, increments i
, then fails the condition, and continues to the “All done!” output.
The moral of the story is this: if you want the loop to execute at least once, no matter what the loop condition, use do-while
.
All these examples might have been better done with a for
loop. Let’s do something less deterministic—repeat until a certain random number comes up!
#include <stdio.h> // For printf
#include <stdlib.h> // For rand
int main(void)
{
int r;
do {
r = rand() % 100; // Get a random number between 0 and 99
printf("%d\n", r);
} while (r != 37); // Repeat until 37 comes up
return 0;
}
for
statementWelcome to one of the most popular loops in the world! The for
loop!
This is a great loop if you know the number of times you want to loop in advance.
You could do the same thing using just a while
loop, but the for
loop can help keep the code cleaner.
Here are two pieces of equivalent code—note how the for
loop is just a more compact representation:
// Print numbers between 0 and 9, inclusive...
// Using a while statement:
0;
i = while (i < 10) {
"i is %d\n", i);
printf(
i++;
}
// Do the exact same thing with a for-loop:
for (i = 0; i < 10; i++) {
"i is %d\n", i);
printf( }
That’s right, folks—they do exactly the same thing. But you can see how the for
statement is a little more compact and easy on the eyes. (JavaScript users will fully appreciate its C origins at this point.)
It’s split into three parts, separated by semicolons. The first is the initialization, the second is the loop condition, and the third is what should happen at the end of the block if the loop condition is true. All three of these parts are optional.
for (initialize things; loop if this is true; do this after each loop)
Note that the loop will not execute even a single time if the loop condition starts off false.
for
-loop fun fact!You can use the comma operator to do multiple things in each clause of the
for
loop!for (i = 0, j = 999; i < 10; i++, j--) { printf("%d, %d\n", i, j); }
An empty for
will run forever:
for(;;) { // "forever"
"I will print this again and again and again\n" );
printf("for all eternity until the cold-death of the universe.\n");
printf( }
Very much like other languages you’re used to, C has the concept of functions.
Functions can accept a variety of arguments and return a value. One important thing, though: the arguments and return value types are predeclared—because that’s how C likes it!
Let’s take a look at a function. This is a function that takes an int
as an argument, and returns an int
.
The int
before the plus_one
indicates the return type.
The int n
indicates that this function takes one int
argument, stored in parameter n
.
Continuing the program down into main()
, we can see the call to the function, where we assign the return value into local variable j
:
int main(void)
{
int i = 10, j;
j = plus_one(i); // The "call"
printf("i + 1 is %d\n", j);
return 0;
}
Before I forget, notice that I defined the function before I used it. If hadn’t done that, the compiler wouldn’t know about it yet when it compiles
main()
and it would have given an unknown function call error. There is a more proper way to do the above code with function prototypes, but we’ll talk about that later.
Also notice that main()
is a function!
It returns an int
.
But what’s this void
thing? This is a keyword that’s used to indicate that the function accepts no arguments.
You can also return void
to indicate that you don’t return a value:
// This function takes no parameters and returns no value:
void hello(void)
{
printf("Hello, world!\n");
}
int main(void)
{
hello(); // Prints "Hello, world!"
}
When you pass a value to a function, a copy of that value gets made in this magical mystery world known as the stack30. (The stack is just a hunk of memory somewhere that the program allocates memory on. Some of the stack is used to hold the copies of values that are passed to functions.)
For now, the important part is that a copy of the variable or value is being passed to the function. The practical upshot of this is that since the function is operating on a copy of the value, you can’t affect the value back in the calling function directly. Like if you wanted to increment a value by one, this would NOT work:
You might somewhat sensibly think that the value of i
after the call would be 11, since that’s what the ++
does, right? This would be incorrect. What is really happening here?
Well, when you pass i
to the increment()
function, a copy gets made on the stack, right? It’s the copy that increment()
works on, not the original; the original i
is unaffected. We even gave the copy a name: a
, right? It’s right there in the parameter list of the function definition. So we increment a
, sure enough, but what good does that do us out in main()
? None! Ha!
That’s why in the previous example with the plus_one()
function, we return
ed the locally modified value so that we could see it again in main()
.
Seems a little bit restrictive, huh? Like you can only get one piece of data back from a function, is what you’re thinking. There is, however, another way to get data back; C folks call it passing by reference. But no fancy-schmancy name will distract you from the fact that EVERYTHING you pass to a function WITHOUT EXCEPTION is copied onto the stack and the function operates on that local copy, NO MATTER WHAT. Remember that, even when we’re talking about this so-called passing by reference.
But that’s a story for another time.
So if you recall back in the ice age a few sections ago, I mentioned that you had to define the function before you used it, otherwise the compiler wouldn’t know about it ahead of time, and would bomb out with an error.
This isn’t quite strictly true. You can notify the compiler in advance that you’ll be using a function of a certain type that has a certain parameter list and that way the function can be defined anywhere at all, as long as the function prototype has been declared first.
Fortunately, the function prototype is really quite easy. It’s merely a copy of the first line of the function definition with a semicolon tacked on the end for good measure. For example, this code calls a function that is defined later, because a prototype has been declared first:
int foo(void); // This is the prototype!
int main(void)
{
int i;
i = foo();
return 0;
}
int foo(void) // this is the definition, just like the prototype!
{
return 3490;
}
You might notice something about the sample code we’ve been using…that is, we’ve been using the good old printf()
function without defining it or declaring a prototype! How do we get away with this lawlessness? We don’t, actually. There is a prototype; it’s in that header file stdio.h
that we included with #include
, remember? So we’re still legit, officer!
Pointers are one of the most feared things in the C language. In fact, they are the one thing that makes this language challenging at all. But why?
Because they, quite honestly, can cause electric shocks to come up through the keyboard and physically weld your arms permanently in place, cursing you to a life at the keyboard in this language from the 70s!
Well, not really. But they can cause huge headaches if you don’t know what you’re doing when you try to mess with them.
Computer memory holds data of all kinds, right? It’ll hold float
s, int
s, or whatever you have. To make memory easy to cope with, each byte of memory is identified by an integer. These integers increase sequentially as you move up through memory. You can think of it as a bunch of numbered boxes, where each box holds a byte31 of data. Or like a big array where each element holds a byte, if you come from a language with arrays. The number that represents each box is called its address.
Now, not all data types use just a byte. For instance, an int
is often four bytes, as is a float
, but it really depends on the system. You can use the sizeof
operator to determine how many bytes of memory a certain type uses.
// %zu is the format specifier for type size_t ("t" is for "type", but
// it's pronounced "size tee"), which is what is returned by sizeof.
// More on size_t later.
"an int uses %zu bytes of memory\n", sizeof(int));
printf(
// That prints "4" for me, but can vary by system.
When you have a data type that uses more than a byte of memory, the bytes that make up the data are always adjacent to one another in memory. Sometimes they’re in order, and sometimes they’re not32, but that’s platform-dependent, and often taken care of for you without you needing to worry about pesky byte orderings.
So anyway, if we can get on with it and get a drum roll and some forboding music playing for the definition of a pointer, a pointer is the address of some data in memory. Imagine the classical score from 2001: A Space Odessey at this point. Ba bum ba bum ba bum BAAAAH!
Ok, so maybe a bit overwrought here, yes? There’s not a lot of mystery about pointers. They are the address of data. Just like an int
can be 12
, a pointer can be the address of data.
This means that all these things mean the same thing:
I’m going to use these interchangeably. And yes, I just threw location in there because you can never have enough words that mean the same thing.
Often, we like to make a pointer to some data that we have stored in a variable, as opposed to any old random data out in memory wherever. Having a pointer to a variable is often more useful.
So if we have an int
, say, and we want a pointer to it, what we want is some way to get the address of that int
, right? After all, the pointer is just the address of the data. What operator do you suppose we’d use to find the address of the int
?
Well, by a shocking suprise that must come as something of a shock to you, gentle reader, we use the address-of
operator (which happens to be an ampersand: “&
”) to find the address of the data. Ampersand.
So for a quick example, we’ll introduce a new format specifier for printf()
so you can print a pointer. You know already how %d
prints a decimal integer, yes? Well, %p
prints a pointer. Now, this pointer is going to look like a garbage number (and it might be printed in hexadecimal33 instead of decimal), but it is merely the index into memory the data is stored in. (Or the index into memory that the first byte of data is stored in, if the data is multi-byte.) In virtually all circumstances, including this one, the actual value of the number printed is unimportant to you, and I show it here only for demonstration of the address-of
operator.
#include <stdio.h>
int main(void)
{
int i = 10;
printf("The value of i is %d, and its address is %p\n", i, &i);
return 0;
}
On my computer, this prints:
The value of i is 10, and its address is 0x7ffda2546fc4
If you’re curious, that hexadecimal number is 140,727,326,896,068 in base 10. That’s the index into memory where the variable i
’s data is stored. It’s the address of i
. It’s the location of i
. It’s a pointer to i
.
It’s a pointer because it lets you know where i
is in memory. Like a literal sign with an arrow on it pointing at a thing, this number indicates to us where in memory we can find the value of i
. It points to i
.
Again, we don’t really care what the number is, generally. We just care that it’s a pointer to i
.
Well, this is all well and good. You can now successfully take the address of a variable and print it on the screen. There’s a little something for the ol’ resume, right? Here’s where you grab me by the scruff of the neck and ask politely what the frick pointers are good for.
Excellent question, and we’ll get to that right after these messages from our sponsor.
ACME ROBOTIC HOUSING UNIT CLEANING SERVICES. YOUR HOMESTEAD WILL BE DRAMATICALLY IMPROVED OR YOU WILL BE TERMINATED. MESSAGE ENDS.
Welcome back to another installment of Beej’s Guide to Whatever. When we met last we were talking about how to make use of pointers. Well, what we’re going to do is store a pointer off in a variable so that we can use it later. You can identify the pointer type because there’s an asterisk (*
) before the variable name and after its type:
int main(void)
{
int i; /* i's type is "int" */
int *p; /* p's type is "pointer to an int", or "int-pointer" */
return 0;
}
Hey, so we have here a variable that is a pointer itself, and it can point to other int
s. We know it points to int
s, since it’s of type int*
(read “int-pointer”).
When you do an assignment into a pointer variable, the type of the right hand side of the assignment has to be the same type as the pointer variable. Fortunately for us, when you take the address-of
a variable, the resultant type is a pointer to that variable type, so assignments like the following are perfect:
int i;
int *p; /* p is a pointer, but is uninitialized and points to garbage */
/* p now "points to" i */ p = &i;
On the left of the assignment, we have a variable of type pointer-to-int
(int*
), and on the right side, we have expression of type address-of-int
(since i
is an int
). But remember that “address” and “pointer” both mean the same thing! The address of a thing is pointer to that thing.
So effectively, both sides of the assignment are type pointer-to-int
(which is the same as type “address-of-int
”, but no one says it that way).
Get it? I know is still doesn’t quite make much sense since you haven’t seen an actual use for the pointer variable, but we’re taking small steps here so that no one gets lost. So now, let’s introduce you to the anti-address-of, operator. It’s kind of like what address-of
would be like in Bizarro World.
Like we’ve said, a pointer, also known as an address, is sometimes also called a reference. How in the name of all that is holy can there be so many terms for exactly the same thing? I don’t know the answer to that one, but these things are all equivalent, and can be used interchangeably.
The only reason I’m telling you this is so that the name of this operator will make any sense to you whatsoever. When you have a pointer to a variable (roughly “a reference to a variable”), you can use the original variable through the pointer by dereferencing the pointer. (You can think of this as “de-pointering” the pointer, but no one ever says “de-pointering”.)
What do I mean by “get access to the original variable”? Well, if you have a variable called i
, and you have a pointer to i
called p
, you can use the dereferenced pointer p
exactly as if it were the original variable i
!
You almost have enough knowledge to handle an example. The last tidbit you need to know is actually this: what is the dereference operator? It is the asterisk, again: *
. Now, don’t get this confused with the asterisk you used in the pointer declaration, earlier. They are the same character, but they have different meanings in different contexts34.
Here’s a full-blown example:
#include <stdio.h>
int main(void)
{
int i;
int *p; // this is NOT a dereference--this is a type "int*"
p = &i; // p now points to i, p holds address of i
i = 10; // i is now 10
*p = 20; // i (yes i!) is now 20!!
printf("i is %d\n", i); // prints "20"
printf("i is %d\n", *p); // "20"! dereference-p is the same as i!
return 0;
}
Remember that p
holds the address of i
, as you can see where we did the assignment to p
. What the dereference operator does is tells the computer to use the variable the pointer points to instead of using the pointer itself. In this way, we have turned *p
into an alias of sorts for i
.
Great, but why? Why do any of this?
Right about now, you’re thinking that you have an awful lot of knowledge about pointers, but absolutely zero application, right? I mean, what use is *p
if you could just simply say i
instead?
Well, my feathered friend, the real power of pointers comes into play when you start passing them to functions. Why is this a big deal? You might recall from before that you could pass all kinds of parameters to functions and they’d be dutifully copied onto the stack, and then you could manipulate local copies of those variables from within the function, and then you could return a single value.
What if you wanted to bring back more than one single piece of data from the function? I mean, you can only return one thing, right? What if I answered that question with another question, like this:
What happens when you pass a pointer as a parameter to a function? Does a copy of the pointer get put on the stack? You bet your sweet peas it does. Remember how earlier I rambled on and on about how EVERY SINGLE PARAMETER gets copied onto the stack and the function uses a copy of the parameter? Well, the same is true here. The function will get a copy of the pointer.
But, and this is the clever part: we will have set up the pointer in advance to point at a variable…and then the function can dereference its copy of the pointer to get back to the original variable! The function can’t see the variable itself, but it can certainly dereference a pointer to that variable! Example!
#include <stdio.h>
void increment(int *p) // note that it accepts a pointer to an int
{
*p = *p + 1; // add one to the thing p points to
}
int main(void)
{
int i = 10;
int *j = &i; // note the address-of; turns it into a pointer
printf("i is %d\n", i); // prints "10"
printf("i is also %d\n", *j); // prints "10"
increment(j);
printf("i is %d\n", i); // prints "11"!
return 0;
}
Ok! There are a couple things to see here…not the least of which is that the increment()
function takes an int*
as a parameter. We pass it an int*
in the call by changing the int
variable i
to an int*
using the address-of
operator. (Remember, a pointer is an address, so we make pointers out of variables by running them through the address-of
operator.)
The increment()
function gets a copy of the pointer on the stack. Both the original pointer j
(in main()
) and the copy of that pointer p
(in increment()
) point to the same address, namely the one holding the value i
. So dereferencing either will allow you to modify the original variable i
! The function can modify a variable in another scope! Rock on!
Pointer enthusiasts will recall from early on in the guide, we used a function to read from the keyboard, scanf()
…and, although you might not have recognized it at the time, we used the address-of
to pass a pointer to a value to scanf()
. We had to pass a pointer, see, because scanf()
reads from the keyboard and stores the result in a variable. The only way it can see that variable that is local to that calling function is if we pass a pointer to that variable:
int i = 0;
"%d", &i); /* pretend you typed "12" */
scanf("i is %d\n", i); /* prints "i is 12" */ printf(
See, scanf()
dereferences the pointer we pass it in order to modify the variable it points to. And now you know why you have to put that pesky ampersand in there!
NULL
PointerAny pointer type can be set to a special value called NULL
. This indicates that this pointer doesn’t point to anything.
int *p;
p = NULL;
Since it doesn’t point to a value, dereferencing it is undefined behavior, and probably will result in a crash:
int *p = NULL;
12; // CRASH or SOMETHING PROBABLY BAD *p =
Despite being called the billion dollar mistake by its creator35, the NULL
pointer is a good sentinel value36 and general indicator that a pointer hasn’t yet been initialized.
(Of course, the pointer points to garbage unless you explicitly assign it to point to an address or NULL
.)
The syntax for declaring a pointer can get a little weird. Let’s look at this example:
int a;
int b;
We can condense that into a single line, right?
int a, b; // Same thing
So a
and b
are both int
s. No problem.
But what about this?
int a;
int *p;
Can we make that into one line? We can. But where does the *
go?
The rule is that the *
goes in front of any variable that is a pointer type. That is. the *
is not part of the int
in this example. it’s a part of variable p
.
With that in mind, we can write this:
int a, *p; // Same thing
It’s important to note that this line does not declare two pointers:
int *p, q; // p is a pointer to an int; q is just an int.
So take a look at this and determine which variables are pointers and which are not:
int *a, b, c, *d, e, *f, g, h, *i;
I’ll drop the answer in a footnote37.
Luckily, C has arrays. I mean, I know it’s considered a low-level language38 but it does at least have the concept of arrays built-in. And since a great many languages drew inspiration from C’s syntax, you’re probably already familiar with using [
and ]
for declaring and using arrays in C.
But only barely! As we’ll find out later, arrays are just syntactic sugar in C—they’re actually all pointers and stuff deep down. Freak out! But for now, let’s just use them as arrays. Phew.
Let’s just crank out an example:
#include <stdio.h>
int main(void)
{
int i;
float f[4]; // Declare an array of 4 floats
f[0] = 3.14159; // Indexing starts at 0, of course.
f[1] = 1.41421;
f[2] = 1.61803;
f[3] = 2.71828;
// Print them all out:
for (i = 0; i < 4; i++) {
printf("%f\n", f[i]);
}
return 0;
}
When you declare an array, you have to give it a size. And the size has to be fixed39.
In the above example, we made an array of 4 floats. The value in the square brackets in the declaration lets us know that.
Later on in subsequent lines, we access the values in the array, setting them or getting them, again with square brackets.
Hopefully this looks familiar from languages you already know!
You can’t. C doesn’t record this information. You have to manage it separately in another variable.
There is a trick to get the number of elements in an array in the scope in which an array is declared. But, generally speaking, this won’t work the way you want if you pass the array into a function.
You can initialize an array with constants ahead of time:
#include <stdio.h>
int main(void)
{
int i;
int a[5] = {22, 37, 3490, 18, 95}; // Initialize with these values
for (i = 0; i < 5; i++) {
printf("%d\n", a[i]);
}
return 0;
}
Catch: initializer values must be constant terms. Can’t throw variables in there. Sorry, Illinois!
You should never have more items in your initializer than there is room for in the array, or the compiler will get cranky:
foo.c: In function ‘main’:
foo.c:6:39: warning: excess elements in array initializer
6 | int a[5] = {22, 37, 3490, 18, 95, 999};
| ^~~
foo.c:6:39: note: (near initialization for ‘a’)
But (fun fact!) you can have fewer items in your initializer than there is room for in the array. The remaining elements in the array will be automatically initialized with zero.
int a[5] = {22, 37, 3490};
// is the same as:
int a[5] = {22, 37, 3490, 0, 0};
It’s a common shortcut to see this in an initializer when you want to set an entire array to zero:
int a[100] = {0};
Which means, “Make the first element zero, and then automatically make the rest zero, as well.”
Lastly, you can also have C compute the size of the array from the initializer, just by leaving the size off:
int a[3] = {22, 37, 3490};
// is the same as:
int a[] = {22, 37, 3490}; // Left the size off!
C doesn’t stop you from accessing arrays out of bounds. It might not even warn you.
Let’s steal the example from above and keep printing off the end of the array. It only has 5 elements, but let’s try to print 10 and see what happens:
#include <stdio.h>
int main(void)
{
int i;
int a[5] = {22, 37, 3490, 18, 95};
for (i = 0; i < 10; i++) { // BAD NEWS: printing too many elements!
printf("%d\n", a[i]);
}
return 0;
}
Running it on my computer prints:
22
37
3490
18
95
32765
1847052032
1780534144
-56487472
21890
Yikes! What’s that? Well, turns out printing off the end of an array results in what C developers call undefined behavior. We’ll talk more about this beast later, but for now it means, “You’ve done something bad, and anything could happen during your program run.”
And by anything, I mean typically things like finding zeroes, finding garbage numbers, or crashing. But really the C spec says in this circumstance the compiler is allowed to emit code that does anything40.
Short version: don’t do anything that causes undefined behavior. Ever41.
You can add as many dimensions as you want to your arrays.
int a[10];
int b[2][7];
int c[4][5][6];
These are stored in memory in row-major order42.
You an also use initializers on multidimensional arrays by nesting them:
#include <stdio.h>
int main(void)
{
int row, col;
int a[2][5] = { // Initialize a 2D array
{0, 1, 2, 3, 4},
{5, 6, 7, 8, 9}
};
for (row = 0; row < 2; row++) {
for (col = 0; col < 5; col++) {
printf("(%d,%d) = %d\n", row, col, a[row][col]);
}
}
return 0;
}
For output of:
(0,0) = 0
(0,1) = 1
(0,2) = 2
(0,3) = 3
(0,4) = 4
(1,0) = 5
(1,1) = 6
(1,2) = 7
(1,3) = 8
(1,4) = 9
[Casually] So… I kinda might have mentioned up there that arrays were pointers, deep down? We should take a shallow dive into that now so that things aren’t completely confusing. Later on, we’ll look at what the real relationship between arrays and pointers is, but for now I just want to look at passing arrays to functions.
I want to tell you a secret. Generally speaking, when a C programmer talks about a pointer to an array, they’re talking about a pointer to the first element of the array43.
So let’s get a pointer to the first element of an array.
#include <stdio.h>
int main(void)
{
int a[5] = {11, 22, 33, 44, 55};
int *p;
p = &a[0]; // p points to the array
// Well, to the first element, actually
printf("%d\n", *p); // Prints "11"
return 0;
}
This is so common to do in C that the language allows us a shorthand:
p = &a[0]; // p points to the array
// is the same as:
p = a; // p points to the array, but much nicer-looking!
Just referring to the array name in isolation is the same as getting a pointer to the first element of the array! We’re going to use this extensively in the upcoming examples.
But hold on a second–isn’t p
an int*
? And *p
gives is 11
, same as a[0]
? Yessss. You’re starting to get a glimpe of how arrays and pointers are related in C.
Let’s do an example with a single dimensional array. I’m going to write a couple functions that we can pass the array to that do different things.
Prepare for some mind-blowing function signatures!
#include <stdio.h>
// Passing as a pointer to the first element
void times2(int *a, int len)
{
for (int i = 0; i < len; i++)
printf("%d\n", a[i] * 2);
}
// Same thing, but using array notation
void times3(int a[], int len)
{
for (int i = 0; i < len; i++)
printf("%d\n", a[i] * 3);
}
// Same thing, but using array notation with size
void times4(int a[5], int len)
{
for (int i = 0; i < len; i++)
printf("%d\n", a[i] * 4);
}
int main(void)
{
int x[5] = {11, 22, 33, 44, 55};
times2(x, 5);
times3(x, 5);
times4(x, 5);
return 0;
}
All those methods of listing the array as a parameter in the function are identical.
void times2(int *a, int len)
void times3(int a[], int len)
void times4(int a[5], int len)
In C, the first is the most common, by far.
And, in fact, in the latter situation, the compiler doesn’t even care what number you pass in (other than it has to be greater than zero44). It doesn’t enforce anything at all.
Now that I’ve said that, the size of the array in the function declaration actually does matter when you’re passing multidimensional arrays into functions, but let’s come back to that.
We’ve said that arrays are just pointers in disguise. This means that if you pass an array to a function, you’re likely passing a pointer to the first element in the array.
But if the function has a pointer to the data, it is able to manipulate that data! So changes that a function makes to an array will be visible back out in the caller.
Here’s an example where we pass a pointer to an array into a function, the function manipulates the values in that array, and those changes are visible out in the caller.
#include <stdio.h>
void double_array(int *a, int len)
{
// Multiple each element by 2
//
// This doubles the values in x in main() since x and a both point
// to the same array in memory!
for (int i = 0; i < len; i++)
a[i] *= 2;
}
int main(void)
{
int x[5] = {1, 2, 3, 4, 5};
double_array(x, 5);
for (int i = 0; i < 5; i++)
printf("%d\n", x[i]); // 2, 4, 6, 8, 10!
return 0;
}
Later when we talk about the equivalence between arrays and pointers, we’ll see how this makes a lot more sense. For now, it’s enough to know that functions can make changes to arrays that are visible out in the caller.
The story changes a little when we’re talking about multidimensional arrays. C needs to know all the dimensions (except the first one) so it has enough information to know where in memory to look to find a value.
Here’s an example where we’re explicit with all the dimensions:
#include <stdio.h>
void print_2D_array(int a[2][3])
{
for (int row = 0; row < 2; row++) {
for (int col = 0; col < 3; col++)
printf("%d ", a[row][col]);
printf("\n");
}
}
int main(void)
{
int x[2][3] = {
{1, 2, 3},
{4, 5, 6}
};
print_2D_array(x);
return 0;
}
But in this case, these two45 are equivalent:
void print_2D_array(int a[2][3])
void print_2D_array(int a[][3])
The compiler really only needs the second dimension so it can figure out how far in memory to skip for each increment of the first dimension.
Also, the compiler does minimal compile-time bounds checking (if you’re lucky), and C does zero runtime checking of bounds. No seat belts! Don’t crash!
Finally! Strings! What could be simpler?
Well, turns out strings aren’t actually strings in C. That’s right! They’re pointers! Of course they are!
Much like arrays, strings in C barely exist.
But let’s check it out—it’s not really such a big deal.
Before we start, let’s talk about constant strings in C. These are sequences of characters in double quotes ("
). (Single quotes enclose characters, and are a different animal entirely.)
Examples:
"Hello, world!\n"
"This is a test."
"When asked if this string had quotes in it, she replied, \"It does.\""
The first one has a newline at the end—quite a common thing to see.
The last one has quotes embedded within it, but you see each is preceded by (we say “escaped by”) a backslash (\
) indicating that a literal quote belongs in the string at this point. This is how the C compiler can tell the difference between printing a double quote and the double quote at the end of the string.
Now that we know how to make a constant string, let’s assign it to a variable so we can do something with it.
char *s = "Hello, world!";
Check out that type: pointer to a char
46. The string variable s
is actually a pointer to the first character in that string, namely the H
.
And we can print it with the %s
(for “string”) format specifier:
char *s = "Hello, world!";
"%s\n", s); // "Hello, world!" printf(
Another option is this, equivalent to the above char*
usage:
char s[14] = "Hello, world!";
// or, if we were properly lazy:
char s[] = "Hello, world!";
This means you can use array notation to access characters in a string. Let’s do exactly that to print all the characters in a string on the same line:
#include <stdio.h>
int main(void)
{
char s[] = "Hello, world!";
for (int i = 0; i < 13; i++)
printf("%c\n", s[i]);
return 0;
}
Note that we’re using the format specifier %c
to print a single character.
Also, check this out. The program will still work fine if we change the definition of s
to be a char*
type:
#include <stdio.h>
int main(void)
{
char *s = "Hello, world!"; // char* here
for (int i = 0; i < 13; i++)
printf("%c\n", s[i]); // But still use arrays here...?
return 0;
}
And we still can use array notation to get the job done when printing it out! This is surprising, but is still only because we haven’t talked about array/pointer equivalence yet. But this is yet another hint that arrays and pointers are the same thing, deep down.
We’ve already seen some examples with initializing string variables with constant strings:
char *s = "Hello, world!";
char t[] = "Hello, again!";
But these two are subtly different.
This one is a pointer to a constant string (i.e. a pointer to the first character in a constant string):
char *s = "Hello, world!";
If you try to mutate that string with this:
char *s = "Hello, world!";
s[0] = 'z'; // BAD NEWS: tried to mutate a constant string!
The behavior is undefined. Probably, depending on your system, a crash will result.
But declaring it as an array is different. This one is a non-constant, mutable copy of the constant string that we can change at will
char t[] = "Hello, again!"; // t is an array copy of the string
0] = 'z'; // No problem
t[
"%s\n", t); // "zello, again!" printf(
So remember: if you have a pointer to a constant string, don’t try to change it!
You can’t, since C doesn’t track it for you. And when I say “can’t”, I actually mean “can”47. There’s a function in <string.h>
called strlen()
that can be used to compute the length of any string.
#include <stdio.h>
#include <string.h>
int main(void)
{
char *s = "Hello, world!";
printf("The string is %zu characters long.\n", strlen(s));
return 0;
}
The strlen()
function returns type size_t
, which is an integer type so you can use it for integer math. We print size_t
with %zu
.
The above program prints:
The string is 13 characters long.
Great! So it is possible to get the string length!
But… if C doesn’t track the length of the string anywhere, how does it know how long the string is?
C does strings a little differently than many programming languages, and in fact differently than almost every modern programming language.
When you’re making a new language, you have basically two options for storing a string in memory:
Store the bytes of the string along with a number indicating the length of the string.
Store the bytes of the string, and mark the end of the string with a special byte called the terminator.
If you want strings longer than 255 characters, option 1 requires at least two bytes to store the length. Whereas option 2 only requires one byte to terminate the string. So a bit of savings there.
Of course, these days is seems ridiculous to worry about saving a byte (or 3—lots of languages will happily let you have strings that are 4 gigabytes in length). But back in the day, it was a bigger deal.
So C took approach #2. In C, a “string” is defined by two basic characteristics:
NUL
character48) somewhere in memory after the pointer that indicates the end of the string.A NUL
character can be written in C code as \0
, though you don’t often have to do this.
When you include a constant string in your code, the NUL
character is automatically, implicitly included.
char *s = "Hello!"; // Actually "Hello!\0" behind the scenes
So with this in mind, let’s write our own strlen()
function that counts characters in a string until it finds a NUL
.
The procedure is to look down the string for a single NUL
character, counting as we go49:
int my_strlen(char *s)
{int count = 0;
while (s[count] != '\0') // Single quotes for single char
count++;
return count;
}
And that’s basically how the built-in strlen()
gets the job done.
You can’t copy a string through the assignment operator (=
). All that does is make a copy of the pointer to the first character… so you end up with two pointers to the same string:
#include <stdio.h>
int main(void)
{
char s[] = "Hello, world!";
char *t;
// This makes a copy of the pointer, not a copy of the string!
t = s;
// We modify t
t[0] = 'z';
// But printing s shows the modification!
// Because t and s point to the same string!
printf("%s\n", s); // "zello, world!"
return 0;
}
If you want to make a copy of a string, you have to copy it a byte at a time—but this is made easier with the strcpy()
function50.
Before you copy the string, make sure you have room to copy it into, i.e. the destination array that’s going to hold the characters needs to be at least as long as the string you’re copying.
#include <stdio.h>
#include <string.h>
int main(void)
{
char s[] = "Hello, world!";
char t[100]; // Each char is one byte, so plenty of room
// This makes a copy of the string!
strcpy(t, s);
// We modify t
t[0] = 'z';
// And s remains unaffected because it's a different string
printf("%s\n", s); // "Hello, world!"
// But t has been changed
printf("%s\n", t); // "zello, world!"
return 0;
}
Notice with strcpy()
, the destination pointer is the first argument, and the source pointer is the second. A mnemonic I use to remember this is that it’s the order you would have put t
and s
if an assignment =
worked for strings.
In C, have something called a struct
, which is a user-definable type that holds multiple pieces of data, potentially of different types.
It’s a convenient way to bundle multiple variables into a single one. This can be beneficial for passing variables to functions (so you just have to pass one instead of many), and useful for organizing data and making code more readable.
If you’ve come from another language, you might be familiar with the idea of classes and objects. These don’t exist in C, natively51. You can think of a struct
as a class with only data members, and no methods.
You can declare a struct
in your code like so:
struct car {
char *name;
float price;
int speed;
};
This is often done at the global scope outside any functions so that the struct
is globally available.
When you do this, you’re making a new type. The full type name is struct car
. (Not just car
—that won’t work.)
There aren’t any variables of that type yet, but we can declare some:
struct car saturn;
And now we have an uninitialized variable saturn
52 of type struct car
.
We should initialize it! But how do we set the values of those individual fields?
Like in many other languages that stole it from C, we’re going to use the dot operator (.
) to access the individual fields.
"Saturn SL/2";
saturn.name = 15999.99;
saturn.price = 175;
saturn.speed =
"Name: %s\n", saturn.name);
printf("Price (USD): %f\n", saturn.price);
printf("Top Speed (km): %d\n", saturn.speed); printf(
That example in the previous section was a little unwieldy. There must be a better way to initialize that struct
variable!
You can do it with an initializer by putting values in for the fields in the order they appear in the struct
when you define the variable. (This won’t work after the variable has been defined—it has to happen in the definition).
struct car {
char *name;
float price;
int speed;
};
// Now with an initializer! Same field order as in the struct declaration:
struct car saturn = {"Saturn SL/2", 16000.99, 175};
"Name: %s\n", saturn.name);
printf("Price: %f\n", saturn.price);
printf("Top Speed: %d km\n", saturn.speed); printf(
The fact that the fields in the initializer need to be in the same order is a little freaky. If someone changes the order in struct car
, it could break all the other code!
We can be more specific with our initializers:
struct car saturn = {.speed=172, .name="Saturn SL/2"};
Now it’s independent of the order in the struct
declaration. Which is safer code, for sure.
Similar to array initializers, any missing field designators are initialized to zero (in this case, that would be .price
, which I’ve omitted).
You can do a couple things to pass a struct
to a function.
struct
.struct
.Recall that when you pass something to a function, a copy of that thing gets made for the function to operate on, whether it’s a copy of a pointer, an int
, a struct
, or anything.
There are basically two cases when you’d want to pass a pointer to the struct
:
struct
that was passed in, and have those changes show in the caller.struct
is somewhat large and it’s more expensive to copy that onto the stack than it is to just copy a pointer53For those two reasons, it’s far more common to pass a pointer to a struct
to a function.
Let’s try that, making a function that will allow you to set the .price
field of the struct car
:
struct car {
char *name;
float price;
int speed;
};
int main(void)
{
struct car saturn = {.speed=175, .name="Saturn SL/2"};
// Pass a pointer to this struct car, along with a new,
// more realistic, price:
set_price(&saturn, 800.00);
// ... code continues ...
You should be able to come up with the function signature for set_price()
just by looking at the types of the arguments we have there.
saturn
is a struct car
, so &saturn
must be the address of the struct car
, AKA a pointer to a struct car
, namely a struct car*
.
And 800.0
is a float
.
So the function declaration must look like this:
void set_price(struct car *c, float new_price)
We just need to write the body. One attempt might be:
void set_price(struct car *c, float new_price) {
// ERROR!!
c.price = new_price; }
That won’t work because the dot operator only works on struct
s… it doesn’t work on pointers to struct
s.
Ok, so we can dereference the struct
to de-pointer it to get to the struct
itself. Dereferencing a struct car*
results in the struct car
that the pointer points to, which we should be able to use the dot operator on:
void set_price(struct car *c, float new_price) {
// Works, but non-idiomatic :(
(*c).price = new_price; }
And that works! But it’s a little clunky to type all those parens and the asterisk. C has some syntactic sugar called the arrow operator that helps with that.
void set_price(struct car *c, float new_price) {
// (*c).price = new_price; // Works, but non-idiomatic :(
//
// The line above is 100% equivalent to the one below:
// That's the one!
c->price = new_price; }
The arrow operator helps refer to fields in pointers to struct
s.
So when accessing fields. when do we use dot and when do we use arrow?
struct
, use dot (.
).struct
, use arrow (->
).typedef
: Making New TypesWell, not so much making new types as getting new names for existing types. Sounds kinda pointless on the surface, but we can really use this to make our code cleaner.
typedef
in TheoryBasically, you take an existing type and you make an alias for it with typedef
.
Like this:
typedef int antelope; // Make "antelope" an alias for "int"
10; // Type "antelope" is the same as type "int" antelope x =
You can take any existing type and do it. You can even make a number of types with a comma list:
typedef int antelope, bagel, mushroom; // These are all "int"
That’s really useful, right? That you can type mushroom
instead of int
? You must be super excited about this feature!
OK, Professor Sarcasm—we’ll get to some more common applications of this in a moment.
typedef
follows regular scoping rules.
For this reason, it’s quite common to find typedef
at file scope (“global”) so that all functions can use the new types at will.
typedef
in PracticeSo renaming int
to something else isn’t that exciting. Let’s see where typedef
commonly makes an appearance.
typedef
and struct
sSometimes a struct
will be typedef
’d to a new name so you don’t have to type the word struct
over and over.
struct animal {
char *name;
int leg_count, speed;
};
// original name new name
// | |
// v v
// |-----------| |----|
typedef struct animal animal;
struct animal y; // This works
// This also works because "animal" is an alias animal z;
Personally, I don’t care for this practice. I like the clarity the code has when you add the word struct
to the type; programmers know what they’re getting. But it’s really common so I’m including it here.
Now I want to run the exact same example in a way that you might commonly see. We’re going to put the struct animal
in the typedef
. You can mash it all together like this:
// original name
// |
// v
// |-----------|
typedef struct animal {
char *name;
int leg_count, speed;
// <-- new name
} animal;
struct animal y; // This works
// This also works because "animal" is an alias animal z;
That’s exactly the same as the previous example, just more concise.
But that’s not all! There’s another common shortcut that you might see in code using what are called anonymous structures54. It turns out you don’t actually need to name the structure in a variety of places, and with typedef
is one of them.
Let’s do the same example with an anonymous structure:
// anonymous struct!
// |
// v
// |----|
typedef struct {
char *name;
int leg_count, speed;
// <-- new name
} animal;
//struct animal y; // ERROR: this no longer works
// This works because "animal" is an alias animal z;
As another example, we might find something like this:
typedef struct {
int x, y;
} point;
20, .y=40};
point p = {.x=
"%d, %d\n", p.x, p.y); // 20, 10 printf(
typedef
and Other TypesIt’s not that using typedef
with a simple type like int
is completely useless… it helps you abstract the types to make it easier to change them later.
For example, if you have float
all over your code in 100 zillion places, it’s going to be painful to change them all to double
if you find you have to do that later for some reason.
But if you prepared a little with:
typedef float app_float;
// and
app_float f1, f2, f3;
Then if later you want to change to another type, like long double
, you just nee to change the typedef
:
// voila!
// |---------|
typedef long double app_float;
// and
// Now these are all long doubles app_float f1, f2, f3;
typedef
and PointersYou can make a type that is a pointer.
typedef int *intptr;
int a = 10;
// "intptr" is type "int*" intptr x = &a;
I really don’t like this practice. It hides the fact that x
is a pointer type because you don’t see a *
in the declaration.
IMHO, it’s better to explicitly show that you’re declaring a pointer type so that other devs can clearly see it and don’t mistake x
for having a non-pointer type.
typedef
and CapitalizationI’ve seen all kinds of capitalization on typedef
.
typedef struct {
int x, y;
// lower snake case
} my_point;
typedef struct {
int x, y;
// CamelCase
} MyPoint;
typedef struct {
int x, y;
// Leading uppercase
} Mypoint;
typedef struct {
int x, y;
// UPPER SNAKE CASE } MY_POINT;
The C99 specification doesn’t dictate one way or another, and shows examples in all uppercase and all lowercase.
K&R2 uses leading uppercase predominantly, but show some examples in uppercase and snake case (with _t
).
If you have a style guide in use, stick with it. If you don’t, grab one and stick with it.
Time to get more into it with a number of new pointer topics! If you’re not up to speed with pointers, check out the first section in the guide on the matter.
Turns out you can do math on pointers, notably addition and subtraction.
But what does it mean when you do that?
In short, if you have a pointer to a type, adding one to the pointer moves to the next item of that type directly after it in memory.
It’s important to remember that as we move pointers around and look at different places in memory, we need to make sure that we’re always pointing to a valid place in memory before we dereference. If we’re off in the weeds and we try to see what’s there, the behavior is undefined and a crash is a common result.
This is a little chicken-and-eggy with Array/Pointer Equivalence, below, but we’re going to give it a shot, anyway.
First, let’s take an array of numbers.
int a[5] = {11, 22, 33, 44, 55};
Then let’s get a pointer to the first element in that array:
int a[5] = {11, 22, 33, 44, 55};
int *p = &a[0]; // Or "int *p = a;" works just as well
The let’s print the value there by dereferencing the pointer:
"%d\n", *p); // Prints 11 printf(
Now let’s use pointer arithmetic to print the next element in the array, the one at index 1:
"%d\n", *(p + 1)); // Prints 22!! printf(
What happened there? C knows that p
is a pointer to an int
. So it knows the sizeof
an int
55 and it knows to skip that many bytes to get to the next int
after the first one!
In fact, the prior example could be written these two equivalent ways:
"%d\n", *p); // Prints 11
printf("%d\n", *(p + 0)); // Prints 11 printf(
because adding 0
to a pointer results in the same pointer.
Let’s think of the upshot here. We can iterate over elements of an array this way instead of using an array:
int a[5] = {11, 22, 33, 44, 55};
int *p = &a[0]; // Or "int *p = a;" works just as well
for (int i = 0; i < 5; i++) {
"%d\n", *(p + i)); // Same as p[i]!
printf( }
And that works the same as if we used array notation! Oooo! Getting closer to that array/pointer equivalence thing! More on this later in this chapter.
But what’s actually happening, here? How do it work?
Remember from early on that memory is like a big array, where a byte is stored at each array index.
And the array index into memory has a few names:
So a point is an index into memory, somewhere.
For a random example, say that a number 3490 was stored at address (“index”) 23,237,489,202. If we have an int
pointer to that 3490, that value of that pointer is 23,237,489,202… because the pointer is the memory address. Different words for the same thing.
And now let’s say we have another number, 4096, stored right after the 3490 at address 23,237,489,210 (8 higher than the 3490 because each int
in this example is 8 bytes long).
If we add 1
to that pointer, it actually jumps ahead sizeof(int)
bytes to the next int
. It knows to jump that far ahead because it’s an int
pointer. If it were a float
pointer, it’d jump sizeof(float)
bytes ahead to get to the next float!
So you can look at the next int
, by adding 1
to the pointer, the one after that by adding 2
to the pointer, and so on.
We saw how we could add an integer to a pointer in the previous section. This time, let’s modify the pointer, itself.
You can just add (or subtract) integer values directly to (or from) any pointer!
Let’s do that example again, except with a couple changes. First, I’m going to add a 999
to the end of our numbers to act as a sentinel value. This will let us know where the end of the data is.
int a[] = {11, 22, 33, 44, 55, 999}; // Add 999 here as a sentinel
int *p = &a[0]; // p points to the 11
And we also have p
pointing to the element at index 0
of a
, namely 11
, just like before.
Now—let’s starting incrementing p
so that it points at subsequent elements of the array. We’ll do this until p
points to the 999
; that is, we’ll do it until *p == 999
:
while (*p != 999) { // While the thing p points to isn't 999
"%d\n", *p); // Print it
printf(// Move p to point to the next int!
p++; }
Pretty crazy, right?
When we give it a run, first p
points to 11
. Then we increment p
, and it points to 22
, and then again, it points to 33
. And so on, until it points to 999
and we quit.
You can subtract a value from a pointer to get to earlier address, as well, just like we were adding to them before.
But we can also subtract two pointers to find the difference between them, e.g. we can calculate how many int
s there are between two int*
s. The catch is that this only works within a single array56—if the pointers point to anything else, you get undefined behavior.
Remember how strings are char*
s in C? Let’s see if we can use this to write another variant of strlen()
to compute the length of a string that utilizes pointer subtraction.
The idea is that if we have a pointer to the beginning of the string, we can find a pointer to the end of the string by scanning ahead for the NUL
character.
And if we have a pointer to the beginning of the string, and we computed the pointer to the end of the string, we can just subtract the two pointers to come up with the length!
#include <stdio.h>
int my_strlen(char *s)
{
// Start scanning from the beginning of the string
char *p = s;
// Scan until we find the NUL character
while (*p != '\0')
p++;
// Return the difference in pointers
return p - s;
}
int main(void)
{
printf("%d\n", my_strlen("Hello, world!")); // Prints "13"
return 0;
}
Remember that you can only use pointer subtraction between two pointers that point to the same array!
We’re finally ready to talk about this! We’ve seen plenty of examples of places where we’ve intermixed array notation, but let’s give out the fundamental formula of array/pointer equivalence:
a[b] == *(a + b)
Study that! Those are equivalent and can be used interchangeably!
I’ve oversimplified a bit, because in my above example a
and b
can both be expressions, and we might want a few more parentheses to force order of operations in case the expressions are complex.
The spec is specific, as always, declaring (in C99 §6.5.2.1¶2):
E1[E2]
is identical to(*((E1)+(E2)))
but that’s a little harder to grok. Just make sure you include parentheses if the expressions are complicated so all your math happens in the right order.
This means we can decide if we’re going to use array or pointer notation for any array or pointer (assuming it points to an element of an array).
Let’s use an array and pointer with both array and pointer notation:
#include <stdio.h>
int main(void)
{
int a[] = {11, 22, 33, 44, 55}; // Add 999 here as a sentinel
int *p = a; // p points to the first element of a, 11
// Print all elements of the array a variety of ways:
for (int i = 0; i < 5; i++)
printf("%d\n", a[i]); // Array notation with a
for (int i = 0; i < 5; i++)
printf("%d\n", p[i]); // Array notation with p
for (int i = 0; i < 5; i++)
printf("%d\n", *(a + i)); // Pointer notation with a
for (int i = 0; i < 5; i++)
printf("%d\n", *(p + i)); // Pointer notation with p
for (int i = 0; i < 5; i++)
printf("%d\n", *(p++)); // Moving pointer p
//printf("%d\n", *(a++)); // Moving array variable a--ERROR!
return 0;
}
So you can see that in general, if you have an array variable, you can use pointer or array notion to access elements. Same with a pointer variable.
The one big difference is that you can modify a pointer to point to a different address, but you can’t do that with an array variable.
This is where you’ll encounter this concept the most, for sure.
If you have a function that takes a pointer argument, e.g.:
int my_strlen(char *s)
this means you can pass either an array or a pointer to this function and have it work!
char s[] = "Antelopes";
char *t = "Wombats";
"%d\n", my_strlen(s)); // Works!
printf("%d\n", my_strlen(t)); // Works, too! printf(
And it’s also why these two function signatures are equivalent:
int my_strlen(char *s) // Works!
int my_strlen(char s[]) // Works, too!
void
PointersYou’ve already seen the void
keyword used with functions, but this is an entirely separate, unrelated animal.
Sometimes it’s useful to have a pointer to a thing that you don’t know the type of.
I know. Bear with me just a second.
Let’s look at an example, the built-in memcpy()
function:
void *memcpy(void *s1, void *s2, size_t n);
This function copies n
bytes of memory starting from address s1
into the memory starting at address s2
.
But look! s1
and s2
are void*
s! Why? What does it mean? Let’s run more examples to see.
For instance, we could copy a string with memcpy()
(though strcpy()
is more appropriate for strings):
#include <stdio.h>
#include <string.h>
int main(void)
{
char s[] = "Goats!";
char t[100];
memcpy(t, s, 7); // Copy 7 bytes--including the NUL terminator!
printf("%s\n", t); // "Goats!"
return 0;
}
Or we can copy some int
s:
#include <stdio.h>
#include <string.h>
int main(void)
{
int a[] = {11, 22, 33};
int b[3];
memcpy(b, a, 3 * sizeof(int)); // Copy 3 ints of data
printf("%d\n", b[1]); // 22
return 0;
}
That one’s a little wild—you see what we did there with memcpy()
? We copied the data from a
to b
, but we had to specify how many bytes to copy, and an int
is more than one byte.
OK, then—how many bytes does an int
take? Answer: depends on the system. But we can tell how many bytes any type takes with the sizeof
operator.
So there’s the answer: an int
takes sizeof(int)
bytes of memory to store.
And if we have 3 of them in our array, like we did in that example, the entire space used for the 3 int
s must be 3 * sizeof(int)
.
(In the string example, earlier, it would have been more technically accurate to copy 7 * sizeof(char)
bytes. But char
s are always one byte large, by definition, so that just devolves into 7 * 1
.)
We could even copy a float
or a struct
with memcpy()
! (Though this is abusive—we should just use =
for that):
struct antelope my_antelope;
struct antelopy my_clone_antelope;
// ...
sizeof my_antelope); memcpy(&my_clone, &my_antelope,
Look at how versatile memcpy()
is! If you have a pointer to a source and a pointer to a destination, and you have the number of bytes you want to copy, you can copy any type of data.
That’s the power of void*
. You can write code that doesn’t care about the type and is able to do things with it.
But with great power comes great responsibility. Maybe not that great in this case, but there are some limits.
void*
.void*
.void*
, since it’s also a deference.void*
, since it’s also a dereference, as well57.And if you think about it, these rules make sense. All those operations rely on knowing the sizeof
the type of data pointed to, and with void*
, we don’t know the size of the data being pointed to—it could be anything!
But wait—if you can’t dereference a void*
what good can it ever do you?
Like with memcpy()
, it helps you write generic functions that can handle multiple types of data. But the secret is that, deep down, you convert the void*
to another type before you use it!
And conversion is easy: you can just assign into a variable of the desired type58.
char a = 'X'; // A single char
void *p = &a; // p points to the 'X'
char *q = p; // q also points to the 'X'
"%c\n", *p); // ERROR--cannot dereference void*!
printf("%c\n", *q); // Prints "X" printf(
Let’s write our own memcpy()
to try this out. We can copy bytes (char
s), and we know the number of bytes because it’s passed in.
void *my_memcpy(void *dest, void *src, int byte_count)
{// Convert void*s to char*s
char *s = src, *d = dest;
// Now that we have char*s, we can dereference and copy them
while (byte_count--) {
*d++ = *s++;
}
// Most of these functions return the destination, just in case
// that's useful to the caller.
return dest;
}
Right there at the beginning, we copy the void*
s into char*
s so that we can use them as char*
s. It’s as easy as that.
Then some fun in a while loop, where we decrement byte_count
until it becomes false (0
). Remember that with post-decrement, the value of the expression is computed (for while
to use) and then the variable is decremented.
And some fun in the copy, where we assign *d = *s
to copy the byte, but we do it with post-increment so that both d
and s
move to the next byte after the assignment is made.
Lastly, most memory and string functions return a copy of a pointer to the destination string just in case the caller wants to use it.
Now that we’ve done that, I just want to quickly point out that we can use this technique to iterate over the bytes of any object in C, float
s, struct
s, or anything!
Let’s run one more real-world example with the built-in qsort()
routine that can sort anything thanks to the magic of void*
s.
(In the following example, you can ignore the word const
, which we haven’t covered yet.)
#include <stdio.h>
#include <stdlib.h>
// The type of structure we're going to sort
struct animal {
char *name;
int leg_count;
};
// This is a comparison function called by qsort() to help it determine
// what exactly to sort by. We'll use it to sort an array of struct
// animals by leg_count.
int compar(const void *elem1, const void *elem2)
{
// We know we're sorting struct animals, so let's make both
// arguments pointers to struct animals
const struct animal *animal1 = elem1;
const struct animal *animal2 = elem2;
// Return <0 =0 or >0 depending on whatever we want to sort by.
// Let's sort ascending by leg_count, so we'll return the difference
// in the leg_counts
return animal1->leg_count - animal2->leg_count;
}
int main(void)
{
// Let's build an array of 4 struct animals with different
// characteristics. This array is out of order by leg_count, but
// we'll sort it in a second.
struct animal a[4] = {
{.name="Dog", .leg_count=4},
{.name="Monkey", .leg_count=2},
{.name="Antelope", .leg_count=4},
{.name="Snake", .leg_count=0}
};
// Call qsort() to sort the array. qsort() needs to be told exactly
// what to sort this data by, and we'll do that inside the compar()
// function.
//
// This call is saying: qsort array a, which has 4 elements, and
// each element is sizeof(struct animal) bytes big, and this is the
// function that will compare any two elements.
qsort(a, 4, sizeof(struct animal), compar);
// Print them all out
for (int i = 0; i < 4; i++) {
printf("%d: %s\n", a[i].leg_count, a[i].name);
}
return 0;
}
As long as you give qsort()
a function that can compare two items that you have in your array to be sorted, it can sort anything. And it does this without needing to have the types of the items hardcoded in there anywhere. qsort()
just rearranges blocks of bytes based on the results of the compar()
function you passed in.
Scope is all about what variables are visible in what contexts.
This is the scope of almost all the variables devs define. It includes what other languages might call “function scope”, i.e. variables that are declared inside functions.
The basic rule is that if you’ve declared a variable in a block delimited by squirrelly braces, the scope of that variable is that block.
If there’s a block inside a block, then variables declared in the inner block are local to that block, and cannot be seen in the outer scope.
Once a variable’s scope ends, that variable can no longer be referenced, and you can consider its value to be gone into the great bit bucket59 in the sky.
An example with nested scope:
int main(void)
{
int a = 12; // Local to outer block, but visible in inner block
if (a == 12) {
int b = 99; // Local to inner block, not visible in outer block
printf("%d %d\n", a, b); // OK: "12 99"
}
printf("%d\n", a); // OK, we're still in a's scope
printf("%d\n", b); // ILLEGAL, out of b's scope
}
Another fun fact is that you can define variables anywhere in the block, within reason—they have the scope of that block, but cannot be used before they are defined.
#include <stdio.h>
int main(void)
{
int i = 0;
printf("%d\n", i); // OK: "0"
//printf("%d\n", j); // ILLEGAL--can't use j before it's defined
int j = 5;
printf("%d %d\n", i, j); // OK: "0 5"
return 0;
}
Historically, C required all the variables be defined before any code in the block, but this is no longer the case in the C99 standard.
If you have a variable named the same thing at an inner scope as one at an outer scope, the one at the inner scope takes precedence at long as you’re running in the inner scope. That is, it hides the one at outer scope for th duration of its lifetime.
#include <stdio.h>
int main(void)
{
int i = 10;
{
int i = 20;
printf("%d\n", i); // Inner scope i, 20 (outer i is hidden)
}
printf("%d\n", i); // Outer scope i, 10
return 0;
}
You might have noticed in that example that I just threw a block in there at line 7, not so much as a for
or if
statement to kick it off! This is perfectly legal. Sometimes a dev will want to group a bunch of local variables together for a quick computation and will do this, but it’s rare to see.
If you define a variable outside of a block, that variable has file scope. It’s visible in all functions in the file that come after it, and shared between them. (An exception is if a block defines a variable of the same name, it would hide the one at file scope.)
This is closest to what you would consider to be “global” scope in another language.
For example:
#include <stdio.h>
int shared = 10; // File scope! Visible to the whole file after this!
void func1(void)
{
shared += 100; // Now shared holds 110
}
void func2(void)
{
printf("%d\n", shared); // Prints "10"
}
int main(void)
{
func1();
func2();
return 0;
}
Note that if shared
were declared at the bottom of the file, it wouldn’t compile. It has to be declared before any functions use it.
for
-loop ScopeI really don’t know what to call this, as C99 §6.8.5.3¶1 doesn’t give it a proper name. We’ve done it already a few times in this guide, as well. It’s when you declare a variable inside the first clause of a for
-loop:
for (int i = 0; i < 10; i++)
"%d\n", i);
printf(
"%d\n", i); // ILLEGAL--i is only in scope for the for-loop printf(
In that example, i
’s lifetime begins the moment it is defined, and continues for the duration of the loop.
If the loop body is enclosed in a block, the variables defined in the for
-loop are visible from that inner scope.
Unless, of course, that inner scope hides them. This crazy example prints 999
five times:
#include <stdio.h>
int main(void)
{
for (int i = 0; i < 5; i++) {
int i = 999; // Hides the i in the for-loop scope
printf("%d\n", i);
}
return 0;
}
The C spec does refer to function scope, but it’s used exclusively with labels, something we haven’t discussed yet. More on that another day.
We’re used to char
, int
, and float
types, but it’s now time to take that stuff to the next level and see what else we have out there in the types department!
So far we’ve used int
as a signed type, that is, a value that can be either negative or positive. But C also has specific unsigned integer types that can only hold positive numbers.
These types are prefaced by the keyword unsigned
.
int a; // signed
signed int a; // signed
signed a; // signed, "shorthand" for "int" or "signed int", rare
unsigned int b; // unsigned
unsigned c; // unsigned, shorthand for "unsigned int"
Why? Why would you decide you only wanted to hold positive numbers?
Answer: you can get larger numbers in an unsigned variable than you can in a signed ones.
But why is that?
You can think of integers being represented by a certain number of bits60. On my computer, an int
is represented by 64 bits.
And each permutation of bits that are either 1
or 0
represents a number. We can decide how to divvy up these numbers.
With signed numbers, we use (roughly) half the permutations to represent negative numbers, and the other half to represent positive numbers.
With unsigned, we use all the permutations to represent positive numbers.
On my computer with 64-bit int
s using two’s complement61 to represent unsigned numbers, I have the following limits on integer range:
Type | Minimum | Maximum |
---|---|---|
int |
-9,223,372,036,854,775,808 |
9,223,372,036,854,775,807 |
unsigned int |
0 |
18,446,744,073,709,551,615 |
Notice that the largest positive unsigned int
is approximately twice as large as the largest positive int
. So you can get some flexibility there.
Remember char
? The type we can use to hold a single character?
char c = 'B';
"%c\n", c); // "B" printf(
I have a shocker for you: it’s actually an integer.
char c = 'B';
// Change this from %c to %d:
"%d\n", c); // 66 (!!) printf(
Deep down, char
is just a small int
, namely an integer that uses just a single byte of space, limiting its range to…
Here the C spec gets just a little funky. It assures us that a char
is a single byte, i.e. sizeof(char) == 1
. But then in C99 §3.6¶3 it goes out of its way to say:
A byte is composed of a contiguous sequence of bits, the number of which is implementation-defined.
Wait—what? Some of you might be used to the notion that a byte is 8 bits, right? I mean, that’s what it is, right? And the answer is, “Almost certainly.”62 But C is an old language, and machines back in the day had, shall we say, a more relaxed opinion over how many bits were in a byte. And through the years, C has retained this flexibility.
But assuming your bytes in C are 8 bits, like they are for virtually all machines in the world that you’ll ever see, the range of a char
is…
—So before I can tell you, it turns out that char
s might be signed or unsigned depending on your compiler. Unless you explicitly specify.
In many cases, just having char
is fine because you don’t care about the sign of the data. But if you need signed or unsigned char
s, you must be specific:
char a; // Could be signed or unsigned
signed char ab // Definitely signed
unsigned char c; // Definitely unsigned
OK, now, finally, we can figure out the range of numbers if we assume that a char
is 8 bits and your system uses the virtually universal two’s complement representation for signed and unsigned63.
So, assuming those constraints, we can finally figure our ranges:
char type |
Minimum | Maximum |
---|---|---|
signed char |
-128 |
127 |
unsigned char |
0 |
255 |
And the ranges for char
are implementation-defined.
Let me get this straight. char
is actually a number, so can we do math on it?
Yup! Just remember to keep things in the range of a char
!
What about those constant characters in single quotes, like 'B'
? How does that have a numeric value?
The spec is also hand-wavey here, since C isn’t designed to run on a single type of underlying system.
But let’s just assume for the moment that your character set is based on ASCII64 for at least the first 128 characters. In that case, the character constant will be converted to a char
whose value is the same as the ASCII value of the character.
That was a mouthful. Let’s just have an example:
#include <stdio.h>
int main(void)
{
char a = 10;
char b = 'B'; // ASCII value 66
printf("%d\n", a + b); // 76!
return 0;
}
This depends on your execution environment and the character set used65. One of the most popular character sets today is Unicode66 (which is a superset of ASCII), so for your basic 0-9, A-Z, a-z and punctuation, you’ll almost certainly get the ASCII values out of them.
short
, long
, long long
So far we’ve just generally been using two integer types:
char
int
and we recently learned about the unsigned variants of the integer types. And we learned that char
was secretly a small int
in disguise. So we know the int
s can come in multiple bit sizes.
But there are a couple more integer types we should look at, and the minimum minimum and maximum values they can hold.
Yes, I said “minimum” twice. The spec says that these types will hold numbers of at least these sizes, so your implementation might be different. The header file <limits.h>
defines macros that hold the minimum and maximum integer values; rely on that to be sure, and never hardcode or assume these values.
These additional types are short int
, long int
, and long long int
. Commonly, when using these types, C developers leave the int
part off (e.g. long long
), and the compiler is perfectly happy.
// These two lines are equivalent:
long long int x;
long long x;
// And so are these:
short int x;
short x;
Let’s take a look at the integer data types and sizes in ascending order, grouped by signedness.
Type | Minimum Bytes | Minimum Value | Maximum Value |
---|---|---|---|
char |
1 | -127 or 0 | 127 or 25567 |
signed char |
1 | -127 | 127 |
short |
2 | -32767 | 32767 |
int |
2 | -32767 | 32767 |
long |
4 | -2147483647 | 2147483647 |
long long |
8 | -9223372036854775807 | 9223372036854775807 |
unsigned char |
1 | 0 | 255 |
unsigned short |
2 | 0 | 65535 |
unsigned int |
2 | 0 | 65535 |
unsigned long |
4 | 0 | 44294967295 |
unsigned long long |
8 | 0 | 9223372036854775807 |
There is no long long long
type. You can’t just keep adding long
s like that. Don’t be silly.
Two’s complement fans might have noticed something funny about those numbers. Why does, for example, the
signed char
stop at -127 instead of -128? Remember: these are only the minimums required by the spec. Some number representations (like sign and magnitude68) top off at ±127.
Let’s run the same table on my 64-bit, two’s complement system and see what comes out:
Type | My Bytes | Minimum Value | Maximum Value |
---|---|---|---|
char |
1 | -128 | 12769 |
signed char |
1 | -128 | 127 |
short |
2 | -32768 | 32767 |
int |
4 | -2147483648 | 2147483647 |
long |
8 | -9223372036854775808 | 9223372036854775807 |
long long |
8 | -9223372036854775808 | 9223372036854775807 |
unsigned char |
1 | 0 | 255 |
unsigned short |
2 | 0 | 65535 |
unsigned int |
4 | 0 | 4294967295 |
unsigned long |
8 | 0 | 18446744073709551615 |
unsigned long long |
8 | 0 | 18446744073709551615 |
That’s a little more sensible, but we can see how my system has larger limits than the minimums in the specification.
So what are the macros in <limits.h>
?
Type | Min Macro | Max Macro |
---|---|---|
char |
CHAR_MIN |
CHAR_MAX |
signed char |
SCHAR_MIN |
SCHAR_MAX |
short |
SHRT_MIN |
SHRT_MAX |
int |
INT_MIN |
INT_MAX |
long |
LONG_MIN |
LONG_MAX |
long long |
LLONG_MIN |
LLONG_MAX |
unsigned char |
0 |
UCHAR_MAX |
unsigned short |
0 |
USHRT_MAX |
unsigned int |
0 |
UINT_MAX |
unsigned long |
0 |
ULONG_MAX |
unsigned long long |
0 |
ULLONG_MAX |
Notice there’s a way hidden in there to determine if a system uses signed or unsigned char
s. If CHAR_MAX == UCHAR_MAX
, it must be unsigned.
Also notice there’s no minimum macro for the unsigned
variants—they’re just 0
.
double
and long double
Let’s see what the C99 spec has to say about floating point numbers in §5.2.4.2.2¶1-2:
The following parameters are used to define the model for each floating-point type:
Parameter Definition \(s\) sign (\(\pm1\)) \(b\) base or radix of exponent representation (an integer \(> 1\)) \(e\) exponent (an integer between a minimum \(e_{min}\) and a maximum \(e_{max}\)) \(p\) precision (the number of base-\(b\) digits in the significand) \(f_k\) nonnegative integers less than \(b\) (the significand digits) A floating-point number (\(x\)) is defined by the following model:
\(x=sb^e\sum\limits_{k=1}^p f_kb^{-k},\) \(e_{min}\le e\le e_{max}\)
I hope that cleared it right up for you.
Okay, fine. Let’s step back a bit and see what’s practical.
Note: we refer to a bunch of macros in this section. They can be found in the header <float.h>
.
Floating point number are encoded in a specific sequence of bits (IEEE-754 format70 is tremendously popular) in bytes.
Diving in a bit more, the number is basically represented as the significand (which is the number part—the significant digits themselves, also sometimes referred to as the mantissa) and the exponent, which is what power to raise the digits to. Recall that a negative exponent can make a number smaller.
Imagine we’re using \(10\) as a number to raise by an exponent. We could represent the following numbers by using a significand of \(12345\), and exponents of \(-3\), \(4\), and \(0\) to encode the following floating point values:
\(12345\times10^{-3}=12.345\)
\(12345\times10^4=123450000\)
\(12345\times10^0=12345\)
For all those numbers, the significand stays the same. The only difference is the exponent.
On your machine, the base for the exponent is probably \(2\), not \(10\), since computers like binary. You can check it by printing the FLT_RADIX
macro.
So we have a number that’s represented by a number of bytes, encoded in some way. Because there are a limited number of bit patterns, a limited number of floating point numbers can be represented.
But more particularly, only a certain number of significant decimal digits can be represented accurately.
How can you get more? You can use larger data types!
And we have a couple of them. We know about float
already, but for more precision we have double
. And for even more precision, we have long double
(unrelated to long int
except by name).
The spec doesn’t go into how many bytes of storage each type should take, but on my system, we can see the relative size increases:
Type | sizeof |
---|---|
float |
4 |
double |
8 |
long double |
16 |
So each of the types (on my system) uses those additional bits for more precision.
But how much precision are we talking, here? How many decimal numbers can be represented by these values?
Well, C provides us with a bunch of macros in <float.h>
to help us figure that out.
It gets a little wonky if you are using a base-2 (binary) system for storing the numbers (which is virtually everyone on the planet, probably including you), but bear with me while we figure it out.
The million dollar question is, “How many significant decimal digits can I store in a given floating point type before the floating point precision runs out?”
But it’s not quite so easy to answer. So we’ll do it in two ways.
The number of decimal digits you can store in a floating point type and surely get the same number back out when you print it is given by these macros:
Type | Decimal Digits You Can Store | Minimum |
---|---|---|
float |
FLT_DIG |
6 |
double |
DBL_DIG |
10 |
long double |
LDBL_DIG |
10 |
On my system, FLT_DIG
is 6, so I can be sure that if I print out a 6 digit float
, I’ll get the same thing back. (It could be more—some numbers will come back correctly with more digits. But 6 is definitely coming back.)
For example, printing out float
s following this pattern of increasing digits, we apparently make it to 8 digits before something goes wrong, but after that we’re back to 7 correct digits.
0.12345
0.123456
0.1234567
0.12345678
0.123456791 <-- Things start going wrong
0.1234567910
Let’s do another demo. In this code we’ll have two float
s that both hold numbers that have FLT_DIG
significant decimal digits71. Then we add those together, for what should be 12 significant decimal digits. But that’s more than we can store in a float
and correctly recover as a string—so we see when we print it out, things start going wrong after the 7th significant digit.
#include <stdio.h>
#include <float.h>
int main(void)
{
// Both these numbers have 6 significant digits, so they can be
// stored accurately in a float:
float f = 3.14159f;
float g = 0.00000265358f;
printf("%.5f\n", f); // 3.14159 -- correct!
printf("%.11f\n", g); // 0.00000265358 -- correct!
// Now add them up
f += g; // 3.14159265358 is what f _should_ be
printf("%.11f\n", f); // 3.14159274101 -- wrong!
return 0;
}
(The above code has an f
after the numeric constants—this indicates that the constant is type float
, as opposed to the default of double
. More on this later.)
Remember that FLT_DIG
is the safe number of digits you can store in a float
and retrieve correctly.
Sometimes you might get one or two more out of it. But sometimes you’ll only get FLT_DIG
digits back. The sure thing: if you store any number of digits up to and including FLT_DIG
in a float
, you’re sure to get them back correctly.
So that’s the story. FLT_DIG
. The End.
…Or is it?
But storing a base 10 number in a floating point number is only half the story.
What about when you print out a floating point number? How many digits can you print?
You might think it would be the same as the number you can store, but it’s not72!
But recall that you might have more decimal digits than FLT_DIG
encoded correctly in the number. In order to make sure you’re printed them all out, you can Of course, if you store the number 3.14f
in a float
, you can’t expect to print out more than 2 decimal places and get sensible results. But FLT_DIG
(if 6) says that you can’t store more digits than 3.14159f
and be sure of getting it stored successfully.
But what if you did some math on a floating point number? Can you get more precision?
When you write down a constant number, like 1234
, it has a type. But what type is it? Let’s look at the how C decides what type the constant is, and how to force it to choose a specific type.
In addition to good ol’ decimal like Grandma used to bake, C also supports constants of different bases.
If you lead a number with 0x
, it is read as a hex number:
int a = 0x1A2B; // Hexadecimal
int b = 0x1a2b; // Case doesn't matter for hex digits
"%x", a); // Print a hex number, "1a2b" printf(
If you lead a number with a 0
, it is read as an octal number:
int a = 012;
"%o\n", a); // Print an octal number, "12" printf(
This is particularly problematic for beginner programmers who try to pad decimal numbers on the left with 0
to line things up nice and pretty, inadvertently changing the base of the number:
int x = 11111; // Decimal 11111
int y = 00111; // Decimal 73 (Octal 111)
int z = 01111; // Decimal 585 (Octal 1111)
An unofficial extension73 in many C compilers allows you to represent a binary number with a 0b
prefix:
int x = 0b101010; // Binary 101010
"%d\n", x); // Prints 42 decimal printf(
There’s no printf()
format specifier for printing a binary number. You have to do it a character at a time with bitwise operators.
You can force a constant integer to be a certain type by appending a suffix to it that indicates the type.
We’ll do some assignments to demo, but most often devs leave off the suffixes unless needed to be precise. The compiler is pretty good at making sure the types are compatible.
int x = 1234;
long int x = 1234L;
long long int x = 1234LL
unsigned int x = 1234U;
unsigned long int x = 1234UL;
unsigned long long int x = 1234ULL;
The suffix can be uppercase or lowercase. And the U
and L
or LL
can appear either one first.
Type | Suffix |
---|---|
int |
None |
long int |
L |
long long int |
LL |
unsigned int |
U |
unsigned long int |
UL |
unsigned long long int |
ULL |
I mentioned in the table that “no suffix” means int
… but it’s actually more complex than that.
So what happens when you have an unsuffixed number like:
int x = 1234;
What type is it?
What C will generally do is choose the smallest type from int
up that can hold the value.
But specifically, that depends on the number’s base (decimal, hex, or octal), as well.
The spec has a great table indicating which type gets used for what unsuffixed value. In fact, I’m just going to copy it wholesale right here.
C99 §6.4.4.1¶5 reads, “The type of an integer constant is the first of the first of the corresponding list in which its value can be represented.”
And then goes on to show this table:
Suffix | Decimal Constant | Octal or Hexadecimal Constant |
---|---|---|
none | int long int |
int unsigned int long int unsigned long int long long int unsigned long long int |
u or U |
unsigned int unsigned long int unsigned long long int |
unsigned int unsigned long int unsigned long long int |
l or L |
long int long long int |
long int unsigned long int long long int unsigned long long int |
Both u or U and l or L |
unsigned long int unsigned long long int |
unsigned long int unsigned long long int |
ll or LL |
long long int |
long long int unsigned long long int |
Both u or U and ll or LL |
unsigned long long int |
unsigned long long int |
What that’s saying is that, for example, if you specify a number like 123456789U
, first C will see if it can be unsigned int
. If it doesn’t fit there, it’ll try unsigned long int
. And then unsigned long long int
. It’ll use the smallest type that can hold the number.
You’d think that a floating point constant like 1.23
would have a default type of float
, right?
Surprise! Turns out unsuffiexed floating point numbers are type double
! Happy belated birthday!
You can force it to be of type float
by appending an f
(or F
—it’s case-insensitive). You can force it to be of type long double
by appending l
(or L
).
Type | Suffix |
---|---|
float |
F |
double |
None |
long double |
L |
For example:
float x = 3.14f;
double x = 3.14;
long double x = 3.14L;
This whole time, though, we’ve just been doing this, right?
float x = 3.14;
Isn’t the left a float
and the right a double
? Yes! But C’s pretty good with automatic numeric conversions, so it’s more common to have an unsuffixed floating point constant than not. More on that later.
Remember earlier when we talked about how a floating point number can be represented by a significand, base, and exponent?
Well, there’s a common way of writing such a number, shown here followed by it’s more recognizable equivalent which is what you get when you actually run the math:
\(1.2345\times10^3 = 1234.5\)
Writing numbers in the form \(s\times b^e\) is called scientific notation74. In C, these are written using “E notation”, so these are equivalent:
Scientific Notation | E notation |
---|---|
\(1.2345\times10^{-3}=12.345\) | 1.2345e-3 |
\(1.2345\times10^4=123450000\) | 1.2345e+4 |
You can print a number in this notation with %e
:
printf("%e\n", 123456.0); // Prints 1.234560e+05
A couple little fun facts about scientific notation:
You don’t have to write them with a single leading digit before the decimal point. Any number of numbers can go in front.
double x = 123.456e+3; // 123456
However, when you print it, it will change the exponent so there is only one digit in front of the decimal point.
The plus can be left off the exponent, as it’s default, but this is uncommon in practice from what I’ve seen.
1.2345e10 == 1.2345e+10
You can apply the F
or L
suffixes to E-notation constants:
1.2345e10F
1.2345e10L
But wait, there’s more floating to be done!
Turns out there are hexadecimal floating point constants, as well!
These work similar to decimal floating point numbers, but they begin with a 0x
just like integer numbers.
The catch is that you must specify an exponent, and this exponent produces a power of 2. That is: \(2^x\).
And then you use a p
instead of an e
when writing the number:
So 0xa.1p3
is \(10.0625\times2^3 == 80.5\).
When using floating point hex constants, We can print hex scientific notation with %a
:
double x = 0xa.1p3;
"%a\n", x); // 0x1.42p+6
printf("%f\n", x); // 80.500000 printf(
In this chapter, we want to talk all about converting from one type to another. C has a variety of ways of doing this, and some might be a little different that you’re used to in other languages.
Before we talk about how to make conversions happen, let’s talk about how they work when they do happen.
Unlike many languages, C doesn’t do string-to-number (and vice-versa) conversions in quite as streamlined a manner as it does numeric conversions.
For these, we’ll have to call functions to do the dirty work.
When we want to convert a number to a string, we can use either sprintf()
(pronounced SPRINT-f) or snprintf()
(s-n-print-f)75
These basically work like printf()
, except they output to a string instead, and you can print that string later, or whatever.
For example, turning part of the value π into a string:
#include <stdio.h>
int main(void)
{
char s[10];
float f = 3.14159;
// Convert "f" to string, storing in "s", writing at most 10 characters
// including the NUL terminator
snprintf(s, 10, "%f", f);
printf("String value: %s\n", s); // String value: 3.141590
return 0;
}
If we wanted to convert a double
, we’d use %lf
. Or a long double
, %Lf
.
There are a couple families of functions to do this in C. We’ll call these the atoi
(pronounced a-to-i) family and the strtol
(stir-to-long) family.
For basic conversion from a string to a number, try the atoi
functions from <stdlib.h>
. These have bad error-handling characteristics (including undefined behavior if you pass in a bad string), so use them carefully.
Function | Description |
---|---|
atoi |
String to int |
atof |
String to float |
atol |
String to long int |
atoll |
String to long long int |
Though the spec doesn’t cop to it, the a
at the beginning of the function stands for ASCII76, so really atoi()
is “ASCII-to-integer”, but saying so today is a bit ASCII-centric.
Here’s an example converting a string to a float
:
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
char *pi = "3.14159";
float f;
f = atof(pi);
printf("%f\n", f);
return 0;
}
But, like I said, we get undefined behavior from weird things like this:
int x = atoi("what"); // "What" ain't no number I ever heard of
(When I run that, I get 0
back, but you really shouldn’t count on that in any way. You could get something completely different.)
For better error handling characteristics, let’s check out all those strtol
functions, also in <stdlib.h>
. Not only that, but they convert to more types and more bases, too!
Function | Description |
---|---|
strtol |
String to long int |
strtoll |
String to long long int |
strtoul |
String to unsigned long int |
strtoull String to unsigned long long int |
|
strtof |
String to float |
strtod |
String to double |
strtold |
String to long double |
These functions all follow a similar pattern of use, and are a lot of people’s first experience with pointers to pointers! But never fret—it’s easier than it looks.
Let’s do an example where we convert a string to an unsigned long
, discarding error information (i.e. information about bad characters in the input string):
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
char *s = "3490";
// Convert string s, a number in base 10, to an unsigned long int.
// NULL means we don't care to learn about any error information.
unsigned long int x = strtoul(s, NULL, 10);
printf("%lu\n", x); // 3490
return 0;
}
Notice a couple things there. Even though we didn’t deign to capture any information about error characters in the string, strtoul()
won’t give us undefined behavior; it will just return 0
.
Also, we specified that this was a decimal (base 10) number.
Does this mean we can convert numbers of different bases? Sure! Let’s do binary!
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
char *s = "101010"; // What's the meaning of this number?
// Convert string s, a number in base 2, to an unsigned long int.
unsigned long int x = strtoul(s, NULL, 2);
printf("%lu\n", x); // 42
return 0;
}
OK, that’s all fun and games, but what’s with that NULL
in there? What’s that for?
That helps us figure out if an error occurred in the processing of the string. It’s a pointer to a pointer to a char
, which sounds scary, but isn’t once you wrap your head around it.
Let’s do an example where we feed in a deliberately bad number, and we’ll see how strtol()
lets us know where the first invalid digit is.
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
char *s = "34x90"; // "x" is not a valid digit in base 10!
char *badchar;
// Convert string s, a number in base 10, to an unsigned long int.
unsigned long int x = strtoul(s, &badchar, 10);
// It tries to convert as much as possible, so gets this far:
printf("%lu\n", x); // 34
// But we can see the offending bad character because badchar
// points to it!
printf("Invalid character: %c\n", *badchar); // "x"
return 0;
}
So there we have strtoul()
modifying what badchar
points to in order to show us where things went wrong77.
But what if nothing goes wrong? In that case, badchar
will point to the NUL
terminator at the end of the string. So we can test for it:
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
char *s = "3490"; // "x" is not a valid digit in base 10!
char *badchar;
// Convert string s, a number in base 10, to an unsigned long int.
unsigned long int x = strtoul(s, &badchar, 10);
// Check if things went well
if (*badchar == '\0') {
printf("Success! %lu\n", x);
} else {
printf("Partial conversion: %lu\n", x);
printf("Invalid character: %c\n", *badchar);
}
return 0;
}
So there you have it. The atoi()
-style functions are good in a controlled pinch, but the strtol()
-style functions give you far more control over error handling and the base of the input.
If you convert a zero to bool
, the result is 0
. Otherwise it’s 1
.
If an integer type is converted to unsigned and doesn’t fit in it, the unsigned result wraps around odometer-style until it fits in the unsigned78.
If an integer type is converted to a signed number and doesn’t fit, the result is implementation-defined! Something documented will happen, but you’ll have to look it up79
If a floating point type is converted to an integer type, the fractional part is discarded with prejudice80.
But—and here’s the catch—if the number is too large to fit in the integer, you get undefined behavior. So don’t do that.
Going From integer or floating point to floating point, C makes a best effort to find the closest floating point number to the integer that it can.
Again, though, if the original value can’t be represented, it’s undefined behavior.
These are conversions the compiler does automatically for you when you mix and match types.
In a number of places, if a int
can be used to represent a value from char
or short
(signed or unsigned), that value is promoted up to int
. If it doesn’t fit in an int
, it’s promoted to unsigned int
.
This is how we can do something like this:
char x = 10, y = 20;
int i = x + y;
In that case, x
and y
get promoted to int
by C before the math takes place.
The integer promotions take place during The Usual Arithmetic Conversions, with variadic functions81, unary +
and -
operators, or when passing values to functions without prototypes82.
These are automatic conversions that C does around numeric operations that you ask for. (That’s actually what they’re called, by the way, by C99 §6.3.1.8.) Note that for this section, we’re just talking about numeric types—strings will come later.
These conversions answer questions about what happens when you mix types, like this:
int x = 3 + 1.2; // Mixing int and double
float y = 12 * 2; // Mixing float and int
Do they become int
s? Do they become float
s? How does it work?
Here are the steps, paraphrased for easy consumption.
If one thing in the expression is a floating type, convert the other things to that floating type.
Otherwise, if both types are integer types, perform the integer promotions on each, then make the operand types as big as they need to be hold the common largest value. Sometimes this involves changing signed to unsigned.
If you want to know the gritty details, check out C99 §6.3.1.8. But you probably don’t.
Just generally remember that int types become float types if there’s a floating point type anywhere in there, and the compiler makes an effort to make sure mixed integer types don’t overflow.
void*
The void*
type is interesting because it can be converted from or to any pointer type.
int x = 10;
void *p = &x; // &x is type int*, but we store it in a void*
int *q = p; // p is void*, but we store it in an int*
These are conversions from type to type that you have to ask for; the compiler won’t do it for you.
You can convert from one type to another by assigning one type to another with an =
.
You can also convert explicitly with a cast.
You can explicitly change the type of an expression by putting a new type in parentheses in front of it. Some C devs frown on the practice unless absolutely necessary, but it’s likely you’ll come across some C code with these in it.
Let’s do an example where we want to convert an int
into a long
so that we can store it in a long
.
Note: this example is contrived and the cast in this case is completely unnecessary because the x + 12
expression would automatically be changed to long int
to match the wider type of y
.
int x = 10;
long int y = (long int)x + 12;
In that example, even those x
was type int
before, the expression (long int)x
has type long int
. We say, “We cast x
to long int
.”
More commonly, you might see a cast being used to convert a void*
into a specific pointer type so it can be dereferenced.
A callback from the built-in qsort()
function might display this behavior since it has void*
s passed into it:
int compar(const void *elem1, const void *elem2)
{return *((const int*)elem2) - *((const int*)elem1);
}
But you could also clearly write it with an assignment:
int compar(const void *elem1, const void *elem2)
{const int *e1 = elem1;
const int *e2 = elem2;
return *e2 - *e1;
}
Again, casting is rarely needed in practice. If you find yourself casting, there might be another way to do the same thing, or maybe you’re casting unnecessarily.
Or maybe it is necessary. Personally, I try to avoid it, but am not afraid to use it if I have to.
Now that we have some more types under our belts, turns out we can give these types some additional attributes that control their behavior. These are the type qualifiers and storage class specifiers.
These are going to allow you to declare constant values, and also to give the compiler optimization hints that it can use.
const
This is the most common type qualifier you’ll see. It means the variable is constant, and any attempt to modify it will result in a very angry compiler.
const int x = 2;
4; // COMPILER PUKING SOUNDS, can't assign to a constant x =
You can’t change a const
value.
Often you see const
in parameter lists for functions:
void foo(const int x)
{"%d\n", x + 30); // OK, doesn't modify "x"
printf( }
const
and PointersThis one gets a little funky, because there are two usages that have two meanings when it comes to pointers.
const int *p; // We can't modify "p" with pointer arithmetic
p++; // Compiler error!
But we can modify what they point to:
int x = 10;
const int *p = &x;
20; // Set "x" to 20, no problem *p =
Great, so we can’t change the pointer, but we can change what it points to. What if we want the other way around? We want to be able to change the pointer, but not what it points to?
int x[] = {10, 20};
int *const p = x; // Move the const close to the variable name
// No problem
p++;
30; // Compiler error! Can't change what it points to *p =
Somewhat confusingly, these two things are equivalent:
const int *p; // Can't modify p
int const *p; // Can't modify p, just like the previous line
but different than:
int *const p; // Can't modify *p, the thing p points to
You can also do both!
const int *const p; // Can't modify p or *p!
const
CorrectnessOne more thing I have to mention is that the compiler will warn on something like this:
const int x = 20;
int *p = &x;
saying something to the effect of:
initialization discards 'const' qualifier from pointer type target
What’s happening there?
Well, we need to look at the types on either side of the assignment:
const int x = 20;
int *p = &x;
// ^ ^
// | |
// int* const int*
The compiler is warning us that the value on the right side of the assignment is const
, but the one of the left is not. And the compiler is letting us know that it is discarding the “const-ness” of the expression on the right.
That is, we can still try to do the following, but it’s just wrong. The compiler will warn, and it’s undefined behavior:
const int x = 20;
int *p = &x;
40; // Undefined behavior--maybe it modifies "x", maybe not!
*p =
"%d\n", x); // 40, if you're lucky printf(
restrict
TLDR: you never have to use this and you can ignore it every time you see it.
restrict
is a hint to the compiler that a particular piece of memory will only be accessed by one pointer and never another. If a developer declares a pointer to be restrict
and then accesses the object it points to in another way, the behavior is undefined.
Basically you’re telling C, “Hey—I guarantee that this one single pointer is the only way I access this memory, and if I’m lying, you can pull undefined behavior on me.”
And C uses that information to perform certain optimizations.
For example, let’s write a function to swap two variables, and we’ll use the restrict
keyword to assure C that we’ll never pass in pointers to the same thing. And then let’s blow it an try passing in pointers to the same thing.
void swap(int *restrict a, int *restrict b)
{
int t;
t = *a;
*a = *b;
*b = t;
}
int main(void)
{
int x = 10, y = 20;
swap(&x, &y); // OK! "a" and "b", above, point to different things
swap(&x, &x); // Undefined behavior! "a" and "b" point to the same thing
}
If we were to take out the restrict
keywords, above, that would allow both calls to work safely. But then the compiler might not be able to optimize.
restrict
has block scope, that is, the restriction only lasts for the scope its used. If it’s in a parameter list for a function, it’s in the block scope of that function.
If the restricted pointer points to an array, the restriction covers the entire array.
If it’s outside any function in file scope, the restriction covers the entire program.
You’re likely to see this in library functions like printf()
:
int printf(const char * restrict format, ...);
Again, that’s just telling the compiler that inside the printf()
function, there will be only one pointer that refers to any part of that format
string.
volatile
You’re unlikely to see or need this unless you’re dealing with hardware directly.
volatile
tells the compiler that a value might change behind its back and should be looked up every time.
An example might be where the compiler is looking in memory at an address that continuously updates behind the scenes, e.g. some kind of hardware timer.
If the compiler decides to optimize that and store the value in a register for a protracted time, the value in memory will update and won’t be reflected in the register.
By declaring something volatile
, you’re telling the compiler, "Hey, the thing this points at might change at any time for reasons outside this program code.
volatile int *p;
_Atomic
This is an optional C feature that we’ll talk about another time.
Type specifiers are similar to type quantifiers. They give the compiler more information about the type of a variable.
auto
You barely ever see this keyword, since auto
is the default for block scope variables. It’s implied.
These are the same:
{int a; // auto is the default...
auto int a; // So this is redundant
}
The auto
keyword indicates that this object has automatic storage duration. That is, it exists in the scope in which it is defined, and is automatically deallocated when the scope is exited.
One gotcha about automatic variables is that their value is indeterminate until you explicitly initialize them. We say they’re full of “random” or “garbage” data, though neither of those really makes me happy. In any case, you won’t know what’s in it unless you initialize it.
Always initialize all automatic variables before use!
static
This keyword has two meanings, depending on if the variable is file scope or block scope.
Let’s start with block scope.
static
in Block ScopeIn this case, we’re basically saying, “I just want a single instance of this variable to exist, shared between calls.”
That is, its value will persist between calls.
static
in block scope with an initializer will only be initialized one time on program startup, not each time the function is called.
Let’s do an example:
#include <stdio.h>
void counter(void)
{
static int count = 1; // This is initialized one time
printf("This has been called %d time(s)\n", count);
count++;
}
int main(void)
{
counter(); // "This has been called 1 time(s)"
counter(); // "This has been called 2 time(s)"
counter(); // "This has been called 3 time(s)"
counter(); // "This has been called 4 time(s)"
return 0;
}
See how the value of count
persists between calls?
One thing of note is that static
block scope variables are initialized to 0
by default.
static int foo; // Default starting value is `0`...
static int foo = 0; // So the `0` assignment is redundant
Finally, be advised that if you’re writing multithreaded programs, you have to be sure you don’t let multiple threads trample the same variable.
static
in File ScopeWhen you get out to file scope, outside any blocks, the meaning rather changes.
Variables at file scope already persist between function calls, so that behavior is already there.
Instead what static
means in this context is that this variable isn’t visible outside of this particular source file. Kinda like “global”, but only in this file.
More on that in the section about building with multiple source files.
extern
The extern
type specifier gives us a way to refer to objects in other source files.
Let’s say, for example, the file bar.c
had the following as its entirety:
Just that. Declaring a new int a
in file scope.
But what if we had another source file, foo.c
, and we wanted to refer to the a
that’s in bar.c
?
It’s easy with the extern
keyword:
// foo.c
extern int a;
int main(void)
{
printf("%d\n", a); // 37, from bar.c!
a = 99;
printf("%d\n", a); // Same "a" from bar.c, but it's now 99
return 0;
}
We could have also made the extern int a
in block scope, and it still would have referred to the a
in bar.c
:
// foo.c
int main(void)
{
extern int a;
printf("%d\n", a); // 37, from bar.c!
a = 99;
printf("%d\n", a); // Same "a" from bar.c, but it's now 99
return 0;
}
Now, if a
in bar.c
had been marked static
. this wouldn’t have worked. static
variables at file scope are not visible outside that file.
A final note about extern
on functions. For functions, extern
is the default, so it’s redundant. You can declare a function static
if you only want it visible in a single source file.
register
Barely anyone uses this anymore.
This is a keyword to hint to the compiler that this variable is frequently-used, and should be made as fast as possible to access. The compiler is under no obligation to agree to it.
Now, modern C compiler optimizers are pretty effective at figuring this out themselves, so it’s rare to see these days.
But if you must:
#include <stdio.h>
int main(void)
{
register int a; // Make "a" as fast to use as possible.
for (a = 0; a < 10; a++)
printf("%d\n", a);
return 0;
}
It does come at a price, however. You can’t take the address of a register:
register int a;
int *p = &a; // COMPILER ERROR! Can't take address of a register
The same applies to any part of an array:
register int a[] = {11, 22, 33, 44, 55};
int p = a; // COMPILER ERROR! Can't take address of a[0]
Or dereferencing part of an array:
register int a[] = {11, 22, 33, 44, 55};
int a = *(a + 2); // COMPILER ERROR! Address of a[0] taken
Interestingly, for the equivalent with array notation, gcc only warns:
register int a[] = {11, 22, 33, 44, 55};
int a = a[2]; // COMPILER WARNING!
with:
warning: ISO C forbids subscripting ‘register’ array
A bit of backstory, here: deep inside the CPU are little dedicated “variables” called registers83. They are super fast to access compared to RAM, so using them gets you a speed boost. But they’re not in RAM, so they don’t have an associated memory address (which is why you can’t take the address-of or get a pointer to them).
But, like I said, modern compilers are really good at producing optimal code, using registers whenever possible regardless of whether or not you specified the register
keyword. Not only that, but the spec allows them to just treat it as if you’d typed auto
, if they want.
In short, you probably don’t want to even bother with register
, and just let the compiler do what it thinks is best.
So far we’ve been looking at toy programs that for the most part fit in a single file. But complex C programs are made up of many files that are all compiled and linked together into a single executable.
In this chapter we’ll check out some of the common patterns and practices for putting together larger projects.
A really common situation is that some of your functions are defined in one file, and you want to call them from another.
This actually works out of the box with a warning… let’s first try it and then look at the right way to fix the warning.
For these examples, we’ll put the filename as the first comment in the source.
To compile them, you’ll need to specify all the sources on the command line:
# output file source files
# v v
# |----| |---------|
gcc -o foo foo.c bar.c
In that examples, foo.c
and bar.c
get built into the executable named foo
.
So let’s take a look at the source file bar.c
:
And the file foo.c
with main in it:
See how from main()
we call add()
—but add()
is in a completely different source file! It’s in bar.c
, while the call to it is in foo.c
!
If we build this with:
gcc -o foo foo.c bar.c
we get this warning:
warning: implicit declaration of function ‘add’
But if we ignore that (which really we should never do—always get your code to build with zero warnings!) and try to run it:
./foo
5
Indeed, we get the result of \(2+3\)! Yay!
So… about that warning. Let’s fix it.
What implicit declaration
means is that we’re using a function, namely add()
in this case, without letting C know anything about it ahead of time. C wants to know what it returns, what types it takes as arguments, and things such as that.
We saw how to fix that earlier with a function prototype. Indeed, if we add one of those to foo.c
before we make the call, everything works well:
// File foo.c
#include <stdio.h>
int add(int, int); // Add the prototype
int main(void)
{
printf("%d\n", add(2, 3)); // 5!
return 0;
}
No more warning!
But that’s a pain—needing to type in the prototype every time you want to use a function. I mean, we used printf()
right there and didn’t need to type in a prototype; what gives?
If you remember from what back with hello.c
at the beginning of the book, we actually did include the prototype for printf()
! It’s in the file stdio.h
! And we included that with #include
!
Can we do the same with our add()
function? Make a prototype for it and put it in a header file?
Sure!
Header files in C have a .h
extension by default. And they often, but not always, have the same name as their corresponding .c
file. So let’s make a bar.h
file for our bar.c
file, and we’ll stick the prototype in it:
And now let’s modify foo.c
to include that file. Assuming it’s in the same directory, we include it inside double quotes (as opposed to angle brackets):
// File foo.c
#include <stdio.h>
#include "bar.h" // Include from current directory
int main(void)
{
printf("%d\n", add(2, 3)); // 5!
return 0;
}
Notice how we don’t have the prototype in foo.c
anymore—we included it from bar.h
. Now any file that wants that add()
functionality can just #include "bar.h"
to get it, and you don’t need to worry about typing in the function prototype.
As you might have guessed, #include
literally includes the named file right there in your source code, just as if you’d typed it in.
We’re almost there! There’s just one more piece of boilerplate we have to add.
It’s not uncommon that a header file will itself #include
other headers needed for the functionality of its corresponding C files. I mean, why not?
But we might get into a crazy situation where header a.h
includes header b.h
, and b.h
includes a.h
! It’s an #include
infinite cycle!
Trying to build such a thing gives an error:
error: #include nested depth 200 exceeds maximum of 200
What we need to do is make it so that if a file gets included once, subsequent #include
s for that file are ignored.
The stuff that we’re about to do is so common that you should just automatically do it every time you make a header file!
And the common way to do this is with a preprocessor variable that we set the first time we #include
the file. And then for subsequent #include
s, we first check to make sure that the variable isn’t defined.
For that variable name, it’s super common to take the name of the header file, like bar.h
, make it uppercase, and replace the period with an underscore: BAR_H
.
(Don’t put a leading underscore (because a leading underscore followed by a capital letter is reserved) or a double leading underscore (because that’s also reserved.))
#ifndef BAR_H // If BAR_H isn't defined...
#define BAR_H // Define it (with no particular value)
// File bar.h
int add(int, int);
#endif // End of the #ifndef BAR_H
This will effectively cause the header file to be included only a single time, breaking any #include
loops you’ll find.
static
and extern
When it comes to multifile projects, you can make sure file-scope variables and functions are not visible from other source files with the static
keyword.
And you can refer to objects in other files with extern
.
For more info, check out the sections in the book on the static
and extern
type specifiers.
This isn’t part of the spec, but it’s 99.999% common in the C world.
You can compile C files into an intermediate representation called object files. These are compiled machine code that hasn’t been put into an executable yet.
Object files in Windows have a .OBJ
extension; in Unix-likes, they’re .o
.
In gcc, we can build some like this, with the -c
(compile only!) flag:
gcc -c foo.c # produces foo.o
gcc -c bar.c # produces bar.o
And then we can link those together into a single executable:
gcc -o foo foo.o bar.o
Voila, we’ve produced an executable foo
from the two object files.
But you’re thinking, why bother? Can’t we just:
gcc -o foo foo.c bar.c
and kill two boids84 with one stone?
For little programs, that’s fine. I do it all the time.
But for larger programs, we can take advantage of the fact that compiling from source to object files is relatively slow, and linking together a bunch of object files is relatively fast.
This really shows with the make
utility that only rebuilds sources that are newer than their outputs.
Let’s say you had a thousand C files. You could compile them all to object files to start (slowly) and then combine all those object files into an executable (fast).
Now say you modified just one of those C source files—here’s the magic: you only have to rebuild that one object file for that source file! And then you rebuild the executable (fast). All the other C files don’t have to be touched.
In other words, by only rebuilding the object files we need to, we cut down on compilation times radically. (Unless of course you’re doing a “clean” build, in which case all the object files have to be created.)
When you run a program, it’s actually you talking to the shell, saying, “Hey, please run this thing.” And the shell says, “Sure,” and then tells the operating system, “Hey, could you please make a new process and run this thing?” And if all goes well, the OS complies and your program runs.
But there’s a whole world outside your program in the shell that can be interacted with from within C. We’ll look at a few of those in this chapter.
Many command line utilities accept command line arguments. For example, if we want to see all files that end in .txt
, we can type something like this on a Unix-like system:
ls *.txt
(or dir
instead of ls
on a Windows system).
In this case, the command is ls
, but it arguments are all all files that end with .txt
85.
So how can we see what is passed into program from the command line?
Say we have a program called add
that adds all numbers passed on the command line and prints the result:
./add 10 30 5
45
That’s gonna pay the bills for sure!
But seriously, this is a great tool for seeing how to get those arguments from the command line and break them down.
First, let’s see how to get them at all. For this, we’re going to need a new main()
!
Here’s a program that prints out all the command line arguments. For example, if we name the executable foo
, we can run it like this:
./foo i like turtles
and we’ll see this output:
arg 0: ./foo
arg 1: i
arg 2: like
arg 3: turtles
It’s a little weird, because the zeroth argument is the name of the executable, itself. But that’s just something to get used to. The arguments themselves follow directly.
Source:
#include <stdio.h>
int main(int argc, char *argv[])
{
for (int i = 0; i < argc; i++) {
printf("arg %d: %s\n", i, argv[i]);
}
return 0;
}
Whoa! What’s going on with the main()
function signature? What’s argc
and argv
86 (pronounced arg-c and arg-v)?
Let’s start with the easy one first: argc
. This is the argument count, including the program name, itself. If you think of all the arguments as an array of strings, which is exactly what they are, then you can think of argc
as the length of that array, which is exactly what it is.
And so what we’re doing in that loop is going through all the argv
s and printing them out one at a time, so for a given input:
./foo i like turtles
we get a corresponding output:
arg 0: ./foo
arg 1: i
arg 2: like
arg 3: turtles
With that in mind, we should be good to go with our adder program.
Our plan:
argv[0]
, the program name)Let’s get to it!
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char **argv)
{
int total = 0;
for (int i = 0; i < argc; i++) {
int value = atoi(argv[i]); // Use strtol() for better error handling
total += value;
}
printf("%d\n", total);
return 0;
}
Sample runs:
$ ./add
0
$ ./add 1
1
$ ./add 1 2
3
$ ./add 1 2 3
6
$ ./add 1 2 3 4
10
Of course, it might puke if you pass in a non-integer, but hardening against that is left as an exercise to the reader.
argv
is NULL
One bit of fun trivia about argv
is that after the last string is a pointer to NULL
.
That is:
argv[argc] == NULL
is always true!
This might seem pointless, but it turns out to be useful in a couple places; we’ll take a look at one of those right now.
char **argv
Remember that when you call a function, C doesn’t differentiate between array notation and pointer notation in the function signature.
That is, these are the same:
void foo(char a[])
void foo(char *a)
Now, it’s been convenient to think of argv
as an array of strings, i.e. an array of char*
s, so this made sense:
int main(int argc, char *argc[])
but because of the equivalence, you could also write:
int main(int argc, char **argv)
Yeah, that’s a pointer to a pointer, all right! If it makes it easier, think of it as a pointer to a string. But really, it’s a pointer to a value that points to a char
.
Also recall that these are equivalent:
argv[i] *(argv + i)
which means you can do pointer arithmetic on argv
.
So an alternate way to consume the command line arguments might be to just walk along the argv
array by bumping up a pointer until we hit that NULL
at the end.
Let’s modify our adder to do that:
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char **argv)
{
int total = 0;
// Cute trick to get the compiler to stop warning about the
// unused variable argc:
(void)argc;
for (char **p = argv; *p != NULL; p++) {
int value = atoi(*p); // Use strtol() for better error handling
total += value;
}
printf("%d\n", total);
return 0;
}
Personally, I use array notation to access argv
, but have seen this style floating around, as well.
Just a few more things about argc
and argv
.
Some environments might not set argv[0]
to the program name. If it’s not available, argv[0]
will be an empty string. I’ve never seen this happen.
The spec is actually pretty liberal with what an implementation can do with argv
and where those values come from. But every system I’ve been on works the same way, as we’ve discussed in this section.
You can modify argc
, argv
, or any of the strings that argv
points to. (Just don’t make those strings longer than they already are!)
On some Unix-like systems, modifying the string argv[0]
results in the output of ps
changing87.
Normally, if you have a program called foo
that you’ve run with ./foo
, you might see this in the output of ps
:
4078 tty1 S 0:00 ./foo
But if you modify argv[0]
like so, being careful that the new string "Hi! "
is the same length as the old one "./foo"
:
0], "Hi! "); strcpy(argv[
and then run ps
while the program ./foo
is still executing, we’ll see this instead:
4079 tty1 S 0:00 Hi!
This behavior is not in the spec and is highly system-dependent.
There are a number of ways a program can exit in C, including return
ing from main()
, or calling one of the exit()
variants.
All of these methods accept an int
as an argument. So far, we’ve done a lot of return 0
from main()
, but what does the 0
mean? What other numbers can we put there? And how are they used?
The spec is both clear and vague on the matter, as is common. Clear because it spells out what you can do, but vague in that it doesn’t particularly limit it, either.
Nothing for it but to forge ahead and figure it out!
Let’s get Inception88 for a second: turns out that when you run your program, you’re running it from another program.
Usually this other program is some kind of shell89 that doesn’t do much on its own except launch other programs.
But this is a multi-phase process, especially visible in command-line shells:
Now, there’s a little piece of communication that takes place between steps 4 and 5: the program can return a status value that the shell can interrogate. Typically, this value is used to indicate the success or failure of your program, and, if a failure, what type of failure.
This value is what we’ve been return
ing from main()
. That’s the status.
Now, the C spec allows for two different status values, which have macro names defined in <stdlib.h>
:
Status | Description |
---|---|
EXIT_SUCCESS or 0 |
Program terminated successfully. |
EXIT_FAILURE |
Program terminated with an error. |
Let’s write a short program that multiplies two numbers from the command line. We’ll require that you specify exactly two values. If you don’t, we’ll print an error message, and exit with an error status.
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char **argv)
{
if (argc != 3) {
printf("usage: mult x y\n");
return EXIT_FAILURE; // Indicate to shell that it didn't work
}
printf("%d\n", atoi(argv[1]) * atoi(argv[2]));
return 0; // same as EXIT_SUCCESS, everything was good.
}
Now if we try to run this, we get the expected effect until we specify exactly the right number of command-line arguments:
$ ./mult
usage: mult x y
$ ./mult 3 4 5
usage: mult x y
$ ./mult 3 4
12
But that doesn’t really show the exit status that we returned, does it? We can get the shell to print it out, though. Assuming you’re running Bash or another POSIX shell, you can use echo $?
to see it90.
Let’s try:
$ ./mult
usage: mult x y
$ echo $?
1
$ ./mult 3 4 5
usage: mult x y
$ echo $?
1
$ ./mult 3 4
12
$ echo $?
0
Interesting! We see that on my system, EXIT_FAILURE
is 1
. The spec doesn’t spell this out, so it could be any number. But try it; it’s probably 1
on your system, too.
The status 0
most definitely means success, but what about all the other integers, even negative ones?
Here we’re going off the C spec and into Unix land. In general, while 0
means success, a positive non-zero number means failure. So you can only have one type of success, and multiple types of failure. Bash says the exit code should be between 0 and 255, though a number of codes are reserved.
In short, if you want to indicate different error exit statuses in a Unix environment, you can start with 1
and work your way up.
On Linux, if you try any code outside the range 0-255, it will bitwise AND the code with 0xff
, effectively clamping it to that range.
You can script the shell to later use these status codes to make decisions about what to do next.
Before I get into this, I need to warn you that C doesn’t specify what an environment variable is. So I’m going to describe the environment variable system that works on every major platform I’m aware of.
Basically, the environment is the program that’s going to run your program, e.g. the bash shell. And it might have some bash variables defined. In case you didn’t know, the shell can make its own variables. Each shell is different, but in bash you can just type set
and it’ll show you all of them.
Here’s an except from the 61 variables that are defined in my bash shell:
HISTFILE=/home/beej/.bash_history
HISTFILESIZE=500
HISTSIZE=500
HOME=/home/beej
HOSTNAME=FBILAPTOP
HOSTTYPE=x86_64
IFS=$' \t\n'
Notice they are in the form of key/value pairs. For example, one key is HOSTTYPE
and its value is x86_64
. From a C perspective, all values are strings, even if they’re numbers91.
So, anyway! Long story short, it’s possible to get these values from inside your C program.
Let’s write a program that uses the standard getenv()
function to look up a value that you set in the shell.
getenv()
will return a pointer to the value string, or else NULL
if the environment variable doesn’t exist.
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
char *val = getenv("FROTZ"); // Try to get the value
// Check to make sure it exists
if (val == NULL) {
printf("Cannot find the FROTZ environment variable\n");
return EXIT_FAILURE;
}
printf("Value: %s\n", val);
return 0;
}
If I run this directly, I get this:
$ ./foo
Cannot find the FROTZ environment variable
which makes since, since I haven’t set it yet.
In bash, I can set it to something with92:
$ export FROTZ="C is awesome!"
Then if I run it, I get:
$ ./foo
Value: C is awesome!
In this way, you can set up data in environment variables, and you can get it in your C code and modify your behavior accordingly.
This isn’t standard, but a lot of systems provide ways to set environment variables.
If on a Unix-like, look up the documentation for putenv()
, setenv()
, and unsetenv
. On Windows, see _putenv()
.
The most basic of all libraries in the whole of the standard C library is the standard I/O library. It’s used for reading from and writing to files. I can see you’re very excited about this.
So I’ll continue. It’s also used for reading and writing to the console, as we’ve already often seen with the printf()
function.
(A little secret here—many many things in various operating systems are secretly files deep down, and the console is no exception. “Everything in Unix is a file!” :-)
)
You’ll probably want some prototypes of the functions you can use, right? To get your grubby little mittens on those, you’ll want to include stdio.h
.
Anyway, so we can do all kinds of cool stuff in terms of file I/O. LIE DETECTED. Ok, ok. We can do all kinds of stuff in terms of file I/O. Basically, the strategy is this:
Use fopen()
to get a pointer to a file structure of type FILE*
. This pointer is what you’ll be passing to many of the other file I/O calls.
Use some of the other file calls, like fscanf()
, fgets()
, fprintf()
, or etc. using the FILE*
returned from fopen()
.
When done, call fclose()
with the FILE*
. This let’s the operating system know that you’re truly done with the file, no take-backs.
What’s in the FILE*
? Well, as you might guess, it points to a struct
that contains all kinds of information about the current read and write position in the file, how the file was opened, and other stuff like that. But, honestly, who cares. No one, that’s who. The FILE
structure is opaque to you as a programmer; that is, you don’t need to know what’s in it, and you don’t even want to know what’s in it. You just pass it to the other standard I/O functions and they know what to do.
This is actually pretty important: try to not muck around in the FILE
structure. It’s not even the same from system to system, and you’ll end up writing some really non-portable code.
One more thing to mention about the standard I/O library: a lot of the functions that operate on files use an “f” prefix on the function name. The same function that is operating on the console will leave the “f” off. For instance, if you want to print to the console, you use printf()
, but if you want to print to a file, use fprintf()
, see?
Wait a moment! If writing to the console is, deep down, just like writing to a file, since everything in Unix is a file, why are there two functions? Answer: it’s more convenient. But, more importantly, is there a FILE*
associated with the console that you can use? Answer: YES!
There are, in fact, three (count ’em!) special FILE*
s you have at your disposal merely for just including stdio.h
. There is one for input, and two for output.
That hardly seems fair—why does output get two files, and input only get one?
That’s jumping the gun a bit—let’s just look at them:
stdin
Input from the console.
stdout
Output to the console.
stderr
Output to the console on the error file stream.
So standard input (stdin
) is by default just what you type at the keyboard. You can use that in fscanf()
if you want, just like this:
/* this line: */
"%d", &x);
scanf(
/* is just like this line: */
"%d", &x); fscanf(stdin,
And stdout
works the same way:
"Hello, world!\n");
printf("Hello, world!\n"); /* same as previous line! */ fprintf(stdout,
So what is this stderr
thing? What happens when you output to that? Well, generally it goes to the console just like stdout
, but people use it for error messages, specifically. Why? On many systems you can redirect the output from the program into a file from the command line…and sometimes you’re interested in getting just the error output. So if the program is good and writes all its errors to stderr
, a user can redirect just stderr
into a file, and just see that. It’s just a nice thing you, as a programmer, can do.
fopen()
Opens a file for reading or writing
#include <stdio.h>
FILE *fopen(const char *path, const char *mode);
The fopen()
opens a file for reading or writing.
Parameter path
can be a relative or fully-qualified path and file name to the file in question.
Paramter mode
tells fopen()
how to open the file (reading, writing, or both), and whether or not it’s a binary file. Possible modes are:
r
Open the file for reading (read-only).
w
Open the file for writing (write-only). The file is created if it doesn’t exist.
r+
Open the file for reading and writing. The file has to already exist.
w+
Open the file for writing and reading. The file is created if it doesn’t already exist.
a
Open the file for append. This is just like opening a file for writing, but it positions the file pointer at the end of the file, so the next write appends to the end. The file is created if it doesn’t exist.
a+
Open the file for reading and appending. The file is created if it doesn’t exist.
Any of the modes can have the letter “b
” appended to the end, as is “wb
” (“write binary”), to signify that the file in question is a binary file. (“Binary” in this case generally means that the file contains non-alphanumeric characters that look like garbage to human eyes.) Many systems (like Unix) don’t differentiate between binary and non-binary files, so the “b
” is extraneous. But if your data is binary, it doesn’t hurt to throw the “b
” in there, and it might help someone who is trying to port your code to another system.
fopen()
returns a FILE*
that can be used in subsequent file-related calls.
If something goes wrong (e.g. you tried to open a file for read that didn’t exist), fopen()
will return NULL
.
int main(void)
{
FILE *fp;
if ((fp = fopen("datafile.dat", "r")) == NULL) {
printf("Couldn't open datafile.dat for reading\n");
exit(1);
}
// fp is now initialized and can be read from
return 0;
}
freopen()
Reopen an existing FILE*
, associating it with a new path
#include <stdio.h>
FILE *freopen(const char *filename, const char *mode, FILE *stream);
Let’s say you have an existing FILE*
stream that’s already open, but you want it to suddenly use a different file than the one it’s using. You can use freopen()
to “re-open” the stream with a new file.
Why on Earth would you ever want to do that? Well, the most common reason would be if you had a program that normally would read from stdin
, but instead you wanted it to read from a file. Instead of changing all your scanf()
s to fscanf()
s, you could simply reopen stdin
on the file you wanted to read from.
Another usage that is allowed on some systems is that you can pass NULL
for filename
, and specify a new mode
for stream
. So you could change a file from “r+
” (read and write) to just “r
” (read), for instance. It’s implementation dependent which modes can be changed.
When you call freopen()
, the old stream
is closed. Otherwise, the function behaves just like the standard fopen()
.
freopen()
returns stream
if all goes well.
If something goes wrong (e.g. you tried to open a file for read that didn’t exist), freopen()
will return NULL
.
#include <stdio.h>
int main(void)
{
int i, i2;
scanf("%d", &i); // read i from stdin
// now change stdin to refer to a file instead of the keyboard
freopen("someints.txt", "r", stdin);
scanf("%d", &i2); // now this reads from the file "someints.txt"
printf("Hello, world!\n"); // print to the screen
// change stdout to go to a file instead of the terminal:
freopen("output.txt", "w", stdout);
printf("This goes to the file \"output.txt\"\n");
// this is allowed on some systems--you can change the mode of a file:
freopen(NULL, "wb", stdout); // change to "wb" instead of "w"
return 0;
}
fclose()
The opposite of fopen()
–closes a file when you’re done with it so that it frees system resources.
#include <stdio.h>
int fclose(FILE *stream);
When you open a file, the system sets aside some resources to maintain information about that open file. Usually it can only open so many files at once. In any case, the Right Thing to do is to close your files when you’re done using them so that the system resources are freed.
Also, you might not find that all the information that you’ve written to the file has actually been written to disk until the file is closed. (You can force this with a call to fflush()
.)
When your program exits normally, it closes all open files for you. Lots of times, though, you’ll have a long-running program, and it’d be better to close the files before then. In any case, not closing a file you’ve opened makes you look bad. So, remember to fclose()
your file when you’re done with it!
On success, 0
is returned. Typically no one checks for this. On error EOF
is returned. Typically no one checks for this, either.
FILE *fp;
fp = fopen("spoonDB.dat", r"); // (you should error-check this)
sort_spoon_database(fp);
fclose(fp); // pretty simple, huh.
printf()
, fprintf()
Print a formatted string to the console or to a file.
#include <stdio.h>
int printf(const char *format, ...);
int fprintf(FILE *stream, const char *format, ...);
These functions print formatted strings to a file (that is, a FILE*
you likely got from fopen()
), or to the console (which is usually itself just a special file, right?)
The printf()
function is legendary as being one of the most flexible outputting systems ever devisied. It can also get a bit freaky here or there, most notably in the format
string. We’ll take it a step at a time here.
The easiest way to look at the format string is that it will print everything in the string as-is, unless a character has a percent sign (%
) in front of it. That’s when the magic happens: the next argument in the printf()
argument list is printed in the way described by the percent code.
Here are the most common percent codes:
%d
Print the next argument as a signed decimal number, like 3490
. The argument printed this way should be an int
.
%f
Print the next argument as a signed floating point number, like 3.14159
. The argument printed this way should be a float
.
%c
Print the next argument as a character, like 'B'
. The argument printed this way should be a char
.
%s
Print the next argument as a string, like "Did you remember your mittens?"
. The argument printed this way should be a char*
or char[]
.
%%
No arguments are converted, and a plain old run-of-the-mill percent sign is printed. This is how you print a ‘%’ using printf()
.
So those are the basics. I’ll give you some more of the percent codes in a bit, but let’s get some more breadth before then. There’s actually a lot more that you can specify in there after the percent sign.
For one thing, you can put a field width in there—this is a number that tells printf()
how many spaces to put on one side or the other of the value you’re printing. That helps you line things up in nice columns. If the number is negative, the result becomes left-justified instead of right-justified. Example:
"%10d", x); /* prints X on the right side of the 10-space field */
printf("%-10d", x); /* prints X on the left side of the 10-space field */ printf(
If you don’t know the field width in advance, you can use a little kung-foo to get it from the argument list just before the argument itself. Do this by placing your seat and tray tables in the fully upright position. The seatbelt is fastened by placing the—cough. I seem to have been doing way too much flying lately. Ignoring that useless fact completely, you can specify a dynamic field width by putting a *
in for the width. If you are not willing or able to perform this task, please notify a flight attendant and we will reseat you.
int width = 12;
int value = 3490;
"%*d\n", width, value); printf(
You can also put a “0” in front of the number if you want it to be padded with zeros:
int x = 17;
"%05d", x); /* "00017" */ printf(
When it comes to floating point, you can also specify how many decimal places to print by making a field width of the form “x.y
” where x
is the field width (you can leave this off if you want it to be just wide enough) and y
is the number of digits past the decimal point to print:
float f = 3.1415926535;
"%.2f", f); /* "3.14" */
printf("%7.3f", f); /* " 3.141" <-- 7 spaces across */ printf(
Ok, those above are definitely the most common uses of printf()
, but there are still more modifiers you can put in after the percent and before the field width:
0
This was already mentioned above. It pads the spaces before a number with zeros, e.g. "%05d"
.
-
This was also already mentioned above. It causes the value to be left-justified in the field, e.g. "%-5d"
.
' '
(space)
This prints a blank space before a positive number, so that it will line up in a column along with negative numbers (which have a negative sign in front of them). "% d"
.
+
Always puts a +
sign in front of a number that you print so that it will line up in a column along with negative numbers (which have a negative sign in front of them). "%+d"
.
#
This causes the output to be printed in a different form than normal. The results vary based on the specifier used, but generally, hexidecimal output ("%x"
) gets a "0x"
prepended to the output, and octal output ("%o"
) gets a "0"
prepended to it. These are, if you’ll notice, how such numbers are represented in C source. Additionally, floating point numbers, when printed with this #
modified, will print a trailing decimal point even if the number has no fractional part. Example: "%#x"
.
Now, I know earlier I promised the rest of the format specifiers…so ok, here they are:
%i
Just like %d
, above.
%o
Prints the integer number out in octal format. Octal is a base-eight number representation scheme invented on the planet Krylon where all the inhabitants have only eight fingers.
%u
Just like %d
, but works on unsigned int
s, instead of int
s.
%x
or %X
Prints the unsigned int
argument in hexidecimal (base-16) format. This is for people with 16 fingers, or people who are simply addicted hex, like you should be. Just try it! "%x"
prints the hex digits in lowercase, while "%X"
prints them in uppercase.
%F
Just like “%f”, except any string-based results (which can happen for numbers like infinity) are printed in uppercase.
%e
or %E
Prints the float
argument in exponential (scientific) notation. This is your classic form similar to “three times 10 to the 8th power”, except printed in text form: “3e8
”. (You see, the “e
” is read “times 10 to the”.) If you use the "%E"
specifier, the the exponent “e” is written in uppercase, a la “3E8
”.
%g
or %G
Another way of printing double
s. In this case the precision you specific tells it how many significant figures to print.
%p
Prints a pointer type out in hex representation. In other words, the address that the pointer is pointing to is printed. (Not the value in the address, but the address number itself.)
%n
This specifier is cool and different, and rarely needed. It doesn’t actually print anything, but stores the number of characters printed so far in the next pointer argument in the list.
int numChars;
float a = 3.14159;
int b = 3490;
"%f %d%n\n", a, b, &numChars);
printf("The above line contains %d characters.\n", numChars); printf(
The above example will print out the values of a
and b
, and then store the number of characters printed so far into the variable numChars
. The next call to printf()
prints out that result.
So let’s recap what we have here. We have a format string in the form:
"%[modifier][fieldwidth][.precision][lengthmodifier][formatspecifier]"
Modifier is like the "-"
for left justification, the field width is how wide a space to print the result in, the precision is, for float
s, how many decimal places to print, and the format specifier is like %d
.
That wraps it up, except what’s this “lengthmodifier” I put up there?! Yes, just when you thought things were under control, I had to add something else on there. Basically, it’s to tell printf()
in more detail what size the arguments are. For instance, char
, short
, int
, and long
are all integer types, but they all use a different number of bytes of memory, so you can’t use plain old “%d
” for all of them, right? How can printf()
tell the difference?
The answer is that you tell it explicitly using another optional letter (the length modifier, this) before the type specifier. If you omit it, then the basic types are assumed (like %d
is for int
, and %f
is for float
).
Here are the format specifiers:
h
Integer referred to is a short
integer, e.g. “%hd
” is a short
and “%hu
” is an unsigned short
.
l
(“ell”)
Integer referred to is a long
integer, e.g. “%ld
” is a long
and “%lu
” is an unsigned long
.
hh
Integer referred to is a char
integer, e.g. “%hhd
” is a char
and “%hhu
” is an unsigned char
.
ll
(“ell ell”)
Integer referred to is a long long
integer, e.g. “%lld
” is a long long
and “%llu
” is an unsigned long long
.
I know it’s hard to believe, but there might be still more format and length specifiers on your system. Check your manual for more information.
int a = 100;
float b = 2.717;
char *c = "beej!";
char d = 'X';
int e = 5;
printf("%d", a); /* "100" */
printf("%f", b); /* "2.717000" */
printf("%s", c); /* "beej!" */
printf("%c", d); /* "X" */
printf("110%%"); /* "110%" */
printf("%10d\n", a); /* " 100" */
printf("%-10d\n", a); /* "100 " */
printf("%*d\n", e, a); /* " 100" */
printf("%.2f\n", b); /* "2.71" */
printf("%hhd\n", c); /* "88" <-- ASCII code for 'X' */
printf("%5d %5.2f %c\n", a, b, d); /* " 100 2.71 X" */
sprintf()
, vprintf()
, vfprintf()
, vsprintf()
scanf()
, fscanf()
Read formatted string, character, or numeric data from the console or from a file.
#include <stdio.h>
int scanf(const char *format, ...);
int fscanf(FILE *stream, const char *format, ...);
The scanf()
family of functions reads data from the console or from a FILE
stream, parses it, and stores the results away in variables you provide in the argument list.
The format string is very similar to that in printf()
in that you can tell it to read a "%d"
, for instance for an int
. But it also has additional capabilities, most notably that it can eat up other characters in the input that you specify in the format string.
But let’s start simple, and look at the most basic usage first before plunging into the depths of the function. We’ll start by reading an int
from the keyboard:
int a;
"%d", &a); scanf(
scanf()
obviously needs a pointer to the variable if it is going to change the variable itself, so we use the address-of operator to get the pointer.
In this case, scanf()
walks down the format string, finds a “%d
”, and then knows it needs to read an integer and store it in the next variable in the argument list, a
.
Here are some of the other percent-codes you can put in the format string:
%d
Reads an integer to be stored in an int
. This integer can be signed.
%f
(%e
, %E
, and %g
are equivalent)
Reads a floating point number, to be stored in a float
.
%s
Reads a string. This will stop on the first whitespace character reached, or at the specified field width (e.g. “%10s”), whichever comes first.
And here are some more codes, except these don’t tend to be used as often. You, of course, may use them as often as you wish!
%u
Reads an unsigned integer to be stored in an unsigned int
.
%x
(%X
is equivalent)
Reads an unsigned hexidecimal integer to be stored in an unsigned int
.
%o
Reads an unsigned octal integer to be stored in an unsigned int
.
%i
Like %d
, except you can preface the input with “0x” if it’s a hex number, or “0” if it’s an octal number.
%c
Reads in a character to be stored in a char
. If you specify a field width (e.g. “%12c
”, it will read that many characters, so make sure you have an array that large to hold them.
%p
Reads in a pointer to be stored in a void*
. The format of this pointer should be the same as that which is outputted with printf()
and the “%p
” format specifier.
%n
Reads nothing, but will store the number of characters processed so far into the next int
parameter in the argument list.
%%
Matches a literal percent sign. No conversion of parameters is done. This is simply how you get a standalone percent sign in your string without scanf()
trying to do something with it.
%[
This is about the weirdest format specifier there is. It allows you to specify a set of characters to be stored away (likely in an array of char
s). Conversion stops when a character that is not in the set is matched.
For example, %[0-9]
means “match all numbers zero through nine.” And %[AD-G34]
means “match A, D through G, 3, or 4”.
Now, to convolute matters, you can tell scanf()
to match characters that are not in the set by putting a caret (^
) directly after the %[
and following it with the set, like this: %[^A-C]
, which means “match all characters that are not A through C.”
To match a close square bracket, make it the first character in the set, like this: %[]A-C]
or %[^]A-C]
. (I added the “A-C
” just so it was clear that the “]
” was first in the set.)
To match a hyphen, make it the last character in the set: %[A-C-]
.
So if we wanted to match all letters except “%”, “^”, “]”, “B”, “C”, “D”, “E”, and “-”, we could use this format string: %[^]%^B-E-]
.
So those are the basics! Phew! There’s a lot of stuff to know, but, like I said, a few of these format specifiers are common, and the others are pretty rare.
Got it? Now we can go onto the next—no wait! There’s more! Yes, still more to know about scanf()
. Does it never end? Try to imagine how I feel writing about it!
So you know that “%d
” stores into an int
. But how do you store into a long
, short
, or double
?
Well, like in printf()
, you can add a modifier before the type specifier to tell scanf()
that you have a longer or shorter type. The following is a table of the possible modifiers:
h
The value to be parsed is a short int
or short unsigned
. Example: %hd
or %hu
.
l
The value to be parsed is a long int
or long unsigned
, or double
(for %f
conversions.) Example: %ld
, %lu
, or %lf
.
L
The value to be parsed is a long long
for integer types or long double
for float
types. Example: %Ld
, %Lu
, or %Lf
.
*
Tells scanf()
do to the conversion specified, but not store it anywhere. It simply discards the data as it reads it. This is what you use if you want scanf()
to eat some data but you don’t want to store it anywhere; you don’t give scanf()
an argument for this conversion. Example: %*d
.
scanf()
returns the number of items assigned into variables. Since assignment into variables stops when given invalid input for a certain format specifier, this can tell you if you’ve input all your data correctly.
Also, scanf()
returns EOF
on end-of-file.
int a;
long int b;
unsigned int c;
float d;
double e;
long double f;
char s[100];
scanf("%d", &a); // store an int
scanf(" %d", &a); // eat any whitespace, then store an int
scanf("%s", s); // store a string
scanf("%Lf", &f); // store a long double
// store an unsigned, read all whitespace, then store a long int:
scanf("%u %ld", &c, &b);
// store an int, read whitespace, read "blendo", read whitespace,
// and store a float:
scanf("%d blendo %f", &a, &d);
// read all whitespace, then store all characters up to a newline
scanf(" %[^\n]", s);
// store a float, read (and ignore) an int, then store a double:
scanf("%f %*d %lf", &d, &e);
// store 10 characters:
scanf("%10c", s);
sscanf()
, vscanf()
, vsscanf()
, vfscanf()
gets()
, fgets()
Read a string from console or file
#include <stdio.h>
char *fgets(char *s, int size, FILE *stream);
char *gets(char *s);
These are functions that will retrieve a newline-terminated string from the console or a file. In other normal words, it reads a line of text. The behavior is slightly different, and, as such, so is the usage. For instance, here is the usage of gets()
:
Don’t use gets()
.
Admittedly, rationale would be useful, yes? For one thing, gets()
doesn’t allow you to specify the length of the buffer to store the string in. This would allow people to keep entering data past the end of your buffer, and believe me, this would be Bad News.
I was going to add another reason, but that’s basically the primary and only reason not to use gets()
. As you might suspect, fgets()
allows you to specify a maximum string length.
One difference here between the two functions: gets()
will devour and throw away the newline at the end of the line, while fgets()
will store it at the end of your string (space permitting).
Here’s an example of using fgets()
from the console, making it behave more like gets()
:
char s[100];
// don't use this--read a line (from stdin)
gets(s); sizeof(s), stdin); // read a line from stdin fgets(s,
In this case, the sizeof()
operator gives us the total size of the array in bytes, and since a char
is a byte, it conveniently gives us the total size of the array.
Of course, like I keep saying, the string returned from fgets()
probably has a newline at the end that you might not want. You can write a short function to chop the newline off, like so:
char *remove_newline(char *s)
{int len = strlen(s);
if (len > 0 && s[len-1] == '\n') // if there's a newline
1] = '\0'; // truncate the string
s[len-
return s;
}
So, in summary, use fgets()
to read a line of text from the keyboard or a file, and don’t use gets()
.
Both gets()
and fgets()
return a pointer to the string passed.
On error or end-of-file, the functions return NULL
.
char s[100];
gets(s); // read from standard input (don't use this--use fgets()!)
fgets(s, sizeof(s), stdin); // read 100 bytes from standard input
fp = fopen("datafile.dat", "r"); // (you should error-check this)
fgets(s, 100, fp); // read 100 bytes from the file datafile.dat
fclose(fp);
fgets(s, 20, stdin); // read a maximum of 20 bytes from stdin
getc()
, fgetc()
, getchar()
, puts()
, fputs()
, ungetc()
getc()
, fgetc()
, getchar()
Get a single character from the console or from a file.
#include <stdio.h>
int getc(FILE *stream);
int fgetc(FILE *stream);
int getchar(void);
All of these functions in one way or another, read a single character from the console or from a FILE
. The differences are fairly minor, and here are the descriptions:
getc()
returns a character from the specified FILE
. From a usage standpoint, it’s equivalent to the same fgetc()
call, and fgetc()
is a little more common to see. Only the implementation of the two functions differs.
fgetc()
returns a character from the specified FILE
. From a usage standpoint, it’s equivalent to the same getc()
call, except that fgetc()
is a little more common to see. Only the implementation of the two functions differs.
Yes, I cheated and used cut-n-paste to do that last paragraph.
getchar()
returns a character from stdin
. In fact, it’s the same as calling getc(stdin)
.
All three functions return the unsigned char
that they read, except it’s cast to an int
.
If end-of-file or an error is encountered, all three functions return EOF
.
// read all characters from a file, outputting only the letter 'b's
// it finds in the file
#include <stdio.h>
int main(void)
{
FILE *fp;
int c;
fp = fopen("datafile.txt", "r"); // error check this!
// this while-statement assigns into c, and then checks against EOF:
while((c = fgetc(fp)) != EOF) {
if (c == 'b') {
putchar(c);
}
}
fclose(fp);
return 0;
}
puts()
, fputs()
Write a string to the console or to a file.
#include <stdio.h>
int puts(const char *s);
int fputs(const char *s, FILE *stream);
Both these functions output a NUL-terminated string. puts()
outputs to the console, while fputs()
allows you to specify the file for output.
Both functions return non-negative on success, or EOF
on error.
// read strings from the console and save them in a file
#include <stdio.h>
int main(void)
{
FILE *fp;
char s[100];
fp = fopen("datafile.txt", "w"); // error check this!
while(fgets(s, sizeof(s), stdin) != NULL) { // read a string
fputs(s, fp); // write it to the file we opened
}
fclose(fp);
return 0;
}
putc()
, fputc()
, putchar()
Write a single character to the console or to a file.
#include <stdio.h>
int putc(int c, FILE *stream);
int fputc(int c, FILE *stream);
int putchar(int c);
All three functions output a single character, either to the console or to a FILE
.
putc()
takes a character argument, and outputs it to the specified FILE
. fputc()
does exactly the same thing, and differs from putc()
in implementation only. Most people use fputc()
.
putchar()
writes the character to the console, and is the same as calling putc(c, stdout)
.
All three functions return the character written on success, or EOF
on error.
// print the alphabet
#include <stdio.h>
int main(void)
{
char i;
for(i = 'A'; i <= 'Z'; i++)
putchar(i);
putchar('\n'); // put a newline at the end to make it pretty
return 0;
}
fseek()
, rewind()
Position the file pointer in anticipition of the next read or write.
#include <stdio.h>
int fseek(FILE *stream, long offset, int whence);
void rewind(FILE *stream);
When doing reads and writes to a file, the OS keeps track of where you are in the file using a counter generically known as the file pointer. You can reposition the file pointer to a different point in the file using the fseek()
call. Think of it as a way to randomly access you file.
The first argument is the file in question, obviously. offset
argument is the position that you want to seek to, and whence
is what that offset is relative to.
Of course, you probably like to think of the offset as being from the beginning of the file. I mean, “Seek to position 3490, that should be 3490 bytes from the beginning of the file.” Well, it can be, but it doesn’t have to be. Imagine the power you’re wielding here. Try to command your enthusiasm.
You can set the value of whence
to one of three things:
SEEK_SET
offset
is relative to the beginning of the file. This is probably what you had in mind anyway, and is the most commonly used value for whence
.
SEEK_CUR
offset
is relative to the current file pointer position. So, in effect, you can say, “Move to my current position plus 30 bytes,” or, “move to my current position minus 20 bytes.”
SEEK_END
offset
is relative to the end of the file. Just like SEEK_SET
except from the other end of the file. Be sure to use negative values for offset
if you want to back up from the end of the file, instead of going past the end into oblivion.
Speaking of seeking off the end of the file, can you do it? Sure thing. In fact, you can seek way off the end and then write a character; the file will be expanded to a size big enough to hold a bunch of zeros way out to that character.
Now that the complicated function is out of the way, what’s this rewind()
that I briefly mentioned? It repositions the file pointer at the beginning of the file:
0, SEEK_SET); // same as rewind()
fseek(fp, // same as fseek(fp, 0, SEEK_SET) rewind(fp);
For fseek()
, on success zero is returned; -1
is returned on failure.
The call to rewind()
never fails.
fseek(fp, 100, SEEK_SET); // seek to the 100th byte of the file
fseek(fp, -30, SEEK_CUR); // seek backward 30 bytes from the current pos
fseek(fp, -10, SEEK_END); // seek to the 10th byte before the end of file
fseek(fp, 0, SEEK_SET); // seek to the beginning of the file
rewind(fp); // seek to the beginning of the file
ftell()
Tells you where a particular file is about to read from or write to.
#include <stdio.h>
long ftell(FILE *stream);
This function is the opposite of fseek()
. It tells you where in the file the next file operation will occur relative to the beginning of the file.
It’s useful if you want to remember where you are in the file, fseek()
somewhere else, and then come back later. You can take the return value from ftell()
and feed it back into fseek()
(with whence
parameter set to SEEK_SET
) when you want to return to your previous position.
Returns the current offset in the file, or -1
on error.
long pos;
// store the current position in variable "pos":
pos = ftell(fp);
// seek ahead 10 bytes:
fseek(fp, 10, SEEK_CUR);
// do some mysterious writes to the file
do_mysterious_writes_to_file(fp);
// and return to the starting position, stored in "pos":
fseek(fp, pos, SEEK_SET);
fseek()
, rewind()
, fgetpos()
, fsetpos()
fgetpos()
, fsetpos()
Get the current position in a file, or set the current position in a file. Just like ftell()
and fseek()
for most systems.
#include <stdio.h>
int fgetpos(FILE *stream, fpos_t *pos);
int fsetpos(FILE *stream, fpos_t *pos);
These functions are just like ftell()
and fseek()
, except instead of counting in bytes, they use an opaque data structure to hold positional information about the file. (Opaque, in this case, means you’re not supposed to know what the data type is made up of.)
On virtually every system (and certainly every system that I know of), people don’t use these functions, using ftell()
and fseek()
instead. These functions exist just in case your system can’t remember file positions as a simple byte offset.
Since the pos
variable is opaque, you have to assign to it using the fgetpos()
call itself. Then you save the value for later and use it to reset the position using fsetpos()
.
Both functions return zero on success, and -1
on error.
char s[100];
fpos_t pos;
fgets(s, sizeof(s), fp); // read a line from the file
fgetpos(fp, &pos); // save the position
fgets(s, sizeof(s), fp); // read another line from the file
fsetpos(fp, &pos); // now restore the position to where we saved
ungetc()
Pushes a character back into the input stream.
#include <stdio.h>
int ungetc(int c, FILE *stream);
You know how getc()
reads the next character from a file stream? Well, this is the opposite of that—it pushes a character back into the file stream so that it will show up again on the very next read from the stream, as if you’d never gotten it from getc()
in the first place.
Why, in the name of all that is holy would you want to do that? Perhaps you have a stream of data that you’re reading a character at a time, and you won’t know to stop reading until you get a certain character, but you want to be able to read that character again later. You can read the character, see that it’s what you’re supposed to stop on, and then ungetc()
it so it’ll show up on the next read.
Yeah, that doesn’t happen very often, but there we are.
Here’s the catch: the standard only guarantees that you’ll be able to push back one character. Some implementations might allow you to push back more, but there’s really no way to tell and still be portable.
On success, ungetc()
returns the character you passed to it. On failure, it returns EOF
.
// read a piece of punctuation, then everything after it up to the next
// piece of punctuation. return the punctuation, and store the rest
// in a string
//
// sample input: !foo#bar*baz
// output: return value: '!', s is "foo"
// return value: '#', s is "bar"
// return value: '*', s is "baz"
//
char read_punctstring(FILE *fp, char *s)
{
char origpunct, c;
origpunct = fgetc(fp);
if (origpunct == EOF) // return EOF on end-of-file
return EOF;
while(c = fgetc(fp), !ispunct(c) && c != EOF) {
*s++ = c; // save it in the string
}
*s = '\0'; // nul-terminate the string!
// if we read punctuation last, ungetc it so we can fgetc it next
// time:
if (ispunct(c))
ungetc(c, fp);
}
return origpunct;
}
fread()
Read binary data from a file.
#include <stdio.h>
size_t fread(void *p, size_t size, size_t nmemb, FILE *stream);
You might remember that you can call fopen()
with the “b
” flag in the open mode string to open the file in “binary” mode. Files open in not-binary (ASCII or text mode) can be read using standard character-oriented calls like fgetc()
or fgets()
. Files open in binary mode are typically read using the fread()
function.
All this function does is says, “Hey, read this many things where each thing is a certain number of bytes, and store the whole mess of them in memory starting at this pointer.”
This can be very useful, believe me, when you want to do something like store 20 int
s in a file.
But wait—can’t you use fprintf()
with the “%d
” format specifier to save the int
s to a text file and store them that way? Yes, sure. That has the advantage that a human can open the file and read the numbers. It has the disadvantage that it’s slower to convert the numbers from int
s to text and that the numbers are likely to take more space in the file. (Remember, an int
is likely 4 bytes, but the string “12345678” is 8 bytes.)
So storing the binary data can certainly be more compact and faster to read.
(As for the prototype, what is this size_t
you see floating around? It’s short for “size type” which is a data type defined to hold the size of something. Great—would I stop beating around the bush already and give you the straight story?! Ok, size_t
is probably an int
.)
This function returns the number of items successfully read. If all requested items are read, the return value will be equal to that of the parameter nmemb
. If EOF occurs, the return value will be zero.
To make you confused, it will also return zero if there’s an error. You can use the functions feof()
or ferror()
to tell which one really happened.
// read 10 numbers from a file and store them in an array
int main(void)
{
int i;
int n[10]
FILE *fp;
fp = fopen("binarynumbers.dat", "rb");
fread(n, sizeof(int), 10, fp); // read 10 ints
fclose(fp);
// print them out:
for(i = 0; i < 10; i++)
printf("n[%d] == %d\n", i, n[i]);
return 0;
}
fopen()
, fwrite()
, feof()
, ferror()
fwrite()
Write binary data to a file.
#include <stdio.h>
size_t fwrite(const void *p, size_t size, size_t nmemb, FILE *stream);
This is the counterpart to the fread()
function. It writes blocks of binary data to disk. For a description of what this means, see the entry for fread()
.
fwrite()
returns the number of items successfully written, which should hopefully be nmemb
that you passed in. It’ll return zero on error.
// save 10 random numbers to a file
int main(void)
{
int i;
int r[10];
FILE *fp;
// populate the array with random numbers:
for(i = 0; i < 10; i++) {
r[i] = rand();
}
// save the random numbers (10 ints) to the file
fp = fopen("binaryfile.dat", "wb");
fwrite(r, sizeof(int), 10, fp); // write 10 ints
fclose(fp);
return 0;
}
feof()
, ferror()
,clearerr()
Determine if a file has reached end-of-file or if an error has occurred.
#include <stdio.h>
int feof(FILE *stream);
int ferror(FILE *stream);
void clearerr(FILE *stream);
Each FILE*
that you use to read and write data from and to a file contains flags that the system sets when certain events occur. If you get an error, it sets the error flag; if you reach the end of the file during a read, it sets the EOF flag. Pretty simple really.
The functions feof()
and ferror()
give you a simple way to test these flags: they’ll return non-zero (true) if they’re set.
Once the flags are set for a particular stream, they stay that way until you call clearerr()
to clear them.
feof()
and ferror()
return non-zero (true) if the file has reached EOF or there has been an error, respectively.
// read binary data, checking for eof or error
int main(void)
{
int a;
FILE *fp;
fp = fopen("binaryints.dat", "rb");
// read single ints at a time, stopping on EOF or error:
while(fread(&a, sizeof(int), 1, fp), !feof(fp) && !ferror(fp)) {
printf("I read %d\n", a);
}
if (feof(fp))
printf("End of file was reached.\n");
if (ferror(fp))
printf("An error occurred.\n");
fclose(fp);
return 0;
}
perror()
Print the last error message to stderr
#include <stdio.h>
#include <errno.h> // only if you want to directly use the "errno" var
void perror(const char *s);
Many functions, when they encounter an error condition for whatever reason, will set a global variable called errno
for you. errno
is just an interger representing a unique error.
But to you, the user, some number isn’t generally very useful. For this reason, you can call perror()
after an error occurs to print what error has actually happened in a nice human-readable string.
And to help you along, you can pass a parameter, s
, that will be prepended to the error string for you.
One more clever trick you can do is check the value of the errno
(you have to include errno.h
to see it) for specific errors and have your code do different things. Perhaps you want to ignore certain errors but not others, for instance.
The catch is that different systems define different values for errno
, so it’s not very portable. The standard only defines a few math-related values, and not others. You’ll have to check your local man-pages for what works on your system.
Returns nothing at all! Sorry!
fseek()
returns -1
on error, and sets errno
, so let’s use it. Seeking on stdin
makes no sense, so it should generate an error:
#include <stdio.h>
#include <errno.h> // must include this to see "errno" in this example
int main(void)
{
if (fseek(stdin, 10L, SEEK_SET) < 0)
perror("fseek");
fclose(stdin); // stop using this stream
if (fseek(stdin, 20L, SEEK_CUR) < 0) {
// specifically check errno to see what kind of
// error happened...this works on Linux, but your
// mileage may vary on other systems!
if (errno == EBADF) {
perror("fseek again, EBADF");
} else {
perror("fseek again");
}
}
return 0;
}
And the output is:
fseek: Illegal seek
fseek again, EBADF: Bad file descriptor
remove()
Delete a file
#include <stdio.h>
int remove(const char *filename);
Removes the specified file from the filesystem. It just deletes it. Nothing magical. Simply call this function and sacrifice a small chicken and the requested file will be deleted.
Returns zero on success, and -1
on error, setting errno
.
char *filename = "/home/beej/evidence.txt";
remove(filename);
remove("/disks/d/Windows/system.ini");
rename()
Renames a file and optionally moves it to a new location
#include <stdio.h>
int rename(const char *old, const char *new);
Renames the file old
to name new
. Use this function if you’re tired of the old name of the file, and you are ready for a change. Sometimes simply renaming your files makes them feel new again, and could save you money over just getting all new files!
One other cool thing you can do with this function is actually move a file from one directory to another by specifying a different path for the new name.
Returns zero on success, and -1
on error, setting errno
.
rename("foo", "bar"); // changes the name of the file "foo" to "bar"
// the following moves the file "evidence.txt" from "/tmp" to
// "/home/beej", and also renames it to "nothing.txt":
rename("/tmp/evidence.txt", "/home/beej/nothing.txt");
tmpfile()
Create a temporary file
#include <stdio.h>
FILE *tmpfile(void);
This is a nifty little function that will create and open a temporary file for you, and will return a FILE*
to it that you can use. The file is opened with mode “r+b
”, so it’s suitable for reading, writing, and binary data.
By using a little magic, the temp file is automatically deleted when it is close()
’d or when your program exits. (Specifically, tmpfile()
unlinks the file right after it opens it. If you don’t know what that means, it won’t affect your tmpfile()
skill, but hey, be curious! It’s for your own good!)
This function returns an open FILE*
on success, or NULL
on failure.
#include <stdio.h>
int main(void)
{
FILE *temp;
char s[128];
temp = tmpfile();
fprintf(temp, "What is the frequency, Alexander?\n");
rewind(temp); // back to the beginning
fscanf(temp, "%s", s); // read it back out
fclose(temp); // close (and magically delete)
return 0;
}
tmpnam()
Generate a unique name for a temporary file
#include <stdio.h>
char *tmpnam(char *s);
This function takes a good hard look at the existing files on your system, and comes up with a unique name for a new file that is suitable for temporary file usage.
Let’s say you have a program that needs to store off some data for a short time so you create a temporary file for the data, to be deleted when the program is done running. Now imagine that you called this file foo.txt
. This is all well and good, except what if a user already has a file called foo.txt
in the directory that you ran your program from? You’d overwrite their file, and they’d be unhappy and stalk you forever. And you wouldn’t want that, now would you?
Ok, so you get wise, and you decide to put the file in /tmp
so that it won’t overwrite any important content. But wait! What if some other user is running your program at the same time and they both want to use that filename? Or what if some other program has already created that file?
See, all of these scary problems can be completely avoided if you just use tmpnam()
to get a safe-ready-to-use filename.
So how do you use it? There are two amazing ways. One, you can declare an array (or malloc()
it—whatever) that is big enough to hold the temporary file name. How big is that? Fortunately there has been a macro defined for you, L_tmpnam
, which is how big the array must be.
And the second way: just pass NULL
for the filename. tmpnam()
will store the temporary name in a static array and return a pointer to that. Subsequent calls with a NULL
argument will overwrite the static array, so be sure you’re done using it before you call tmpnam()
again.
Again, this function just makes a file name for you. It’s up to you to later fopen()
the file and use it.
One more note: some compilers warn against using tmpnam()
since some systems have better functions (like the Unix function mkstemp()
.) You might want to check your local documentation to see if there’s a better option. Linux documentation goes so far as to say, “Never use this function. Use mkstemp()
instead.”
I, however, am going to be a jerk and not talk about mkstemp()
because it’s not in the standard I’m writing about. Nyaah.
Returns a pointer to the temporary file name. This is either a pointer to the string you passed in, or a pointer to internal static storage if you passed in NULL
. On error (like it can’t find any temporary name that is unique), tmpnam()
returns NULL
.
char filename[L_tmpnam];
char *another_filename;
if (tmpnam(filename) != NULL)
printf("We got a temp file named: \"%s\"\n", filename);
else
printf("Something went wrong, and we got nothing!\n");
another_filename = tmpnam(NULL);
printf("We got another temp file named: \"%s\"\n", another_filename);
printf("And we didn't error check it because we're too lazy!\n");
On my Linux system, this generates the following output:
We got a temp file named: "/tmp/filew9PMuZ"
We got another temp file named: "/tmp/fileOwrgPO"
And we didn't error check it because we're too lazy!
setbuf()
, setvbuf()
Configure buffering for standard I/O operations
#include <stdio.h>
void setbuf(FILE *stream, char *buf);
int setvbuf(FILE *stream, char *buf, int mode, size_t size);
Now brace yourself because this might come as a bit of a surprise to you: when you printf()
or fprintf()
or use any I/O functions like that, it does not normally work immediately. For the sake of efficiency, and to irritate you, the I/O on a FILE*
stream is buffered away safely until certain conditions are met, and only then is the actual I/O performed. The functions setbuf()
and setvbuf()
allow you to change those conditions and the buffering behavior.
So what are the different buffering behaviors? The biggest is called “full buffering”, wherein all I/O is stored in a big buffer until it is full, and then it is dumped out to disk (or whatever the file is). The next biggest is called “line buffering”; with line buffering, I/O is stored up a line at a time (until a newline ('\n'
) character is encountered) and then that line is processed. Finally, we have “unbuffered”, which means I/O is processed immediately with every standard I/O call.
You might have seen and wondered why you could call putchar()
time and time again and not see any output until you called putchar('\n')
; that’s right—stdout
is line-buffered!
Since setbuf()
is just a simplified version of setvbuf()
, we’ll talk about setvbuf()
first.
The stream
is the FILE*
you wish to modify. The standard says you must make your call to setvbuf()
before any I/O operation is performed on the stream, or else by then it might be too late.
The next argument, buf
allows you to make your own buffer space (using malloc()
or just a char
array) to use for buffering. If you don’t care to do this, just set buf
to NULL
.
Now we get to the real meat of the function: mode
allows you to choose what kind of buffering you want to use on this stream
. Set it to one of the following:
_IOFBF
stream
will be fully buffered.
_IOLBF
stream
will be line buffered.
_IONBF
stream
will be unbuffered.
Finally, the size
argument is the size of the array you passed in for buf
…unless you passed NULL
for buf
, in which case it will resize the existing buffer to the size you specify.
Now what about this lesser function setbuf()
? It’s just like calling setvbuf()
with some specific parameters, except setbuf()
doesn’t return a value. The following example shows the equivalency:
// these are the same:
setbuf(stream, buf);// fully buffered
setvbuf(stream, buf, _IOFBF, BUFSIZ);
// and these are the same:
setbuf(stream, NULL);// unbuffered setvbuf(stream, NULL, _IONBF, BUFSIZ);
setvbuf()
returns zero on success, and nonzero on failure. setbuf()
has no return value.
FILE *fp;
char lineBuf[1024];
fp = fopen("somefile.txt", "r");
setvbuf(fp, lineBuf, _IOLBF, 1024); // set to line buffering
// ...
fclose(fp);
fp = fopen("another.dat", "rb");
setbuf(fp, NULL); // set to unbuffered
// ...
fclose(fp);
fflush()
Process all buffered I/O for a stream right now
#include <stdio.h>
int fflush(FILE *stream);
When you do standard I/O, as mentioned in the section on the setvbuf()
function, it is usually stored in a buffer until a line has been entered or the buffer is full or the file is closed. Sometimes, though, you really want the output to happen right this second, and not wait around in the buffer. You can force this to happen by calling fflush()
.
The advantage to buffering is that the OS doesn’t need to hit the disk every time you call fprintf()
. The disadvantage is that if you look at the file on the disk after the fprintf()
call, it might not have actually been written to yet. (“I called fputs()
, but the file is still zero bytes long! Why?!”) In virtually all circumstances, the advantages of buffering outweigh the disadvantages; for those other circumstances, however, use fflush()
.
Note that fflush()
is only designed to work on output streams according to the spec. What will happen if you try it on an input stream? Use your spooky voice: who knooooows!
On success, fflush()
returns zero. If there’s an error, it returns EOF
and sets the error condition for the stream (see ferror()
.)
In this example, we’re going to use the carriage return, which is '\r'
. This is like newline ('\n'
), except that it doesn’t move to the next line. It just returns to the front of the current line.
What we’re going to do is a little text-based status bar like so many command line programs implement. It’ll do a countdown from 10 to 0 printing over itself on the same line.
What is the catch and what does this have to do with fflush()
? The catch is that the terminal is most likely “line buffered” (see the section on setvbuf()
for more info), meaning that it won’t actually display anything until it prints a newline. But we’re not printing newlines; we’re just printing carriage returns, so we need a way to force the output to occur even though we’re on the same line. Yes, it’s fflush()!
#include <stdio.h>
#include <unistd.h> // for prototype for sleep()
int main(void)
{
int count;
for(count = 10; count >= 0; count--) {
printf("\rSeconds until launch: "); // lead with a CR
if (count > 0)
printf("%2d", count);
else
printf("blastoff!\n");
// force output now!!
fflush(stdout);
// the sleep() function is non-standard, but virtually every
// system implements it--it simply delays for the specificed
// number of seconds:
sleep(1);
}
return 0;
}
As has been mentioned earlier in the guide, a string in C is a sequence of bytes in memory, terminated by a NUL character (‘\0
’). The NUL at the end is important, since it lets all these string functions (and printf()
and puts()
and everything else that deals with a string) know where the end of the string actually is.
Fortunately, when you operate on a string using one of these many functions available to you, they add the NUL terminator on for you, so you actually rarely have to keep track of it yourself. (Sometimes you do, especially if you’re building a string from scratch a character at a time or something.)
In this section you’ll find functions for pulling substrings out of strings, concatenating strings together, getting the length of a string, and so forth and so on.
strlen()
Returns the length of a string.
#include <string.h>
size_t strlen(const char *s);
This function returns the length of the passed null-terminated string (not counting the NUL character at the end). It does this by walking down the string and counting the bytes until the NUL character, so it’s a little time consuming. If you have to get the length of the same string repeatedly, save it off in a variable somewhere.
Returns the number of characters in the string.
char *s = "Hello, world!"; // 13 characters
// prints "The string is 13 characters long.":
printf("The string is %d characters long.\n", strlen(s));
strcmp()
, strncmp()
Compare two strings and return a difference.
#include <string.h>
int strcmp(const char *s1, const char *s2);
int strncmp(const char *s1, const char *s2, size_t n);
Both these functions compare two strings. strcmp()
compares the entire string down to the end, while strncmp()
only compares the first n
characters of the strings.
It’s a little funky what they return. Basically it’s a difference of the strings, so if the strings are the same, it’ll return zero (since the difference is zero). It’ll return non-zero if the strings differ; basically it will find the first mismatched character and return less-than zero if that character in s1
is less than the corresponding character in s2
. It’ll return greater-than zero if that character in s1
is greater than that in s2
.
For the most part, people just check to see if the return value is zero or not, because, more often than not, people are only curious if strings are the same.
These functions can be used as comparison functions for qsort()
if you have an array of char*
s you want to sort.
Returns zero if the strings are the same, less-than zero if the first different character in s1
is less than that in s2
, or greater-than zero if the first difference character in s1
is greater than than in s2
.
char *s1 = "Muffin";
char *s2 = "Muffin Sandwich";
char *s3 = "Muffin";
strcmp("Biscuits", "Kittens"); // returns < 0 since 'B' < 'K'
strcmp("Kittens", "Biscuits"); // returns > 0 since 'K' > 'B'
if (strcmp(s1, s2) == 0)
printf("This won't get printed because the strings differ");
if (strcmp(s1, s3) == 0)
printf("This will print because s1 and s3 are the same");
// this is a little weird...but if the strings are the same, it'll
// return zero, which can also be thought of as "false". Not-false
// is "true", so (!strcmp()) will be true if the strings are the
// same. yes, it's odd, but you see this all the time in the wild
// so you might as well get used to it:
if (!strcmp(s1, s3))
printf("The strings are the same!")
if (!strncmp(s1, s2, 6))
printf("The first 6 characters of s1 and s2 are the same");
strcat()
, strncat()
Concatenate two strings into a single string.
#include <string.h>
int strcat(const char *dest, const char *src);
int strncat(const char *dest, const char *src, size_t n);
“Concatenate”, for those not in the know, means to “stick together”. These functions take two strings, and stick them together, storing the result in the first string.
These functions don’t take the size of the first string into account when it does the concatenation. What this means in practical terms is that you can try to stick a 2 megabyte string into a 10 byte space. This will lead to unintended consequences, unless you intended to lead to unintended consequences, in which case it will lead to intended unintended consequences.
Technical banter aside, your boss and/or professor will be irate.
If you want to make sure you don’t overrun the first string, be sure to check the lengths of the strings first and use some highly technical subtraction to make sure things fit.
You can actually only concatenate the first n
characters of the second string by using strncat()
and specifying the maximum number of characters to copy.
Both functions return a pointer to the destination string, like most of the string-oriented functions.
char dest[20] = "Hello";
char *src = ", World!";
char numbers[] = "12345678";
printf("dest before strcat: \"%s\"\n", dest); // "Hello"
strcat(dest, src);
printf("dest after strcat: \"%s\"\n", dest); // "Hello, world!"
strncat(dest, numbers, 3); // strcat first 3 chars of numbers
printf("dest after strncat: \"%s\"\n", dest); // "Hello, world!123"
Notice I mixed and matched pointer and array notation there with src
and numbers
; this is just fine with string functions.
strchr()
, strrchr()
Find a character in a string.
#include <string.h>
char *strchr(char *str, int c);
char *strrchr(char *str, int c);
The functions strchr()
and strrchr
find the first or last occurance of a letter in a string, respectively. (The extra “r” in strrchr()
stands for “reverse”–it looks starting at the end of the string and working backward.) Each function returns a pointer to the char in question, or NULL
if the letter isn’t found in the string.
Quite straightforward.
One thing you can do if you want to find the next occurance of the letter after finding the first, is call the function again with the previous return value plus one. (Remember pointer arithmetic?) Or minus one if you’re looking in reverse. Don’t accidentally go off the end of the string!
Returns a pointer to the occurance of the letter in the string, or NULL
if the letter is not found.
// "Hello, world!"
// ^ ^
// A B
char *str = "Hello, world!";
char *p;
p = strchr(str, ','); // p now points at position A
p = strrchr(str, 'o'); // p now points at position B
// repeatedly find all occurances of the letter 'B'
char *str = "A BIG BROWN BAT BIT BEEJ";
char *p;
for(p = strchr(str, 'B'); p != NULL; p = strchr(p + 1, 'B')) {
"Found a 'B' here: %s\n", p);
printf(
}
// output is:
//
// Found a 'B' here: BIG BROWN BAT BIT BEEJ
// Found a 'B' here: BROWN BAT BIT BEEJ
// Found a 'B' here: BAT BIT BEEJ
// Found a 'B' here: BIT BEEJ
// Found a 'B' here: BEEJ
strcpy()
, strncpy()
Copy a string
#include <string.h>
char *strcpy(char *dest, char *src);
char *strncpy(char *dest, char *src, size_t n);
These functions copy a string from one address to another, stopping at the NUL terminator on the src
string.
strncpy()
is just like strcpy()
, except only the first n
characters are actually copied. Beware that if you hit the limit, n
before you get a NUL terminator on the src
string, your dest
string won’t be NUL-terminated. Beware! BEWARE!
(If the src
string has fewer than n
characters, it works just like strcpy()
.)
You can terminate the string yourself by sticking the '\0'
in there yourself:
char s[10];
char foo = "My hovercraft is full of eels."; // more than 10 chars
9); // only copy 9 chars into positions 0-8
strncpy(s, foo, 9] = '\0'; // position 9 gets the terminator s[
Both functions return dest
for your convenience, at no extra charge.
char *src = "hockey hockey hockey hockey hockey hockey hockey hockey";
char dest[20];
int len;
strcpy(dest, "I like "); // dest is now "I like "
len = strlen(dest);
// tricky, but let's use some pointer arithmetic and math to append
// as much of src as possible onto the end of dest, -1 on the length to
// leave room for the terminator:
strncpy(dest+len, src, sizeof(dest)-len-1);
// remember that sizeof() returns the size of the array in bytes
// and a char is a byte:
dest[sizeof(dest)-1] = '\0'; // terminate
// dest is now: v null terminator
// I like hockey hocke
// 01234567890123456789012345
strspn()
, strcspn()
Return the length of a string consisting entirely of a set of characters, or of not a set of characters.
#include <string.h>
size_t strspn(char *str, const char *accept);
size_t strcspn(char *str, const char *reject);
strspn()
will tell you the length of a string consisting entirely of the set of characters in accept
. That is, it starts walking down str
until it finds a character that is not in the set (that is, a character that is not to be accepted), and returns the length of the string so far.
strcspn()
works much the same way, except that it walks down str
until it finds a character in the reject
set (that is, a character that is to be rejected.) It then returns the length of the string so far.
The lenght of the string consisting of all characters in accept
(for strspn()
), or the length of the string consisting of all characters except reject
(for strcspn()
char str1[] = "a banana";
char str2[] = "the bolivian navy on manuvers in the south pacific";
// how many letters in str1 until we reach something that's not a vowel?
n = strspn(str1, "aeiou"); // n == 1, just "a"
// how many letters in str1 until we reach something that's not a, b,
// or space?
n = strspn(str1, "ab "); // n == 4, "a ba"
// how many letters in str2 before we get a "y"?
n = strcspn(str2, "y"); // n = 16, "the bolivian nav"
strstr()
Find a string in another string.
#include <string.h>
char *strstr(const char *str, const char *substr);
Let’s say you have a big long string, and you want to find a word, or whatever substring strikes your fancy, inside the first string. Then strstr()
is for you! It’ll return a pointer to the substr
within the str
!
You get back a pointer to the occurance of the substr
inside the str
, or NULL
if the substring can’t be found.
char *str = "The quick brown fox jumped over the lazy dogs.";
char *p;
p = strstr(str, "lazy");
printf("%s\n", p); // "lazy dogs."
// p is NULL after this, since the string "wombat" isn't in str:
p = strstr(str, "wombat");
strchr()
, strrchr()
, strspn()
, strcspn()
strtok()
Tokenize a string.
#include <string.h>
char *strtok(char *str, const char *delim);
If you have a string that has a bunch of separators in it, and you want to break that string up into individual pieces, this function can do it for you.
The usage is a little bit weird, but at least whenever you see the function in the wild, it’s consistently weird.
Basically, the first time you call it, you pass the string, str
that you want to break up in as the first argument. For each subsequent call to get more tokens out of the string, you pass NULL
. This is a little weird, but strtok()
remembers the string you originally passed in, and continues to strip tokens off for you.
Note that it does this by actually putting a NUL terminator after the token, and then returning a pointer to the start of the token. So the original string you pass in is destroyed, as it were. If you need to preserve the string, be sure to pass a copy of it to strtok()
so the original isn’t destroyed.
A pointer to the next token. If you’re out of tokens, NULL
is returned.
// break up the string into a series of space or
// punctuation-separated words
char *str = "Where is my bacon, dude?";
char *token;
// Note that the following if-do-while construct is very very
// very very very common to see when using strtok().
// grab the first token (making sure there is a first token!)
if ((token = strtok(str, ".,?! ")) != NULL) {
do {
printf("Word: \"%s\"\n", token);
// now, the while continuation condition grabs the
// next token (by passing NULL as the first param)
// and continues if the token's not NULL:
} while ((token = strtok(NULL, ".,?! ")) != NULL);
}
// output is:
//
// Word: "Where"
// Word: "is"
// Word: "my"
// Word: "bacon"
// Word: "dude"
//
strchr()
, strrchr()
, strspn()
, strcspn()
It’s your favorite subject: Mathematics! Hello, I’m Doctor Math, and I’ll be making math FUN and EASY!
[vomiting sounds]
Ok, I know math isn’t the grandest thing for some of you out there, but these are merely functions that quickly and easily do math you either know, want, or just don’t care about. That pretty much covers it.
For you trig fans out there, we’ve got all manner of things, including sine, cosine, tangent, and, conversely, arc sine, arc cosine, and arc tangent. That’s very exciting.
And for normal people, there is a slurry of your run-of-the-mill functions that will serve your general purpose mathematical needs, including absolute value, hypotenuse length, square root, cube root, and power.
In short, you’re a fricking MATHEMATICAL DEITY!
Oh wait, before then, I should tell you that the trig functions have three variants with different suffixes. The “f” suffix (e.g. sinf()
) returns a float
, while the “l” suffix (e.g. sinl()
) returns a massive and nicely accurate long double
. Normal sin()
just returns a double
. These are extensions to ANSI C, but they should be supported by modern compilers.
Also, there are several values that are defined in the math.h
header file.
Constant | C Macro Equivalent |
---|---|
\(e\) | M_E |
\(\log_2 e\) | M_LOG2E |
\(\log_{10} e\) | M_LOG10E |
\(\log_e 2\) | M_LN2 |
\(\log_e 10\) | M_LN10 |
\(\pi\) | M_PI |
\(\pi/2\) | M_PI_2 |
\(\pi/4\) | M_PI_4 |
\(1/\pi\) | M_1_PI |
\(2/\pi\) | M_2_PI |
\(2/\sqrt\pi\) | M_2_SQRTPI |
\(\sqrt2\) | M_SQRT2 |
\(1/\sqrt2\) | M_SQRT1_2 |
sin()
, sinf()
, sinl()
Calculate the sine of a number.
#include <math.h>
double sin(double x);
float sinf(float x);
long double sinl(long double x);
Calculates the sine of the value x
, where x
is in radians.
For those of you who don’t remember, radians are another way of measuring an angle, just like degrees. To convert from degrees to radians or the other way around, use the following code:
180.0f / M_PI;
degrees = radians * 180; radians = degrees * M_PI /
Returns the sine of x
. The variants return different types.
double sinx;
long double ldsinx;
sinx = sin(3490.0); // round and round we go!
ldsinx = sinl((long double)3.490);
cos()
, cosf()
, cosl()
Calculate the cosine of a number.
#include <math.h>
double cos(double x)
float cosf(float x)
long double cosl(long double x)
Calculates the cosine of the value x
, where x
is in radians.
For those of you who don’t remember, radians are another way of measuring an angle, just like degrees. To convert from degrees to radians or the other way around, use the following code:
180.0f / M_PI;
degrees = radians * 180; radians = degrees * M_PI /
Returns the cosine of x
. The variants return different types.
double sinx;
long double ldsinx;
sinx = sin(3490.0); // round and round we go!
ldsinx = sinl((long double)3.490);
tan()
, tanf()
, tanl()
Calculate the tangent of a number.
#include <math.h>
double tan(double x)
float tanf(float x)
long double tanl(long double x)
Calculates the tangent of the value x
, where x
is in radians.
For those of you who don’t remember, radians are another way of measuring an angle, just like degrees. To convert from degrees to radians or the other way around, use the following code:
180.0f / M_PI;
degrees = radians * 180; radians = degrees * M_PI /
Returns the tangent of x
. The variants return different types.
double tanx;
long double ldtanx;
tanx = tan(3490.0); // round and round we go!
ldtanx = tanl((long double)3.490);
asin()
, asinf()
, asinl()
Calculate the arc sine of a number.
#include <math.h>
double asin(double x);
float asinf(float x);
long double asinl(long double x);
Calculates the arc sine of a number in radians. (That is, the value whose sine is x
.) The number must be in the range -1.0 to 1.0.
For those of you who don’t remember, radians are another way of measuring an angle, just like degrees. To convert from degrees to radians or the other way around, use the following code:
180.0f / M_PI;
degrees = radians * 180; radians = degrees * M_PI /
Returns the arc sine of x
, unless x
is out of range. In that case, errno
will be set to EDOM and the return value will be NaN. The variants return different types.
acos()
, atan()
, atan2()
, sin()
acos()
, acosf()
, acosl()
Calculate the arc cosine of a number.
#include <math.h>
double acos(double x);
float acosf(float x);
long double acosl(long double x);
Calculates the arc cosine of a number in radians. (That is, the value whose cosine is x
.) The number must be in the range -1.0 to 1.0.
For those of you who don’t remember, radians are another way of measuring an angle, just like degrees. To convert from degrees to radians or the other way around, use the following code:
180.0f / M_PI;
degrees = radians * 180; radians = degrees * M_PI /
Returns the arc cosine of x
, unless x
is out of range. In that case, errno
will be set to EDOM and the return value will be NaN. The variants return different types.
asin()
, atan()
, atan2()
, cos()
atan()
, atanf()
, atanl()
,atan2()
, atan2f()
, atan2l()
Calculate the arc tangent of a number.
#include <math.h>
double atan(double x);
float atanf(float x);
long double atanl(long double x);
double atan2(double y, double x);
float atan2f(float y, float x);
long double atan2l(long double y, long double x);
Calculates the arc tangent of a number in radians. (That is, the value whose tangent is x
.)
The atan2()
variants are pretty much the same as using atan()
with y
/x
as the argument…except that atan2()
will use those values to determine the correct quadrant of the result.
For those of you who don’t remember, radians are another way of measuring an angle, just like degrees. To convert from degrees to radians or the other way around, use the following code:
180.0f / M_PI;
degrees = radians * 180; radians = degrees * M_PI /
The atan()
functions return the arc tangent of x
, which will be between PI/2 and -PI/2. The atan2()
functions return an angle between PI and -PI.
double atanx;
long double ldatanx;
atanx = atan(0.2);
ldatanx = atanl((long double)0.3);
atanx = atan2(0.2);
ldatanx = atan2l((long double)0.3);
sqrt()
Calculate the square root of a number
#include <math.h>
double sqrt(double x);
float sqrtf(float x);
long double sqrtl(long double x);
Computes the square root of a number. To those of you who don’t know what a square root is, I’m not going to explain. Suffice it to say, the square root of a number delivers a value that when squared (multiplied by itself) results in the original number.
Ok, fine—I did explain it after all, but only because I wanted to show off. It’s not like I’m giving you examples or anything, such as the square root of nine is three, because when you multiply three by three you get nine, or anything like that. No examples. I hate examples!
And I suppose you wanted some actual practical information here as well. You can see the usual trio of functions here—they all compute square root, but they take different types as arguments. Pretty straightforward, really.
Returns (and I know this must be something of a surprise to you) the square root of x
. If you try to be smart and pass a negative number in for x
, the global variable errno
will be set to EDOM
(which stands for DOMain Error, not some kind of cheese.)
// example usage of sqrt()
float something = 10;
double x1 = 8.2, y1 = -5.4;
double x2 = 3.8, y2 = 34.9;
double dx, dy;
printf("square root of 10 is %.2f\n", sqrtf(something));
dx = x2 - x1;
dy = y2 - y1;
printf("distance between points (x1, y1) and (x2, y2): %.2f\n",
sqrt(dx*dx + dy*dy));
And the output is:
square root of 10 is 3.16
distance between points (x1, y1) and (x2, y2): 40.54
http://www.ioccc.org/↩︎
https://en.wikipedia.org/wiki/Python_(programming_language)↩︎
https://en.wikipedia.org/wiki/JavaScript↩︎
https://en.wikipedia.org/wiki/Java_(programming_language)↩︎
https://en.wikipedia.org/wiki/Rust_(programming_language)↩︎
https://en.wikipedia.org/wiki/Go_(programming_language)↩︎
https://en.wikipedia.org/wiki/Swift_(programming_language)↩︎
https://en.wikipedia.org/wiki/Objective-C↩︎
https://en.wikipedia.org/wiki/ANSI_C↩︎
https://en.wikipedia.org/wiki/POSIX↩︎
https://visualstudio.microsoft.com/vs/community/↩︎
https://docs.microsoft.com/en-us/windows/wsl/install-win10↩︎
https://developer.apple.com/xcode/↩︎
http://beej.us/guide/bgc/↩︎
https://en.wikipedia.org/wiki/Assembly_language↩︎
https://en.wikipedia.org/wiki/Bare_machine↩︎
https://en.wikipedia.org/wiki/Operating_system↩︎
https://en.wikipedia.org/wiki/Embedded_system↩︎
https://en.wikipedia.org/wiki/Rust_(programming_language)↩︎
https://en.wikipedia.org/wiki/Grok↩︎
I know someone will fight me on that, but it’s gotta be at least in the top three, right?↩︎
https://en.wikipedia.org/wiki/Assembly_language↩︎
https://en.wikipedia.org/wiki/Machine_code↩︎
https://en.wikipedia.org/wiki/Unix↩︎
If you don’t give it an output filename, it will export to a file called a.out
by default—this filename has its roots deep in Unix history.↩︎
A “byte” is an 8-bit binary number. Think of it as an integer that can only hold the values from 0 to 255, inclusive.↩︎
I’m seriously oversimplifying how modern memory works, here. But the mental model works, so please forgive me.↩︎
Read this as “pointer to a char” or “char pointer”. “Char” for character. Though I can’t find a study, it seems anecdotally most people pronounce this as “char”, a minority say “car”, and a handful say “care”. We’ll talk more about pointers later.↩︎
Colloquially, we say they have “random” values, but they aren’t truly—or even pseudo-truly—random numbers.↩︎
Now. technically speaking, the C specification doesn’t say anything about a stack. It’s true. Your system might not use a stack deep-down for function calls. But it either does or looks like it does, and every single C programmer on the planet will know what you’re talking about when you talk about “the stack”. It would be just mean for me to keep you in the dark. Plus, the stack analogy is excellent for describing how recursion works.↩︎
A byte is a number made up of no more than 8 binary digits, or bits for short. This means in decimal digits just like grandma used to use, it can hold an unsigned number between 0 and 255, inclusive.↩︎
The order that bytes come in is referred to as the endianess of the number. Common ones are big endian and little endian. This usually isn’t something you need to worry about.↩︎
That is, base 16 with digits 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, and F.↩︎
That’s not all! It’s used in /*comments*/
and multiplication!↩︎
https://en.wikipedia.org/wiki/Null_pointer#History↩︎
https://en.wikipedia.org/wiki/Sentinel_value↩︎
The pointer type variables are a
, d
, f
, and i
, because those are the ones with *
in front of them.↩︎
These days, anyway.↩︎
Again, not really, but variable-length arrays—of which I’m not really a fan—are a story for another time.↩︎
In the good old MS-DOS days before memory protection was a thing, I was writing some particularly abusive C code that deliberately engaged in all kinds of undefined behavior. But I knew what I was doing, and things were working pretty well. Until I made a misstep that caused a lockup and, as I found upon reboot, nuked all my BIOS settings. That was fun. (Shout-out to @man for those fun times.)↩︎
There are a lot of things that cause undefined behavior, not just out-of-bounds array accesses. This is what makes the C language so exciting.↩︎
https://en.wikipedia.org/wiki/Row-_and_column-major_order↩︎
This is technically incorrect, as a pointer to an array and a pointer to the first element of an array have different types. But we can burn that bridge when we get to it.↩︎
C99 §6.7.6.2¶1 requires it be greater than zero. But you might see code out there with arrays declared of zero length at the end of struct
s and GCC is particularly lenient about it unless you compile with -pedantic
. This zero-length array was a hackish mechanism for making variable-length structures. Unfortunately, it’s technically undefined behavior to access such an array even though it basically worked everywhere. C99 codified a well-defined replacement for it called flexible array members, which we’ll chat about later.↩︎
This is also equivalent: void print_2D_array(int (*a)[3])
, but that’s more than I want to get into right now.↩︎
It’s actually type const char*
, but we haven’t talked about const
yet.↩︎
Though it is true that C doesn’t track the length of strings.↩︎
This is different than the NULL
pointer, and I’ll abbreviate it NUL
when talking about the character versus NULL
for the pointer.↩︎
Later we’ll learn a neater way to do with with pointer arithmetic.↩︎
There’s a safer function called strncpy()
that you should probably use instead, but we’ll get to that later.↩︎
Although in C individual items in memory like int
s are referred to as “objects”, they’re not objects in an object-oriented programming sense.↩︎
The Saturn was a popular brand of economy car in the United States until it was put out of business by the 2008 crash, sadly so to us fans.↩︎
A pointer is likely 8 bytes on a 64-bit system.↩︎
We’ll talk more about these later.↩︎
Recall that the sizeof
operator tells you the size in bytes of an object in memory.↩︎
Or string, which is really an array of char
s. Somewhat peculiarly, you can also have a pointer that references one past the end of the array without a problem and still do math on it. You just can’t dereference it when it’s out there.↩︎
Because remember that array notation is just a dereference and some pointer math, and you can’t dereference a void*
!↩︎
You can also cast the void*
to another type, but we haven’t gotten to casts yet.↩︎
https://en.wikipedia.org/wiki/Bit_bucket↩︎
“Bit” is short for binary digit. Binary is just another way of representing numbers. Instead of digits 0-9 like we’re used to, it’s digits 0-1.↩︎
https://en.wikipedia.org/wiki/Two%27s_complement↩︎
The industry term for a sequence of exactly, indisputably 8 bits is an octet.↩︎
In general, f you have an \(n\) bit two’s complement number, the signed range is \(-2^{n-1}\) to \(2^{n-1}-1\). And the unsigned range is \(0\) to \(2^{n-1}\).↩︎
https://en.wikipedia.org/wiki/ASCII↩︎
https://en.wikipedia.org/wiki/List_of_information_system_character_sets↩︎
https://en.wikipedia.org/wiki/Unicode↩︎
Depends on if a char
defaults to signed char
or unsigned char
↩︎
https://en.wikipedia.org/wiki/Signed_number_representations#Signed_magnitude_representation↩︎
My char
is signed.↩︎
https://en.wikipedia.org/wiki/IEEE_754↩︎
This program runs as its comments indicate on a system with FLT_DIG
of 6
that uses IEEE-754 base-2 floating point numbers. Otherwise, you might get different output.↩︎
Or at least, it’s probably not—if you store floating point numbers in base 2.↩︎
It’s really surprising to me that C doesn’t have this in the spec yet. In the C99 Rationale document, they write, “A proposal to add binary constants was rejected due to lack of precedent and insufficient utility.” Which seems kind of silly in light of some of the other features they kitchen-sinked in there! I’ll bet one of the next releases has it.↩︎
https://en.wikipedia.org/wiki/Scientific_notation↩︎
They’re the same except snprintf()
allows you to specify a maximum number of bytes to output, preventing the overrunning of the end of your string. So it’s safer.↩︎
https://en.wikipedia.org/wiki/ASCII↩︎
We have to pass a pointer to badchar
into strtoul()
or it won’t be able to modify it in any way we can see, analogous to why you have to pass a pointer to an int
to a function if you want that function to be able to change that value of that int
.↩︎
In practice, what’s probably happening on your implementation is that the high-order bits are just being dropped from the result, so a 16-bit number 0x1234
being converted to an 8-bit number ends up as 0x0034
, or just 0x34
.↩︎
Again, in practice, what will likely happen on your system is that the bit pattern for the original will be truncated and then just used to represent the signed number, two’s complement. For example, my system takes an unsigned char
of 192
and converts it to signed char
-64
. In two’s complement, the bit pattern for both these numbers is binary 11000000
.↩︎
Not really—it’s just discarded regularly.↩︎
Functions with a variable number of arguments.↩︎
This is rarely done because the compiler will complain and having a prototype is the Right Thing to do. I think this still works for historic reasons, before prototypes were a thing.↩︎
https://en.wikipedia.org/wiki/Processor_register↩︎
https://en.wikipedia.org/wiki/Boids↩︎
Historially, MS-DOS and Windows programs would do this differently than Unix. In Unix, the shell would expand the wildcard into all matching files before your program saw it, whereas the Microsoft variants would pass the wildcard expression into the program to deal with. In any case, there are arguments that get passed into the program.↩︎
Since they’re just regular parameter names, you don’t actually have to call them argc
and argv
. But it’s so very idiomatic to use those names, if you get creative, other C programmers will look at you with a suspicious eye, indeed!↩︎
ps
, Process Status, is a Unix command to see what processes are running at the moment.↩︎
https://en.wikipedia.org/wiki/Inception↩︎
https://en.wikipedia.org/wiki/Shell_(computing)↩︎
In Windows cmd.exe
, type echo %errorlevel%
. In PowerShell, type $LastExitCode
.↩︎
If you need a numeric value, convert the string with something like atoi()
or strtol()
.↩︎
In Windows CMD.EXE, use set FROTZ=value
. In PowerShell, use $Env:FROTZ=value
.↩︎