Wednesday, July 12, 2017

Referencing a char* that went out of scope

Leave a Comment

I recently started programming in C again after having programmed in C++ for a while, and my understanding of pointers is a bit rusty.

I would like to ask why this code is not causing any errors:

char* a = NULL; {     char* b = "stackoverflow";     a = b; }  printf(a); 

I thought that because b went out of scope, a should reference a non-existing memory location, and thus their would be a runtime error when calling printf.

I ran this code in MSVC about 20 times, and no errors were shown.

8 Answers

Answers 1

Inside the scope where b is defined, it is assigned the address of a string literal. These literals typically live in a read-only section of memory as opposed to the stack.

When you do a=b you assign the value of b to a, i.e. a now contains the address of a string literal. This address is still valid after b goes out of scope.

If you had taken the address of b and then attempted to dereference that address, then you would invoke undefined behavior.

So your code is valid and does not invoke undefined behavior, but the following does:

int *a = NULL; {     int b = 6;     a = &b; }  printf("b=%d\n", *a); 

Another, more subtle example:

char *a = NULL; {     char b[] = "stackoverflow";     a = b; }  printf(a); 

The difference between this example and yours is that b, which is an array, decays to a pointer to the first element when assigned to a. So in this case a contains the address of a local variable which then goes out of scope.

EDIT:

As a side note, it's bad practice to pass a variable as the first argument of printf, as that can lead to a format string vulnerability. Better to use a string constant as follows:

printf("%s", a); 

Or more simply:

puts(a); 

Answers 2

String literals are statically allocated, so the pointer is valid indefinitely. If you had said char b[] = "stackoverflow", then you would be allocating a char array on the stack that would become invalid when the scope ended. This difference also shows up for modifying strings: char s[] = "foo" stack allocates a string that you can modify, whereas char *s = "foo" only gives you a pointer to a string that can be placed in read-only memory, so modifying it is undefined behaviour.

Answers 3

Line by line, this is what your code does:

char* a = NULL; 

a is a pointer not referencing anything (set to NULL).

{     char* b = "stackoverflow"; 

b is a pointer referencing the static, constant string literal "stackoverflow".

    a = b; 

a is set to also reference the static, constant string literal "stackoverflow".

} 

b is out of scope. But since a is not referencing b, then that does not matter (it's just referencing the same static, constant string literal as b was referencing).

printf(a); 

Prints the static, constant string literal "stackoverflow" referenced by a.

Answers 4

Other people have explained that this code is perfectly valid. This answer is about your expectation that, if the code had been invalid, there would have been a runtime error when calling printf. It isn't necessarily so.

Let's look at this variation on your code, which is invalid:

#include <stdio.h> int main(void) {     int *a;     {         int b = 42;         a = &b;     }     printf("%d\n", *a); // undefined behavior     return 0; } 

This program has undefined behavior, but it happens to be fairly likely that it will, in fact, print 42, for several different reasons — many compilers will leave the stack slot for b allocated for the entire body of main, because nothing else needs the space and minimizing the number of stack adjustments simplifies code generation; even if the compiler did formally deallocate the stack slot, the number 42 probably remains in memory until something else overwrites it, and there's nothing in between a = &b and *a to do that; standard optimizations ("constant and copy propagation") could eliminate both variables and write the last-known value for *a directly into the printf statement (as if you had written printf("%d\n", 42)).

It's absolutely vital to understand that "undefined behavior" does not mean "the program will crash predictably". It means "anything can happen", and anything includes appearing to work as the programmer probably intended (on this computer, with this compiler, today).


As a final note, none of the aggressive debugging tools I have convenient access to (Valgrind, ASan, UBSan) track "auto" variable lifetimes in sufficient detail to trap this error, but GCC 6 does produce this amusing warning:

$ gcc -std=c11 -O2 -W -Wall -pedantic test.c test.c: In function ‘main’: test.c:9:5: warning: ‘b’ is used uninitialized in this function     printf("%d\n", *a); // undefined behavior     ^~~~~~~~~~~~~~~~~~ 

I believe what happened here was, it did the optimization I described above — copying the last known value of b into *a and then into the printf — but its "last known value" for b was a "this variable is uninitialized" sentinel rather than 42. (It then generates code equivalent to printf("%d\n", 0).)

Answers 5

The code doesn't generate any error because you are simply assigning character pointer b to another character pointer a and that is perfectly fine.

In C, You can assign a pointer reference to another pointer. here actually the string "stackoverflow" is used as a literal and the base address location of that string will be assign to a variable.

Though you are out of scope for variable b but still the assignment had been done with the a pointer. So it will print the result without any error.

Answers 6

Step by step execution of given code Please understand that the memory locations 1000, 2000 and 3000 are used just for illustration.

After the end of the scope, the memory is just deallocated for 'b'. But, the memory location is not over written of stored with NULL.

These are called Memory leakages.

Answers 7

String literals are always allocated statically and program can access anytime,

char* a = NULL;  {      char* b = "stackoverflow";      a = b;  }    printf(a);

Here memory to string literal stackoverflow is allocated by compiler same as it allocate memory to int/char variables or pointers

Difference is that string literal are places in READONLY section/segment. Variable b is allocated at stack but it is holding memory address of read only section/segmemt.

In the code var 'b' has address of string literal. Even when b looses its scope the memory for string literal will always be allocated

Note: Memory allocated to string literals is part of binary and will be removed once program is unloaded

Refer ELF binary specification to understand in more details

Answers 8

I think that, as a proof of previous answers, it is good to take a look at what really sits inside your code. People already mentioned that string literals lay inside .text section. So, they (literals) are simply, always, there. You can easily find this for the code

#include <string.h>  int main() {   char* a = 0;   {     char* b = "stackoverflow";     a = c;   }   printf("%s\n", a); } 

using following command

> cc -S main.c 

inside main.s you will find, at the very bottom

... ... ...         .section        __TEXT,__cstring,cstring_literals L_.str:                                 ## @.str         .asciz  "stackoverflow"  L_.str.1:                               ## @.str.1         .asciz  "%s\n" 

You can read more about assembler sections (for example) here: https://docs.oracle.com/cd/E19455-01/806-3773/elf-3/index.html

And here you can find very well prepared coverage of Mach-O executables: https://www.objc.io/issues/6-build-tools/mach-o-executables/

If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment