C Programming Skills - #3

[Note: This was originally written for "FYI", the internal newsletter for the worldwide software engineering departments of Dendrite International]

Here we have an example of a programmer making a few unfounded assumptions and overlooking the obvious. Can you spot them? (Except for removing the programmer's name--to spare him an further embarrassment-- the function below is exactly as it appears in the Dendrite source file -- demonstrating not only the author's grand programming style, but also his rich command of the English language.)

/*
* strb0trim
* Remove leading and trailing spaces + leading '0's.
*
* NOTES:
* o Return a pointer into the provided (modified) buffer.
* o Special case is "0", which must be returned untouched.
*
*/
char *strb0trim(register char *s)
{
	register char *p;

	/* locate the end */
	p = s + strlen(s) - 1;

	/* trim trailing shit unconditionally */
	while(*p==' ' && *p)
		*p--='\0';

	/* trim leading shit - but not to death */
	while(*s==' ' || (*s=='0' && *(s+1)))
		s++;

	return s;
}

The section with the most trouble is :

while(*p==' ' && *p)
	*p--='\0';

Let's stop and think about this of a second. (Something the programmer obviously didn't do). It says loop while the character pointed to is equal to a space, and while not equal to a zero. Now, if it's a blank, it's guaranteed to not be a zero now, isn't it? So the second comparison is completely superfluous, and that will function the same as if written:

while(*p==' ')
	*p--='\0';

But, that's not good enough. The programmer knew what he wanted to do, he just went about it all wrong. In the loop, we are starting at the end of the string and work back towards the beginning, erasing spaces as we go. But what if the string is entirely spaces? We have to have some way to stop at the beginning of the string. The original function thought it would look for the zero which ended the string immediately before this string in memory.

So, What's wrong with that? This is bad for a number of reasons. First of all, there is no guarantee that the string before it in memory is null terminated. Consider this scenario: A fixed field length record is read into memory. One field is copied into a temporary buffer, and processed using this function. Based on the results of that, the record is modified and written out again. Sounds like a very basic and straightforward procedure. Then why is the file being corrupted? The temporary buffer will probably be in memory immediately after the file record. If the value in the temporary buffer is all blanks, the function will continue going backwards in memory, into the record buffer, and continue erasing spaces in the blank-padded file record.

And, that uses one of the assumptions the original programmer did -- that the object immediately before the string in memory is another string. But who's to say it won't be something else? An integer perhaps? If so, you have a 1 in 256 chance that the byte immediate before the string is 0x20 (i.e., a space). You call a function and suddenly an unrelated variable changes it's value.

Wait -- It gets worse. Consider what would happen if the string were a malloc()ed block? Then, most likely, the byte immediately before the string is a pointer used by the memory manager to point to the next memory block. And again, there's a 1 in 256 chance that it'll be a 0x20. If so, this function could replace it with a 0x00, and your program dies a horrible death-- but not immediately where you might have a chance to spot the cause quickly, but instead several lines downstream, when you next try to malloc a block.

So what can we do about it? Well, we know that the pointer "s" is the start of the string, so why not test against that? This would make the offending lines:

while(*p==' ' && p > s)
	*p--='\0';

Some more to think about: This removes spaces (0x20), but what about tabs, line feeds, carriage returns and so forth? Should we be using isspace(*p) instead of *p==' ' to test for all manner of whitespace? Also, this function returns a pointer inside the original buffer, which means if this block were malloced, a separate pointer will have to be maintained by the calling routine, so it can later be freed. Would it be better if we shifted the significant data up to the first position of the buffer, and returned the pointer we were given?

Copyright © 1994, James M. Curran