@WhozCraig has provided a suitable answer to the main thrust of the question, but I'd really like to help a bit more with some other issues you have.
First Steps
OK, first off, a couple of really important things about your code :
Braces. For the love of God, use braces on your if..else
syntax. See your BstInsert
below.
void BstInsert( Bst *root, int key )
{
if( !root->key )
root->key = key;
else if ( key <= root->key)
if( root-> left )
BstInsert( root->left, key );
else
root->left = NewNode( key );
else
if ( root -> right )
BstInsert( root->right, key);
else
root->right = NewNode( key );
}
When you are writing functions to walk your BST based on whether one key is less than or greater than another, above all else you must be consistent. Using A < B
in one place and A <= B
can be disastrous. It also helps readability if you choose a side to stick the target node (the one you are looking for) and always do the comparisons the same way around.
Technical Problems
Memory allocation can fail. When it does, the various allocation methods (malloc
, calloc
, etc) will return NULL
. You should check for this. Note that calloc
initialises memory to zero (clears it), while malloc
does not. For this sort of situation (writing a basic data structure as a practice exercise), I like to wrap my allocation methods like this:
void *ecalloc(size_t n, size_t s) {
void *o = calloc(n, s);
if (NULL == o) {
fprintf(stderr, "Memory allocation failed!\n");
exit(EXIT_FAILURE);
}
return o;
}
This means a) I don't have to type out annoying if (NULL == thing)
allocation checks all the time and b) that the program will quit after printing a message if allocation fails. The latter may not always be desirable (well, the quitting part, at least, although you don't have a ton of options if you have run out of memory), but will be more than sufficient in this case.
Design Problems
Warning: The term Design
is used loosely here.
Let's say we want a BST to store some integers. You decide that a node in your BST will store an integer and two pointers (to nodes). That is fine. However, it means you can't sensibly use the key as a sentinel value to determine whether or not a node is used.
Luckily, we don't need to. Instead of using a node as the root when the tree is empty, just use a pointer to a node. When there is no node, the pointer is NULL
. This is related to the problem you hit with your remove method.
A BST is a tree, made up of linked nodes, right? Or is it? You can also think of it as a tree where each node is really a subtree. That makes it well suited to recursion, so let's try to express things elegantly using recursion when we can.
Now, we have a few options.
- We could make an immutable bst (all your modifying calls would look like
b = bst_insert(b, 10)
because bst_insert
and so on would all return the new modified copy of the tree without altering the old one).
- We could go for more along the lines of
void bst_insert(bst **b, int key)
, called as bst_insert(&b, 10)
, where we use an extra level of indirection to modify our tree by passing in a pointer to a pointer to a node.
- Or we could could go for something in between the previous two options, where we have
bst *b(bst *b, int key)
, which modifies what it can of *b
(the key and the child pointers), and assigns back what it can't. This avoids the extra indirection (which is a bit ugly) but is a bit inconsistent in that if you are using a combination of assigning function return values and function side-effects to achieve your aim.
I have picked option two to work with.
Debugging
Suppose you insert a 1
into your BST. Perhaps you remove a 2
. How do you know that this worked? Wouldn't it be easier to debug if you could see what your BST was doing?
I suggest (especially when starting out writing basic data structures, when a complex debugging environment like gdb might be a) overkill and b) information overload) that you code ways to print out the state of your data structure very early on.
Also, if you are running *nix, valgrind
(pronounced like "Val grinned") is your best friend. It is a memory checker, and you can use it to make sure you always free the memory you allocate (once you're done with it, of course), and to look for other memory errors like running out of bounds. Learn to use it (it's actually pretty straightforward). If you're on Windows, there's Purify although I don't think it's free... anyway I'm sure someone more familiar with that environment can recommend something.
Compiler warnings are also a wonderful thing. Turn them on and treat them like errors. When using gcc I tend to use -W -Wall -ansi -pedantic
. On a related note, there is -g
to generate information for GDB to use.
Writing a BST
I was going to go over your code and dissect it, but I ended up writing a similarly styled BST myself and then went over my code explaining each part. I've gone with a two file approach. We have bst.c
and bst.h
.
bst.h
This bit is so that if the header is included multiple times in a larger system, we don't try to define or declare the same things more than once accidentally, and also so that we don't accidentally cause an infinite preprocessor loop if we have circular header references.
#ifndef BST_H_
#define BST_H_
Here's a typedef that both lets us avoid typing struct bstnode
all the time, and hides the contents of the struct bstnode
type somewhat from anyone who is just using your BST.
typedef struct bstnode bst;
extern
says these functions are implemented somewhere else, basically.
extern bst *bst_new(int k);
extern void bst_insert(bst **b, int k);
extern bst *bst_search(bst *b, int k);
extern void bst_remove(bst **b, int k);
extern void bst_delete(bst **b);
extern void bst_newick(const bst *b);
#endif /* BST_H_ */
bst.c
#include <stdlib.h>
#include <stdio.h>
#include "bst.h"
Here's the complete struct bstnode
. We have access to the bst
typedef because we have included bst.h
.
struct bstnode {
int key;
bst *left, *right;
};
In this context, static means these functions have file scope.
static void bst_swap_keys(bst *a, bst *b);
static void bst_newick_rec(const bst *b);
static void *ecalloc(size_t n, size_t s); /* Here for compactness - normally I would put it in a utility file somewhere else. */
Now, we could just as easily have included bst.h
in another file, main.c
, and put our main method there, but for compactness, I didn't.
int main(void)
{
bst* b = bst_new(5);
bst_newick(b);
bst_insert(&b, 7);
bst_newick(b);
bst_insert(&b, 3);
bst_insert(&b, 8);
bst_insert(&b, 2);
bst_insert(&b, 1);
bst_newick(b);
bst_remove(&b, 7);
bst_newick(b);
bst_delete(&b);
printf("%p\n", (void*) b);
return EXIT_SUCCESS;
}
The downside of doing bst_new
this way is that you need a key before you can make your first valid node. We could have just scrapped bst_new
and done the allocation in bst_insert
instead, but I wanted to keep the new
/delete
paradigm here.
bst *bst_new(int k) {
bst *b = ecalloc(1, sizeof *b);
b->key = k;
return b;
}
Here is our insertion method. Bear in mind that I have tried to avoid shortcuts as much as possible, and there are plenty of ways to make this code more compact. Note the obsessive use of braces - it might be extra work but I recommend it to avoid unintended behaviour, especially when modifying your code at a later date.
void bst_insert(bst **b, int k) {
if (NULL == b) { /* I wanted to avoid additional levels of nesting so I did this instead of NULL != b */
return;
}
if (NULL == *b) {
*b = bst_new(k);
} else if ((*b)->key > k) {
bst_insert(&(*b)->left, k);
} else if ((*b)->key < k) {
bst_insert(&(*b)->right, k);
}
}
Find a node if possible. I could have made b const
to show we don't modify it, but then I would have had to change the return type too, and then cast it away to modify anything I searched for, which is a bit naughty.
bst *bst_search(bst *b, int k) {
if (NULL == b) {
return NULL;
} else if (b->key == k) {
return b;
} else if (b->key > k) {
return bst_search(b->left, k);
} else {
return bst_search(b->right, k);
}
}
This only exists for the bst_remove
method, but it could be useful outside this file too, so it is available through the header too.
bst *bst_min(bst *b) {
if (NULL != b && NULL != b->left) {
return bst_min(b->left);
} else {
return b;
}
}
Note that we swap the keys of the target node (the one being deleted) and the one that should replace it, rather than swapping the nodes themselves, and then recursively delete the target value again. If the keys were strings or something else allocated on the heap, you would also need to free the key just before freeing the node.
void bst_remove(bst **b, int k) {
bst *temp;
if (NULL == b || NULL == *b) { /* Doing it like this avoids extra indentation, which is harder to read*/
return;
}
temp = *b;
if ((*b)->key > k) {
bst_remove(&(*b)->left, k);
} else if ((*b)->key < k) {
bst_remove(&(*b)->right, k);
} else {
if (NULL != (*b)->left && NULL != (*b)->right)
{
temp = bst_min((*b)->right);
bst_swap_keys((*b), temp);
bst_remove(&(*b)->right, k);
}
else if (NULL != (*b)->left)
{
*b = (*b)->left;
}
else if (NULL != (*b)->right)
{
*b = (*b)->right;
}
else
{
(*b) = NULL;
}
free(temp);
}
}
bst_delete
is rather important. It frees all the memory you have allocated to the bst you pass to it. Remember, for every allocation call there should also be a free call. If the keys were strings or something else allocated on the heap, you would also need to free the key just before freeing the node.
void bst_delete(bst **b) {
if (NULL == b) {
return;
}
if (NULL != *b) {
bst_delete(&(*b)->left);
bst_delete(&(*b)->right);
free(*b);
*b = NULL;
}
}
Printing a BST in Newick format and reading off the values always feels like a little bit of a hack to me (because there's no distinction between L->R and R->L in Newick...), but I have a soft spot for it, am also used to reading it, and have found it handy for debugging in the past. And your method doing the printing ought to be consistent in its ordering anyway, unless you're crazy. This also demonstrates wrapping a recursive task by splitting off the recursive work into a separate method, which is then called from publicly available one. The latter handles other tasks that are not so well suited to recursion (e.g. printing a semi-colon and newline only once and at the top level.)
void bst_newick(const bst *b)
{
if (NULL != b)
{
bst_newick_rec(b);
printf(";\n");
}
else
{
printf("NULL!\n");
}
}
static void bst_newick_rec(const bst *b)
{
if (NULL == b) {
return;
}
if (NULL != b->left || NULL != b->right) {
printf("(");
if (NULL != b->left && NULL != b->right) {
bst_newick_rec(b->left);
printf(",");
bst_newick_rec(b->right);
} else if (NULL != b->left) {
bst_newick_rec(b->left);
} else {
bst_newick_rec(b->right);
}
printf(")");
}
printf("%d", b->key);
}
Making a key swapping method is really just a minor convenience.
static void bst_swap_keys(bst *a, bst *b)
{
int temp;
if (NULL != a && NULL != b && a != b)
{
temp = a->key;
a->key = b->key;
b->key = temp;
}
}
static void *ecalloc(size_t n, size_t s) {
void *o = calloc(n, s);
if (NULL == o) {
fprintf(stderr, "Memory allocation failed!\n");
exit(EXIT_FAILURE);
}
return o;
}
Please keep in mind this has basically been assembled in my coffee break, and has not been rigorously tested. I hope this helps.