c - Why is my BST root pointer changing for some unknown reason?

Question

I am trying to implement a Binary Search Tree data structure in C, and I have run into a bug. My pointer value changes for a reason I do not understand. (Please see bottom of post for strange output [Delete function and main functions clarify where output comes from] ) My Test function below:

int main(void)
{
    Bst *bst = ( Bst* ) calloc( 1, sizeof( Bst ) );
    BstInsert( bst, 7 );
    BstInsert( bst, 8 );
    BstInsert( bst, 2 );
    BstInsert( bst, 1 );
    BstTraverse( bst );
    BstRemove( bst, 7); 
    printf("=========================\n");
    printf("Root Key: %d\n", bst->key );
    printf("Left Key: %d\n", bst->left->key );
    printf("Right Key: %d\n", bst->right->key );
    printf("Location: %p\n", &bst);
    BstTraverse( bst );

    return 0;
}

My Delete Node function below:

void BstRemove( Bst *root, int key ){
    //Seems like recursive algorithm would need doubly linked bst implementation
    Bst *temp_node = BstFind( root, key );
    Bst *parent_node = BstGetParent( root, key );
    Bst *replacement_node = ( Bst* ) calloc( 1, sizeof( Bst ) );
    if ( temp_node->key == root->key )
    {   
        if (root->left) replacement_node = BstMax( root->left );
        else if ( root->right ) replacement_node = BstMin( root->right );
        else replacement_node = NULL;
    }
    else if ( temp_node->left )
    {
        replacement_node = BstMax( temp_node );
        Bst *parent_replacement_node = BstGetParent( root, replacement_node->key );
        parent_replacement_node->right = NULL;
    }
    else if ( temp_node->right )
    {
        replacement_node = BstMin( temp_node );
        Bst *parent_replacement_node = BstGetParent( root, replacement_node->key );
        parent_replacement_node->left = NULL;
    }
    else
        replacement_node = NULL;

    if ( parent_node && key < parent_node->key )
        parent_node->left = replacement_node;
    else if ( parent_node )
        parent_node->right = replacement_node;

    if ( replacement_node )
    {
        if ( root->left->key != replacement_node->key ) replacement_node->left = temp_node->left;
        if ( root->right->key != replacement_node->key ) replacement_node->right = temp_node->right;
    }
    root = replacement_node;
    printf("Root Key: %d\n", root->key );
    printf("Left Key: %d\n", root->left->key );
    printf("Right Key: %d\n", root->right->key );
    printf("Location: %p\n", &root);
    free(temp_node);
}

Output Below:

1
2
7
8
Root Key: 2
Left Key: 1
Right Key: 8
Location: 0x7fffc5cf52e8
=========================
Root Key: 0
Left Key: 2
Right Key: 8
Location: 0x7fffc5cf5338
1
2
8
0
8

The reason this confuses me so much is because I am using a pointer. I see no reason for the root->key value to change when it is 2 within the delete function, and once it is processed
root->key becomes 0. I am grateful for anybody who can point out my problem or help me in the right direction. You can see my current BST implementation at https://github.com/PuffNotes/C/blob/master/data_structures/binary_tree.c if necessary. I recently started trying to program everyday to gain some skills, and consider myself to be a beginner in C ( for reference ). Thank you.

score 4 · Accepted Answer

You're not changing your root node pointer. It is passed by value to the remove function, and since it is certainly a viable target of the delete, it should be passed by address since it might change to a different node. Note: if I missed a root in there somewhere I apologize, but your compile should catch it).

Note: I made no validation pass on whether this code is correct or even works; but the real hint something was wrong was the root = at the bottom, followed by the print-out, then the caller (main()) doing the same print-out and showing a different root pointer value.

void BstRemove( Bst **root, int key )
{
    //Seems like recursive algorithm would need doubly linked bst implementation
    Bst *temp_node = BstFind( *root, key );
    Bst *parent_node = BstGetParent( *root, key );
    Bst *replacement_node = ( Bst* ) calloc( 1, sizeof( Bst ) );
    if ( temp_node->key == (*root)->key )
    {   
        if ((*root)->left) replacement_node = BstMax( (*root)->left );
        else if ( (*root)->right ) replacement_node = BstMin( (*root)->right );
        else replacement_node = NULL;
    }
    else if ( temp_node->left )
    {
        replacement_node = BstMax( temp_node );
        Bst *parent_replacement_node = BstGetParent( (*root), replacement_node->key );
        parent_replacement_node->right = NULL;
    }
    else if ( temp_node->right )
    {
        replacement_node = BstMin( temp_node );
        Bst *parent_replacement_node = BstGetParent( (*root), replacement_node->key );
        parent_replacement_node->left = NULL;
    }
    else
        replacement_node = NULL;

    if ( parent_node && key < parent_node->key )
        parent_node->left = replacement_node;
    else if ( parent_node )
        parent_node->right = replacement_node;

    if ( replacement_node )
    {
        if ( (*root)->left->key != replacement_node->key ) replacement_node->left = temp_node->left;
        if ( (*root)->right->key != replacement_node->key ) replacement_node->right = temp_node->right;
    }
    *root = replacement_node;

    printf("Root Key: %d\n", (*root)->key );
    printf("Left Key: %d\n", (*root)->left->key );
    printf("Right Key: %d\n", (*root)->right->key );
    printf("Location: %p\n", root);
    free(temp_node);
}

Invoke it like this:

BstRemove( &bst, 7);

And get used to passing the root in by address, as you will do plenty of it when you start writing balancing algorithms.

score 3 · Accepted Answer

@WhozCraig has provided a suitable answer to the main thrust of the question, but I'd really like to help a bit more with some other issues you have.

First Steps

OK, first off, a couple of really important things about your code :

Braces. For the love of God, use braces on your if..else syntax. See your BstInsert below.

void BstInsert( Bst *root, int key )
{
    if( !root->key )
        root->key = key;
    else if ( key <= root->key)
        if( root-> left )
            BstInsert( root->left, key );
        else
            root->left = NewNode( key );
    else
        if ( root -> right )
            BstInsert( root->right, key);
        else
            root->right = NewNode( key );
}

When you are writing functions to walk your BST based on whether one key is less than or greater than another, above all else you must be consistent. Using A < B in one place and A <= B can be disastrous. It also helps readability if you choose a side to stick the target node (the one you are looking for) and always do the comparisons the same way around.

Technical Problems

Memory allocation can fail. When it does, the various allocation methods (malloc, calloc, etc) will return NULL. You should check for this. Note that calloc initialises memory to zero (clears it), while malloc does not. For this sort of situation (writing a basic data structure as a practice exercise), I like to wrap my allocation methods like this:

void *ecalloc(size_t n, size_t s) {
    void *o = calloc(n, s);
    if (NULL == o) {
    fprintf(stderr, "Memory allocation failed!\n");
    exit(EXIT_FAILURE);
    }
    return o;
}

This means a) I don't have to type out annoying if (NULL == thing) allocation checks all the time and b) that the program will quit after printing a message if allocation fails. The latter may not always be desirable (well, the quitting part, at least, although you don't have a ton of options if you have run out of memory), but will be more than sufficient in this case.

Design Problems

Warning: The term Design is used loosely here.

Let's say we want a BST to store some integers. You decide that a node in your BST will store an integer and two pointers (to nodes). That is fine. However, it means you can't sensibly use the key as a sentinel value to determine whether or not a node is used.

Luckily, we don't need to. Instead of using a node as the root when the tree is empty, just use a pointer to a node. When there is no node, the pointer is NULL. This is related to the problem you hit with your remove method.

A BST is a tree, made up of linked nodes, right? Or is it? You can also think of it as a tree where each node is really a subtree. That makes it well suited to recursion, so let's try to express things elegantly using recursion when we can.

Now, we have a few options.

We could make an immutable bst (all your modifying calls would look like b = bst_insert(b, 10) because bst_insert and so on would all return the new modified copy of the tree without altering the old one).
We could go for more along the lines of void bst_insert(bst **b, int key), called as bst_insert(&b, 10), where we use an extra level of indirection to modify our tree by passing in a pointer to a pointer to a node.
Or we could could go for something in between the previous two options, where we have bst *b(bst *b, int key), which modifies what it can of *b (the key and the child pointers), and assigns back what it can't. This avoids the extra indirection (which is a bit ugly) but is a bit inconsistent in that if you are using a combination of assigning function return values and function side-effects to achieve your aim.

I have picked option two to work with.

Debugging

Suppose you insert a 1 into your BST. Perhaps you remove a 2. How do you know that this worked? Wouldn't it be easier to debug if you could see what your BST was doing?

I suggest (especially when starting out writing basic data structures, when a complex debugging environment like gdb might be a) overkill and b) information overload) that you code ways to print out the state of your data structure very early on.

Also, if you are running *nix, valgrind (pronounced like "Val grinned") is your best friend. It is a memory checker, and you can use it to make sure you always free the memory you allocate (once you're done with it, of course), and to look for other memory errors like running out of bounds. Learn to use it (it's actually pretty straightforward). If you're on Windows, there's Purify although I don't think it's free... anyway I'm sure someone more familiar with that environment can recommend something.

Compiler warnings are also a wonderful thing. Turn them on and treat them like errors. When using gcc I tend to use -W -Wall -ansi -pedantic. On a related note, there is -g to generate information for GDB to use.

Writing a BST

I was going to go over your code and dissect it, but I ended up writing a similarly styled BST myself and then went over my code explaining each part. I've gone with a two file approach. We have bst.c and bst.h.

bst.h

This bit is so that if the header is included multiple times in a larger system, we don't try to define or declare the same things more than once accidentally, and also so that we don't accidentally cause an infinite preprocessor loop if we have circular header references.

#ifndef BST_H_
#define BST_H_

Here's a typedef that both lets us avoid typing struct bstnode all the time, and hides the contents of the struct bstnode type somewhat from anyone who is just using your BST.

typedef struct bstnode bst;

extern says these functions are implemented somewhere else, basically.

extern bst *bst_new(int k);
extern void bst_insert(bst **b, int k);
extern bst *bst_search(bst  *b, int k);
extern void bst_remove(bst **b, int k);
extern void bst_delete(bst **b);
extern void bst_newick(const bst  *b);

#endif /* BST_H_ */

bst.c

#include <stdlib.h>
#include <stdio.h>
#include "bst.h"

Here's the complete struct bstnode. We have access to the bst typedef because we have included bst.h.

struct bstnode {
    int key;
    bst *left, *right;
};

In this context, static means these functions have file scope.

static void bst_swap_keys(bst *a, bst *b);
static void bst_newick_rec(const bst *b);
static void *ecalloc(size_t n, size_t s); /* Here for compactness - normally I would put it in a utility file somewhere else. */

Now, we could just as easily have included bst.h in another file, main.c, and put our main method there, but for compactness, I didn't.

int main(void)
{
    bst* b = bst_new(5);

    bst_newick(b);
    bst_insert(&b, 7);

    bst_newick(b);
    bst_insert(&b, 3);

    bst_insert(&b, 8);
    bst_insert(&b, 2);
    bst_insert(&b, 1);
    bst_newick(b);

    bst_remove(&b, 7);
    bst_newick(b);

    bst_delete(&b);
    printf("%p\n", (void*) b);

    return EXIT_SUCCESS;
}

The downside of doing bst_new this way is that you need a key before you can make your first valid node. We could have just scrapped bst_new and done the allocation in bst_insert instead, but I wanted to keep the new/delete paradigm here.

bst *bst_new(int k) {
bst *b = ecalloc(1, sizeof *b);
b->key = k;
return b;

}

Here is our insertion method. Bear in mind that I have tried to avoid shortcuts as much as possible, and there are plenty of ways to make this code more compact. Note the obsessive use of braces - it might be extra work but I recommend it to avoid unintended behaviour, especially when modifying your code at a later date.

void bst_insert(bst **b, int k) {
    if (NULL == b) { /* I wanted to avoid additional levels of nesting so I did this instead of NULL != b */
    return;
    }

    if (NULL == *b) {
    *b = bst_new(k);
    } else if ((*b)->key > k) {
        bst_insert(&(*b)->left, k);
    } else if ((*b)->key < k) {
    bst_insert(&(*b)->right, k);
    }
}

Find a node if possible. I could have made b const to show we don't modify it, but then I would have had to change the return type too, and then cast it away to modify anything I searched for, which is a bit naughty.

bst *bst_search(bst *b, int k) {
    if (NULL == b) {
    return NULL;
    } else if (b->key == k) {
    return b;
    } else if (b->key > k) {
    return bst_search(b->left, k);
    } else {
    return bst_search(b->right, k);
    }
}

This only exists for the bst_remove method, but it could be useful outside this file too, so it is available through the header too.

bst *bst_min(bst *b) {
    if (NULL != b && NULL != b->left) {
    return bst_min(b->left);
    } else {
    return b;
    }
}

Note that we swap the keys of the target node (the one being deleted) and the one that should replace it, rather than swapping the nodes themselves, and then recursively delete the target value again. If the keys were strings or something else allocated on the heap, you would also need to free the key just before freeing the node.

void bst_remove(bst **b, int k) {
    bst *temp;
    if (NULL == b || NULL == *b) { /* Doing it like this avoids extra indentation, which is harder to read*/
    return;
    }
    temp = *b;

    if ((*b)->key > k) {
    bst_remove(&(*b)->left, k);
    } else if ((*b)->key < k) {
    bst_remove(&(*b)->right, k);
    } else {
    if (NULL != (*b)->left && NULL != (*b)->right)
    {
        temp = bst_min((*b)->right);
        bst_swap_keys((*b), temp);
        bst_remove(&(*b)->right, k);
    }
    else if (NULL != (*b)->left)
    {
        *b = (*b)->left;
    }
    else if (NULL != (*b)->right)
    {
        *b = (*b)->right;
    }
    else
    {
        (*b) = NULL;
    }
    free(temp);
    }
}

bst_delete is rather important. It frees all the memory you have allocated to the bst you pass to it. Remember, for every allocation call there should also be a free call. If the keys were strings or something else allocated on the heap, you would also need to free the key just before freeing the node.

void bst_delete(bst **b) {
    if (NULL == b) {
    return;
    }
    if (NULL != *b) {
    bst_delete(&(*b)->left);
    bst_delete(&(*b)->right);
    free(*b);
        *b = NULL;
    }
}

Printing a BST in Newick format and reading off the values always feels like a little bit of a hack to me (because there's no distinction between L->R and R->L in Newick...), but I have a soft spot for it, am also used to reading it, and have found it handy for debugging in the past. And your method doing the printing ought to be consistent in its ordering anyway, unless you're crazy. This also demonstrates wrapping a recursive task by splitting off the recursive work into a separate method, which is then called from publicly available one. The latter handles other tasks that are not so well suited to recursion (e.g. printing a semi-colon and newline only once and at the top level.)

void bst_newick(const bst *b)
{
    if (NULL != b)
    {
    bst_newick_rec(b);
    printf(";\n");
    }
    else
    {
    printf("NULL!\n");
    }
}

static void bst_newick_rec(const bst *b)
{
    if (NULL == b) {
    return;
    }

    if (NULL != b->left || NULL != b->right) {
    printf("(");
    if (NULL != b->left && NULL != b->right) {
        bst_newick_rec(b->left);
        printf(",");
        bst_newick_rec(b->right);
    } else if (NULL != b->left) {
        bst_newick_rec(b->left);
    } else {
        bst_newick_rec(b->right);
    }
    printf(")");
    }
    printf("%d", b->key);
}

Making a key swapping method is really just a minor convenience.

static void bst_swap_keys(bst *a, bst *b)
{
    int temp;
    if (NULL != a && NULL != b && a != b)
    {
    temp = a->key;
    a->key = b->key;
    b->key = temp;
    }
}

static void *ecalloc(size_t n, size_t s) {
    void *o = calloc(n, s);
    if (NULL == o) {
    fprintf(stderr, "Memory allocation failed!\n");
    exit(EXIT_FAILURE);
    }
    return o;
}

Please keep in mind this has basically been assembled in my coffee break, and has not been rigorously tested. I hope this helps.