0

I'm reading a file in and adding each character to an array. I then break these characters down into words by removing spaces and other non-essential characters. Now, to work with each word individually, I'd like to add each word into it's own array. Is there any way to do this? I've attempted to add the memory location of the start of each word, but it keeps giving me the memory address of the very start of the array. The problem is, in the code below, the variable named 'buffer' overwrites itself with a new word with each iteration of the while loop. I need to be able to reference each word in order to push it into a linked list. Here's what I have so far:

#include <stdio.h>
#include <ctype.h>

int main(int argc, char **argv) {
char buffer[1024];
int c;
size_t n = 0;

FILE *pFile = stdin;

pFile = fopen(argv[1], "r");
if (pFile == NULL) perror("Error opening file");
    else {
        while(( c = fgetc(pFile)) != EOF ) {

            if (isspace(c) || ispunct(c)) {

                if (n > 0) {
                    buffer[n] = 0;
                    printf("read word %s\n", buffer);
                    n = 0;
                }
            } else {
                buffer[n++] = c;
            }
        }
        if (n > 0) {
            buffer[n] = 0;
            printf("read word %s\n", buffer);
        }
        fclose(pFile);
    }
return 0;
}

If I give a file containing the characters "This is a test document that holds words for this exercise", the following is produced:

read word This
read word is
read word a
read word test
read word document
read word that
read word holds
read word words
read word for
read word this
read word exercise
4

3 回答 3

1

If all you need is to store the string you've read then you could use strdup (man strdup for more info) to make a copy of the buffer and then store the pointer in an array or linked list as you mentioned.

Keep in mind that strdup uses malloc to allocate storage for each string and you must free this memory yourself when the strings are no longer needed. Also, repeatedly using malloc to allocate many small blocks of memory can be expensive, so use with caution!

于 2013-03-30T11:30:10.530 回答
1

The buffer is still a pointer, i.e. pointer arithmetic applies. You write 0 inside buffer whenever you encounter end of the word - this is good. Now all you need to do to have your next word in a separate array is just fastforward buffer to the next free position:

buffer += n;

To make it look neater, you could discard n altogether, have buffer++ everywhere and copy next character of the word like *buffer = c.

Then every word sits in its own array and they do not overlap. You can use pointer to the beginning of the word to store into a linked list. You can use conventional string functions (e.g. strlen) and their output will not suffer from back-to-back packing of strings in memory. This is possible, because you appended 0 at the end of every stored word.

于 2013-03-30T11:36:54.230 回答
1

Looks like you've got a good start. What you are doing is successfully reading all the words into a single array one at a time and overwriting them each time.

The problem is, in the code below, the variable named 'buffer' overwrites itself with a new word with each iteration of the while loop.

Sure, does:

     if (n > 0) {
           buffer[n] = 0; // this line terminates each string
           printf("read word %s\n", buffer);
           n = 0;         // this line resets the array so you overwrite with the next
                          // word
     }

So at this point you just need to place those words into your linked list instead of overwriting them. You could store them all in the array (if it's long enough) but why bother when you'll just have to take them back out? you really need to do at this point is replace this line:

printf("read word %s\n", buffer);

with the code to add the word into your linked list. Basically you need some sort of "node" structure, in the most basic sense you need to do something like:

struct node{
   char * word;       // place to add the word
   struct node *next; // pointer to the next node
};

You'll just need to get some memory for each node and each string in the nodes as you go, the following code assumes you have a head node pointing to the first node in your linked list and that you have a pointer to a current node that starts at head:

cur->next = malloc(sizeof(node));          // assign memory for a new node
cur = cur->next;                           // move current to the next node
cur->word = malloc(sizeof strlen(buffer)); // assign memory for the word
cur->next = NULL;                          // set the next pointer to NULL
strcpy(cur->word, buffer);                 // copy the word from the buffer 
                                           //   to your list
于 2013-03-30T15:03:10.497 回答