The plan is read in the name of the sequence, set that name as the variable for a dynamic array and use malloc/realloc to handle storing the actual sequence for a later comparison of all the different sequences. I can handle everything except the variable variable names.
Instead of naming the variable with the sequence header/name, create a struct
that holds the sequence header/name and the sequence, e.g.:
typedef struct {
char *header;
char *sequence;
} fasta_t;
Then create a list of fasta_t
pointers ("pointer to pointers"):
fasta_t **fasta_elements = NULL;
Use malloc()
to allocate space for N
elements of type fasta_t *
, e.g.:
fasta_elements = malloc(N * sizeof(fasta_t *));
It's a good idea to check if you actually got the memory you asked for:
if (!fasta_elements) {
/* i.e., if fasta_elements is still NULL */
fprintf(stderr, "ERROR: Could not allocate space for FASTA element list!\n");
return EXIT_FAILURE;
}
(You should get into the habit of doing this with every pointer you malloc()
, in my opinion.)
Now that space has been allocated, read in N
elements (use realloc()
if we need to make the list bigger, but let's assume N
elements for now). Within a loop, allocate space for an individual fasta_t
pointer, as well as space for header and sequence char *
s within the fasta_t
pointer:
#define MAX_HEADER_LENGTH 256
#define MAX_SEQUENCE_LENGTH 4096
/* ... */
size_t idx;
char current_header[MAX_HEADER_LENGTH] = {0};
char current_sequence[MAX_SEQUENCE_LENGTH] = {0};
for (idx = 0U; idx < N; idx++)
{
/* set up space for the fasta_t struct members (the header and sequence pointers) */
fasta_elements[idx] = malloc(sizeof(fasta_t));
/* parse current_header and current_sequence out of FASTA input */
/* ... */
/* validate input -- does current_header start with a '>' character, for instance? */
/* data in bioinformatics is messy -- validate input where you can */
/* set up space for the header and sequence pointers */
/* sizeof(char) is redundant in C, because sizeof(char) is always 1, but I'm putting it here for completeness */
fasta_elements[idx]->header = malloc((strlen(current_header) + 1) * sizeof(char));
fasta_elements[idx]->sequence = malloc((strlen(current_sequence) + 1) * sizeof(char));
/* copy each string to the list pointer, for which we just allocated space */
strncpy(fasta_elements[idx]->header, current_header, strlen(current_header) + 1);
strncpy(fasta_elements[idx]->sequence, current_sequence, strlen(current_sequence) + 1);
}
To print out the i+1
'th element's header, for example:
fprintf(stdout, "%s\n", fasta_elements[i]->header);
(Remember that indexing is 0-based in C — the 10th element has index 9, for instance.)
When finished, be sure to free()
individual pointers within a fasta_t *
pointer, the fasta_t *
pointer itself, and then the fasta_t **
pointer to pointers:
for (idx = 0U; idx < N; idx++)
{
free(fasta_elements[i]->header), fasta_elements[i]->header = NULL;
free(fasta_elements[i]->sequence), fasta_elements[i]->sequence = NULL;
free(fasta_elements[i]), fasta_elements[i] = NULL;
}
free(fasta_elements), fasta_elements = NULL;
For convenience, once you get the hang of dealing with struct
s and memory management, you'll want to write wrapper functions that set up, access, edit and break down a fasta_t *
element, as well as wrapper functions that do the same for a list of fasta_t *
elements.