Trying to detect errors in the text-file loaded...

Pinzu · Oct 12, 2014

PHP:

int loadFile(char* file, Match match[], int* indexMax)
{
    int d;
    system("cls");
    printf("Enter File (FileName.txt) : ");
    getStringNoSpace(file);
    d = strlen(file);
    if (d > 4 && file[d-4] == '.' && file[d-3] == 't' && file[d-2] == 'x' && file[d-1] == 't') {
        char s1[STRLEN], s2[STRLEN], s3[STRLEN], s4[STRLEN];
        FILE *fp;
        fp = fopen(file,"r");
        if (fp != NULL) {
            while (fscanf(fp, "%s %s %s %s", s1, s2, s3, s4) == 4) {
                if (createMatch(&match[*indexMax], s1, s2, s3, s4)) {
                    (*indexMax)++;
                    printf("Loaded game %d from %s...\n", *indexMax, file);
                }
                else {
                    printf("Failed to load line %d from file %s!!!\n", *indexMax + 1, file);
                }
            }
        }

    }
    pressAnyKey();
}

Basicially I want to discard every row that finds invalid entries. The following will be printed:

PHP:

printf("Failed to load line %d from file %s!!!\n", *indexMax + 1, file);

The only issue I'm having is that every row after that also becomes invalid. I belieave it's because everything in the scan gets "dislodged" or done in the wrong order I would like to not re-scan or what ever is going on...

Any ideas?

The text file looks like this:
2014-07-30 GFF NOR 1-2
2013-02-29 GFF MAIF 1-3 <<< this causes a friendly error. It doesn't read because that date I reckon doesn't exist.
2014-03-06 GFF HBK 1-2
2014-03-06 GFF HELLO HBK 1-2 <<<< this break the logic of the scanf == 4 and thus every file afterwards is broken.

and an example of an error would be to add spaces to any of the entries...

GhostWolf · Oct 12, 2014

This is something you should generally do with regular expressions, but seeing as you are forcing yourself to use C, you don't have them.

Instead of directly scanning the file, you can first read the whole line (see fgets), and then scan the resulting string. That way if you get an error, it only affects that line, and you can also print the line number for more obvious errors.

On a side note, here's a nicer way to check for the extension: http://stackoverflow.com/questions/10347689/how-can-i-check-whether-a-string-ends-with-csv-in-c

Dr Super Good · Oct 13, 2014

There are two problems with your code.

Firstly there is a critical buffer overflow vulnerability. If any field from the source file is larger than STRLEN - 1 characters then it will overflow your buffer. Since these buffers are stack allocated it could cause all kinds of nasty things to happen in the hands of a hacker, possibly even resulting in him taking control of the process.

To fix this you need to specify a maximum read width of STRLEN. For example "%15s" will restrict it to reading 15 characters. You may need to generate the format string at compile time to use the constant STRLEN as the width source. Also do be aware that "%15s" will read 15 characters so needs a buffer of 16 characters in size so that it can hold the null terminator at the end.

The problem with read breaking is caused by data misalignment. In the format declaration string the character ' ' matches any number of any white space character including new line. As such taking this example...

2014-03-06 GFF HBK 1-2
2014-03-06 GFF HELLO HBK 1-2
2013-02-29 GFF MAIF 1-3

It will match
2014-03-06 GFF HBK 1-2
2014-03-06 GFF HELLO HBK
1-2 2013-02-29 GFF MAIF
1-3 ... ... ...
As you can see, the fields have become mal-aligned so everything will error from then on.

The solution I would recommend is the one GhostWolf mentioned using fgets. Allocate a read buffer of (4 * STRLEN + 5) size (the +5 is for 3 spaces, a new line and a null) and then match from that buffer. If the final character of the string is not a new line then you know that the line was malformed (too long) and you can keep running fgets until the end of line is reached in a loop before resuming line matching.

Kanadaj · Oct 22, 2014

Dr Super Good said:
There are two problems with your code.

Firstly there is a critical buffer overflow vulnerability. If any field from the source file is larger than STRLEN - 1 characters then it will overflow your buffer. Since these buffers are stack allocated it could cause all kinds of nasty things to happen in the hands of a hacker, possibly even resulting in him taking control of the process.

To fix this you need to specify a maximum read width of STRLEN. For example "%15s" will restrict it to reading 15 characters. You may need to generate the format string at compile time to use the constant STRLEN as the width source. Also do be aware that "%15s" will read 15 characters so needs a buffer of 16 characters in size so that it can hold the null terminator at the end.

The problem with read breaking is caused by data misalignment. In the format declaration string the character ' ' matches any number of any white space character including new line. As such taking this example...

2014-03-06 GFF HBK 1-2
2014-03-06 GFF HELLO HBK 1-2
2013-02-29 GFF MAIF 1-3

It will match
2014-03-06 GFF HBK 1-2
2014-03-06 GFF HELLO HBK
1-2 2013-02-29 GFF MAIF
1-3 ... ... ...
As you can see, the fields have become mal-aligned so everything will error from then on.

The solution I would recommend is the one GhostWolf mentioned using fgets. Allocate a read buffer of (4 * STRLEN + 5) size (the +5 is for 3 spaces, a new line and a null) and then match from that buffer. If the final character of the string is not a new line then you know that the line was malformed (too long) and you can keep running fgets until the end of line is reached in a loop before resuming line matching.

Or you can use regex:
([0-9\-]+) *([a-zA-Z\ ]+) *([1-9\-]+)[\n\r]*
Afterwards you can rebuild/reuse the string as an array of elements, [0] being the whole, [1] being the date, [2] being the middle part, and [3] being the end string.

Note that regex is not part of the ANSI C libraries, so you'd have to download and use an implementation. But a well written regex is usually the least painful method for text processing unless it's extremely performance heavy.

Trying to detect errors in the text-file loaded...

Pinzu

Pinzu

Resources

GhostWolf

GhostWolf

Resources

Dr Super Good

Dr Super Good

Resources

Kanadaj

Kanadaj

Resources

Similar threads