In various sciences one is often invested in processing voluminous data which follows a given format, with many examples following (hopefully) the same format. A typical data file might have a line of environment variables, followed by raw data, which repeats in columns and/or rows eg.,
4,23,45,56,78
12,23,45,56,67,78,89,87,…,72
32,34,23,21,23,56,43,23,…,34
34,54,32,45,89,76,54,98,…,58
67,67,88,32,34,21,22,97,…,51
A typical section of C code which would use this might look like:
float config[8];
scanf(“%f,%f,%f,%f,%f”,&config[1],&config[2],&config[3],
&config[4],&config[5]);
printf(“%f,%f,%f,%f,%f\n”,config[1],config[2],config[3],
config[4],config[5]);
int RECORDS = (int) config[1];
int i;
if ( RECORDS < 100){
int SIZE = sizeof(float)*RECORDS;
float *deltaCSMean = malloc(SIZE);
float *deltaCSSigma = malloc(SIZE);
float *CQMean = malloc(SIZE);
float *CQSigma = malloc(SIZE);
float *etaMean = malloc(SIZE);
float *etaSigma = malloc(SIZE);
float *brdF2 = malloc(SIZE);
float *brdF1 = malloc(SIZE);
float *amplitude = malloc(SIZE);
for (i=0; i< RECORDS; i++){
scanf(“%f,%f,%f,%f,%f,%f,%f,%f,%f”,&deltaCSMean[i],
&deltaCSSigma[i],&CQMean[i],&CQSigma[i],
&etaMean[i],&etaSigma[i],
&brdF2[i],&brdF1[i],&litude[i]);
printf(“%f,%f,%f,%f,%f,%f,%f,%f,%f\n”,deltaCSMean[i],
deltaCSSigma[i],CQMean[i],CQSigma[i],
etaMean[i],etaSigma[i],brdF2[i],brdF1[i],amplitude[i]);
}
}
Using redirect a simple invocation of executable foo using input foo1.txt and output foo2.txt would of course be:
./foo < foo1.txt > foo2.txt
which is fine until one needs different data from the file. Supposing in this example data repeats along columns, then one can use awk and a pipe to reshape the data input, removing the need to edit and recompile the C source for different data subsets:
gawk ‘BEGIN{FS=”,”; print “4,12,23,44,56″; getline;} {for(i=lowBnd;i<upBnd;i++){printf(“%f,”,$i) }; printf(“%f\n”,$upBnd) }’ foo1.txt | ./foo > foo2.txt
where lowBnd and upBnd correspond to the data column limits in the input file. Note that the environment variables are written during the BEGIN block, allowing the specification of (for instance) a different job for the new data set.
Posted by bbrouwer
Posted by bbrouwer
Posted by bbrouwer 









