Glorious Gawk part II

Here’s a snapshot from a shell script to extract various important segments from a *ps file, after conversion from *pdf. It uses various gawk/awk tricks including using patterns for brackets, checking lengths of records to discriminate lines/polylines. Your mileage may vary a little, but if you check the *ps file preamble, you should be able to translate this to your specific tasks


#!/bin/sh
# line/text extract from pdf wjb 12/08, 02/08

NOFILE=64
[ -z $1 ] && echo “foo.sh <filename>” && exit $NOFILE

myfile=${1}

#convert to ps

echo “converting pdf -> ps…”

pdf2ps $myfile gHjLz.ps

echo “…done”

#take out line drawing sections w/ line numbers

echo “extracting lines & text…”

awk ‘$4==”scale”,$1==”Q” {print NR > “gHjLq.txt”}’ gHjLz.ps
awk ‘$4==”scale”,$1==”Q” {print $0 > “gHjLp.txt”}’ gHjLz.ps
awk ‘$1==”q” {print NR > “test.txt”}’ gHjLz.ps

# _p == polylines,  _l == lines

awk ‘BEGIN { RS = “q” } ; {if (NF > 12) print NR,$0 > “gHjLp_p.txt”; else print NR,$0 > “gHjLp_l.txt”}’  gHjLp.txt

awk ‘$4==”scale” {print $1 > “gHjLp_foo.txt”}’ gHjLp_p.txt

awk ‘$4==”scale”,$1==”Q” {print NR,$0 > “gHjLp_mol.txt”}’ gHjLz.ps

#take out text w/ line numbers

awk ‘$4 == “,” {print NR,$0 > “gHjLp_text.txt”}; $1 == “$C” {print NR,x > “gHjLp_text.txt”}; {x=$0}; $1 == “$C”, $1 == “,” {print NR,$0> “gHjLp_text.txt”}’ gHjLz.ps

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s