I'm using Tabula (more specifically the command-line version, tabula-java) to extract data from PDFs. I have a bash script which calls tabula-java a total of four times per PDF. It's a slow process (10 sec per PDF). I have almost 200K PDFs to process, so I was hoping to see some speed-up by using drip.
Unfortunately, my script doesn't like drip. When I pipe tabula's output to tr (translate), the script hangs within tr. Here's one of those tabula calls which hangs in a piped-to tr:
export id_value=$(drip -cp tabula-0.8.0-jar-with-dependencies.jar technology.tabula.CommandLineApp -a 240.593,124.695,264.308,227.97 -p 1 $filename | tr -d '\r\n')
When I say this "hangs" I mean that it enters but never exits tr. Control-C will get me back to the prompt.
The script works just fine when I avoid drip and call tabula through java:
export id_value=$(java -cp tabula-0.8.0-jar-with-dependencies.jar technology.tabula.CommandLineApp -a 240.593,124.695,264.308,227.97 -p 1 $filename | tr -d '\r\n')
Details: OS X 10.8.5, tabula-java 0.8.0
I'm using Tabula (more specifically the command-line version, tabula-java) to extract data from PDFs. I have a bash script which calls tabula-java a total of four times per PDF. It's a slow process (10 sec per PDF). I have almost 200K PDFs to process, so I was hoping to see some speed-up by using drip.
Unfortunately, my script doesn't like drip. When I pipe tabula's output to tr (translate), the script hangs within tr. Here's one of those tabula calls which hangs in a piped-to tr:
export id_value=$(drip -cp tabula-0.8.0-jar-with-dependencies.jar technology.tabula.CommandLineApp -a 240.593,124.695,264.308,227.97 -p 1 $filename | tr -d '\r\n')When I say this "hangs" I mean that it enters but never exits tr. Control-C will get me back to the prompt.
The script works just fine when I avoid drip and call tabula through java:
export id_value=$(java -cp tabula-0.8.0-jar-with-dependencies.jar technology.tabula.CommandLineApp -a 240.593,124.695,264.308,227.97 -p 1 $filename | tr -d '\r\n')Details: OS X 10.8.5, tabula-java 0.8.0