Does anyone have any AWK , SED, GREP, fu for parsing DO billing info (Billing history pdf) to more easily invoice clients?
This textbox defaults to using Markdown to format your answer.
You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!
If you’re using Ubuntu, you can install poppler-utils and use pdftotext which will convert a PDF to a text file. You could then process the text file as you see fit.
To keep the layout the same as it’s shown on the invoice, you’d use something such as:
pdftotext -layout my.pdf
The above will create a file called my.txt in the same directory, which is the text-based version of the PDF.
Once my.txt is created, you could use something such as:
grep 'droplet-name' my.txt
… and get output that looks something like this:
droplet-name (4GB) 125 01-01 00:00 01-06 05:21 $7.44
droplet-name (4GB Backup Services) 01-01 00:00 01-06 05:21 $0.37
If you need to get the hours a Droplet was in use, something such as the following would work (very basic example without going in to regex or similar):
grep 'droplet-name (4GB)' my.txt | awk '{print $2}'
You could then grab the total cost (i.e. $7.44) by running:
grep 'droplet-name (4GB)' my.txt | awk '{print $8}'
Essentially, we’re just counting the columns which is where $2 and $8 come from.
That being said, once the PDF is converted to text, you may be better of using your programming language of choice to process the data as processing using bash can get a little complex as well as finicky as you need to address various scenarios depending on what you use.
For example, with PHP (chosen as that’s what I’m working with right now), we could use something like the mini-script I have below. It’s designed to run from the CLI using:
php name-of-script.php
… and accepts three arguments. The first is the name of the file that you’ve converted to text using the function above, the second being the hostname of the Droplet or the Droplet name, and the third is by default false, but if true is passed as a third argument, instead of returning an array of data, it’ll return a json encoded string.
Usage:
php name-of-script.php /path/to/my.txt droplet_name
or
php name-of-script.php /path/to/my.txt droplet_name true
<?php
function getDropletBilling( $file, $droplet, $json = false )
{
if ( file_exists( $file ) )
{
$data = fopen( $file, 'r' );
if ( $data )
{
$client = [];
while ( ( $line = fgets( $data ) ) !== false )
{
if ( strpos( $line, $droplet ) !== false )
{
$dropletData = preg_replace( "/\([^)]+\)/", '', $line );
$dropletData = explode( ' ', $dropletData );
$dropletData = array_filter( $dropletData );
$dropletData = array_values( $dropletData );
$client[] = $dropletData;
}
}
}
fclose( $data );
if ( false === $json )
{
return $client;
}
else
{
return json_encode( $client );
}
}
}
if ( ! empty( $argv ) )
{
array_shift( $argv );
$count = count( $argv );
if ( $count > 3 )
{
throw InvalidArgumentException(
'Function: getDropletBilling() expects two arguments, ' . $count . ' given.'
);
}
else
{
if ( $count === 2 )
{
return getDropletBilling( $argv[0], $argv[1] );
}
else
{
return getDropletBilling( $argv[0], $argv[1], $argv[2] );
}
}
}
As an example, if you don’t pass true, it’ll return an array, or multiple arrays depending on how many times droplet_name shows up in the text file. For example:
array(2) {
[0]=>
array(7) {
[0]=>
string(11) "droplet_name"
[25]=>
string(3) "125"
[30]=>
string(5) "01-01"
[31]=>
string(5) "00:00"
[34]=>
string(5) "01-06"
[35]=>
string(5) "05:21"
[38]=>
string(6) "$7.44"
}
[1]=>
array(6) {
[0]=>
string(11) "droplet_name"
[17]=>
string(5) "01-01"
[18]=>
string(5) "00:00"
[21]=>
string(5) "01-06"
[22]=>
string(5) "05:21"
[25]=>
string(6) "$0.37"
}
}
If you pass true, which returns a json encoded string, the above will look like:
[{"0":"droplet_name","25":"125","30":"01-01","31":"00:00","34":"01-06","35":"05:21","38":"$7.44\n"},{"0":"droplet_name","17":"01-01","18":"00:00","21":"01-06","22":"05:21","25":"$0.37\n"}]
The results detail:
…
Of course, the above is just one way of doing it without the CLI. You may not want to use PHP, and that’s, of course, perfectly okay :-). This is just showing you how it could be done. Results could be modified more, though for the purpose of this example, I chose to strip out anything inside (), so the size of the Droplet isn’t included in the result set.
Get paid to write technical tutorials and select a tech-focused charity to receive a matching donation.
Full documentation for every DigitalOcean product.
The Wave has everything you need to know about building a business, from raising funding to marketing your product.
Stay up to date by signing up for DigitalOcean’s Infrastructure as a Newsletter.
New accounts only. By submitting your email you agree to our Privacy Policy
Scale up as you grow — whether you're running one virtual machine or ten thousand.
Sign up and get $200 in credit for your first 60 days with DigitalOcean.*
*This promotional offer applies to new accounts only.