Scripting in Bash is a pain. Bash can do almost anything, and is unbeatable for small scripts, but it struggles when scaling up to doing anything close to a real world scripting problem. Python is a natural choice, especially for the scientist who already is using it for analysis. But, it's much harder to do basic tasks in Python. So you are left with scripts starting out as Bash scripts, and then becoming a mess, then being (usually poorly) ported to Python, or even worse, being run by a Python script. I've seen countless Python scripts that run Bash scripts that run real programs. I've even written one or two. It's not pretty.
I recently came (back) across a really powerful library for doing efficient command line scripts in Python. It contains a set of tools that makes the four (five with color) main tasks of command line scripts simple and powerful. I will also go over the one main drawback of the library (and the possible enhancement!).
Note: The colors module is new to Plumbum in 1.6.0.
Local commands¶
The first and foremost part of the library is a replacement for popen, subprocess, etc. of Python. I'll compare the "correct, current" Python standard library method and Plumbum's method.
Basic commands¶
Our first task will simply be to get our feet wet with a simple command. Let's run ls
to see the contents of the current directory. This is easy with subprocess.call
:
import subprocess
subprocess.call(["echo", "I am a string"])
What just happened? The result, zero, was the return code of the call. The output of the call went to stdout, so if we were in a terminal, we would have seen it output (and in IPython notebook, it will show up in the terminal that started the notebook). This might be what we want, but probably we wanted the value of the output. That would be subprocess.check_output
:
subprocess.check_output(["echo", "I am a string"])
As you can already see, this not only requires different calls for different situations, but it even gave a bytes string (which is technically correct, but almost never what you want for a shell script). The reason for the different calls is because they are shortcuts to the actual subprocess Popen object. So we really need:
p = subprocess.Popen(["echo","I am a string"],
shell=False, bufsize=512,
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
outs, errs = p.communicate()
outs
As you can guess, this is only a small smattering of the options you can pass (not all were needed for this call), but it gives you an idea of what is needed to work with subprocess.
Let's look at Plumbum. First, let's see the fastest method to get a command:
from plumbum import local, FG, BG, TF, RETCODE
echo = local['echo']
echo("I am a string")
Here, we have a local object, which represents the computer. It acts like a dictionary; if you put in a key, you get the command that would be called if you run that command in a terminal. Let's look at the object we get:
echo
Now this is a working python object and can be called like any Python function! In fact, it can access most all of the details and power of the Popen object we saw earlier. If you don't like to repeat yourself, there is a magic shortcut for getting commands:
from plumbum.cmd import echo
There is no echo
command in a cmd.py
file somewhere; this dynamically does exactly what we did, calling ['echo']
on the local object. This is quicker and simpler, but it is good to know what happens behind the scenes!
Plumbum also allows you to add arguments to a command without running the command; as you will soon see, this allows you to build complex commands just like bash. If you use square brackets instead of parenthesis, the command doesn't run yet (Haskal users: this is currying; Pythonistas will know it as partial
)
echo["I am a string"]
When you are ready, you can call it:
echo["I am a string"]()
Or, you can run it in the forground, so that the output is sent to the current terminal as it runs (this is the subprocess.call
equivilent from the beginning, although non-zero return values are not handled in the same way):
from plumbum import FG
echo["I am a string"] & FG
Complex commands (piping)¶
Stdin¶
Now, how about input a python text string to a command? As an example, let's use the unix dc
command. It is a desktop calculator, with reverse polish notation syntax.
from plumbum.cmd import dc
We can call it using the -e
flag followed by the calculation we want to preform, like 1 + 2
. We already know how to do that,
dc('-e', '1 2 + p')
But, it also can be run without this flag. If we do that, we can then type (or pipe) text in from the bash shell.
In subprocess
, we don't have a shortcut, so we have to use Popen
, manually setting the stdin
and stdout
to a subprocess
PIPE
, and then communicate in bytes.
proc = subprocess.Popen(['dc'],
stdin=subprocess.PIPE,
stdout=subprocess.PIPE)
outs, errs = proc.communicate('1 2 + p'.encode('ascii'))
outs
Compare that to Plumbum:
(dc << '1 2 + p')()
Piping¶
Of course, in bash we can pipe text from one command to another. Let's compare that (not going to even try the subprocess call here).
Since I'm using IPython, prepending a line with ! will cause it to run in bash. So, in Bash:
!echo "1 2 + p" | dc
In Plumbum:
(echo["1 2 + p"] | dc)()
If we wanted to see what that command would look like in bash, then we can call print on the unevaluated object:
print(echo["1 2 + p"] | dc)
Background execution¶
One of the great things about Bash is the ease of "simple" multithreading; you can start a command in the background using the &
character. To test this, we need a long running command that returns a value. In bash, we can make this using the following function:
$ fun () { sleep 3; echo 'finished'; }
$ fun
finished
$ fun &
[1] 6210
$finished
[1]+ Done fun
Here, when we ran it in the foreground, it held up our terminal until it finished. The second time we ran it, it gave us back our terminal, but we were interrupt ed three seconds later with text from the process. If we wanted to interact with the process, or wait for it to finish, etc, we could do $!
to get the pid of the last spawned process, and then use wait
to wait on the pid. (see git-all.bash for an example).
This simplicity is not usually something that is easy to emulate in a programing language. Let's see it in plumbum. Here, I'm piping sleep (which doesn't print anything) to echo, just so I can get a slow running command, and I'm using IPython's time magic to measure the time taken:
%%time
sleep = local['sleep']
sleep_and_print = sleep['3'] | echo['hi']
print(sleep_and_print())
%%time
bg = sleep_and_print & BG
Now, bg is a Future
object that is attached to the background process. We can call .poll()
on it to see if it's done or .wait()
to wait until it returns. Then, we can access the stdout and stderr of the command. (stdout
, etc. will automatically wait()
for you, so you can use them directly.)
%%time
print(bg.stdout)
Remote commands¶
Besides local commands, Plumbum provides a remote class for working with remote machines via SSH in a platform independent manner. It works much like the local object, and will use the best system, including Paramiko, to do the processes. I haven't moved my scripts from pure Paramiko to Plumbum yet, but only having to learn one procedure for both local and remote machines is a huge plus (and Paramiko is fairly ugly to program in, like subprocess).
Command Line Applications¶
Command line applications on Python already have one of the best toolkits available, argparse (C++'s Boost Program Options library is a close second). However, after seeing the highly pythonic Plumbum cli
module, it feels repetitive and antiquated.
Let's look at a command line application that takes a couple of options. In argparse, we would need to do the following:
%%writefile color_argparse.py
import argparse
def main():
parser = argparse.ArgumentParser(description='Echo a command in color.')
parser.add_argument('-c','--color', type=str,
help='Color to print')
parser.add_argument('echo',
help='The item to print in color')
args = parser.parse_args()
print('I should print', args.echo, 'in', args.color, "- but I'm lazy.")
if __name__ == '__main__':
main()
%run color_argparse.py -c red item
As you can tell from the documentation, the programs quickly grow as you try to do more advanced commands, grouping, or subcommands. Now compare to Plumbum:
%%writefile color_plumbum.py
from plumbum import cli
class ColorApp(cli.Application):
color = cli.SwitchAttr(['-c','--color'], help='Color to print')
def main(self, echo):
print('I should print', echo, 'in', self.color, "- but I'm lazy.")
if __name__ == '__main__':
ColorApp.run()
%run color_plumbum.py -c red item
Here, we see a more natural mapping of class -> program, and also we have a lot more control over the items this way, as well. For example, if we want to add a validator, say to check existing files or to ensure a number in a range or a word in a set, we can do that on each switch. Switches can also be full fledged functions that run when the switch is set. And, we can easily extend this process to subcommands (see git-all.py) and it remains readable and avoids duplication.
Path manipulations¶
Path manipulations using os.path
functions are messy and can become involved quickly. Things that should be simple require several functions chained to get anywhere. The situation was bad enough to warrant adding an additional module to Python 3.4+, the provisional pathlib module. Now this is not a bad module, but you have to install a separate library on Python 2.7 or 3.3 to get it, and it has a couple of missing features. Plumbum provides a similar construct, and it is automatically available if you are already using Plumbum, and it corrects two of the three missing features. The features I'm mentioning are:
- No support for manipulation of multiple extensions, like
.tar.gz
- Plumbum supports an additional argument to
.with_suffix()
, default matches pathlib
- Plumbum supports an additional argument to
- No support for home directories
- Plumbum provides the
local.env.home
path
- Plumbum provides the
- No support for using
open(path)
without wrapping in astr()
call- Can't be fixed unless path subclasses str (not likely for either library, see unipath), or pathlib support added to the system open function (any Python devs reading? Please?)
I would love to see the pathlib
module adapt the .with_suffix()
addition that Plumbum has, and add some sort of home directory expansion or path, as well.
Plumbum also has the unique little trick that //
automatically calls glob, making path composition even simpler. I doubt we'll get this added to pathlib, but I can always hope (at least, until someone removes the provisional status).
Color support (NEW)¶
I've been working on a new color library for Plumbum. git-all.py
has been converted to use it.
Colors are used through the Styles generated by the colors object. You can get colors and atributes like this:
from plumbum import colors
red = colors.fg.red # Red forground color
other_color = colors.bg(2) # The second background color
bold = colors.cold
reset = colors.reset
You can directly access colors
as if it was the fg
object. Standard term colors can be accessed with ()
, and the 256 extended colors can be accessed with []
by number, name (camel case or underscore), or html code. All objects support with statements, which restores normal font (for a single Style
, it will reset only the necessary component if possible, like bold
or fg
color). You can manually take the inverse (~
) to get the undo-ing action. Calling a Style
without any parameters will send it to stdout
. Using |
will wrap a string with a style (as will []
notation). Styles otherwise act just like normal text, so they can be added to text, etc (they are str
subclasses, after all).
For the following demo, I'll be using the HTMLCOLOR, and a with statement to capture ouput in IPython and display it as HTML. (See my upcoming post for a more elegant IPython display technique.) Also note redirect_stdout
is new in Python 3.4, but is easy to implement in other versions if needed.
from plumbum.colorlib import htmlcolors as colors
from IPython.display import display_html
from contextlib import contextmanager, redirect_stdout
from io import StringIO # Python3 name
@contextmanager
def show_html():
out=StringIO()
with redirect_stdout(out):
yield
display_html(out.getvalue(), raw=True)
Now, inside the capture context manager, we can use COLOR just like on a terminal (save for needing to use </br>
to break lines if we don't take advantage of the build in htmlstyle print command, and having to be careful not to use un-reset Styles).
with show_html():
colors.green.print("This is in red!")
(colors.bold & colors.blue).print("This is in bold blue!")
colors.bg['LightYellow'].print("This is on the background!")
colors['LightBlue'].print("This is also from the extented color set")
print("This is {colors.em}emphasized{colors.em.reset}! (reset was needed)".format(colors=colors), end='<br/>')
print("This is normal")
Putting it together in an example: git-all¶
Now, let's look at a real world example previously mentioned: git-all.bash. This is a script I wrote some time ago for checking a large number of repositories in a common folder. Due to the clever way git subcommands work, simply naming this git-all
and putting it in your path gives your a git all
command. It is written in very reasonable bash, IMO, and works well.
Directory manipulation¶
Let's look at this piece by piece and see what would be required to convert it to Python. First, this script is in one of the repo's, so we need the current directory, up one.
In Bash that's:
unset CDPATH
SOURCE="${BASH_SOURCE[0]}"
while [ -h "$SOURCE" ]; do
DIR="$( cd -P "$( dirname "$SOURCE" )" && pwd )"
SOURCE="$(readlink "$SOURCE")"
[[ $SOURCE != /* ]] && SOURCE="$DIR/$SOURCE"
done
DIR="$( cd -P "$( dirname "$SOURCE" )" && pwd )"
REPOLOC=$DIR/..
(Sorry for the awful highlighting by IPython, it hates the $ in strings for Bash.)
Converted to python,
REPOLOC = local.path(__file__) / '..'
We can find the directories that are valid repos:
for file in $(ls); do
if [[ -d $REPOLOC/$file/.git ]]; then
...
done
And code goes here.
In Python, lists are easy to use:
valid_repos = [d / '../..' for d in local.cwd // '*/.git/config']
The multiple ugly loops over all repos can easily translate into a generator:
def git_on_all(bold=False):
for n,repo in enumerate(valid_repos):
with local.cwd(repo):
with color_change(n):
yield repo.basename
To use it, simply loop over git_on_all()
:
for repo_name in git_on_all():
print('The current working directory is in the', repo_name, 'repo!')
Command line arguments¶
We don't have a nice cli
tool in Bash, so we have to build long if statements. We can separate each command in Python, and let the help file be built for us:
@GitAll.subcommand("pull")
class Pull(cli.Application):
'Pulls all repos in the folder, not threaded.'
def main(self):
for repo in git_on_all():
git['pull'] & FG
This is git all pull
, clean and seperated from the ugly loops in Bash.
Multithreading¶
The fetch loop, one of the strong points of the Bash script, looks like this:
if [[ $1 == qfetch ]]
|| [[ $1 == fetch ]]
|| [[ $1 == status ]]; then
for file in $(ls); do
if [[ -d $REPOLOC/$file/.git ]]; then
cd $REPOLOC/$file
git fetch -q &
spawned+=($!)
fi
done
echo -n "Waiting for all repos to report: "
for pid in ${spawned[@]}; do
wait $pid
done
echo "done"
fi
This does a normally advanced multithreading process in a few, simple lines. In Python, we have
def fetch():
bg = [(git['fetch','-q'] & BG )
for repo in git_on_all()]
printf('Waiting for the repos to report: ')
for fut in bg:
fut.wait()
print('done')
This is just as readable, if not more so, and doesn't need the if loop to check the input, since that's now part of the cli
interface. The actual version in the script also can report errors in the fetch, which the Bash version can not.
Colors (classic tput method)¶
We would like to toggle colors, so each repo is in a different cyclic color. My final Bash solution was elegant:
Bash (will need to run echo -n
on these):
txtreset=$(tput sgr0)
txtbold=$(tput bold)
Python would be able to do the same thing (will only need to run these in the foreground, with & FG
):
txtreset=tput['sgr0']
txtbold=tput['bold']
Though with the plumbum.color
library, we don't have to.
Color changing is easy to implement with a Python context manager:
@contextmanager
def color_change(color):
txtreset & FG
txtbold & FG
tput['setaf',color%6+1] & FG
try:
yield
finally:
txtrst & FG
The try/finally block allows this to restore our color, even if it throws an exception! This is tremendously better than the Bash version, which leaves the color on the terminal if you make a mistake. A nice example of context managers can be found on Jeff Preshing's blog.
You can use it to wrap parts of the code that print in a color:
with colorchange(tput('setaf',2), bold=True):
print('This will be in color number 2')
Colors (new method)¶
Plumbum has a new colors tool, and this is how you would use in in this script.
from plumbum import colors
Colors can be generated cyclically by number, and combinations of color and attributes can be put in a with statement, too:
with(colors.fg[1:7][n%6] & colors.bold):
And, we can simply unbold:
colors.bold.reset.print(git('diff-files', '--name-status', '-r', '--ignore-submodules', '--'))
And that's it! All the benefits we had from before are here.
Final Comparison¶
I'll be using functions in the Python version to make it clear what each git call does, and making the Python version cleaner in a few ways that I could also apply to the Bash script. So this is not meant to be a 1:1 comparison. In my defense, Bash users tend to avoid functions or other clean programming practices.
Most of the extra lines are from the Python functions. Also, I've improved a couple of commands for git for current best practices. I've also avoided using FG for the print commands, so that I can control the color and the long-output paging (If you change print()
for & FG
, the output would match the Bash script). Here is the script: git-all.py.
Note: You might want to look at the history of that script, as I'll probaby update it occasionally as I start using it.
Notice that it is very clear what each part of the cli
part of the script, and it's easy to add a feature or extend it. The long for loops are nicely abstracted into iterators.
Also, there may be bugs for a few days while I start using this instead of my bash script. Also, it must be renamed to git-all
with no extension for git all status
etc. to work.
Bonus: Possible improvement: argcomplete support¶
One last thing: The one drawback to Plumbum over argparse is due to one enhancement package for argparse. I think a great addition to the Plumbum library would be argcomlete support. If you've never seen argcomplete, it's a bash completion extension for argparse. Argcomplete allows you to add validators, and will suggest completions when pressing tab on the command line.
Adding support to Plumbum would not be that hard, and probably wouldn't even require argcomplete to be installed. The Argcomplete API requires three things:
#ARGCOMPLETE_OK
near the top of a script- Special output piped to several channels (8 and 9, I believe) of the terminal when the
_ARGCOMPLETE
special environment variable is set, and then exits before calling.main()
. - The ability to predict the next completion
The first one is easy, and wouldn't require anything from Plumbum. The second would be a simple addition to a new method cli.Application.argcomplete(self)
that could be overridden to remove or customize argcomplete support. The final one is the hard one, the prediction of the possible completions. If that can be done, support could be added.
Because support would be added into Plumbum itself, you wouldn't have to use the monkey patching that argcomplete has to use to inject itself into argparse. You would still use the same bash hooks that argcomplete uses, so it would work along side it, being called in the same way.
No comments:
Post a Comment