Benjamin Han: UNIX Tips for Mac OS X

UNIX Tips for Mac OS X

(new stuff is in red)

Here is a list of short tips on using various UNIX tools under Mac OS X - some of them might just be reminders for myself. Some of them are also applicable to other flavors of UNIX. Comments are welcome (email address at the bottom).

20060914: (10.3.x+ only) How do I split a PDF file into several from command-line? (updated 20060921)
20060828: (10.4.x+ only) How do I join multiple PDFs into one from command-line? (updated 20060914)
20060601: How do I get my IP address? (updated 20060603)
20060531: Apple: Shell Scripting Primer (external)
20060525: How do I mass-rename file extension? (variable substitution in bash)
20060525: What are all those processes? (external)
20060414: How do I backup my stuff to an external drive using rsync? (updated 20060531)
20060414: How do I find the pid of a process by its name?
20060303: What if the "Open With" option in Finder gives you duplicate apps or misses some app?
20050718: Reloading Cisco VPN kernel extension
20050713: (10.4.x only) Apple: Prevent .DS_Store file creation over network connections (external)
20040427: Setting environment variables for GUI apps
20040215: Apple: "Well Known" TCP and UDP Ports Used By Apple Software Products (external)
20031218: Create a disk image file for a folder, using hdiutil
20031208: Tell if a process is still alive (updated 20060414)
20031128: Send an email to a bunch of people from command-line (updated 20060828)
20031128: Clean up .DS_Store files
20031121: Open a file with an app from the command-line
20031117: Make focus follow mouse in Terminal.app
20031117: Apple: do shell script in AppleScript (external)
20031116: How do I avoid automatic launching of Xterm whenever I start Apple's X11?
20031116: How do I add an alias to an IP address? (updated 20031123)
20031116: How do I change my default shell? (updated 20060519)
20031116: How do I suspend/resume a process (even the GUI ones)? (updated 20060519)
20031116: Building/installing the latest Emacs for OS X (native) from CVS (updated 20060319)
20031116: How do I enable the root account? (updated 20060519)
20031116: Apple: OS X boot process (external, updated 20060414)
20031116: Firewall and iTunes sharing - what ports to open?
20031115: Who are listening to my tunes?

(10.3.x+ only) How do I split a PDF file into several from command-line?

Looking at the source code of join.py mentioned in this tip, I realized it'd be easy to adapt it into a script to do the opposite: to split a PDF file into several files given a sequence of splitting points (in terms of page numbers). This is exactly what I did: you can download the script splitPDF.py and use it like this (make sure you did "chmod a+x splitPDF.py" dance):

splitPDF.py input.pdf splitPageNum_1 ... splitPageNum_n

This will split the file input.pdf into (n + 1) files. This is best illustrated by an example:

splitPDF.py input.pdf 3 5

Assuming input.pdf has 10 pages, you will get three files back: input.part1.1_3.pdf contains page 1-3, input.part2.4_5.pdf contains page 4-5, and input.part3.6_10.pdf contains page 6-10 (note how the page ranges are part of the output filenames).

I should mention that each splitPageNum_i should be an integer between 1 and the number of pages of input.pdf (inclusive), and the entire sequence must be strictly increasing. Lastly this script should work on both Panther and Tiger (Mac OS X 10.3.x/10.4.x).

Just for completeness sake, if you have ghostscript installed (possibly via Fink), this is how you can extract pages from a pdf file (all in one line):

gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -dFirstPage=3 -dLastPage=5 -sOUTPUTFILE=input.3_5.pdf input.pdf

This will extract page 3-5 (inclusive) from input.pdf into a new file input.3_5.pdf.

(10.4.x+ only) How do I join multiple PDFs into one from command-line?

(Thanks to Stan Jou for pointing this script to me)

For some of us, sometimes we need to join/combine/concatenate multiple PDF files into one PDF file for some reason. There have been multiple ways to achieve this without buying extra piece of software. If you're a Tiger (OS X 10.4.x) user, things are even a bit easier - it turns out a Python script has already been written for us by those kind Apple engineers - this script is located at

/System/Library/Automator/Combine PDF Pages.action/Contents/Resources/join.py

(Actually the same script should work on Panther (10.3.x) - but I cannot distribute it here)

You can make using the script a bit easier by creating a symlink for it at a convenient place: (in one line - note the backslashes)

ln -s /System/Library/Automator/Combine\ PDF\
          Pages.action/Contents/Resources/join.py joinPDF.py

Then you can use it to join, say input1.pdf and input2.pdf into a file final.pdf like this:

./joinPDF.py -o final.pdf input1.pdf input2.pdf

final.pdf then is a concatenation of input1.pdf and input2.pdf, in that order.

There is another option available, if you look at the source code (note the option --preview and --append are not really implemented, but I guess the latter is just equivalent to the script default): that's --shuffle. I'll just quote the explanation from the code:

Take a page from each PDF input file in turn before taking another from each file. If this option is not specified then all of the pages from a PDF file are appended to the output PDF file before the next input PDF file is processed.

Just for completeness sake, if you have ghostscript installed (possibly via Fink), this is how you can join multiple PDF files (all in one line):

gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -sOutputFile=final.pdf
          input1.pdf input2.pdf 

How do I get my IP address?

Many people are asking so I'm posting this Python script for getting both the internal IP and the external IP of your Mac: these two IPs might be different if you are behind a router or a NAT device. Most people will be interested in the external IP since this is the IP the "outside world" sees you from. Here is the script:

 #!/usr/bin/env python

          import urllib, re, sys, os

          # if this changes we need to revise the code to get the external IP

          ip_telling_url = 'http://www.dyndns.org/cgi-bin/check_ip.cgi'

          if len(sys.argv) == 1:

            # get the external IP

            mo = re.search(r'\d+\.\d+\.\d+\.\d+', urllib.urlopen(ip_telling_url).read())

            if mo:

              print mo.group()

            else:

              print 'Cannot get the external IP!'

          else:

            # get the internal IP of an interface

            targetInt = sys.argv[1]

            output = os.popen('ipconfig getifaddr %s 2>&1' % targetInt).read().strip()

            if re.match(r'\d+\.\d+\.\d+\.\d+', output):

              print output

            else:

              print 'Cannot get the internal IP for interface
          \'%s\'' % targetInt

As usual save this script to a file say getip.py and do a "chmod a+x getip.py" to make it executable. To get the external IP, do this:

./getip.py

To get the internal IP of a specific network interface, say en0, do this:

./getip.py en0

Don't know what network interfaces are? They are the ethernet cards inside your Mac. For example on my Powerbook I have en0 for the ethernet gigabit interface (wired), and en1 for the AirPort interface (wireless).

How do I mass-rename file extension? (variable substitution in `bash`)

Have you ever wanted to change file extensions over a bunch of files with the same extension, e.g., change *.doc to *.txt? You certainly don't want to do it one file at a time...

It turns out that bash (the default shell in OS X) has some nifty tricks to save us - it's called variable/parameter substitution. Some of the most useful ones are:

"${var#pattern}" and "${var##pattern}": Removes from the beginning of $var the part that matches pattern; '#' removes the shortest possible match while '##' removes the longest possible match.
"${var%pattern}" and "${var%%pattern}": Removes from the end of $var the part that matches pattern; '%' removes the shortest possible match while '%%' removes the longest possible match.
"${var/pattern/replacement}" and "${var//pattern/replacement}": Replaces the first match ('/' version) or all matches ('//' version) with replacement.

Don't know what the heck that means? We'll conjure the second one ('%') to do the work for us. Create the following script chgext:

#!/bin/sh

          for f in *.$1

          do

            mv $f ${f%$1}$2

          done

As usual do a "chmod a+x chgext" to make it executable. For our example (change *.doc into *.txt), use this command:

./chgext doc txt

To keep the tip short I'll only mention one more thing: you can use wildcard '*' in pattern - that will match all possible strings. But pattern is not a regular expression - '.' won't be interpreted as it would in a regex (so it'll only match a dot character).

How do I backup my stuff to an external drive using `rsync`?

Not everyone knows that in Mac OS X you don't need to buy expensive software to do incremental backup. Using the UNIX command rsync, you can perform intelligent incremental backup, meaning you only update your backup files with their latest versions - no wasteful copying of the same files.

rsync is a very flexible and powerful tool - you can even do backup with a remote server. But I'll only show how you can backup your files to an external drive (or to a different directory) - do a 'man rsync' to learn the other goodies it offers. Here is the little script for this purpose:

 #!/bin/sh

          SOURCE_DIRS="Documents:Music:Pictures:Library/Mail:Downloaded stuff"

          TARGET_DIR="/Volumes/External Drive"

          # if the external drive is not there, complain and stop

          if [ ! -e "$TARGET_DIR" ]

          then

            echo Target directory does not exist!

            exit

          fi

          IFS=:

          pushd .

          cd ~/

          /usr/bin/rsync -E --delete --progress -av $SOURCE_DIRS "$TARGET_DIR"

          popd

The SOURCE_DIRS is a list of folders you want to backup - they are specified relative to your home folder, and are separated using colon (`:') - so in the script the directory "~/Downloaded stuff" (note that space in a directory's name is okay) will be backed up. The TARGET_DIR is the place where you want to store the backup files: in this case an external drive with name ''External Drive'' is used (again note that space in the path is okay) - the backup files will be deposited directly under the root directory of that drive. Feel free to customize both variables to suit your needs. (thanks to Paul Henrich for pointing me to the space-related problems)

Note a crucial option is added to the rsync line (thanks to Patrick Cunningham and Brian Ashe): the '-E' switch is a special addition to the Mac's built-in rsync, which copies extended attributes and resource forks that are used in the HFS/HFS+ filesystem. To make sure the right version of rsync is used, I hard coded the path of rsync in the script.

To run it, save the file into say 'backup', and make it executable (chmod a+x ./backup). After running it you should expect to have an exact replica of the specified folders on your external drive.

How do I find the pid of a process by its name?

In UNIX many chores involving processes require that you know the pid (process ID) of the targeting processes before you can do anything with them. Here is a simple script you can use to find out the pid of a process by its name:

#!/bin/sh

          ps axc|awk "{if (\$5==\"$1\") print \$1}"|tr '\n' ' '

Save it into a file, say 'pidof', and make it executable (chmod a+x pidof). Then use it like this (assuming you're in the same directory as that of pidof):

./pidof Finder

That will give you the pid of the Finder process.

What if the "Open With" option in Finder gives you duplicate apps or misses some app?

For some reason sometimes the database maintained by LaunchServices is out of synch with "reality" (what apps are or are not on your hard drive). Fortunately you can force it to rebuild the database by running the following command:

/System/Library/Frameworks/ApplicationServices.framework/\

          Frameworks/LaunchServices.framework/Support/lsregister \

          -kill -r -domain local -domain system -domain user 

Running lsregister with no argument will tell you what those options are for.

Reloading Cisco VPN kernel extension

From time to time my Cisco VPN client just gives me crap like "cannot load kernel extension" or "cannot find a valid IP address" etc although my connection is perfectly fine. In this case try the following in Terminal.app:

 sudo kextunload /System/Library/Extensions/CiscoVPN.kext

          sudo kextload /System/Library/Extensions/CiscoVPN.kext

Setting environment variables for GUI apps

As UNIX users we all know that the way to set up PATH variable (or other environment variables) is to do that either in our .bash_profile (if the default shell is bash) or .tcsh file (if the default shell is tcsh). Unfortunately the graphical apps do not get their paths from those settings. To do that you need to create a file ~/.MacOSX/environment.plist and add your settings there. This document at Apple will tell you the details.

Create a disk image file for a folder, using `hdiutil`

Do this:

 hdiutil create -srcfolder <src dir> -volname <volume
            name> <.dmg name> 

This creates a compressed .dmg file with the name you specified, and the image has the contents of the folder you specified, and has a volume name you gave.

Tell if a process is still alive

This assumes you know the ID of the process you want to watch (if you don't, see this tip). A simple line below will give you the answer:

ps -p <process ID> -o pid | tail
          -n 1 | grep -v PID

If the process is still alive, you'll get the process ID back. Otherwise nothing will be printed. This is useful in building a larger script where monitoring a process is necessary.

Send an email to a bunch of people from command-line

This tip is actually a simple script written in Python - a powerful scripting language shipped with Mac OS X.3 Panther. The script allows you to send an email to multiple people on the command-line. This would be useful, for example, in sending a periodic reminder to a bunch of people with the help of the system scheduler, cron.

Note this tip is useful not only for Panther users, but also for the users of the other platforms that Python supports (e.g., Linux) as well.

First, you need to create the following script: copy and paste the content below into a file, say, smtp.py (`py' is the default extension for Python scripts). Make sure you change the line "smtpHost=..." to point to an SMTP server you are allowed to use. Now make it executable by doing "chmod a+x smtp.py".

 #!/usr/bin/env python

          import smtplib, sys, time

          # change this to a new SMTP server if desired

          smtpHost = 'some.smtp.mail.server'

          # sys.argv[1] is the sender

          # sys.argv[2] is the filename pointing to the list of recipients

          # sys.argv[3] is the subject

          # sys.argv[4] is the message content

          if len(sys.argv) != 5:

            print 'Usage: ./smtp <sender> <recipient FN> <subj> <msg
          FN>'

            sys.exit(1)

          # each recipient takes one line; '#' signals comments

          rList = []

          for line in open(sys.argv[2]).readlines():

            r = line.split('#')[0].strip()

            if r: rList.append(r)

          sender = sys.argv[1]

          subj = sys.argv[3]

          date = time.ctime(time.time())

          msg = 'From: %s\nTo: %s\nDate: %s\nSubject: %s\n%s' \

                 % (sender, ', '.join(rList),
          date, subj, open(sys.argv[4]).read())

          server = smtplib.SMTP(smtpHost) # connect, no login step

          failed = server.sendmail(sender, rList, msg)

          server.quit() 

          if failed:

            print 'smtp.py: Failed recipients:', failed

          else:

            print 'smtp.py: No errors.'

To use the script, do this:

./smtp.py <sender> <recipient FN> <subj> <msg
          FN>

Among the arguments, `sender' is your email address, "recipient FN" is a file containing a list of email addresses, with each address taking on one line, `subj' is the subject line you want to use, and finally "msg FN" is a file containing the message body. For example, here is the command I used to send out the weekly reminder for playing basketball:

./smtp.py spambot@die.die.die bbPlayers.txt "Don't
          forget to play BB!" bbMsg.txt

A note about the sender argument: it has to be an email address from a valid domain (so the line above won't work - it's deliberately garbled), but other than that, there's no safety net to prevent you from impersonating others - BUT DON'T. That's what spammers do; besides, in the email header it'll clearly mark the originating IP address of the message, so if someone WANTS to track you down, she/he WILL.

Clean up `.DS_Store` files

Ever want to remove all those hidden .DS_Store files under some directory? You can do this:

 find <directory> -name .DS_Store
          -exec rm -f '{}' ';' 

Or a faster version (thanks to Sean Kelly - the version above does show you how to do artitrary things to the files though):

 find <directory> -name .DS_Store
          -delete

Just replace <directory> with the directory you want to clean up - every sub-directory will be visited and cleaned up as well.

Open a file with an app from the command-line

This is useful in two ways: (1) it saves you one trip to reach the mouse in order to open some file; (2) it could force some app to open a file that is usually not associated to the app. There are 3 possible forms:

 open <some file>

          open -a <some app> <some file>

          open -e <some file>

The first form opens a file with the default (associated) app; the second form opens a file with the specified app; and the last one opens a file with TextEdit.app.

Make focus follow mouse in Terminal.app

This tip is from here. Type this in the terminal:

defaults write com.apple.terminal FocusFollowsMouse
          -string YES

The next time you start Terminal.app, when the mouse is over any Terminal.app window, that window will receive the input focus (type away and you'll know). Do the above with YES replaced by NO to turn it off.

By the way, the defaults command actually writes to an app's preference file (.plist); in this case, the file modified is ~/Library/Preferences/com.apple.Terminal.plist.

How do I avoid automatic launching of Xterm whenever I start Apple's X11?

If you look at /etc/X11/xinit/xinitrc, you'll find that by default xterm is launched whenever you start Apple's X11. This could be quite annoying if you don't want that. Fortunately the fix is very simple. Just create your own .xinitrc file under your home directory, like this:

 #!/bin/sh

          exec quartz-wm 

How do I add an alias to an IP address?

(updated 20031123: as of Panther (OS X.3), BSD flat files such as /etc/hosts are enabled again. So you no longer need to use Netinfo.app for this now)

This is another Netinfo-related tip. Sometimes we want to type a shortened name of a machine to do various business with it; for example, instead of typing `ssh 1.2.3.4' we want to type `ssh mymachine'. For Linux/UNIX guys we know how to do this - just open up /etc/hosts and add an entry to it. In Mac OS X you need to fire up Netinfo.app (in /Applications/Utilities) instead: navigate yourself to /machines, and add a new entry per machine. For each entry you also need to provide ip_address and name properties.

(Of course for the ssh example given above, you could just modify your ~/.ssh/config file - but that's another story)

How do I change my default shell?

If you come from Linux/UNIX world like me, we all know where to change the setting of a user's default shell (/etc/passwd). But OS X does things a little differently. To do that you need to fire up Netinfo.app (in /Applications/Utilities). Navigate yourself to /users/<user name> in Netinfo's window, and find a property named "shell". The rest should be obvious.

(Thanks to Robin Breathe) It turns out you can also achieve this by simply using the command-line utility chsh. Say you want to change your default shell from /bin/bash (the default for OS X) to /bin/tcsh, just type this into the terminal:

chsh -s /bin/tcsh

Other nifty things are possible using chsh - again, "man chsh" for more.

How do I suspend/resume a process (even the GUI ones)?

This works both to a command-line or a GUI process. The latter case is particularly needed, for example, when your video transcoding process takes up almost 100% of CPU power and you want to do something that requires at least some attention from the CPU...

Here you go: the first thing you need to do is to figure out the pid of the process you want to suspend: see this tip. After determining the pid (say it's 2209), I use the following command to suspend that process:

kill -SIGSTOP 2209

Of course, do replace the pid with yours when you do this. Now the music stops! The iTunes window is still there, but when you move the mouse cursor over the window, all you get is the familiar beachball! Ok enough for fun, how do we resume it? Use this:

kill -SIGCONT 2209

Now you have the total control - cool isn't it?

You can also combine this tip with the commands above like this (at least in bash shell, which is default in OS X):

kill -SIGSTOP `pidof iTunes`

The command in the backquotes will first be executed to get the pid of the process "iTunes", and the pid is then replacing the backquoted portion of the command and the entire command is executed. So in effect this stops iTunes.

(Thanks to Martin Dittus) Yet another way to completely bypass the need of getting the pid first is to use command killall, like this

killall -SIGSTOP iTunes

Although from my own experience this sometimes is less robust than the methods given above. To its credit you can even use regular expressions with option '-m' to specify the processes you want to send the signal to (but be careful - you don't want to end up with a completely frozen system!).

Building/installing the latest Emacs for OS X (native) from CVS

(this assumes you already enabled the root account, and you'll do this entirely under root)

Are you an Emacs user? If yes, don't you want to use a native OS X version of Emacs instead of being trapped inside the terminal? And, how about using the latest version of Emacs, directly shipped to you from the CVS? Here is how.

(Thanks to the people who have devoted in porting Emacs to OS X)

Here is a screenshot of Emacs OS X to motivate you...

So here is how you check out a copy of Emacs source code (you might want to think about where you want to put this - it's about 90MB in size):

cvs -z3 -d:pserver:anonymous@cvs.savannah.gnu.org:/sources/emacs
          co emacs

This will start the downloading process, so wait for a while. After that, create a shell script, say, emacs_build with the following content:

#!/bin/sh

          cp -Rp emacs emacs.build

          cd emacs.build

          CFLAGS='-O3 -faltivec' CXXFLAGS='-O3 -faltivec' ./configure \

          --enable-carbon-app=/Applications/Development --without-x

          make bootstrap

Note the line starting with `CFLAGS' is too long so I have to break it into several segments (connected with '\'; if you want to type them on one line, just remove the '\' - this applies to the rest of this part), just note that the last line should be `make bootstrap'. Also, if you want to change the target folder you want to install your Emacs into, change the `--enable-carbon-app=' setting to the correct folder (here I chose to install it in /Applications/Development). Lastly, note that the building process starts by copying the source tree to another directory called emacs.build - this will prevent polluting the source tree.

Ok now make emacs_build executable by issuing chmod a+x ./emacs_build in the terminal. Then execute it.

After about 30-40 minutes (depending on the speed of your machine), the building process should finish. You can then do cd emacs.build; make install to install the whole thing. But better yet, create a shell script emacs_install with the following content:

#!/bin/sh

          rm -rf /Applications/Development/Emacs.app/ /usr/local/bin/emacs* \

          /usr/local/share/emacs/21.3.50/

          cd emacs.build

          make install

          cd ..

          rm -rf emacs.build

Again the line "rm -rf" is broken down into two segments using '\'. Make this executable, and execute it. This script will remove the old stuff first, install the new build, and wipe the build directory clean.

But what if, from time to time, you want to update the Emacs source tree with the CVS, so you'll have the latest bugfixes? Create a shell script called emacs_update like the following:

#!/bin/sh

          rm -f emacs/lisp/loaddefs.el*

          cd emacs

          cvs -z3 -d:pserver:anonymous@cvs.savannah.gnu.org:/sources/emacs update

          cd ..

Again make it executable and then run it to update your emacs source tree.. Of course this assumes you still keep the source tree around (in emacs directory) - the CVS update will only download the necessary files.

How do I enable the root account?

By default the root account is disabled in Mac OS X - you have to do everything that requires root privilege using sudo instead, and this could become quite annoying after a while. To enable the root account, fire up the Netinfo.app (in /Applications/Utilities), and click on the Security menu - you'll see an item Enable Root Account.

Philip Bruce sent in an alternative: just do a "sudo su" and you'll be dropped in a shell as root.

Firewall and iTunes sharing - what ports to open?

If you have a firewall running on Mac, either configured via the built-in Sharing Preferences panel, or from a third-party tool such as Flying Buttress, make sure you open these ports to let iTunes traffic through:

Multicast-DNS (mDNS): this is UDP port 5533. This port is necessary to be able to see all the shared playlists (and to be able to automagically discover all the Rendezvous-enabled services).
DAAP: this is TCP port 3689. This port is for the actual iTunes data traffic.

Enabling only one of the above and not the other, you might only be able to see the shared playlists but not able to play them, or vice versa.

If you're using BrickHouse, which I highly recommend over Apple's built-in firewall configuration interface, what you might end up with is something like this:

And these two rules are at the beginning of my firewall rules. You might also notice I only allow connections to/from 128.2.0.0 (CMU).

Who are listening to my tunes?

Ever wondering who those "n users connected" are when you look at the Sharing part of the Preferences of your iTunes? Well wonder no more - it turns out fairly easy to figure out in Terminal.app; just type this line:

lsof -r 2 -n -P -F n -c iTunes -a -i TCP@`hostname`:3689

and it will tell you the IP address of each connection, together with the music file the connection is listening to. lsof is a UNIX tool which is capable of listing the files a particular process opens - including even the files being accessed from remote connections, such as NFS, and iTunes in our case. Some short explanations of the parameters used:

-r 2: list the opened files repeatedly, and refresh the list every 2 seconds.
-n: show numeric IP address instead of domain names.
-P: show port info in numbers instead of names (e.g., 3689 vs. `daap').
-F n: display field `name' (n).
-c iTunes: only list files opened by process with name 'iTunes'.
-a: logical `and' to connect more than one conditions.
-i TCP@`hostname`:3689: only list files involving the specified address; in this case the protocol must be TCP, connecting to my machine, and connecting to port 3689.

Of course based on the result you can do all kinds of fancy things, like accumulating statistics about which tunes get listened to most often, and so on. I'm too lazy to write an app like that - let me know if you write one though!