12.9. Links and Files
To understand more about what's going on with files and directories, it helps to understand the Unix model of files and directories even if your non-Unix system doesn't work in this way. As usual, there's more to the story than we'll explain here, so check any good book on Unix internal details if you need the full story.
A mounted volume is a hard disk drive (or something else that works, more or less, like that, such as a disk partition, a floppy disk, a CD-ROM, or a DVD-ROM). It may contain any number of files and directories. Each file is stored in a numbered inode, which we can think of as a particular piece of disk real estate. One file might be stored in inode 613, and another in inode 7033.
To locate a particular file, we'll have to look it up in a directory. A directory is a special kind of file, maintained by the system. Essentially, it is a table of filenames and their inode numbers.[*] Along with the other things in the directory, there are two special directory entries. One is . (called "dot"), which is the name of that directory; and the other is .. ("dot-dot"), which is the directory one step higher in the hierarchy (i.e., the directory's parent directory). Figure 12-1 provides an illustration of two inodes. One is for a file called chicken, and the other is Barney's directory of poems, /home/barney/poems, which contains that file. The file is stored in inode 613 and the directory is stored in inode 919. (The directory's own name, poems, doesn't appear in the illustration because that's stored in another directory.) The directory contains entries for three files (including chicken) and two directories (one of which is the reference back to the directory itself, in inode 919), along with each item's inode number.
Figure 12-1. The chicken before the egg
When it's time to make a new file in a given directory, the system adds an entry with the file's name and the number of a new inode. How can the system tell that a particular inode is available? Each inode holds a number called its link count. The link count is always zero if the inode isn't listed in any directory, so any inode with a link count of zero is available for new file storage. When the inode is added to a directory, the link count is incremented; when the listing is removed, the link count is decremented. For the file chicken as illustrated above, the inode count of 1 is shown in the box above the inode's data.
But some inodes have more than one listing. For example, we've seen that each directory entry includes ., which points back to that directory's own inode. The link count for a directory should always be at least two: its listing in its parent directory and its listing in itself. In addition, if it has subdirectories, each of those will add a link since each will contain ...[*] In Figure 12-1, the directory's inode count of 2 is shown in the box above its data. A link count is the number of true names for the inode. Could an ordinary file inode have more than one listing in the directory? It certainly could. Suppose that, working in the directory shown above, Barney uses the Perl link function to create a new link:
link "chicken", "egg" or warn "can't link chicken to egg: $!";
This is similar to typing ln chicken egg at the Unix shell prompt. If link succeeds, it returns true. If it fails, it returns false and sets $!, which Barney is checking in the error message. After this runs, the name egg is another name for the file chicken, and vice versa; neither name is more real than the other, and (as you may have guessed) it would take some detective work to find out which came first. Figure 12-2 shows a picture of the new situation, where there are two links to inode 613.
Figure 12-2. The egg is linked to the chicken
These two filenames are talking about the same place on the disk. If the file chicken holds 200 bytes of data, egg holds the same 200 bytes, for a total of 200 bytes (since it's the same file with two names). If Barney appends a new line of text to file egg, that line will also appear at the end of chicken. If Barney were to delete chickenaccidentally (or intentionally), that data would not be lost because it's still available under the name egg. If he were to delete egg, he'd still have chicken. Of course, if he deletes both of them, the data will be lost.[*] There's another rule about the links in directory listings: the inode numbers in a given directory listing refer to inodes on that same mounted volume. This rule ensures that if the physical medium (the diskette, perhaps) is moved to another machine, all of the directories stick together with their files. That's why you can use rename to move a file from one directory to another, but only if both directories are on the same filesystem (mounted volume). If they were on different disks, the system would have to relocate the inode's data, which is too complex an operation for a simple system call.
Another restriction on links is they can't make new names for directories because the directories are arranged in a hierarchy. If you were able to change that, utility programs like find and pwd could become lost trying to find their way around the filesystem.
So, links can't be added to directories, and they can't cross from one mounted volume to another. Fortunately, there's a way to get around these restrictions on links by using a new and different kind of link: a symbolic link. A symbolic link (also called a soft link to distinguish it from the true or hard links that we've been talking about up to now) is a special entry in a directory that tells the system to look elsewhere. Let's say that Barney (working in the same directory of poems as before) creates a symbolic link with Perl's symlink function, like this:
symlink "dodgson", "carroll" or warn "can't symlink dodgson to carroll: $!";
This is similar to what would happen if Barney used the command ln -s dodgson carroll from the shell. Figure 12-3 shows a picture of the result, including the poem in inode 7033.
Now if Barney chooses to read /home/barney/poems/carroll, he gets the same data as if he had opened /home/barney/poems/dodgson because the system follows the symbolic link automatically. That new name isn't the "real" name of the file because (as you can see in the diagram) the link count on inode 7033 is still just one. That's because the symbolic link tells the system, "If you got here looking for carroll, now you want to go off to find something called dodgson instead."
A symbolic link can freely cross mounted filesystems or provide a new name for a directory unlike a hard link. A symbolic link can point to any filename, one in this directory or in another oneeven to a file that doesn't exist. But that means a soft link can't keep data from being lost as a hard link can since the symlink doesn't contribute
Figure 12-3. A symlink to inode 7033
to the link count. If Barney were to delete dodgson, the system could no longer follow the soft link.[*] Though there would be an entry called carroll, trying to read from it would give an error such as file not found. The file test -l 'carroll' would report true, but -e 'carroll' would be false. It's a symlink, but it doesn't exist.
Since a soft link could point to a file that doesn't exist, it could be used when creating a file as well. Barney has most of his files in his home directory, /home/barney, but he also needs frequent access to a directory with a long name that is difficult to type: /usr/local/opt/system/httpd/root-dev/users/staging/barney/cgi-bin. So, he sets up a symlink named /home/barney/my_stuff, which points to that long name, and now it's easy for him to get to it. If he creates a file (from his home directory) called my_stuff/bowling, that file's real name is /usr/local/opt/system/httpd/root-dev/users/staging/barney/cgi-bin/bowling. Next week, when the system administrator moves these files of Barney's to /usr/local/opt/internal/httpd/www-dev/users/staging/barney/cgi-bin, Barney repoints the one symlink, and he and all of his programs can still find his files with ease.
It's normal for /usr/bin/perl, /usr/local/bin/perl, or both to be symbolic links to the true Perl binary on your system. This makes it easy to switch to a new version of Perl. Say you're the system administrator, and you've built the new Perl. Your older version is running, and you don't want to disrupt anything. When you're ready for the switch, you move a symlink or two, and every program that begins with #!/usr/bin/perl will use the new version. In the unlikely case of a problem, you can replace the old symlinks and have the older Perl running the show again. (Like any good admin, notify your users to test their code with the new /usr/bin/perl-7.2 well in advance of the switch, and tell them they can keep using the older one during the next month's grace period by changing their programs' first lines to #!/usr/bin/perl-6.1, if they need to.)
Perhaps suprisingly, both hard and soft links are useful. Many non-Unix operating systems have neither, and the lack is sorely felt. On some non-Unix systems, symbolic links may be implemented as a "shortcut" or an "alias." Check the perlport manpage for the latest details.
To find out where a symbolic link is pointing, use the readlink function. This will tell you where the symlink leads, or it will return undef if its argument wasn't a symlink:
my $where = readlink "carroll"; # Gives "dodgson" my $perl = readlink "/usr/local/bin/perl"; # Maybe tells where perl is
You can remove either kind of link with unlink. Now, you see where that operation gets its name. unlink removes the directory entry associated with the given filename, decrementing the link count and possibly freeing the inode.