HFSDebug 4.0 and New HFS+ Features

November 9th, 2008

I wrote HFSDebug in early 2004. I initially made it available as a software tool to help understand fragmentation in HFS+ volumes, although it could also be used to analyze several implementation details of HFS+. Eventually, I extended HFSDebug to be able to analyze all on-disk aspects of HFS+, along with the ability to compute more types of volume statistics and to even retrieve some in-memory details of mounted HFS+ volumes.

HFSDebug has been an extremely useful tool for me. I’ve used it to help explain the workings of HFS+ in the Mac OS X Internals book, to understand occasional mysterious behavior in HFS+ volumes, to search for file system objects, to generate interesting file system statistics (top N largest files, top N fragmented files, resource forks vs data forks, contiguous free space, and so on), and to create interesting demos. (For example, to show HFS+ mechanisms such as Hot File Clustering and On-the-Fly Defragmentation at work.)

HFS+ is the preferred and default volume format on Mac OS X. Even with exciting new developments such as ZFS support in Mac OS X, I don’t expect HFS+ to become obsolete any time soon. Today’s Macintosh computers, iPods, iPhones, and AppleTV’s all use HFS+.

With most major releases of Mac OS X, HFS+ has gained new capabilities. Features such as metadata journaling, on-the-fly file defragmentation, hot file clustering, extended attributes, access control lists, hard link chains (tracking hard links), and directory hard links have come to HFS+ in recent years. There have been “news reports” of compression being an upcoming feature in HFS+.

The most interesting technical things about HFS+ are not its features, but how several of the newer features are implemented. With the goal of retaining backward compatibility, new features have often been retrofitted, or shoehorned, if you will, into HFS+. Knowing such implementation details evokes different reactions in different people, ranging from “That’s a nifty way to implement this!” to “Gross!” This is something you can decide for yourself with the help of HFSDebug, which can show you exactly how the file system works.

New Features in HFSDebug

Now, every time a new feature is added to HFS+, HFSDebug likely (but not always) needs to be updated, say, to recognize and parse a new type of on-disk object such as a directory hard link. I’m releasing a new version of HFSDebug that has the following improvements.

  • Ability to show details of directory hard links.
  • Ability to show details of hard link chains.
  • New built-in filters: atime (find files by access time), dirhardlink (list directory hard links), hardlink (list file hard links), and sxid (list setuid/setgid files).
  • Ability to do component-wise path lookup from scratch, allowing you to analyze individual file system objects by path even on unmounted HFS+ volume.
  • Support for Snow Leopard.
  • Numerous subtle improvements and some bugfixes.

Still a PowerPC binary!?

It may be surprising (or troubling) to some of you that there is still no x86 version of HFSDebug: it’s available only as a PowerPC executable. Well, there is some logic to this madness. You see, I wrote HFSDebug in the “Panther” (10.3) days. Mac OS X was PowerPC-only then. It was also big endian. That matters because HFS+ uses big endian for its on-disk structures.

HFSDebug is a complex program. It essentially reads raw data from an HFS+ disk (say, a partition on a real disk or a disk image) and recreates a read-only HFS+ file system in memory. To simplify matters, I decided to skip structure-by-structure, field-by-field endianness conversion—after all, I was only targeting the big-endian-only Mac OS X. By contrast, the xnu kernel’s HFS+ implementation does do byte swapping on x86. So does the fsck_hfs program.

As long as Rosetta exists, HFSDebug can get away with being a PowerPC executable, allowing me to defer the grunt work of swapping bytes to a later date.

Let us take the new HFSDebug features for a spin.

Hard Link Chains

Although support for file hard links has been there in HFS+ before Leopard, the new chaining feature in Leopard can keep track of hard link chains, which are doubly linked list of file IDs connecting hard links together. Hard links to a file on HFS+ are conceptually similar to those on Unix systems: They represent multiple directory entries referring to common file content. Implementation-wise, HFS+ hard links use a special hard-link file for each directory entry. The common file content is stored in a special file: the indirect-node file. All indirect-node files are stored in the private metadata folder, a special directory (/\xE2\x90\x80\xE2\x90\x80\xE2\x90\x80\xE2\x90\x80HFS+ Private Data) that’s both normally invisible to the user and has a name that’s “hard” to type. It’s much easier to understand this through HFSDebug.

We begin by creating a file called file1. Before we create a hard link to this file, we examine its details using HFSDebug. That way, we can tell if anything about the file changes after link creation.

$ mkdir /tmp/test/
$ cd /tmp/test
$ echo "This is file1" > file1
$ sudo hfsdebug file1
...
  path                 = Leopard HD:/private/tmp/test/file1
# Catalog File Record
  type                 = file
  file ID              = 1927091
  flags                = 0000000000000010
...
  # BSD Info
  ownerID              = 501 (singh)
  groupID              = 0 (wheel)
  adminFlags           = 00000000
  ownerFlags           = 00000000
  fileMode             = -rw-r--r--
  linkCount            = 1
  textEncoding         = 0
  attrBlocks           = 0
  # Finder Info
  fdType               = 0
  fdCreator            = 0
  fdFlags              = 0000000000000000
...

Let us now make a hard link to this file.

$ ln file1 file2

Let us see if anything has changed about file1 now that we made a hard link to it.

$ sudo hfsdebug file1
...
  path                 = Leopard HD:/private/tmp/test/file1
# Catalog File Record
  type                 = file (hard link)
  indirect node file   = Leopard HD:/%0000%0000%0000%0000HFS+ Private Data/iNode1927091
  file ID              = 1927094
  flags                = 0000000000100010
                       . File has a thread record in the catalog.
                       . File has hardlink chain.
...
  # BSD Info
  ownerID              = 1927095 (previous link ID)
  groupID              = 0 (next link ID)
  adminFlags           = 00000000
  ownerFlags           = 00000010
                       . UF_IMMUTABLE (file may not be changed)
  fileMode             = -r--r--r--
  iNodeNum             = 1927091 (link reference number)
  textEncoding         = 0
  attrBlocks           = 0
  # Finder Info
  fdType               = 0x686c6e6b (hlnk)
  fdCreator            = 0x6866732b (hfs+)
  fdFlags              = 0000000100000000
                       . kHasBeenInited
...
  # Data Fork
  logicalSize          = 0 bytes
  # Resource Fork
  logicalSize          = 0 bytes

We see that a lot has changed! The on-disk nature of file1 has completely transformed. The original content has actually “moved” to an indirect-node file. What was file1 before has been replaced with a new directory entry altogether: one that has a new file ID within the file system. The new directory entry is also a file, but with several special properties. Its “type” and “creator” (as stored in the Finder Info) are hlnk and hfs+, respectively. It has been marked immutable. It has no content in either its data fork or its resource fork. Moreover, the owner and group ID on-disk fields have been repurposed to act as the previous and next links, respectively, in the hard link chain. We see that the previous link ID is 1927095. Let us use HFSDebug to show us information for that ID.

$ sudo hfsdebug -c 1927095
...
  path                 = Leopard HD:/private/tmp/test/file2
# Catalog File Record
  type                 = file (hard link)
  indirect node file   = Leopard HD:/%0000%0000%0000%0000HFS+ Private Data/iNode1927091
  file ID              = 1927095
  flags                = 0000000000100010
                       . File has a thread record in the catalog.
                       . File has hardlink chain.
...
  # BSD Info
  ownerID              = 0 (previous link ID)
  groupID              = 1927094 (next link ID)
  adminFlags           = 00000000
  ownerFlags           = 00000010
                       . UF_IMMUTABLE (file may not be changed)
  fileMode             = -r--r--r--
  iNodeNum             = 1927091 (link reference number)
  textEncoding         = 0
  attrBlocks           = 0
  # Finder Info
  fdType               = 0x686c6e6b (hlnk)
  fdCreator            = 0x6866732b (hfs+)
  fdFlags              = 0000000100000000
                       . kHasBeenInited
...
  # Data Fork
  logicalSize          = 0 bytes
  # Resource Fork
  logicalSize          = 0 bytes

We see that ID 1927095 corresponds to the other reference we just created: file2. The properties of this reference are similar to those of the other reference file1. They do differ in their file IDs. (They are indeed two separate on-disk file system objects.) They also differ in their previous and next links in the hard link chain. (We can confirm that file1 and file2 are connected together.)

The file’s content is in the indirect node file, which also now is the on-disk object with the original file ID (1927091). Let us use HFSDebug to look at that file.

$ sudo hfsdebug -c 1927091
...
  path                 = Leopard HD:/%0000%0000%0000%0000HFS+ Private Data/iNode1927091
# Catalog File Record
  type                 = file
  file ID              = 1927095
  flags                = 0000000000100010
                       . File has a thread record in the catalog.
                       . File has hardlink chain.
  reserved1            = 1927095 (first link ID)
...
  # BSD Info
  ownerID              = 501 (singh)
  groupID              = 0 (wheel)
  adminFlags           = 00000000
  ownerFlags           = 00000000
...
  # Finder Info
  fdType               = 0x686c6e6b (hlnk)
  fdCreator            = 0x6866732b (hfs+)
  fdFlags              = 0000000100000000
                       . kHasBeenInited
...
  # Data Fork
  logicalSize          = 14 bytes
  totalBlocks          = 1
  fork temperature     = no HFC record in B-Tree
  clumpSize            = 0
  extents              =   startBlock   blockCount      % of file
                             0xbb04b7          0x1       100.00 %
                         1 allocation blocks in 1 extents total.
                         1.00 allocation blocks per extent on an average.
  # Resource Fork
  logicalSize          = 0 bytes

As we see, the indirect-node file acts as the container for several of the original file’s properties. In particular, it has the original file’s content, the owner ID, and the group ID. A reserved field (reserved1) even contains the ID of the head of the hard link chain.

Of course, these are implementation details. HFS+ will show you the expected hard link semantics when you look at these files through the usual file system interfaces.

$ ls -las file1 file2
1927091 8 -rw-r--r--  2 singh  wheel  14 Nov  3 21:55 file1
1927091 8 -rw-r--r--  2 singh  wheel  14 Nov  3 21:55 file2
$ cat file1 file2
This is file1
This is file1

We see that both file1 and file2 show up with identical metadata, including the same “inode” number as you would expect. They also “have” the same content.

Note that if you now delete one of the hard links, say, file2, things will not revert back to how they were to begin with. You will have file1 as the only hard-link file along with the indirect-node file.

Let us look at directory hard links next.

Directory Hard Links

It’s not straightforward to create a directory hard link on Mac OS X. Well, that shouldn’t be surprising: directory hard links aren’t meant for third party developers, let alone users. They are essentially an implementation detail needed to make the Time Machine feature of Leopard work. Since we are talking about implementation details here, we will have to create a directory hard link or two—for experimentation, of course.

Leopard at the time of this writing requires the following conditions to be met for a directory hard link’s creation to be allowed. In the following list, “source” refers to the existing directory that will be pointed at by the new directory hard link “destination” that’s being created.

  • The file system must be journaled HFS+.
  • The parent directories of the source and destination must be different.
  • The source’s parent must not be the root directory.
  • The destination must not be in the root directory.
  • The destination must not be a descendent of the source.
  • The destination must not have any ancestor that’s a directory hard link.

If you meet all these conditions, you could create a directory hard link on an HFS+ volume under Mac OS X 10.5 and above. It’s then a matter of writing a program that uses the link() system call.

/* dirlink.c */

#include <stdio.h>
#include <unistd.h>

int
main(int argc, char** argv)
{
    int ret = -1;
    if (argc == 3) {
        ret = link(argv[1], argv[2]);
        if (ret) {
            perror("link");
        }
    }
    return ret;
}

In our /tmp/test/ testing directory, we’ll create a directory dir1 and a subdirectory subdir. It’s in subdir that we’ll create a hard link dir2 to dir1. This is because dir1 and dir2 can’t have the same parent.

$ gcc -Wall -o dirlink dirlink.c
$ mkdir dir1
$ mkdir subdir

Before we create the directory hard link, let us use HFSDebug to peek at the current on-disk details of dir1.

$ sudo hfsdebug dir1
...
  path                 = Leopard HD:/private/tmp/test/dir1
# Catalog Folder Record
  type                 = folder
  folder ID            = 1927398
  flags                = 0000000000000000
  valence              = 0
...
  # BSD Info
  ownerID              = 501 (singh)
  groupID              = 0 (wheel)
  adminFlags           = 00000000
  ownerFlags           = 00000000
  fileMode             = drwxr-xr-x
  linkCount            = 1
  textEncoding         = 0
  attrBlocks           = 0
  # Finder Info
  frRect               = (top = 0, left = 0), (bottom = 0, right = 0)
  frFlags              = 0000000000000000
  frLocation           = (v = 0, h = 0)
  opaque               = 0
  # Opaque Finder Info
  scrollPosition       = (v = 0, h = 0)
  reserved1            = 0
  Opaque Finder Flags  = 0000000000000000
  reserved2            = 0
  putAwayFolderID      = 0

Let us create the link and confirm that our expectations of directory hard link semantics are met.

$ ./dirlink dir1 subdir/dir2
$ ls -lasdi dir1 subdir/dir2
1927398 0 drwxr-xr-x  2 singh  wheel  68 Nov  3 22:59 dir1
1927398 0 drwxr-xr-x  2 singh  wheel  68 Nov  3 22:59 subdir/dir2
$ echo Hello > dir1/file
$ cat subdir/dir2/file
Hello

Everything looks in order. Let us now use HFSDebug to see what actually happened inside the file system. We looked at dir1’s on-disk details earlier. We can now see what changed after we created a directory hard link to dir1.

$ sudo hfsdebug dir1
...
  path                 = Leopard HD:/private/tmp/test/dir1
# Catalog File Record
  type                 = file (alias, directory hard link)
  indirect folder      = Leopard HD:/.HFS+ Private Directory Data%000d/dir_1927398
  file ID              = 1927407
  flags                = 0000000000100010
                       . File has a thread record in the catalog.
                       . File has hardlink chain.
...
  # BSD Info
  ownerID              = 1927408 (previous link ID)
  groupID              = 0 (next link ID)
  adminFlags           = 00000000
  ownerFlags           = 00000010
                       . UF_IMMUTABLE (file may not be changed)
  fileMode             = -r--r--r--
  iNodeNum             = 1927398 (link reference number)
  textEncoding         = 0
  attrBlocks           = 0
  # Finder Info
  fdType               = 0x66647270 (fdrp)
  fdCreator            = 0x4d414353 (MACS)
  fdFlags              = 1000000000000000
                       . kIsAlias
  fdLocation           = (v = 0, h = 0)
  opaque               = 0
  # Data Fork
  logicalSize          = 0 bytes
  # Resource Fork
  logicalSize          = 464 bytes
  totalBlocks          = 1
  fork temperature     = no HFC record in B-Tree
  clumpSize            = 0
  extents              =   startBlock   blockCount      % of file
                             0xbae746          0x1       100.00 %
                         1 allocation blocks in 1 extents total.
                         1.00 allocation blocks per extent on an average.

  rsrc contents        = (up to 464 bytes)
       00 00 01 00 00 00 01 9e 00 00 00 9e 00 00 00 32 00 00 00 00 00 00 00 00
                                                     2
...
       00 00 00 00 00 00 00 1c 00 32 00 00 61 6c 69 73 00 00 00 0a 00 00 ff ff
                                   2        a  l  i  s
       00 00 00 00 00 00 00 00

We see that dir1’s transformation is more drastic than what we had observed in the case of file hard links. After we created a directory hard link to dir1, it’s no longer a directory inside the file system. In fact, the “real” directory (that is, the link target) has moved to a special folder (/.HFS+ Private Directory Data\xd), just as the link target of a file hard link had moved to a (different) special folder. Its name within the special folder is dir_1927398, where the number represents the original “inode” number of dir1. However, dir1 hasn’t been replaced by another directory that points to the link target—it has been replaced by a file, or specifically, an alias. (Backward compatibility!) The immutable alias file has fdrp and MACS as its type and creator codes, respectively. It also as a resource fork. Moreover, we see that as in the case of file hard links, there exists a hard link chain.

Let us also examine the link target using HFSDebug. The path would be “hard” to type because of the characters in the special folder’s name. We can use the folder ID instead, which would be the original ID of dir1.

$ sudo hfsdebug -c 1927398
...
  path                 = Leopard HD:/.HFS+ Private Directory Data%000d/dir_1928557
# Catalog Folder Record
  type                 = folder
  folder ID            = 1927398
  flags                = 0000000000100100
                       . Folder has extended attributes.
                       . Folder has hardlink chain.
  valence              = 0
...
  # BSD Info
  ownerID              = 501 (singh)
  groupID              = 0 (wheel)
  adminFlags           = 00000000
  ownerFlags           = 00000000
  fileMode             = drwxr-xr-x
  linkCount            = 2
  textEncoding         = 0
  attrBlocks           = 0
  # Finder Info
  frRect               = (top = 0, left = 0), (bottom = 0, right = 0)
  frFlags              = 0000000000000000
  frLocation           = (v = 0, h = 0)
  opaque               = 0
...
# Attributes
...
  # Attribute Key
  keyLength            = 72
  pad                  = 0
  fileID               = 1927398
  startBlock           = 0
  attrNameLen          = 30
  attrName             = com.apple.system.hfs.firstlink
  # Inline Data
  recordType           = 0x10
  reserved[0]          = 0
  reserved[1]          = 0
  attrSize             = 8 bytes
  attrData             = 31 39 32 37 34 30 38 00
                          1  9  2  7  4  0  8     

We see mostly what we would expect given our previous observation of the implementation details of file hard links. There is one more thing in this case though: the folder has an extended attribute whose name is com.apple.system.hfs.firstlink and whose value is an encoding of the “inode” number of the head of the directory hard link chain.

HFSDebug Filters

At this point, you could use the built-in dirhardlink filter in HFSDebug to enumerate all directory hard links on the volume.

$ sudo hfsdebug --filter=builtin:dirhardlink
2 links -> dir_1927398
Leopard HD:/private/tmp/test/dir1 -> dir_1927398
Leopard HD:/private/tmp/test/subdir/dir2 -> dir_1927398

The filter prints both link targets and link references. For a link target, the number of references to it is printed before it. For a link reference, the target that it points to is printed after it.

By the way, filters are a very useful recent addition to HFSDebug. A fundamental capability of HFSDebug is to go over all the entries in the HFS+ catalog file. It uses this capability to generate many types of statistics. The recently added filter support makes it possible for you to write a program that plugs into HFSDebug and receives a callback for each catalog file entry. That way, you can examine each entry, apply arbitrary criteria, and show (or not show) details about that entry. Say, you wish to list all setuid/setgid files on an HFS+ volume. Sure, you could run a find command to do that. On one of my HFS+ volumes with about a million files and 200K folders, find takes a while to do this.

$ time sudo find / -xdev -type f \( -perm -4000 -o -perm -2000 \)
...
6.41s user 94.12s system 34% cpu 4:53.35 total
$

You could do this much faster with the sxid built-in HFSDebug filter, whose implementation is a mere ten lines of C code. (Of course, the absolute time taken will also depend on the underlying hardware, but we are only interested in the relative time difference.)

$ time sudo hfsdebug --filter=builtin:sxid
...
2.86s user 9.33s system 17% cpu 1:08.04 total
$

Note that many types of searches on HFS+ can also be done through the searchfs() system call, although it can be quite cumbersome to use. Of course, searchfs() cannot be done on an unmounted volume.

Specifying File System Objects by Path on Unmounted Volumes

As we have seen, a common use for HFSDebug is to have it display implementation details of individual file system objects. You could specify the object of interest in several ways: by providing its catalog node ID (CNID), by providing an “fsspec” style pair consisting of the parent folder’s CNID and the object’s name, or by providing a POSIX-style path to the object. The latter is often the easiest and most convenient to specify. However, until now, HFSDebug did not do component-wise path lookups itself—it used the operating system to convert the path to an inode number. This results in a few caveats. To begin with, it’s against the HFSDebug philosophy of not relying on the operating system for any HFS+-related operations. It also means that if the volume in question is not mounted (say, it’s corrupt and can’t be mounted or you are investigating something and don’t want to mount it), you can’t use paths to look at individual objects. You will have to dump all objects on the file system and then find the node ID of the object of interest. Moreover, even on a mounted volume, the operating system disallows path-based access to several files. (See Chapter 12 of Mac OS X Internals.) In such cases, again you will need to know the node ID of the object of interest, even on a mounted volume.

I’ve “fixed this issue” (or “added the feature”, depending on how you look at it) in the new version of HFSDebug. Say, if you have an unmounted volume on /dev/disk5s1 and you want to examine /tmp/foo/bar on it. Now you can simply do:

$ sudo hfsdebug -d /dev/disk5s1 /tmp/foo/bar
...

The semantics of symbolic link resolution are as follows. If the object (bar in this example) is a symbolic link itself, then HFSDebug will show you properties of bar and not what it points to. This is in line with HFSDebug philosophy and also how things work today on mounted volumes. If, however, a nonterminal component of the path is a symbolic link, HFSDebug will resolve it. Again, this is desirable.

That’s about it.

One More Thing

I can’t talk about HFSDebug’s Snow Leopard-specific features since the latter is under NDA. If you do have access to the latest Snow Leopard seed, try HFSDebug on it. For example, examine some standard Mac OS X files using HFSDebug.

$ sudo hfsdebug /bin/ls
...
$ sudo hfsdebug /etc/asl.conf
...
$ sudo hfsdebug /Applications/Mail.app/Contents/PkgInfo
...

GrabFS Source Code

August 19th, 2008

Earlier this year, I released GrabFS, a MacFUSE file system that shows “live” screenshots of Mac OS X applications. If you wish to understand how GrabFS works, you can now browse its source.

Enjoy.

New Install/Update Capabilities in MacFUSE

July 25th, 2008

MacFUSE has a new install/update mechanism that greatly simplifies and improves things both for end users and developers who use MacFUSE in their software.

The relevant wiki page has all the details.

Note that instead of Tiger- and Leopard-specific downloads, now there’s a single downloadable disk image containing a single installable package. The package, which third parties can choose to include within their metapackages, knows how to install the latest version of MacFUSE for your platform.

Extending HFSDebug

July 23rd, 2008

Recently, I had a need to know if any files or folders had been modified or created on an HFS+ volume in the past N seconds. There are many ways you could generate this type of information on Mac OS X.

To begin with, you could try asking Spotlight.

Besides Spotlight, Mac OS X has a rich variety of mechanisms and APIs for learning about file system changes.

On Leopard, you could write a program that uses the FSEvents API to learn of directory-level changes that occur on a volume. The FSEvents API is part of CoreServices.framework. "Directory-level" means that this API is best suited for monitoring large directory trees—it will not tell you when a particular file changes.

To monitor specific files, you could use the kqueues mechanism. (See kqueue(2).) Being file-level, kqueues don’t scale like the FSEvents API as you will need to monitor each file system object separately. Therefore, it’s better suited for situations where you need to monitor only a few specific objects.

You could also directly use the low-level fsevents mechanism (/dev/fsevents) that underlies the FSEvents API and Spotlight—but only if your need is experimental in nature. The fslogger program is an example of directly using the fsevents mechanism. fslogger will tell you—in pretty much real time—when file system objects change. (Make sure to see caveats.)

Then there is the kauth mechanism that was introduced in Tiger, primarily to help creators of virus scanning software. Kauth allows for extremely fine-grained file system activity monitoring—you can see vnode-level operations. In fact, monitoring is sort of a side effect of using the kauth mechanism. You can actually allow and deny individual operations, as virus scanning software might need to. However, kauth is not easy to use. (Not that the other APIs mentioned necessarily are!) To use kauth, you need to write a kernel extension. You also need to be extremely careful in what you do so as not to deadlock the operating system.

There also exist tools like fs_usage and dtrace on Mac OS X. fs_usage uses the kernel’s kdebug facility to perform fine-grained tracing of kernel events. In particular, it allows you to trace file system activity. Beginning with Leopard, the DTrace facility, to which dtrace is a front-end, lets you trace all kinds of activity at both the kernel and user levels. You could do some very imaginative things with dtrace.

So, we see that there is no dearth of ways to monitor file system activity on Mac OS X. However, there are caveats associated with each way we looked at so far. Consider Spotlight. To use it, we would be assuming that Spotlight indexing was enabled on the volume in question. Spotlight also doesn’t look everywhere: your areas of interest on the file system might be outside of Spotlight’s default or configured search scope. Moreover, to use or the other APIs we talked about, you will need to have the volume mounted—usually a reasonable requirement, except it may not be an option if, for example, you are trying to recover valuable data from a volume that has been through an accident. Or you could be performing file system forensics. Or the volume could be damaged enough to not be in a mountable state—at least not without repair. In these situations, you can’t or wouldn’t want to mount the volume. That aside, in my case, I didn’t know until after the volume had been modified that I wanted to know what had changed. That is, I didn’t happen to be conveniently running any monitoring programs and such.

You can always old plain old Unix-style find and walk the entire file system, examining each file and folder. This still needs the volume to be mounted, but it is exhaustive. Of course, if you have a large volume, exhaustively examining each file and folder through a brute-force find or other programs could take "forever." (In my case, I had over 4 million files on the volume. I also had little patience.)

Fortunately, Mac OS X lets you exploit the fact that the HFS+ volume format uses a central catalog B-Tree for storing hierarchy: the searchfs() system call can be used to "quickly" search HFS+ volumes. (It is much, much quicker than a typical portable user-space file-tree-walk.) In my case, I could use searchfs() to search for files and folders with creation or modification dates that match my criteria. Well, almost. I actually did require the volume to be unmounted. I also felt more inclined to do something general purpose.

hfsdebug is a tool that can walk the catalog tree even on unmounted volumes. I decided to add filtering capability to hfsdebug. "Filtering" means that hfsdebug can walk the HFS+ catalog B-Tree, examining each file and folder, and produce output based on some matching criteria. The new version of hfsdebug contains two built-in filters: mtime and crtime. You can use these filters to look for files and folders that have been modified or created, respectively, in the past N seconds. The number of seconds is passed as an argument to these filters. For example, to look for file system objects modified within the past 60 seconds, you would run hfsdebug as follows:

$ sudo hfsdebug --filter=builtin:crtime --filter_args=60
1216795688 [Tue Jul 22 23:48:08 2008]: Macintosh HD:/private/var/log/asl.db
1216795688 [Tue Jul 22 23:48:08 2008]: Macintosh HD:/private/var/log/system.log
...

Better still, you can write your own filters that hfsdebug can use. A filter is implemented as a dynamic library that implements up to 3 functions: one of them mandatory (hfsdebug_filter_callback()) and two of them optional (hfsdebug_filter_init() and hfsdebug_filter_fini().) To use your own filter, you would run hfsdebug the same way as in the case of built-in filters:

$ sudo hfsdebug --filter=/path/to/myfilter.dylib --filter_args=string
...

If your filter implements the hfsdebug_filter_init() function, hfsdebug would call it with the filter argument string, if any, as the argument. You could parse the argument string in the init function and initialize your filter’s state, if necessary.

int
hfsdebug_filter_init(const char *filter_args);

If you return a non-zero value from the init function, hfsdebug will terminate. If your filter doesn’t have any arguments, you could choose not to implement the init function.

After you return 0 from the init function, hfsdebug will invoke your filter’s callback function once for each file and folder record in the HFS+ catalog.


typedef char*(*hfsdebug_filter_path_retriever_t)(void);

int
hfsdebug_filter_callback(
    void *info, hfsdebug_filter_path_retriever_t pathRetriever);

The info argument is a pointer to either an HFSPlusCatalogFile structure or an HFSPlusCatalogFolder structure. (See the xnu kernel source for details of these structures.) You can determine which structure it is based on the first int16_t within the structure: it’s either kHFSPlusFileRecord or kHFSPlusFolderRecord. Given these structures, your filter can examine various attributes of the file system object.

Note that hfsdebug does not pass you the path to the file system object in question. This is because path computation is expensive. Instead, hfsdebug passes you a pointer to a path retriever function. You can invoke this function to make hfsdebug compute the path on demand and return a C string pointer. This pointer is valid for the given file system object only until your callback returns. You should only call the path retriever function if you truly need the path—doing so for each file system object would be quite time consuming. Note that hfsdebug filters are not multithreaded.

Again, you must return 0 from the callback for hfsdebug to keep calling you as long as there are more file system objects. If you return a non-zero value, hfsdebug will terminate.

Finally, once hfsdebug is done with all file system objects, it will call your filter’s fini function if one is implemented.


void
hfsdebug_filter_fini(void);

The following is a complete example of an hfsdebug filter. It does the same things as the built-in mtime filter, that is, it looks for files and folders that were modified within the last N seconds.

/*
 * myfilter.c
 *
 * HFSDebug Filter for "mtime"
 *
 * Look for file system objects that have been modified
 * within the past N seconds.
 *
 * gcc -arch ppc -dynamiclib -I/path/to/xnu/bsd/ -Wall -o myfilter.dylib myfilter.c
 */

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <errno.h>
#include <time.h>

#include <hfs/hfs_format.h>

#define MAC_GMT_FACTOR 2082844800UL

typedef char*(*hfsdebug_filter_path_retriever_t)(void);

static uint32_t mtime_seconds = 0;

int
hfsdebug_filter_init(const char *filter_args)
{
    mtime_seconds = strtoul(filter_args, NULL, 10);
    if ((errno == ERANGE) || (errno == EINVAL)) {
        fprintf(stderr,
                "invalid argument (%s) to mtime filter\n", filter_args);
        return errno;
    }

    time_t now = time(NULL);

    mtime_seconds = now - mtime_seconds + MAC_GMT_FACTOR;

    return 0;
}

int
hfsdebug_filter_callback(void *info,
                         hfsdebug_filter_path_retriever_t pathRetriever)
{
    int16_t recordType = *(int16_t*)info;
    uint32_t modDate;

    if (recordType == kHFSPlusFileRecord) {
        HFSPlusCatalogFile *file = (HFSPlusCatalogFile*)info;
        modDate = file->contentModDate;
    } else if (recordType == kHFSPlusFolderRecord) {
        HFSPlusCatalogFolder *folder = (HFSPlusCatalogFolder*)info;
        modDate = folder->contentModDate;
    } else {
        /* ignore */
        return 0;
    }

    if (modDate > mtime_seconds) {
        modDate -= MAC_GMT_FACTOR;
        char *tmpTime = asctime(localtime((time_t*)&modDate));
        *(tmpTime + 24) = 0;
        fprintf(stdout,
                "%u [%s]: %s\n", modDate, tmpTime, pathRetriever());
    }

    return 0;
}

void
hfsdebug_filter_fini(void)
{
    return;
}


Download HFSDebug 3.20

/bin/aural: The Solution

July 14th, 2008

1200 Baud Archaeology

/bin/aural

June 30th, 2008

Here is a unique computer puzzle: the audio file (MP3 encoding) contains something that could well be music to many a hacker’s ears. What is it? Can you "prove" that it is what you say it is?

New Version of MacFUSE

April 28th, 2008

Version 1.5 of MacFUSE is out.

The CHANGELOG has details of what’s new.

HFSDebug Bugfix Release 3.10

February 26th, 2008

I discovered a bug in hfsdebug. It causes hfsdebug to crash while printing Access Control Entry (ACE) details for certain files or folders. For example, consider the standard ~/Library/Preferences/ folder on Leopard.

$ ls -lde ~/Library/Preferences
drwx------@ 167 singh  staff ... /Users/singh/Library/Preferences
 0: group:everyone deny delete

This folder has an ACE for the group everyone. In particular, the ACE applies to no specific user (or you could say it applies to the wildcard user). HFSDebug was not dealing with this situation well. See what happens.

$ sudo hfsdebug ~/Library/Preferences/
  <Catalog B-Tree node = 15028 (sector 0x49080)>
  path                 = Macintosh HD:/Users/singh/Library/Preferences
# Catalog Folder Record
...
        # ACL Entry
        ace_applicable     = ab cd ef ab cd ef ab cd ef ab cd ef 0 0 0 c
zsh: bus error  sudo ./hfsdebug ~/Library/Preferences

I’ve released a bugfix version of HFSDebug to take care of this. The correct behavior should be as follows.

$ sudo hfsdebug ~/Library/Preferences/
  <Catalog B-Tree node = 15028 (sector 0x49080)>
  path                 = Macintosh HD:/Users/singh/Library/Preferences
# Catalog Folder Record
...
        # ACL Entry
        ace_applicable     = ab cd ef ab cd ef ab cd ef ab cd ef 0 0 0 c
          user             = *
          group            = everyone
          gid              = 12
        ace_flags          = 00000000000000000000000000000010 (0x000002)
                             . KAUTH_ACE_DENY
        ace_rights         = 00000000000000000000000000010000 (0x000010)
                             . KAUTH_VNODE_DELETE

Download HFSDebug 3.10

“TPM DRM” In Mac OS X

January 31st, 2008

A Myth That Won’t Die

MacFUSE Now Friendlier with Objective-C

January 9th, 2008

Quoting my Google Mac Blog post in its entirety:

A new version of MacFUSE is now available. As always, you can download a ready-to-install prebuilt package, or browse the ready-to-build source. Besides bug fixes and other minor improvements, there is a major new developer feature in this release: an Objective-C framework is now part of the core MacFUSE distribution! MacFUSE.framework will make developing user-space file systems in Objective-C easier than ever before. We look forward to seeing lots of interesting new applications based on MacFUSE.

Ted Bonkenburg, one of the engineers behind MacFUSE.framework, will give a talk this Thursday, January 10, during the next Silicon Valley Cocoaheads meeting at the Apple campus in Cupertino. His talk will focus on using the MacFUSE Objective-C API, but much of it will carry over to using other programming languages with MacFUSE. We’ll also show some very cool file system demos. So, if you’re interested in MacFUSE and are in the area, be there! It will be a hands-on talk, so please bring your laptops if you want to follow along. (Xcode 2.5 or newer required.)

GrabFS: The Screenshot File System

January 2nd, 2008


A while ago, I wrote about procfs for Mac OS X, a MacFUSE-based file system. Subsequently, I added more cool features to my procfs implementation. Recently, I had reason to demonstrate procfs again and realized that I needed still more cool features. That need led to GrabFS.

In a pinch, GrabFS is a file system that shows you a live view of the window contents of currently running applications. In a GrabFS volume, folders represent running applications and image files represent instant screenshots (”grabs”) of the applications’ windows. You simply copy a file or just open it in place, and you have a screenshot. Open it again, and you have a new screenshot!

Go here to read more about GrabFS and to download it. GrabFS requires Mac OS X "Leopard" and MacFUSE.

New Version of HFSDebug

December 30th, 2007

I found some time today to make a certain feature of HFSDebug work on Leopard. The new version is available for download here. The new version should run on both Leopard and Tiger, but there are no visible changes whatsoever for Tiger users.

If you use HFSDebug, you might have realized that the -m option doesn’t work on Leopard any more. This option is used to retrieve and display the in-kernel mount data for a currently mounted HFS+ volume. This is what you would see if you ran the now deprecated version 2.56 of HFSDebug on Leopard:


$ sw_vers
ProductName: Mac OS X
ProductVersion: 10.5.1
BuildVersion: 9B18
$ sudo hfsdebug
populateHFSPlusMount(222): failed to retrieve symbol information.
hfsdebug: failed to locate mount data (perhaps the volume is not mounted)
$

The updated version should work correctly as follows. As you can see, if you did care about this information, this is a rather useful feature that needed fixing for Leopard.

$ sudo ./hfsdebug -m
  Volume name                             = Macintosh HD (volfs_id=234881026)
  block device number                     = { major=14, minor=2 }
  HFS+ flags                              = 000...0000000000000010001100
                                            + HFS_WRITEABLE_MEDIA
                                            + HFS_CLEANED_ORPHANS
                                            + HFS_METADATA_ZONE
  default owner                           = { uid=99, gid=99 }
  directory protection bits mask          = 755
  file protection bits mask               = 755
# Key Data Structures
  struct mount *                          = 0x41ebb90
  block device vnode                      = 0x4333f40
  Extents file vnode                      = 0x4333eb0
  Catalog file vnode                      = 0x4333e20
  Allocation file vnode                   = 0x4333d90
  Attributes file vnode                   = 0x4333d00
# Statistics
  physical block size                     = 512
  physical block count                    = 0x12975e60
  alternate volume header location        = 0x12975e5e
  size of a buffer cache buffer           = 4096
  number of files in file system          = 1047391
  number of directories in file system    = 156694
  free allocation blocks                  = 0x1967df5
  start block for next allocation search  = 0xdfd404
  next unused catalog node ID             = 2011304
  file system write count                 = 84130726
  free block reserve                      = 64000
  blocks on loan for delayed allocations  = 0
  encodings in use                        = 00...010000000000000000001001011
# Notification Variables
  notification conditions bits            = 0
  freespace warning limit                 = 64000
  freespace desired level                 = 96000
# Times
  last mounted time                       = Sun Dec 30 21:36:21 2007
  last mounted modification time          = Sun Dec 30 21:35:51 2007
  last modification time                  = Sun Dec 30 22:08:31 2007
  cache of largest known free extents     =
# Journal
  journal for this volume                 = 0x4338f00
  vnode for journal device                = 0x4333f40
  start block of journal                  = 0x4a8
  journal size                            = 16777216
  journal file ID                         = 16
  journal info block file ID              = 17
# Hot File Clustering
  clustering stage                        = HFC_RECORDING
  recording period start time             = Thu Dec 20 07:41:40 2007
  recording period stop time              = Wed Jan  2 13:17:52 2008
  opaque recording data                   = 0x24189004
  maximum files to track                  = 1000
  vnode of Hot Files B-Tree               = 0x0
# Metadata Zone
  metadata zone start block               = 0x1
  metadata zone end block                 = 0x67fff
  hotfile start block                     = 0x45be2
  hotfile end block                       = 0x67fff
  hotfile free blocks                     = 0x20491
  hotfile maximum blocks                  = 0x2241e
  overflow maximum blocks                 = 0x800
  catalog maximum blocks                  = 0x43f3b
# Other
  maximum inline attribute size           = 3802

MacFUSE: New Release, Leopard Support

October 26th, 2007

A new release of MacFUSE is here. There is a new version for Leopard, a new version for Tiger, and a new version of sshfs.app that runs on both Tiger and Leopard.

Downloads: http://code.google.com/p/macfuse/downloads/list

Documentation: http://code.google.com/p/macfuse/w/list

iPhone Restore Image

July 1st, 2007

I don’t have an iPhone (and don’t really intend to get one), but for the iPhone-equipped curious operating system investigators, the iPhone Restore image downloadable from Apple’s web site has plenty of interesting details about the hardware and software composition of the iPhone, how some of the things work, and so on. Enjoy.

(Hint: Looks like several components of the image might have inadvertently escaped encryption before it was put up for download.)

IBM Assured Execution Environment

June 23rd, 2007

Several years ago, while I was working at the IBM Almaden Research Center, we came up with a security mechanism called the Assured Execution Environment (AxE). We had implementations for Windows XP and Mac OS X. (Although AxE supports code signing as a feature, it’s not the same—in any case, this was long before code signing was known as a forthcoming feature in Mac OS X "Leopard".)

An evolved version of the Windows implementation is now available for download from the IBM alphaWorks web site.

Making procfs Cooler

June 5th, 2007

A few weeks ago, I released as open source a MacFUSE-based process file system for Mac OS X.

I recently added several new features to this procfs implementation. Some of these features are "cool" in that they put a new twist on certain types of visual information.

For example, there’s a folder /proc/system/hardware/displays/ that contains a subfolder each for connected displays. Subfolder 0 represents the first display, 1 is the second display (if any), and so on. Within each such subfolder, there’s a file called info that contains information about that particular display: its resolution, bits-per-pixel, bytes-per-row, whether the display is built-in, whether it supports OpenGL acceleration, and so on. There’s another file called screenshot.tiff that contains a TIFF rendition of what’s on that display at that moment—an always-live screenshot, if you will. You copy this file and you get a screenshot. Copy it again, and you get a new screenshot. You can just open it in place too.

Along similar lines, there’s another folder /proc/system/hardware/camera/ and a file screenshot.tiff within it. When you open this file, procfs activates the camera momentarily, takes a picture, deactivates the camera, and makes the picture available as a TIFF file. You can copy the file and you get an image of what the camera’s seeing at that moment. Copy it again and you get another "live" image.

Besides these, the updated procfs has other (not-so-visual) interesting features.

More details, source code, and a precompiled binary available here:

Making procfs Cooler


All contents of this site, unless otherwise noted, are ©1994-2008 Amit Singh. All Rights Reserved.