Pathlib Basics

Python 3 provides a class called pathlib which allows users to work with paths as objects, with the operations like recursively listing directories or accessing items that you would expect out of the command line or Powershell. The official documentation is here. Eventually I want to understand it well because I think it’ll reduce the friction between what I want to do on a filesystem (point a function at a directory, access an image) and writing the code to do it.

To bootstrap my mental model I’m going to imagine a filesystem as a filing cabinet whose drawers contain folders.

Pure Paths

Pure Paths are the parent class of Path objects and “provide purely computational operations without I/O”. In other words, you can do operations like concatenate two Pure Paths or extract the components of a Pure Path, but it doesn’t contain any functionality to actually access the filesystem. I think of a Pure Path as the labels of the filing cabinet, e.g. ‘the bengal folder in the cat drawer’. What’s in there? Pure Paths have no idea.

In a word, Pure Paths know everything about paths (roots, suffixes, parents) except what’s actually in them.

The abstraction of Pure Paths come with several quality-of-life improvements compared to working with paths as unadorned strings. For example, it can differentiate between operating systems, since Window and Posix filesystems have different syntax despite sharing the same abstract file system structure. Depending on the operating system, creating a Pure Path will output a different class: a PurePosixPath object on Posix systems and a PureWindowsPath on Windows systems. They can be compared, and are compatible with operations which take a os.PathLike object (in fact it’s a parent class).

They’re instantiated, intuitively, by strings, lists of strings, or other Path objects, e.g.

PurePath('hello', 'world')
PurePath(Path('foo'), Path('bar'))

Paths

In practice you’ll use Concrete Paths which are Pure Paths which can do IO operations. You can ask it how many folders there are in the Cats drawer, for example. Being filesystem-aware also means that, conversely, you can create Path objects which depend on the specifics of your filesystem, e.g. the Path.home() method which outputs a Path object representing your home directory (the equivalent of ~) or Path.cwd() which does the intuitive thing.

Some useful methods:

  • Path.exists(): Self → Bool. Whether the path points to an existing file or directory. Similar are Path.is_dir() and Path.is_file().
  • Path.expanduser(): Self → Path. Expands ~ constructions inside a Path object).
  • Path.iterdir(): Self → Iter[Path]. Yields a generator containing Path objects of the directory contents.
  • Path.mkdir(mode=0o777, parents=False, exist_ok=False): Self → Null. Creates a directory for Self.
  • Path.resolve(): Self → Path. Make the path absolute, resolving any symlinks (most importantly . and ... )

Examples: List the directories in your current directory:

p = Path.getcwd()
[x for x in p.iterdir() if x.is_dir()]

Create a hello directory inside your home directory:

p = Path('~', 'hello')
p.expanduser().mkdir()

Any basic operation you’re used to doing on the command line you can likely do with Path objects.