RFC: Non Code Data Directories and Standards for Catalyst
Introduction
This blog is a proposal and request for comments regarding adopting the XDG Filesystem Hierarchy as a option for managing all the non code data composing a Catalyst application.
The Problem
Right now when you create a new Catalyst application the non code data by default goes to either {home}/root (for templates and static stuff) or {home} (for configuration files), where {home} is the root directory of the application. So you get a directory structure like:
MyApp
myapp.conf
/t
/root
/static
/lib
MyApp.pm
/MyApp
/Model
/View
/Controller
Now, this {home} directory is something of a hack, since we use Catalyst::Utils:home() to try to figure it out based on certain expectations. Perl doesn't have this idea of {home} built into it. If your application is 'installed' (via cpan or make install), we guess the location based on the physical address of the application modules (whatever you got that is inheriting from Catalyst). If it's not installed (which is the common case when you are developing and just running the development server or tests) it walks the directory structure looking for a Makefile.PL or a Build.PL and then decides that's good enough to call {home}.
Since this method can be a bit flaky, a lot of people are recommending that you use File::ShareDir (see here for a good overview). This module intergrates well with Module::Install and leverages the fact that a Perl module can have a share directory associated with it. Using this, you might create a directory structure like:
MyApp-Web
/etc
myapp-web.conf
/t
/share
/static
/lib
MyApp.pm
/MyApp
Web.pm
/Web
/Model
/View
/Controller
I also modified the directory hierarchy a bit to reflect the growing consensus that your Catalyst application should ideally live one level further down from your application root. In this case I choose 'MyApp/Web.pm' which seems to be the most popular choice and one that is semantically meaningful. This represents the idea that your MVC layer should be the thinnest possible over your true domain and interface logic, which sits in the MyApp directory. I also moved the configuration files to {home}/etc since that makes sense from people used to finding configuration in /etc
Although this is an improvement, it still suffers from several issues. First of all one problem with File::ShareDir is that it can only find the share directory for installed applications. For the common case where you are actively developing, or running tests, you still need some code like Catalyst::Utils::home() to guess the directory for you. In this way it's not much better than what Catalyst::Utils::home() provides out of the box.
Also, when your share data is installed into the perl library path, this means that your application server (or user running apache mod_perl or fastcgi) would need the correct level of access to the path. This complicates configuration. This setup is this is not what most Unix administrators will expect. There are reasonably well defined norms for where your configuration should go (/etc or ~/.config) as well as where the logs go and all that.
Although you can override the {home} directory with environment variables, this is not ideal if our goal is to minimize installation hassle and make everything work well out of the box. It complications your installation for users as well as configuration the web servers that will run the code.
It also complicate customization. For example, let's say I am using the MojoMojo wiki and want to run three instance of it. Each instance will have unique configuration and I want to slightly modify the theme files for each. Right now, the only way I can do this is via the method of overridding the environment variable for home for each running instance. Although this works, this is a 'roll my own' approach that is likely to vary from administrator to administrator, making it more difficult to onboard new admins due to the uniqueness of each application. I strongly feel that we should have clear standards for all the most common case deployment issues, since this reduces errors, speeds deployment as well as counter the argument I often hear that Perl is hard to maintain. A standard will also help grow a set of best practices surrounding deployment issues which we can document and promote.
Proposal
This is a case where Perl is not well leveraging existing norms, which really goes against the grain for us, considering CPAN with it's "reuse, recycle" mantra is one of our primary claims to fame. My recommendation is that we adopt an existing standard and make this available as a plugin or set of roles for Catalyst. The most relevent standard is the XDG Filesystem Hierarchy which exists specifically as a standard for where installed applications put configuration and data files, both locally that users can overide as well as global stuff that only admins should touch.
Although this standard is aimed at Linux, it's fairly straighforward and similar methods are employed by Windows Server and MacOSX Server so that is should be possible to create a pluggable support mechanism that is broadly applicable.
the XDG Filesystem Hierarchy defines some environment variables and defaults for the most common types of non code data, as well as offers a system for separating user configuration from global configuration.
I recommend you review the standard, since it's very short, but here's a summary. The standard defines 4 enviroment variables useful to us:
XDG_DATA_HOME
These is the location of data oriented files that a user running the application should be able to customize (or will be customized during installation or use of the application). By default these go into "~/.local/share".
XDG_CONFIG_HOME
Similar to XDG_DATA_HOME but specifically for configuration files. Defaults to "~/.config".
XDG_DATA_DIRS
Takes a string of paths (delimited by ":") where to local for systemwide data. These could be things like templates or static assets that shouldn't be changed by users and that would be shared by all instances of the application. The default is: "/usr/local/share/:/usr/share/".
XDG_CONFIG_DIRS
Like XDG_DATE_DIRS but for configuration. Defaults to "/etc/xdg".
The way I'd see this working is that if the application we being run in development mode, we'd first look for files local to the application file path, and then fall back to looking at the XDG defined directories. Additionally, we'd probably need some boilerplate install scripts that authors can use to prompt for the desired path information (which rational defaults). So our application distribution would possible look like:
MyApp-Web
/t
/etc
myapp-web.conf
/share
/local
/lib
/MyApp
Web.pm
/Web
/Model
/View
/Controller
And during installation we'd copy "MyApp-Web/share/local" to "$XDG_DATA_HOME/myapp-web" and "MyApp-Web/share/" to "$XDG_DATA_DIRS/myapp-web" (we'd either just copy to the first one in the path or prompt at install time). Handling configuration would be a bit trickier. My thougth here is that we'd copy "MyApp-Web/etc/*" to "$XDG_CONFIG_DIRS/myapp-web" but when running the application would like in both XDG_CONFIG_DIRS and XDG_CONFIG_HOME, merging both to allow locally overriding of the configuration.
Overall I believe this will give us a smoother and more professional installation experience, make it easier to administer Catalyst applications and help start a best practices dialog.
Thoughts, criticism, abuse welcome :)
Comments
MyApp::Model is, IMO, just a dumb convention. I prefer to just have MyApp::User, MyApp::Group, and so on. I figure if I'm going to use this convenient second-level namespace, I might as well use it for my core classes.
I find MyApp::Web::Model really confusing. Do you have a non-web model as well? The model is the model, and is not web-specific (at least if you're doing it right). Yes, I'm familiar with the Reaction concepts of Interface vs Domain model, and that makes sense too, but that's not what you're proposing, AFAICT.
I can sort of see MyApp::Web::Controller, although it's somewhat hard for me to imagine a non-web controller (do I need a full-blown set of controllers for a CLI?).
That leaves MyApp::Web::View, which seems reasonable.
I do love the use of share. I already do that myself, and also do the etc thing. I hate the Catalyst convention of root, which is completely arbitrary and bears no relation to any standard I know of.
Making this work properly on Windows and MacOS are also important to me, so I'd need a sort of base API with plugins to mimic the XDG locations.