Apr 172019
 

Traditionally, GObject implementations in C are mutable: you
instantiate a GObject and then change its state via method calls.
Sometimes this is expected and desired; a GtkCheckButton widget
certainly can change its internal state from pressed to not pressed,
for example.

Other times, objects are mutable while they are being “assembled” or
“configured”, and only yield a final immutable result until later.
This is the case for RsvgHandle from librsvg.

Please bear with me while I write about the history of the
RsvgHandle API and why it ended up with different ways of doing the
same thing.

The traditional RsvgHandle API

The final purpose of an RsvgHandle is to represent an SVG document
loaded in memory. Once it is loaded, the SVG document does not
change, as librsvg does not support animation or creating/removing SVG
elements; it is a static renderer.

However, before an RsvgHandle achieves its immutable state, it has
to be loaded first. Loading can be done in two ways:

  • The historical/deprecated way, using the rsvg_handle_write() and
    rsvg_handle_close() APIs. Plenty of code in GNOME used this
    write/close idiom before GLib got a good abstraction for
    streams; you can see another example in GdkPixbufLoader.
    The idea is that applications do this:
file = open a file...;
handle = rsvg_handle_new ();

while (file has more data) {
   rsvg_handle_write(handle, a bit of data);
}

rsvg_handle_close (handle);

// now the handle is fully loaded and immutable

rsvg_handle_render (handle, ...);
file = g_file_new_for_path ("/foo/bar.svg");
stream = g_file_read (file, ...);
handle = rsvg_handle_new ();

rsvg_handle_read_stream_sync (handle, stream, ...);

// now the handle is fully loaded and immutable

rsvg_handle_render (handle, ...);

A bit of history

Let’s consider a few of RsvgHandle‘s functions.

Constructors:

  • rsvg_handle_new()
  • rsvg_handle_new_with_flags()

Configure the handle for loading:

  • rsvg_handle_set_base_uri()
  • rsvg_handle_set_base_gfile()

Deprecated loading API:

  • rsvg_handle_write()
  • rsvg_handle_close()

Streaming API:

  • rsvg_handle_read_stream_sync()

When librsvg first acquired the concept of an RsvgHandle, it just
had rsvg_handle_new() with no arguments. About 9 years later, it
got rsvg_handle_new_with_flags() to allow more options, but it took
another 2 years to actually add some usable flags — the first one was
to configure the parsing limits in the underlying calls to libxml2.

About 3 years after RsvgHandle appeared, it got
rsvg_handle_set_base_uri() to configure the “base URI” against which
relative references in the SVG document get resolved. For example, if
you are reading /foo/bar.svg and it contains an element like <image
xlink:ref="smiley.png"/>
, then librsvg needs to be able to produce
the path /foo/smiley.png and that is done relative to the base URI.
(The base URI is implicit when reading from a specific SVG file, but
it needs to be provided when reading from an arbitrary stream that may
not even come from a file.)

Initially RsvgHandle had the write/close APIs, and 8 years later
it got the streaming functions once GIO appeared. Eventually the
streaming API would be the preferred one, instead of just being a
convenience for those brave new apps that started using GIO.

A summary of librsvg’s API may be something like:

  • librsvg gets written initially; it doesn’t even have an
    RsvgHandle, and just provides a single function which takes a
    FILE * and renders it to a GdkPixbuf.

  • That gets replaced with RsvgHandle, its single rsvg_handle_new()
    constructor, and the write/close API to feed it data
    progressively.

  • GIO appears, we get the first widespread streaming APIs in GNOME,
    and RsvgHandle gets the ability to read from streams.

  • RsvgHandle gets rsvg_handle_new_with_flags() because now apps
    may want to configure extra stuff for libxml2.

  • When Cairo appears and librsvg is ported to it, RsvgHandle gets an
    extra flag so that SVGs rendered to PDF can embed image data
    efficiently.

It’s a convoluted history, but git log -- rsvg.h makes it accessible.

Where is the mutability?

An RsvgHandle gets created, with flags or without. It’s empty, and
doesn’t know if it will be given data with the write/close API or
with the streaming API. Also, someone may call set_base_uri() on
it. So, the handle must remain mutable while it is being populated
with data. After that, it can say, “no more changes, I’m done”.

In C, this doesn’t even have a name. Everything is mutable by default
all the time. This monster was the private data of RsvgHandle
before it got ported to Rust:

struct RsvgHandlePrivate {
    // set during construction
    RsvgHandleFlags flags;

    // GObject-ism
    gboolean is_disposed;

    // Extra crap for a deprecated API
    RsvgSizeFunc size_func;
    gpointer user_data;
    GDestroyNotify user_data_destroy;

    // Data only used while parsing an SVG
    RsvgHandleState state;
    RsvgDefs *defs;
    guint nest_level;
    RsvgNode *currentnode;
    RsvgNode *treebase;
    GHashTable *css_props;
    RsvgSaxHandler *handler;
    int handler_nest;
    GHashTable *entities;
    xmlParserCtxtPtr ctxt;
    GError **error;
    GCancellable *cancellable;
    GInputStream *compressed_input_stream;

    // Data only used while rendering
    double dpi_x;
    double dpi_y;

    // The famous base URI, set before loading
    gchar *base_uri;
    GFile *base_gfile;

    // Some internal stuff
    gboolean in_loop;
    gboolean is_testing;
};

“Single responsibility principle”? This is a horror show. That
RsvgHandlePrivate struct has all of these:

  • Data only settable during construction (flags)
  • Data set after construction, but which may only be set before
    loading (base URI)
  • Highly mutable data used only during the loading stage: state
    machines, XML parsers, a stack of XML elements, CSS properties…
  • The DPI (dots per inch) values only used during rendering.
  • Assorted fields used at various stages of the handle’s life.

It took a lot of refactoring to get the code to a point where it was
clear that an RsvgHandle in fact has distinct stages during its
lifetime, and that some of that data should only live during a
particular stage. Before, everything seemed a jumble of fields, used
at various unclear points in the code (for the struct listing above,
I’ve grouped related fields together — they were somewhat shuffled in
the original code!).

What would a better separation look like?

In the master branch, now librsvg has this:

/// Contains all the interior mutability for a RsvgHandle to be called
/// from the C API.
pub struct CHandle {
    dpi: Cell<Dpi>,
    load_flags: Cell<LoadFlags>,

    base_url: RefCell<Option<Url>>,
    // needed because the C api returns *const char
    base_url_cstring: RefCell<Option<CString>>,

    size_callback: RefCell<SizeCallback>,
    is_testing: Cell<bool>,
    load_state: RefCell<LoadState>,
}

Internally, that CHandle struct is now the private data of the
public RsvgHandle object. Note that all of CHandle‘s fields are a
Cell<> or RefCell<>: in Rust terms, this means that those fields
allow for “interior mutability” in the CHandle struct: they can be
modified after intialization.

The last field’s cell, load_state, contains this type:

enum LoadState {
    Start,

    // Being loaded using the legacy write()/close() API
    Loading { buffer: Vec<u8> },

    // Fully loaded, with a Handle to an SVG document
    ClosedOk { handle: Handle },

    ClosedError,
}

A CHandle starts in the Start state, where it doesn’t know if it
will be loaded with a stream, or with the legacy write/close API.

If the caller uses the write/close API, the handle moves to the
Loading state, which has a buffer where it accumulates the data
being fed to it.

But if the caller uses the stream API, the handle tries to parse an
SVG document from the stream, and it moves either to the ClosedOk
state, or to the ClosedError state if there is a parse error.

Correspondingly, when using the write/close API, when the caller
finally calls rsvg_handle_close(), the handle creates a stream for
the buffer, parses it, and also gets either into the ClosedOk or
ClosedError state.

If you look at the variant ClosedOk { handle: Handle }, it contains
a fully loaded Handle inside, which right now is just a wrapper
around a reference-counted Svg object:

pub struct Handle {
    svg: Rc<Svg>,
}

The reason why LoadState::ClosedOk does not contain an Rc<Svg>
directly, and instead wraps it with a Handle, is that this is just
the first pass at refactoring. Also, Handle contains some
API-level logic which I’m not completely sure makes sense as a
lower-level Svg object. We’ll see.

Couldn’t you move more of CHandle‘s fields into LoadState?

Sort of, kind of, but the public API still lets one do things like
call rsvg_handle_get_base_uri() after the handle is fully loaded,
even though its result will be of little value. So, the fields that
hold the base_uri information are kept in the longer-lived
CHandle, not in the individual variants of LoadState.

How does this look from the Rust API?

CHandle implements the public C API of librsvg. Internally,
Handle implements the basic “load from stream”, “get the geometry of
an SVG element”, and “render to a Cairo context” functionality.

This basic functionality gets exported in a cleaner way through the
Rust API, discussed previously. There is no
interior mutability in there at all; that API uses a builder pattern
to gradually configure an SVG loader, which returns a fully loaded
SvgHandle, out of which you can create a CairoRenderer.

In fact, it may be possible to refactor all of this a bit and
implement CHandle directly in terms of the new Rust API: in effect,
use CHandle as the “holding space” while the SVG loader gets
configured, and later turned into a fully loaded SvgHandle
internally.

Conclusion

The C version of RsvgHandle‘s private structure used to have a bunch
of fields. Without knowing the code, it was hard to know that they
belonged in groups, and each group corresponded roughtly to a stage in
the handle’s lifetime.

It took plenty of refactoring to get the fields split up cleanly in
librsvg’s internals. The process of refactoring RsvgHandle‘s fields,
and ensuring that the various states of a handle are consistent, in
fact exposed a few bugs where the state was not being checked
appropriately. The public C API remains the same as always, but has
better internal checks now.

GObject APIs tend to allow for a lot of mutability via methods that
change the internal state of objects. For RsvgHandle, it was possible
to change this into a single CHandle that maintains the mutable data
in a contained fashion, and later translates it internally into an
immutable Handle that represents a fully-loaded SVG document. This
scheme ties in well with the new Rust API for librsvg, which keeps
everything immutable after creation.

Federico Mena-Quintero: Containing mutability in GObjects
Source: Planet Gnome