Transform matrices: using matrix multiplication for transforming coordinates in 3d space

Sep 18, 2008 in geektalk, visual effects, visual effects pipeline

As I'd promised earlier, I'll be doing a quick run-through of how matrix multiplication can be used to rotate, scale and translate coordinates through 3d space. First, though, I'm going to write a little bit about a tool I've been working on and just launched. Sorry: the actual code for it is proprietary, so I can't share it, but a brief discussion of the tool and what it needed to be able to do will help illustrate why matrix transformation is so important, even when you're not creating a 3d package from scratch or developing a game engine.

I needed a way to quickly create a usable "right eye" camera to correspond to a "left eye" camera. I needed to be able to specify interocular distance (the distance between the left and right eye), as well as the convergence distance (where the eyes were focusing). Our eyes naturally "converge" on objects that are in front of us - this is part of how we judge relative depth as this convergence affects the apparent relationship if the dual images our eyes perceive of *other* objects that are not at the point of convergence. But that's a subject for another entry.

Anyway, the tool needed to be able to take a tracked "left eye" camera and apply that tracking data to generate the "right eye" camera using convergence and interocular information that was either derived from a track, noted on set, or dynamically assigned in Maya. I wanted a right eye camera that could be created instantly and that would automatically follow any changes I made to the left eye camera. If I smoothed that camera's path, I needed the right eye to follow suit, if I performed some dramatic transformation to that camera's path, perhaps extending its animation or re-animating an object that was part of the track while conforming the camera to that object's new path: I needed the right eye camera to keep up. If a cg element was being added, moving toward the camera, I needed to be able to selectively lock the convergence to "look at" that object's depth without diverting the cameras to actually look directly at it - similar to changing the focus depth.

It's all about control and instantaneous response, with a camera whose path is known with some degree of certainty and a second camera that will follow that one in a way that will produce realistic cg and will enable artists to quickly and accurately duplicate the second camera position when the original interocular distance and convergence information is not known.

Matrix multiplication, as a mathematical operation, isn't complicated - it's simple multiplication and addition, just repeated a number of times to generate the required result matrix. Since there are already more-than-ample tutorials online about how to actually *perform* matrix math (if you were cursed to do it by hand or develop procedures to support it in programming environments that don't already have matrix math support) I won't be covering that in detail. If my simple explanation doesn't quite do it for you, put "matrix multiplication" into Google if you're stumped.

I'm also going to avoid covering application in a specific language. Most recently, I was doing this in MEL for a project I'll discuss in a moment, but there are Python libraries for doing matrix math, TCL/tk support for it, PHP and Perl support - you'll rarely find yourself with no built-in or easily-added matrix support, though you may want to write (as I did) a number of routines to make it a little more accessible.

3D Translation MatrixTo the left, you can see the standard format of a translation matrix. This matrix (when multiplied by a coordinate matrix as represented on the right-hand side of the equation) will translate that coordinate into a new space, offsetting it by (tx, ty, and tz). In the typical way of matrix multiplication, the element in each row of the coordinate matrix [x y z 1] is multiplied by a column of the translation matrix, as shown here:

Translation Matrix ExplainedThe rows of this matrix are then added together (in typical matrix multiplication fashion) to give the resulting (x',y',z') location.

But there's more to matrix transformation than simple translation. If you wanted to find the new (x', y', z') for a translation like this one, simple addition would be enough. But manipulating points in 3d space is rarely that simple!

Fortunately, it's just as easy to scale a point using matrix math! Always away from the origin - we'll talk about how to scale a point away from somewhere other than the origin shortly. All matrix operations work with a similar setup, so you'll get used to seeing a similar notation here. Scaling matrix

This matrix is multiplied in the same fashion as the one we see above, rows against columns, with row sums producing (x',y', z') for the new location.

I'll admit, though. I don't use matrix math for scaling things very often. You know what I do use it for, though?

Rotating a point in space with respect to the origin! Isn't that exciting? I LOVE rotating things! Well, ok, so it's not that big of a deal - but when we start combining some of these things, it can turn a complicated object tree in your scene into a relatively simple expression.

I have to warn you, though, rotation is a bitch. Ever notice how your favorite 3d software has this whole "rotation order" thing? That's because if you rotate something 10 degrees in X, then 15 degrees in Y, it's not the same as rotating it in Y first and then in X. There's also a different matrix for each axis of rotation, so let's take a look. Same matrix math process as translation and scaling, but from this one we get rotation around the Z axis.

Matrix for z-Rotation

And now here:

Matrix for x-rotation

That one rotates in X! And lastly, as you'd expect, there's a matrix for rotating around the Y axis:

 

Matrix for y-rotation

Now, combining these matrix transformations together is as simple as multiplying the matrices with each other. Now, you build the series of multiplications in order from right to left, but they're carried out from left to right. For instance, for "zxy" rotation order, you would create an expression similar to the following: (I'll use MEL for this example: $xRot, $yRot, and $zRot are each 4x4 matrices that already contain transformation data for x, y and z rotation):

matrix $r[4][4] = $yRot * $xRot * $zRot;

Provided proper matrix variables are supplied, MEL supports basic matrix operations with some limitations that you'll find as you stretch your legs.

These can be strung together into much longer expressions generating much more complex matrices. To rotate an object about a point other than the origin, for instance, subtract the values of that coordinate from the coordinates of the object (transform it in {-cx, -cy, -cz} where {cx,cy,cz} was the center to rotate it around), perform the rotation, the transform it back {+cx,+cy,+cz}.

It may take some time learning to visualize and plan out a complex matrix transform - but what makes it powerful is the ability to combine any number of transformations into a single operation. Some basic trigonometry and a well-applied matrix transform can accomplish all kinds of things!

XML, Python and the Visual Effects Pipeline

May 16, 2008 in geektalk, python, visual effects, visual effects pipeline, xml

I was talking to a friend today about what I'm doing with regards to managing data through an animation pipeline using XML. The more I work with it and the farther I get into the project, the more flexible and powerful the whole thing seems. Of course the goal to doing the implementation in Python is that virtually every software package in the vfx industry is python-friendly - so once the core routines are written, everything from Nuke and pyShake (the python plugin for Shake - if you haven't seen it yet, check it out here) to Maya, Houdini and RealFlow will be able to make use of them. I think most places are doing that these days, with a few nods to TCL/tk here and there - but broadly supported scripting languages are King and open description formats like XML are Queen.

My friend marveled at how nice it would be if one day, a couple years from now, everything was able to talk that smoothly: that a character animated in Maya could be pulled into Houdini, for instance, as something other than an OBJ sequence or a separately rigged character that you had to tediously (or with a lot of specific coding) link to exported channel data.

I wonder if that interoperability thing will ever extend beyond each individual studio's implementation. Everybody has a way of getting software to talk amongst themselves, some solutions being more elegant than others, but when you invest in creating something as elaborate as this it becomes your own proprietary tool. If you develop a tool that an animator can take an animated character with a complex rig on it, arbitrarily select additional elements that were never *really* meant to be animated and animate them anyway, and the modeling team can modify the model and issue a new version of it - and the animation gets seamlessly transferred over to the new model, even able to be read into RealFlow, substituting a different set of low poly independent objects that are driven by the data in that XML file: you don't put that pipeline tool on the internet for everyone to download for free.

That tool becomes your secret weapon. As a studio with an investment in a powerful and unique proprietary tool, even charging for it may not mean as much to you as the edge you gain during the heat of production.

Being XML based and implemented in Python does put my current project a wee bit closer to being an open standard, though. Even Shake will take Python scripts now - and they're really powerful in it and getting more so as development continues. The readability thing for XML is a gigantic plus, and the way it represents data is great. I can build a module that will write out the translation of a locator in both world and local space, as a baked set (every frame has a value) and as a set of keyframes (values only for those frames where the value was explicitly set by the artist), as well as screenspace UV values - so the same XML file could reconstruct a scene for a lighter to light and render from or another animator to tweak the animation curves, or for RealFlow to drive low-poly proxy objects with to disturb a drifting mist, or for a compositor in Toxic to link an effect to. And it's all one XML file - not a half dozen formats (often multiple versions of each) and a hundred-unit sequence of geometry exports.