3D animators are a famously tech-savvy and notoriously obsessive bunch, with prodigious knowledge of both graphics hardware and software. And though their party conversation suffers as a result, their lightning-quick PCs always benefit. A hardcore graphics-head recently told me that after throwing a $700 graphics card into his 3-GHz PC, and optimizing its OS for 3D computations, he rendered an ultra-realistic, 30-second-long ocean scene in a mere, er, 18 hours.
Behold the grand irony of graphics: As faster processors become available, so do applications and algorithms that take full advantage of their capabilities. So while the animation enthusiast is treated to steadily more nuanced and mature imagery, the animator finds rendering performance locked in a zero-sum game. Local rendering moves at the speed of a glacier, to the chagrin of digital artists worldwide.
But what about render farming? The idea of coaxing multiple machines to gang up on one graphics-processing job is old; but the feasibility of configuring such a system on a shoestring budget is rather new. With open-source solutions proliferating on the Web and businesses kicking decent PCs to the curb like so many red-headed stepchildren, money's no longer an obstacle. Only one question remains, really: Are you a big enough geek to put it all together?
The Big Picture
If you've seen a 3D feature film, you've seen the final output of the rendering process. But each of those graphics began as a scene file made up of raw-text instructions waiting for a computer to draw them out. If the feature is a finished meal, then the scene file is a recipe, rendering is the cooking process, and rendering software is the chef.A render farm employs many PCs simultaneously by running a queue manager on each box. The queueing software divides a job into multiple parts and decides which machine executes which part and when. Each machine refers to the job's scene file, which needs to reside in a location accessible to all the machines, and renders its share of frames. Once finished, each system stores the rendered frames back to that central location, ready for review.
At its core, a render farm is pretty simple: Seven or so machines on a network, a network-accessible storage location, a rendering app, and a queue manager. Putting it all together should be equally simple, right? Contin
Rawhide
The first step is fun—you get to live out your latent cowboy fantasies, scouting the tech prairie for PCs gone astray. Every few weeks, you'll probably stumble into a few doggies that need roping—aging Athlons in a buddy's basement or once mind-bending G3s now in the school-yard dumpster. Whether they're beaten, battered, or dilapidated, your job is to pick them up without discrimination. And lest this sound too carefree for your inner gearhead, dig the rationale:First, the number of CPUs on a render farm impacts performance more than the combined clock speed of the CPUs. Nine 1-GHz machines chained together will render much faster than three 3-GHz systems, so don't rule out a find on the basis of its wimpy-sounding processor speed.
Next, it doesn't matter much if a machine's been gutted on its way to the gutter. Consider any leftover peripherals you encounter a bonus, but they're not a necessity for your purposes. CD, DVD, and Zip drives come in handy for local installations, but local tweaks are a repetitive, needless bore on a multi-machine system. Rolling out applications and modifications simultaneously to multiple machines via your network is much more efficient. For Windows, Symantec's Ghost handles client management nicely. Ghost For Linux promises the same core functionality for open-source operating systems. With this in mind, all a farm machine strictly needs is a CPU, network adapter, hard drive, motherboard, and about 512MB of memory.
Once you've managed to rustle up five or so boxes, you're ready to shift into Doctor Frankenstein mode, stitching cannibalized hardware into your motley machines until they lurch to life. Each machine's rehabilitation will be different, but the same commonsense strategies will help in all cases. Don't hesitate to mimic wirings you don't understand; if several wires are unplugged from a motherboard, just ape the cable scheme from a similar machine in better condition. Green wire to pin 25? Sounds plausible. You're not going for enlightenment; you're just restoring the machines to basic operating condition. We won't go too in-depth here with the rebuilds
Choosing The Right Queue Manager
Now that you're ready to boot up, you need to decide what to boot up to. Your hardware will dictate which OS you'll be able to run, of course, but if you have a mix of Apple and Intel boxes, bear in mind that it's possible to run Linux on either. YellowDogLinux is a Linux distro for the PowerPC architecture, and RedHat and Fedora run on PCs, of course. Running the same basic OS across different hardware could simplify your life.Your OS, in turn, will govern your choice of queue management software—the stuff that parcels out the rendering task across the farm. Dr Queue won't run on Windows; Smedge will only run on Windows, and so forth. There are several criteria to consider as you sort through the available apps. Do you want a free or open-source application, or are you willing to pay for tech support? Which kinds of projects do you want to render out—Alias Maya, 3ds max, Adobe After Effects, or any of a number of others? Be sure to pick a queue manager that supports them. Will your QM require a dedicated supervisor machine? Do you want your QM to recruit your own workstation if it's idle? If so, would you like it to consider a few criteria before doing so, such as length of time idle or number of running programs? And finally, would you like your QM to integrate with your favorite 3D app, or can you live with a standalone interface?
Consider that there will be an unruly bunch of users trying to share this farm, some human, others virtual. The animators using your farm will be eager to render their projects as quickly as possible, but no one artist should be able to jump ahead in the queue. The Òvirtual userÓ will be an account that owns render jobs launched by your QM. And as with any account, changing credentials and permissions individually on each local machine can be a real bore. Lucky you: There's an easy solution—user management consolidated on one central server.
Depending on the OS, this server is called a Primary Domain Controller (PDC) or Open Directory Master (ODM), but the general idea remains the same. With such a server on your network, you can easily modify passwords on the server, and you can tell another computer to refer to the server for login.
Creating a PDC is definitely the way to go, but remember that it will cost you another machine, and possibly your sanity. Windows 2000 Server is notoriously clunky, and under some circumstances, bug-ridden. For a fix, feel free to type a string of binary numbers into the registry! Binary numbers? Good software hasn't made us think about binary since 1981. But don't hate. My homies at Microsoft dropped their Swatch Watches at the Flock of Seagulls concert; they just don't know what time it is.
Windows 2003 Server and XP Server have made marginal gains, but there's a counterintuitive life saver: OS X Server can act as a Windows PDC just as effectively as a Windows box, and you can set it up with infinitely less stress. Just install OS 10.3 Server on a blue-and-white G3 or newer, tell it to act as a PDC at startup, and connect it to a 100BaseT-or-better network hub. Five minutes later you'll have your first W2K box joining the Apple machine's Windows domain. You'll find it easy to make Apple and Linux machines refer to the OS X open directory for credentials as well. By the way, Samba, on which facets of OS X Server are based, offers the same simple network management for Linux servers.
Network Storage
Before whipping your worker-machine army into shape, there's one last network consideration: storage. Each worker will need access to the same location, from which it reads its 3D instructions and, after executing them, dutifully stores the fruit of its labor.This location can be any storage device, or even just a partition on one. We came across a 300GB SCSI RAID collecting dust, which was cool but not at all necessary. You can use a FireWire drive, a secondary internal drive, or a logical partition. If you're using the quick-and-dirty approach to user management, without the PDC, simply enable file sharing on the device via the OS of whatever machine it's connected to.
But again, this issue is handled more elegantly with a PDC or ODM. For the sake of centralizing server resources, connect your storage device directly to the PDC and then enable sharing for it. PDC-managed sharing lets you control access to the device quickly and gracefully. Also, your client machines will act less flakey when mounting a device that their PDC knows about. And there's another benefit: Some client OSes, like Windows 2000, have limits on how many other clients can access their shared resources. A PDC, however, handles multi-access for a living
So you're the master of your domain, but it's empty. Now you need some machines to sign on and draw some pretty pictures.
Some QMs distribute both supervision and work across the whole farm. But more robust ones, like qube!, will ask you to provide a dedicated worker. So begin by singling out your slowest machine for the super role. Since this box tells other machines what to render when, but does no actual rendering itself, it can be prehistoric. In fact, Pipeline FX claims that its qube! app does fine with a 500MHz PII at the helm.
First, build a baseline image on your super. To do this, load up the software and tweaks that'll be common to each machine on the network, including an OS, helpful network shortcuts and aliases, an unarchiver like WinZip or StuffIt Expander—whatever. You'll definitely want to throw on a VNC application—one of many free, open-source packages that hook your keyboard, mouse, and monitor to any other machine, anywhere. After all, you'll need to tweak these farmers now and again. This approach supplants pricey KVM switches and prevents a multi-monitor pileup.
Now that the image is together, use your client management tool (Ghost, Ghost For Linux, or OS X's Carbon Copy Cloner) to copy it to another hard drive. Next install your supervisor application. For qube!, this comes as a standard installer: MSI, tarball, or .dmg. Finally, join this machine to the PDC's domain. Put your super on the same hub as the PDC and you're ready to image the first worker.
Select a machine to serve as your worker model. Choose one with hardware that is most similar to the others. The baseline image you stored a moment ago has most of the apps your worker will need, so connect the model worker to the hub and send the baseline disk image using your client management tool.
There might be a few cloning complications to worry about. A Ghost/Windows image can be finicky if it's sent to different hardware, getting confused when its software brain wakes up in a new body. In this case you'll need to download the appropriate hardware drivers until it's happy. Also, the baseline will join the network insisting that it's the same machine as the one that generated it. The quick fix is to take it off the domain, change its name, and then rejoin.
Once your farmer is behaving, install the qube! worker software and tweak some config file to get the super and worker communicating. Since the worker passes commands from supervisor to renderer but does no rendering itself, you'll also need to install a renderer. Some 3D apps, such as Maya, ship with one. Otherwise you'll want to install a standalone renderer; mental ray or Renderman come to mind.
Finally, you'll want to configure your workstation to submit jobs. This should be a simple matter of running an installer. Exactly what gets installed is QM-specific. When you run the qube! MayaJobType installer, your machine ends up fully loaded. The qube! render submission window now launches from within the Maya interface. You'll also find the qubic! application, which gives you serious monitoring and control of jobs and farmers alike.
Attack of the Clones
Now we're ready to give the farm a test drive. From your workstation, choose a test file and drag it to your shared drive. Launch the submission window and tell your QM submitter to render out this scene file. For its output folder, point to the same drive, and submit.The supervisor should know about the job immediately, and start bossing around the workers. In the end, you'll see a render that, compared with your workstation, happens twice as slowly?
No sweat, it's all part of your plan. You just want to get one lone worker jibing with the system. Now the baseline image you started with is out of the picture. Copy your worker's disk, and you'll have a full-fledged worker image you can clone onto any number of machines, and a functional worker materializes each time. The more clones you add, the faster the farm.
How much faster? Dramatically. The exact gains will vary with the number and speed of your farmers, naturally. Pratt Institute in Brooklyn seemed a good place for a test. The school needed a better way to render images, the department needed to reduce network traffic, and we needed to show off our tech chops. A test run on Pratt's new render farm should give you an idea of what to expect.
Pratt's setup is professional but affordable: eight salvaged Athlon machines at 1-GHz, each running qube! and joined to a G3's Windows domain. Furthermore, the whole farm is on its own hub, so rendering activity doesn't drag down overall network performance. The setup is optimized for Maya, so let's render out a processing-intensive scene and see how long it takes.
Graduate student Chris Waner's recent animation certainly qualifies. It has realistic radiocity, complex camera shapes, and heaps of ocean footage—each a notorious timesucker.
A local render on one of the school's 2-GHz Dell workstations, which are equipped with nVidia Quadro4 XGL cards and 512MB of RAM, draws out the 800-frame piece in 52 hours. Using a one-worker farm, our time shoots up to 104 hours-unsurprising, considering the age of the machine. But after adding six more workers to the farm, the same render clocks in at just over 13 hours, more than three times as fast as in pre-farm days.
0 comments:
Post a Comment