Skip to main content
Hardware

For high-compute jobs, HTCondor matches hardware with researchers

University pros share how the tool saves them time and takes advantage of unused resources.

Credit: Brittany Holloway-Brown, Photos: Adobe Stock

Credit: Brittany Holloway-Brown, Photos: Adobe Stock

5 min read

Annika Pratt studies macrophomina phaseolina—fun to say, bad for plants.

As a fungal researcher, the University of Wisconsin–Madison PhD candidate analyzes the DNA of the microorganism’s 500 different strains, or genomes, preferably all at once. A program outputs a file that tells Pratt every single gene and its location.

A fungal genome has thousands of genes. The high-computational task, on her machine, would take 12 hours…per genome.

A management tool called HTCondor helps researchers like Pratt connect to powerful computers on campus to do hardware-heavy jobs. The open-source software, developed at UW–Madison’s Center for High Throughput Computing (CHTC), has also received funding from orgs like the National Science Foundation.

University pros—and one software engineer who has worked with the computing scheduler for almost 20 years—shared with us why HTCondor helps with research needs.

“I can run analyses in parallel, and I don’t have to sit with my computer open for hours on end. I can just submit something, close my computer, go do whatever else I need to do, and then come back later when it’s done,” Pratt told us.

Flight of the HTCondor. HTCondor, first installed as a production system in the UW–Madison Computer Sciences department in the 1990s, has now been put to work by almost 200 academic, government, and private-sector groups, according to its latest map containing volunteered info. (There are even users in Antarctica. The “IceCube” experiment researchers use HTCondor as they search for subatomic particles in the South Pole.)

The map also cites 22 private-sector partners. According to Greg Thain, senior software engineer at the CHTC, the private sector “absolutely” uses high-throughput computing in production and not just for research. DreamWorks Animation, he wrote, uses the system to render the frames of their movies. (DreamWorks did not immediately respond to a request for comment regarding if and how the company uses the tool.)

Institutions can limit computing capacity to their own campus hardware, as well as pull from an “OSPool,” composed of thousands of cores contributed by sites all over the country.

It disappoints Thain when researchers prematurely limit the scope of their inquiry to what can be done on a laptop.

“Computers are cheap. They’re not free; they’re cheap. They’re widely available. We don’t ever want lack of access to computation when there are so many computers, in your office, in your building, in the world,” he said.

On schedule! To run the fungi analysis, Pratt logs into the CHTC.

A “condor_q” line command allows Pratt to see jobs completed and jobs running. A “condor_submit” command points to the relevant submit file.

Top insights for IT pros

From cybersecurity and big data to cloud computing, IT Brew covers the latest trends shaping business tech in our 4x weekly newsletter, virtual events with industry experts, and digital guides.

A submit file contains, essentially, the program that Pratt wants to run: components like GPU requirements, the executable (a batch script), and text files that the executable requires. (Some command-line expertise is required, Pratt notes; she attended trainings and learned from the HTCondor team.)

Hardware times. Hardware is a precious commodity when dealing with the latest technologies, like large language models. OpenAI CEO Sam Altman this year shared on X how his company’s latest GenAI tool was “melting” available GPUs, and Big Tech companies like Google, Amazon, Meta, and Microsoft have dedicated major investments to provide data-center support to train and run their LLMs.

Institutions like the Grainger College of Engineering at the University of Illinois Urbana–Champaign want to use LLMs, too, and that means plenty of computers and a tool like HTCondor to schedule the jobs.

Users submit their tasks to HTCondor, and the management system queues them up based on scheduling policy and priority scheme, runs them, and informs the user of the result.

Fair play. The tool tries to run a “fair share” of resources, Thain said; if you’ve used 5,000 CPU hours, you’ll move to the back of the line after your job is done. The resource providers, however, can influence the setup substantially, he added—maybe a physics department that’s paid for the machines gets primary access, or an astronomy team gets the computing at night.

The setup came in handy during Covid-19 lockdowns, according to Gianni Pezzarossi, computational system analyst for the Grainger College, given HTCondor’s handling of workloads across unused computers, like abandoned Linux lab machines after students left campus.

“We were able to utilize all these idle computers that were sitting in the computer labs,” he said.

A primary benefit of HTCondor is to help with resource contention within research groups, Pezzarossi told us. Before HTCondor, there wouldn’t be enough CPU to go around, he said, and jobs would be halted by the Linux Out of Memory Killer.

Following the campus-wide usage during Covid, Grainger began to offer individual research groups their own smaller clusters made up of only their research-group nodes. The school currently runs six of these smaller clusters.

Pratt, on June 20, had one project completed in the HTCondor queue and one currently running.

Five-hundred strains is still a big job, no matter who or what is handling it.


Top insights for IT pros

From cybersecurity and big data to cloud computing, IT Brew covers the latest trends shaping business tech in our 4x weekly newsletter, virtual events with industry experts, and digital guides.