I got pretty excited recently whilst reading the events guide when I saw that the second half of the last session of the Modern Solo Piano Festival was going to be free. I dutifully turned up a little earlier and found out that although it was free, it was not actually free. It turned out the listing had mistaken the style of piano (free) for the price (not free and not so cheap either). Still, I am a fan of solo piano (my bro is one) so I went anyway.
The festival was designed to bring together pianists of all stripes and this session bought together one pianist from the “experimental” style, and one from the “free” style. Thankfully the festival tried hard to minimize the pretentiousness normally associated with “serious” music. The show was compered by a hand puppet, who was, at least, funny to the germans. There were psychedelic patterns projected onto screens arranged vaguely in the shape of oversized piano keys, which vaguely matched the music.
Still, it was quite odd to see music that was “free” distinguished from music that was “experimental”. What possible difference could there be? It turns out that the difference was a generational one as much anything. The two guys playing experimental piano, american Dustin O’Halloran and a german, known simply as Hauschka, were in their 20/30s, whilst the free piano was played by an german dude A. von Schlippenbach and and a japanese woman, Aki Takase, who were on the northern side of 50.
First the similarities. Both acts felt sufficiently liberated to interpret solo piano as meaning two pianos on stage. Also, since there’s not too much crazy things you can do to a piano without electronics and without damaging the piano, the only option was to put things onto the strings, anything from rags to tin pots. When the first act put ping-pong balls into the piano, it seemed really cool. But when the second act did it too, it seemed kind of derivative. Still, it’s a great visual effect as the balls would bounce out of the piano depending on the vigor of the playing.
The “free” piano played by the older pianists was all about technical virtuosity devoid of harmony or movement. It seemed like these pianists were paranoid about hitting any classically pleasing intervals, producing anti-melodies in the playing. Although there was some astonishing skill, evident in the finger work across the keyboard, it was all a cluster of cacophonic notes that bought to mind a stuttering washing machine rather than a piece of music. Was it composed or improvised? Does it really matter?
The younger pianists in the “experimental” style played pieces of haunting beauty and tonal simplicity. The music flowed in repetitive arpeggio figures and plaintive drones. Rhythmically, the music moved in steady crescendos. There was an inexorable emotional logic to these pieces as they worked through classic harmonic modalities, and washed over like a lost soundtrack of a european coming-of-age film. Alex Ross of the New Yorker once wrote that the biggest contribution that Phillip Glass made to contemporary music was to bring back melody and rhythm, without sinking into sentimentality, after decades of avoiding it. These younger pianists are clearly on the after-side of that divide. There is an emotional logic to harmonies that have been known ever since Archimedes, and although for a few brief decades, musicians disdained this logic in the classic boomer style, the current generation have seen fit to bring it back.
Tricky thing, heating up a protein. I’ve been playing around with GROMACS and have gotten to the point that I want to set up a simulation of a protein in a periodic box of waters somewhat carefully. That means that I have to slowly heat up a system. Here, I’ve followed the standard procedure of applying thermostats in increasing 50K chunks. This following video shows a 30ps simulation where each 5ps is run at an increasing hotter chunk of 50K. I really couldn’t be fucked putting positional constraints on the protein:
The result is rather hypnotic as the water goes from a wobbly jelly to an swarm of angry molecules. However, these waters are rather unphysical since below ~250K, the water really should be frozen in any number of crystal states. I’ll leave this for the force-field mavens to explain.
My laptop died last week. At first I had a panic attack as if someone had ripped out a vital organ. But slowly my senses came back to me. And it didn’t turn out to be so bad after all. My roommate had a spare laptop (in German), I had a separate computer at work and of course I had my iphone. In the time that my laptop has been at the Apple serice center, I have had to, out of necessity, changed my working habits. I have started to write on on my iphone which I had never thought of doing before. I have even started to read code on my iphone. And you know what, I like it better reading and writing on a puny little iPhone. There’s fewer distractions. It’s actually more effective to write code by giving yourself a solid time just to read existing code. I can sit in a more comfortable location, practically anywhere. And it’s always available. It’s so convenient i can just pop off a short note in a cafe. Like this one.
The other night I treated myself to some of the art on offer in Berlin at a venue down the street called Dock 11. The piece was called Andropolaroid and came recommended as a Tagestip by Zitty. An intriguing enough name, it was a solo dancer piece performed by a sprightly Japanese dancer Yui Kawaguchi, who I gather from the program is also of German extraction. If ever an ethnicity demanded an exploration of man and machine, this was it.

Of course the theater was inside a huge warehouse space with exposed bricks walls and high ceilings. But the show was such a beguiling piece of motion and light that at times I was transported out of the warehouse and into some other more intimate space. The piece was as memorable for the startling light installation that inhabited the stage as for the show itself. About a hundred programmable flourescent tubes dangled from the ceiling carving out a geometric pattern that criss-crossed the room. The show proceeded through a complex set of lighting design, varying in its luminal intensity, from fast and kinetic, to subtle, diffuse and slow. We were sometimes dropped from darkness onto a lonely spot on the stage. Other times the entire array of lights would crackle into a warm glow. It felt like space and time mutating in front of our very eyes.
Although strictly speaking there was only one woman on the stage, the lights effectively served as a second dancer as Kawaguchi rippled through bodies of light and shadow. The style was a disciplined flow of modern dance, some servicable pop-and-locking and some welcome dollops of circus-like theatrics, involving some imaginative manipulation of the swinging lights. A particular effective sequence involved an affecting red hoodie, the burst of color popping out of an otherwise monochromic stage.
Holding it together was the gorgeous sound design delivered through 10 huge speakers hung around the ceiling in the room. Although most of the music revolved around rythmic industrial minimalist beats (we are in Berlin after all), there were some theatrical touches, including an eerie effect where Kawaguchi would lipsync to an amplified whispering. The thought behind how the sound was projected around the room was precisely accentuated to the action on stage. And it is fitting that the final scene faded into the cello strains of a plaintive track by Colleen.
Over the last few years, I’ve averaged about a book a week in reading. It feels like a lot yet I see blogs by people like Aaron Swartz who claim to have read 100 books in a year. That’s like double what I normally read. Is this possible?
Well, last year Aaron put up a list of helpful advice to increase your reading. I decided to take up the offer and try for 100 books this year. This was a good year to do it since I was unemployed for several months and I have been travelling a lot. Well, now that I’m past the half way mark, I see that I am more or less on track.
It really does help to use a public library. I read a lot on my iPhone, which makes reading convenient anywhere, even outside on a windy day in the dark. The iPhone also makes it convenient to read any public domain work via great apps such as Eucalyptus and Stanza. I’m rereading some favorites from my youth and thrillers whilst travelling.
p.s. this post was mostly written on my iPhone cause my motherfucking harddrive died on my laptop.
If I read a few comics and poetry books I should be able to make it by year’s end.
1. Alexandre Dumas, “Comte de Monte Cristo”
2. Emily Bronte, “Wuthering Heights”
3. Colum McCann, “Let the Great World Spin”
4. Peter R. Cook, “Principles of Nuclear Structure and Function”
5. David Harvey, “Limits of Capital”
6. Matthew Zapruder, “American Linden”
7. Stanislas Dehaene “Reading in the Brain”
8. Marguerite Yourcenar “Memoires d’Hadrien”
9. Susan Sontag, “On Photography”
10. Arthur Conan Doyle, “The Adventures of Sherlock Holmes”
11. Mungo MacCallum, “Australian story: Kevin rudd and the Lucky Country”
12. James Jones, “The Thin Red Line”
13. Daniel Tammet, “Embracing the Wide Sky”
14. Andre Agassi, “Open”
15. John Updike, “Rabbit Run”
16. Bryan Ward-Perkins, “The Fall of Rome: And the End of Civilization”
17. Atul Gawande, “The Checklist Manifesto”
18. Tom Vanderbilt, “Traffic”
19. Christopher Macdougall, “Born to Run”
20. Allison Hoover Bartlett, “The Man Who Loves Books Too Much”
21. Ben Mirov, “I is to Vorticism”
22. Amélie Nothomb, “Ni d’Eve ni d’Adam”
23. Neale Donald Walsch “Happier than God”
24. Neale Donald Walsch “Tomorrow’s God”
25. David Owen, “Green Metropolis”
26. Dasgupta, Papadimitriou, Vazirani, “Algorithms”
27. Nicolas Wade, “The Faith Instinct”
28. William Lidwell, Kritina Holden, Jill Butler, “Universal Principles of Design”
29. Virgil, “Aneid”
30. Ian Fleming, “On Her Majesty’s Secret Service”
31. Stephan Zweig, “Chess”
32. Isaac Asimov, “Foundation”
33. Robert Harris, “Imperium”
34. Isaac Asimov, “Foundation and Empire”
35. Isaac Asimov, “Second Foundation”
36. Neil Shubin, “Your Inner Fish”
37. Y. A. Rozanov, “Probability Theory: a concise course”
38. Isaac Singer, “Shadows on the Hudson”.
39. Michael Lewis, “The Big Short”.
40. Arthur Spotnik, “Spunk and Bite”
41. Ian Mcneely and Lisa Wolverton, “Reinventing Knowledge”
42. Robert Harris, “Conspirata”
43. Robert Neuwirth, “Shadow Cities: A Billion Squatters, a New Urban World”
44. Anya Kamenetz, “DIY U: Edupunks, Edupreneurs, and the Coming Transformation of Higher Education”
45. Beth Lisick, “Everybody into the Pool: True Tales”
46. Merrill Goozner, “The $800 Million Pill: The Truth behind the Cost of New Drugs”
47. Rebecca Skloot, “The Immortal Life of Henrietta Lacks”
48. Stieg Larsson, “The Girl with the Dragon Tattoo”
49. Jean-Pierre Changeux, “L’homme neuronal”
50. Tom DeMarco and Timothy Lister, “Peopleware”
51. Alice Munro, “Hateship, Friendship, Courtship, Loveship, Marriage: Stories”
52. Rodd Wagner & James Harter, “12: The Elements of Great Management”
53. Aristotle, “Poetics”
54. Alain Mabanckou, “Black Bazar”
55. Henry Miller, “Tropic of Cancer”
56. Henry James, “Turn of the Screw”
57. Allen Ginsberg, “Kadish and other Poems”
58. Vincent Eaton, “Self-portrait of Someone Else”
59. Marcel Proust, “Sodome et Gomorrhe. Vol. I”
60. Marcel Proust, “Sodom et Gomorrhe Vol. II”
61. Gustav Flaubert, “Un coeur simple”
I got a comment on this blog that’s been eating away at me for the last few months. It was from a post about developments in compstructbio over the last decade, where I had placed the program Rosetta as the biggest development in the last decade. The offending comment was from a student who said,
“Interestingly, when my bioinformatics lecturer talked about rosetta, I was expecting to say that it was amazing after reading your post. He put its success down to the ridiculous amount of computing power it has and said that we have no idea if the algorithm is any good.”
This is almost as wrong as a pork bun vendor hawking steaming hot pork buns in a synagogue on a sabbath. Indeed, it’s several pork buns worth of wrong. Now normally I’d let these kind of buns slide into the non-kosher basket, but I’ve talked to enough bioinformaticians out there to realize that it’s a pretty widespread attitude. So here, I hope to set the record straight.
So what does our Bioinformaticsexpert claim? Essentially this:
- We have no idea if Rosetta is any good
- Rosetta relies on a ridiculous amount of computing power
- Rosetta’s success is not surprising
The short answer is that
- we have an extremely clear idea of how good Rosetta is,
- Rosetta owes its success to a rather elegant package of ideas, and in an absolute sense, it has actually quite modest computing requirements and
- Rosetta’s success was a big shot in the arm for the structural biology community.
Now the long answer, Mr Bioinformaticsexpert. It’s clear Mr. Bioinformaticsexpert knows little about the history of the protein folding problem. Because the biggest thing to happen to protein folding is the CASP competition, which was first organized a decade ago. The goal of CASP is to define an accuracy test for any protein-folding algorithm. In CASP, predictors are given a sequence to predict a structure, and only after are the predictions submitted, will the actual structure be deposited in the PDB. Indeed, when CASP was first announced, it was a kick in the face to a whole bunch of protein-folding theorists who had claimed to have “solved” the “protein” “folding” problem “in principle”.
The measurement of the entries is a very serious business indeed. Indeed, designing a measure of accuracy is considered a very important job, and some of the biggest names in structbio (Jane Richardson, Manfred Sippl) have served as judges in CASP. Still, the entries in the first year were so bad, that the judges awarded mercy points to entries that were even vaguely globular. So yes, Mr. Bioinformaticsexpert, we have very clear criteria about the accuracy (and difficulty) of protein-folding.
Since CASP3, David Baker’s predictions, based on Rosetta (and I say based on because there’s a certain amount of manual tweaking), have consistently placed first in CASP. Now there’s a distinction between placing first and solving the problem. They haven’t solved it yet, but they have gotten closer than anyone else. We know this because we can put numbers on the accuracy of their predictions. They were good but you wouldn’t bet your first born on it.
Now strictly speaking, Rosetta is not an algorithm, but a mix of algorithms, statistical potentials, databases and heuristics. Serious protein folding researchers have long given up the idea that there is a single killer algorithm that can solve the problem. To even talk about a single algorithm is so 1993. Might as well turn up to the party with fingerless gloves, do a moonwalk, and sing Purple Rain. At its coarsest level, the protein folding problem can be broken into two parts: developing a good scoring function, and finding ways to search through conformational space. Fundamentally there is no more to it than these two problems. But practically speaking, there are many different ways to break these problems down into tractable programs in action.
Most people have long since given up on using chemically realistic atomic force-fields to score conformations. Once you get away from the shackles of atomic interactions, you can get quite creative in thinking up new force-field terms such as statistical potentials, and heuristics based on phenomenological criteria.
Searching through the space is an even more creative act. The tricks is to collapse your search space as much as possible by making educated guesses. One thing many people do is to identify secondary structure using bioinformatics means and then fix those parts to helix and strand conformation.
Many of the innovations in Rosetta was in finding clever ways to cut down the search space, which actually made Rosetta more efficient, and less computationally intensive than its competitors. An example: one early innovation of Rosetta was to use database-derived local fragments. Rosetta would look up the protein sequence against its database of short fragments, and if there was a hit, then that part of the protein would be fixed to that piece of local structure. Honestly, no one before this had taken local fragment sequence/structure relationships seriously. This brilliant insight alone made Rosetta faster than all its competitors, as these local fragments massively cut down on the search space.
Was the fact that Rosetta was successful, surprising? Here, I have to be somewhat anecdotal. Having first gotten into the protein folding problem in the late 90’s, I can tell you that I knew several people (Prof Y) working in the area who were seriously thinking of moving on to other areas, simply because no serious progress had been made for a very long time. I was looking around for a postdoc at the time and I had just interviewed in the Baker lab. After coming back from an (ultimately unsuccessful) interview, at the next joint group meeting, Prof Y, who himself had just come back from CASP3, looked straight at me with a strange look of shock and disbelief, said that the guy I had just interviewed with had managed to predict an unbelievable structure of an entirely new fold. I swear he was visibly shaking. It is hard to overestimate how much ju-ju that this achievement had at the time but yes, the success of Rosetta was incredibly surprising, and exciting.
I realize that I probably got a little too worked up over this topic. But it touches on something that matters to me: I think that there is a great divide between bioinformatics and compstructbiology, and it’s something that people should be aware of. I’d be the first to admit that my bioinformatics is poor; my knowledge of statistics is laughable; and I can barely construct a phylogeny tree. Yet I’d challenge a bioinformatician to explain the difference between a free-energy minimum of a canonical ensemble from the energy minimum conformation of a micro-canonical ensemble; or what is a covalent bond; and why that matters to protein folding. In protein folding, there are many subtle biophysical considerations over and above pure computational grunt.
Still in some ways, this lack of perspective is encouraging, because it means that protein folding is nothing like the trash-talking big-hair mess that it used it be. And people have moved on. Kind of like anything from the 1980’s. And that’s probably a good thing.
Have you ever made a sequencing error? Remember the moment you finally stare at the control experiment, and it slowly dawns on you that you had really fucked up. You stomach drops as you realized the consequences: maybe you kick yourself for your stupidity, maybe you just realized that you had poured 12,000 dollars down the drain, maybe you had just seen a chapter in your thesis dissolve into thin air. But anything you might imagine will pale in comparison to the sequencing error made by a company in the wild-west days of biotech.
You’ve probably never heard of the company that made the error (Genetics Institute). The company that did not was Amgen. And the difference is a patent now worth $2 billion a year. The protein involved is known as erythopoietin, but you probably know it as the drug that Lance Armstrong did not take, otherwise known as EPO.
It’s a sordid but rather entertaining story (would make a decent screen-play), and I stumbled on to it whilst Merrill Goozner’s history of the modern pharmaceutical industry “The $800 Million Pill: The Truth behind the Cost of New Drugs”. Not only did the sequencing error set back Genetics Institute in the race to patent the synthetic manufacture of EPO, but Genetics Institute had actually stolen the sequence from Amgen in the first place.
But first, some background.
The golden age of biotech in the 1980’s centered around a cluster of technologies now known as recombinant DNA engineering. In recombinant DNA engineering, if you could extract the complete DNA sequence that coded for a protein in the human genome, you could recombine that piece of DNA into a virus, inject that virus into a colony of bacteria, and force feed the bacteria into making industrial quantities of your human protein. This was much easier than the previous method, which is to kill a lot of animals and extract minute quantities of the protein, and make medicine out of it.
This promised a dramatically new line of therapy for diseases caused by the lack of a specific protein; the replacement of which restored normal function in a sick person. In hindsight, it turned out that there were only a handful of human diseases – the “low-hanging fruit” of biotech – that could fit this description. But almost every one of these diseases would lead to a blockbuster drug that would make the fortune of several biochemistry professors of the US university system.
Erythropoiesis – or red blood cell formation was one such disease. Red blood cell formation was known to be triggered by the presence of one, and only one, protein; erythopoietin or Epo. When Winston Salser left UCLA to create AMGEN, he had no clear idea of what he wanted to work on. He chased several drug targets of diseases, one of which, included erythropoiesis. However, for him to apply his DNA recombinant techniques to Epo, he would need the DNA sequence. And that, he did not have.
Cue to Eugene Goldwasser, who over several decades, had worked out methods to identify EPO, and purify it; first from animals, and then from humans. However, he did not know the protein sequence, let alone the DNA sequence that codes for the protein. Indeed Goldwasser didn’t even have that much Epo. It wasn’t until 1973 that Goldwasser found a way to produce workable quantities of Epo. Goldwasser was contacted by Takaji Miyake of Kumamoto university, who had access to some patients who suffered from a condition called aplastic anemia. The bone marrow of these patients did not work properly and led to an overproduction of Epo in the patient’s urine. Sending this urine to the US, Goldwasser was able to purify 8 mg of pure human Epo from 2,250 liters of piss over a period of 18 months. This became the world’s only supply of Epo.
After a lot of delicate negotiations, Goldwasser did a deal with Amgen. And Amgen undertook the process of figuring out the DNA sequence for EPO in the human genome, using Goldwasser’s store of Epo. You might be aware from first year biochemistry that there is a unique DNA code for the amino acids that codes for a protein. But a working gene, or fully functioning DNA sequence, includes so much more: start codes, end codes, promoters, repressors, splicing points, and other such regulatory pieces of DNA. Consequently, the process involved, first, working out the protein sequence of Epo, then working out the DNA sequence that codes the amino acids, and finally using fragments of these sequences to search for the actual piece of DNA in an actual human genome.
But first, Amgen needed the protein sequence of Epo. And they were in luck. In this golden age of biotech, many new machines would spring into existence like Athena from the head of Zeus. Leroy Hood of Caltech had just built a machine that could sequence the amino acids of a protein, any protein, using only a small quantity of protein. So Amgen sent a small sample of Epo to Leroy Hood to figure out the protein sequence.
And here is where it gets interesting. Rodney Hewick, one of the co-inventors of the protein sequencing machine, was one of the people who carried out the sequence analysis. After sequencing Epo, Hewick abruptly quit on Sep 1st 1981, only to immediately join a Boston biotech, Genetics Institute as a senior protein chemist. He arrived at the new company bearing gifts, and the gift was the sequence of Epo.
Suddenly the race was on for the identification of the human gene of Epo, and the patenting of the process to synthetically manufacture EPO using DNA recombinant engineering: Amgen on the West coast, Genetics Institute on the East coast.
The problem was that Rodney Hewick made exactly 3 errors out of 166 amino acids in the protein sequence of Epo. And he didn’t even know it. And in Boston, he had no more access to Goldwasser’s precious store of Epo to double-check his sequencing. After 3 year of failure, Genetics Insitute finally realized that their protein sequence was wrong. To mitigate their error, they had to hammer out a deal with Miyake in Japan to get their hands on the precious urine of the patients with aplastic anemia. They purified their own Epo, sequenced it again, and finally found the full gene for Epo. They submitted the article to Nature on Dec 7, 1984.
But by then it was too late. Fu-Kuen Lin, who had joined Amgen as their 7th scientist in 1981, had single-handedly identified the human of Epo in the human genome using Goldwasser’s protein. More important than getting a Nature paper, he had filed a patent for Amgen in Dec 13. 1983, a good year before Genetics Institute’s Nature article. In the world of big pharma, it is the patent that matters. Amgen got FDA approval in on June 1 in 1989. EPO was Amgen’s blockbuster drug, which attracted 460 million dollars from the government in the first year. It is now worth 2 billion dollars of income, and almost half of Amgen’s income in 2002.
Amgen became a biotech behemoth, whereas Genetics Institute eventually got bought out by Wyeth. And all because of of 3 errors in the sequencing of the protein.
Over the last few years, when I describe my job at parties to people who do not live inside the ivory tower, I say that I am doing “short-term contract research”. People intuitively understand what that means. The term “postdoc” on the other hand often draws blank stares.
At a recent party, I actually met a fellow postdoc, and when I said that I was doing short-term contract research, that drew a blank stare from his face. After some clarifications, we realized that we were both postdocs.
I don’t really like the term postdoc because it hides the ephemeral nature of the position. Perhaps in years gone by, where a postdoc was a reasonably good guarantee up the academic leader, the term had some cachet. But now, it is almost an empty term.
That is why I prefer the description “short-term contract research”. This emphasizes the arbitrary nature of employment. Unlike a real position, it is capricious and can be terminated at the drop of a hat. Furthermore, by emphasizing contract, it makes no promise about any further career path. And as for the explicit word “research”, well everybody understands what research is, and they naturally assume that I am qualified to do it.
And really that is all that they ought to conclude: that I can do research, and that I am being paid for a little while to do it.
There are times when you need to flog your own work. I recently published a paper [1] that provides a concrete model of single-domain allostery. It also provides a clean computational model of interior sidechain-sidechain interactions. I think it’s my best work yet, and I will now flog the shit out of it for you.
What is allostery and why should you care about it?
From the greek word for other (allo), allostery refers to how events occurring at one site on a protein induces changes at another site, far, far away. Allostery was originally conceived to explain cooperative binding in oligomeric proteins made up of identical subunits. Binding sites are typically found at domain interfaces, where the binding of ligand at one site forces changes in the domain interface that lead to quaternary structure rearrangements. The rearrangements propagate to the empty binding sites on the other domain interfaces, resulting in better (cooperative) binding and hence, allostery. The concept of allostery now means any long-distance effect in a protein system.
As such, allostery is the foundation stone of information processing in the cell. Let’s say we have two pathways, pathway A and pathway B, which are at first separate. We then select protein A from pathway A, and protein B from pathway B. Then we design an allosteric adaptor protein C where binding protein A to protein C will induce better binding for protein B at another site on protein C. The allostery in the adaptor protein C now intermingles pathway A to pathway B.
Allostery is thus the key to identifying pathway interactions in protein structures and finding a theory of allostery in protein systems is one of the major challenges in computational structural biology. The problem has split in all sorts of sub-problems such as oligomeric proteins and nucleotide-protein systems. But for me, the most challenging sub-problem is the allostery of single domains.
Singe domain allostery: allostery through dynamics
The surprising thing is that even a single domain can undergo allostery. What makes this difficult to understand? Well, if binding a ligand transmit changes to another site, then, in the absence of quaternary structure, these changes must tunnel through the body of the protein. Here is the rub: how do you model dynamics through the body of a protein?
One hypothesis for how single domain allostery might work is the Cooper-Dryden model [2], which argues that allostery occurs through flexibility in a protein. The idea is that large pieces of a single domain, including the binding site, are intrinsically flexible. Subsequent binding of a ligand will rigidify, not just the binding site, but other parts of the protein. These newly rigidified sites can serve as allosteric binding site.
However, the Cooper-Dryden model is only a generic thermodynamic argument. What is lacking is a way of calculating the intrinsic flexibility of a protein from the crystal structure, and showing that this explains allostery in a real protein system.
The same PDZ fold gives rise to different dynamics
The poster child of single-domain allostery is the PDZ domain. A shockingly small domain of ~60 amino acids, PDZ domains are ubiquitous scaffolding proteins. The PDZ domains have a well-defined binding groove that mainly binds C-terminii of other proteins or short peptides. However, there are at least 2 or three other binding sites in the PDZ fold. A quick survey of these interactions show that indeed, these little buggers act as allosteric adaptors.
Nevertheless, no one had actually catalogued the differences in allosteric changes across the PDZ domains. I found 5 PDZ domains with crystal structures that show distinctly different allosteric response. The differences can be seen in the mobility of the α-helices in the different PDZ domains upon ligand-binding, apo structures in magenta, ligand in blue, and ligand-bound structure in light blue:

As per the Cooper-Dryden model, we can attribute allostery in the PDZ domain to the intrinsic dynamics of the α-helices. Upon binding, the α-helices rigidifies creating new rigid surfaces, which are available for binding to other ligands, leading to allostery.
Here’s the challenge: we have five PDZ domains (all with the same fold) with different dynamics in the α-helices. Is there a way to calculate the conformational flexibility from the apo crystal structures of the PDZ domains?
A computational model of sidechain-sidechain interactions
Well, of course there is a solution to the computational problem of single domain flexibility, otherwise I would be leading you down the garden path. Clearly, all 5 PDZ domains have the same fold, so major differences must lie in the sidechain-sidechain interactions. Now one might argue that long-time molecular-dynamic (MD) simulations of the 5 structures will tell you about the flexibility, but there are problems with this. First, it’s expensive to run these simulations, but second, even if you could identify the flexible regions, there is currently no trajectory analysis method that allows you to clearly deduce sidechain-sidechain interactions.

How sidechain couplings differ
The results from the heat maps of different PDZ domains are striking: at the same tertiary contact in the PDZ fold but in different PDZ domains, different sidechain-sidechain have different interaction strengths. Here are two examples of the same contact position in two different PDZ domains. On the left they interact strongly. On the right they do not.
The usual definitions of good interaction strength – being in contact and being buried – is insufficient. Sidechain interaction depends on how they interlock, which depends on the sidechain degrees of freedom and their orientation with respect to each other. The RIP method provides a straightforward computational method to explore all these factors, and identify the couplings in a structure.
Tertiary couplings explain α-helix flexibility
However, not all couplings are interesting: couplings between residues in the same piece of secondary structure (β-sheet and α-helicx) don’t tell us very much. After all these residues are already coupled by the backbone hydrogen bonds. What turns out to be extremely informative are the sidechain-sidechain couplings of tertiary contacts. On the left are the apo structures overlaid over the ligand-bound structures (red indicates of large RMSD difference). On the right are the residues involved in major tertiary couplings (red):

By comparing the tertiary coupling maps of the apo structures to the conformational freedom in the ligand-bond structures, I found a rather simple relationship: PDZ domains with tertiary RIP couplings between the α-helices and the body of the protein have rigid α-helices; the PDZ structures without couplings have dynamic α-helices.
The theory of single domain allostery
Although one shouldn’t generally generalize from a single case study, I’m going to anyway since there are few examples known to me of single-domain families with diverse allostery and crystal structures.
Single domain allostery arises from the intrinsic dynamics of a protein fold. A piece of secondary structure (α-helix or loop) is flexible or rigid depending on the strength of the tertiary contacts to the body of the protein. The strength of these contacts depend on how well the sidechain-sidechains interlock, which (surprise! surprise!) can be adequately explored with the RIP method.
So, depending on the specific sidechain interactions, the same fold will have different surface flexibilities. Binding a ligand at one of these regions will rigidify all connected pieces of flexible regions. The newly rigidified portions of the surfaces provide new allosteric binding sites, thereby transmitting information through the body of the protein.
Arguably the second greatest scientist of the Twentieth Century was Linus Pauling. He single-handedly created modern chemistry by figuring out how to mathematically model the chemical bond.
After studying at Oregon State College and getting his phd at Caltech, Pauling travelled to Germany in the 30’s to learn the shit-hot theory of quantum mechanics. When he arrived in Europe, quantum mechanics had been successfully used to calculate, in astonishing precision, the atomic spectra of hydrogen and other such simple atoms.
It seems obvious with 90 years of hindsight, but quantum mechanics is the quantitative theory of molecules and, by extension, all of modern chemistry. But the early physicists who invented quantum mechanics couldn’t see that (although Dirac famously quipped after discovering the relativistic version of the Schrödinger equation, ‘the rest is chemistry’) because, well, the early physicists just didn’t know enough chemistry.
This is where Pauling came in. He arrived in Europe with the latest ideas about chemical bonds from the US such as Lewis electron pairs and the hydrogen bond. Combined with his unparalleled breadth of knowledge of chemistry and superb mathematical ability, Pauling realized that the electron orbitals of atomic structure could be bent into the first ever mathematical description of the chemical bond.
In a series of papers in JACS in 1931 and 1932, titled under the rubric “The Nature of the Chemical Bond”, Linus Pauling made the first ever attempt to mathematically model the chemical bond in terms of linear combinations of atomic electron orbitals. Compared to today’s macho quantum chemists and their chtonic cpu-hungry calculations, these approximations might seem like child’s play, but you have to remember just how much insight Pauling squeezed out of his remarkably simple calculations of hybrid orbitals.
I believe that Pauling’s 1931 paper, The Nature of the Chemical Bond. Application of Results obtained from the Quantum Mechanics and from a Theory of Paramagnetic Susceptibility to the Structure of Molecules, the first of the lot, is one of the great papers of science. It ranks up there with Einstein’s paper of special relativity and Watson & Cricks DNA paper.
I first came across this beautiful calculation in Pauling’s 1960 book, but it is much more exciting to read the argument in the 1931 paper. That paper is easy to find at the Linus Pauling archive at the Oregon State archive but is unfortunately stored in a stupid TIFF format. Here, I’ve taken the liberty of stitching it together as a PDF file for your reading pleasure. Enjoy.
