Docking Proteins: finding ways to fit-in


I got interested in bio-technology, and thus met with some experts at Berkeley and started a project to understand how certain types of drug work through binding to proteins. My project is still on-going and maybe one day it will lead to new drug discoveries!

Protein Physics

Proteins are essentially long chains of amino-acids, supported by a carbon backbone. There are roughly 20 types of amino-acids, and different combinations lead to different types of proteins.

The interactions between each amino-acid affect the proteins' shape, allowing it to bend and fold in particular ways. These shapes can be experimentally measured by crystallizing proteins and shooting x-rays at it. The x-rays get diffracted, and the shape can be infered by measuring the out-going rays. Proteins typically have crevices and holes that allow other molecules to fit in and subsequently perform its functions. If certain compounds can plug up these holes or alter the shape of the protein, it can block the proteins' function. Many drugs function in this way by disabling unwanted proteins.

Docking is a way to numerically predict how well a given compound fit inside the crevices of a protein. This is done by estimating the binding energy, which tends to be more negative for more tightly binding. The binding energy contains two contributions:

Now, let's take a look at our target protein for the toy study: HSP90b, which is found in humans and may be related to cancer progression. The protein's digital model file can be easily obtained through the RCSB protein data bank . To view the model, one can use the open software chimera. Docking will be done using dock6. We'll refer to the compound that is being docked onto the protein as ligands.

Image of a proton
Image of HSP90

3D surface of HSP90. Red (blue) shows regions of negative (positive) effective charges. The ligand is show in stick figures, and has a very complicated name and we'll simply call it KU3. The total binding energy is calculated to be roughly -60 kcal/mol. The energies need to remove each hydrogen from CH4 is roughly 100 kcal/mol, thus the ligand-protein interaction energy is rather small compared to typical chemical binding energies.

Docking Study

Now that we figured out how KU3 binds to HSP90, let's take a look at how other compounds can do the same. So the simplest way to proceed, is to simply remove KU3 from the model file, and see if one can replace it with other compounds. A sensible list can be obtained from the free database ZINC15, where the approved drug list is a good starting point (contains roughly 1500 compounds). After properly preparing protein file, I ran dock6 (takes a few hours), and here are some sample results that are particularly binding

Image of a proton
Binding of Sibenadet to HSP90

The most binding compound turns out to be Sibenadet, which is a drug that could treat chronic obstructive pulmonary disease. The binding energy is roughly -69 kcal/mol, and is largely dominated by Van-Der-Waals interaction with a small electrostatic constribution of -2.3 kcal/mol. Thus binding is largely due to how nicely Sibenadet fits in the protein site. In reality, given the many infinite possible orientations, it may be quite hard for a compound to orient itself in such particular position to achieve such a low binding, thus binding may actually be quite unlikely.

Image of a proton
Binding of Spermine to HSP90

Another fairly binding compound called Spermine. A compound involved in cellular metabolism. Here the total binding energy is -60 kcal/mol, and the Electrostatic contribution is more than half at -34 kcal/mol! It looks like this binding is largely coming from interaction with the negatively charged (red) regions of HSP90. This looks like a an interesting candidate!

Of course, drug discovery isn't as simply as running some numerical calculations. The next step could either be testing out some promising candidate in the lab to see if the ligands really bind, and if there are additional side effects. The question of side effects can be investigated by considering the following: how generic is the given ligands' binding to the specific protein sites? In other words, is the ligand simply a really sticky compound, or does it really prefer to bind to HSP90 specifically. To proceed, I investigated potential correlations of binding energies between HSP90 and other common proteins. I chose four test subjects to this test: HSP90 analog found in yeast (HSY), ornithine decarboxylase (ODC), albumin (ALB), and hemoglobin (HEM).

Binding Energy Correlation
Scattering plot of the absolute value of the binding energies (in units of kcal/mol)

Scatterplots of the absolute value of total binding energies for different proteins (diagonal shows the distribution). As expected, we see a large correlation between HSP and HSY, that is expected since they are analogs of one another (HSP for humans and HSY for yeast). However, there is still large correlations between binding energies for different proteins. This can be explained by the fact that Van-Der-Wals interaction is very shape dependent, and quite often crevices for different proteins can be quite similar.

Electrostatic Energy Correlation
Scattering plot of the absolute value of the Electrostatic binding energies (in units of kcal/mol)

Scatterplots of the electrostatic binding energies (in mol/kcal times -1). There is still correlations between HSP and HSY as expected, and small correlations between HSP and ODC as the surface effective charges for the ligand sites are both rather negative. For the ALB and HEM, the correlation is mostly gone! Thus the electrostatic binding energy could possibly be a good indication of dock selectivity

After applying some basic criterion (large electrostatic contribution, stronger binding to HSP90 than all other proteins), below are a few good sample candidates for further drug developments (Unfortunately our friend Spermine binds strongly to ODC with a binding energy of -76 kcal/mol compared to -60 kcal/mol):


I learned that proteins have interesting surface features that can allow binding to happen. This binding can be modelled straightforwardly using physics, and one can obtain binding energies quite easily through dock6. Many ligands have good binding, but a large component comes from Van-Der-Waals interaction which can be correlated across different proteins. A good drug candidate should have a more balanced binding contribution coming from electrostatic interactions, which leads to more selectivity. Ideally, one would like ligands that bind to one and only one protein, but reality is much more complicated, and it may be necessary to run through a much larger database of proteins to determine potential side-effects. Of course, all of this is neglecting additional effects of these drug candidates inside the body. I've identified a few potential drug candidates but further study is absolutely needed!