Cerius2 Forcefield Based Simulations

Forcefield Based Simulations

6 Free Energy

Who should read this chapter

Forcefield-based calculation of relative or absolute free energy should be considered an "expert" application. We recommend using the FDiscover program for relative and absolute free energy calculations. Although it is possible to run relative free energy calculations with CHARMm, that program is slower and less flexible.

This chapter explains

: Relative free energy--theory and implementation
: Absolute free energy

Relative free energy--theory and implementation

This section includes

: Finite difference thermodynamic integration (FDTI)
: Relative free energy--methodology

To compare chemically distinct systems, the FDiscover program can be used to calculate the relative free energy difference between chemically unique species. This approach uses the finite difference thermodynamic integration (FDTI) algorithm of Mezei et al. (1987). Since FDTI does not require analytical derivatives of the Hamiltonian with respect to the coupling parameter, it is more suited for the complex coupling formalisms required in chemical perturbations. It also has been shown to possess better convergence properties than other methods.

The value of _i in Eq. 121 for a given i depends on the numerical integration scheme used. For example, a simple trapezoidal rule approach would make each _i equal. More sophisticated integration methods may allow each _i to be different. The FDiscover program uses a Gaussian-Legendre quadrature method (Press 1986) that chooses the values needed, given the total number of intervals specified. Hence, you never explicitly specify the intervals.

Advantages

The advantages of FDTI are:

Unlike the perturbation method (PM), large changes in free energy can be calculated in fewer steps.
Unlike thermodynamic integration (TI), analytical derivatives of the Hamiltonian with respect to the coupling parameter are not needed.
As has been suggested by Mezei, FDTI may converge faster than either TI or PM.

Incorporation of the coupling parameter

The first step in any free energy calculation is parameterizing the energy function to provide a continuous change between the thermodynamic states that are being compared. The relative free energy function as implemented in the FDiscover program is parameterized at the level of the forcefield parameters themselves. For example, the energy E of a harmonic bond as a function of the bond length b is:

Eq. 122

where K is the force constant and b₀ is the reference bond length. In a free energy calculation E(b) may be different for the initial A and final B states (i.e., each state has different force constants and reference bond lengths). The function is reformulated as a function of the coupling parameter:

Eq. 123

changes from 0 to 1, the bond description gradually changes from a State A bond to a State B bond.

Relative free energy--methodology

This describes the steps involved in performing an in vacuo calculation to solve the free energy difference between methanol and ethane.

Background of example problem

A seminal free energy calculation reported by Jorgensen and Ravimohan (1985) showed that free energy perturbation techniques seem to accurately predict the difference in solvation energy between methanol and ethane. This calculation has since been repeated in several laboratories, using independent programs and forcefields, and has become a de facto benchmark for free energy calculations (Singh et al. 1987; Fleischman and Brooks 1987).

Construction of the thermodynamic cycle

Consider the following thermodynamic cycle for the solvation of ethane and methanol:

This cycle requires that:

Eq. 124

The right side of this equation is the difference in solvation free energy that can be measured experimentally. However, the left side is more convenient to compute (making water appear around a solute is a much larger perturbation than changing a methyl group into a hydroxyl group). Either side can be chosen, because both sides are equal.

If there is no difference in intramolecular energy between ethane and methanol, then A₃ is zero and the gas phase calculation can be ignored. Although this is intuitively incorrect, a suitable forcefield could treat the intramolecular energy of ethane and methanol identically. For example, if the nonpolar hydrogens were not represented explicitly, no atom pairs would have more than two intervening bonds. Most forcefields calculate intramolecular nonbond interactions only for atom pairs separated by more than two intervening bonds. Furthermore, if the solutes are treated as rigid models, the energy would not be affected by deformations in bond lengths or angles.

In this exercise all hydrogens are explicitly included. In addition, variations in degrees of freedom are allowed for both the solvent and solute molecules.

Including all degrees of freedom requires both a vacuum and a solvent calculation. The computational penalty for an additional gas phase calculation is minor; the solvent calculation is two orders of magnitude more intensive. Moreover, because setting up the free energy calculation is virtually identical for both calculations, the gas phase calculation provides a tractable example that yields experience and confidence in performing free energy calculations without wasting an inordinate amount of computer time.

Defining the = 0 and = 1 states

To set up the calculation, you must first design the chemical perturbation for methanol going to ethane. You must first choose which of these models corresponds to the = 0 state. The = 0 state must always have enough atoms to accommodate both molecular extremes, even if invisible "dummy" atoms must be added as placeholders for atoms that appear as the perturbation proceeds.

Ethane is the logical choice for the = 0 state, because methanol can be completely subsumed within ethane. However, to illustrate that the choice is arbitrary, the calculation in the opposite direction, for methanol going to ethane, is described.

The = 0 state is provided to FDiscover as would any model for simulation, in ordinary coordinate (.car) and molecular data (.mdf) files. In the FDiscover command input file (.inp), you can specify which atoms are being perturbed and their limiting states with the Warp command. The limiting states are defined completely by the potential atom type, the partial atomic charge, and the mass of each atom.

Analyzing the output of a methanol-to-ethane free energy calculation

The free energy summary table printed out at the end of the completed free energy loop (after the last resume) contains the complete history. Throughout the run, a partial summary table is output after each resume and contains all the available data.

Assessing convergence

Graphing the Boltzmann factor [exp -(E \xda kT)] vs. time aids in assessing convergence during a given calculation. Systematic drift in exp -(E \xda kT) over time is symptomatic of nonconvergence.

To simplify analysis, various quantities are output during free energy calculations to a file with the extension .tot. Figure 32

Figure 32 . The ensemble-averaged quantity from Eq. 120 plotted vs. time for three values of
Each point plotted is actually an average of 10 points sampled over 100 fs.

plots Column 5 of the .tot file [exp -(E \xda kT)] for the first, third, and last points for the methanol-to-ethane gas phase calculation. Note that there is a slight upward drift in the curve for the first point, although the others seem to have converged.

Absolute free energy

This section includes

: Theory and implementation
: Example: Fentanyl
: Analysis of results

Theory and implementation

This section describes a technique for convenient, precise, and numerically efficient determination of absolute free energies of stable or unstable constrained model conformations. The technique of absolute free energy is general and can be applied in a transparent manner to systems in a vacuum or in solution, under any conditions of volume and/or temperature.

This approach is a special case of the thermodynamic integration approach to free energy calculations, which is itself a general method for computing the change in free energy upon going from one thermodynamic state to another. Absolute free energy simply constrains one of these states to be a model system for which the absolute free energy is known analytically. By integrating from a known, albeit model, state to the final real state, the absolute free energy becomes the sum of the numerically computed thermodynamic integration step and the analytical absolute free energy of the model state.

Uses of absolute free energy calculations

Evaluating absolute free energies for particular conformations is an important goal for several reasons:

Absolute free energy values of different thermodynamic states can be compared directly without having to devise a transition pathway between them, as is necessary in relative free energy calculations.
The poor convergence properties associated with reversible structural changes in relative free energy calculations can be avoided.

This section includes a description of the essential characteristics of the absolute free energy method, including a derivation of the ideal solid model and the thermodynamic integration method.

Derivation for ideal systems

The absolute free energy algorithm depends on defining a model for which the partition function in Eq. 114 can be derived analytically. The model implemented in the FDiscover program is an ideal solid. That is, the atoms in the system are constrained harmonically to a lattice (analogous to a solid) and do not interact with each other (analogous to the familiar ideal gas). The Hamiltonian for such a system is:

Eq. 125

where the first summation is simply the kinetic energy (including a reciprocal of Planck's constant for each degree of freedom, a quantum effect), and the second is a harmonic function constraining each atom to a corresponding lattice point (x_i⁰, y_i⁰, z_i⁰) with a force constant of K_i. Note that there are no terms for interactions between particles. This simplification is what makes an analytical solution possible. Substituting this Hamiltonian into the partition function gives:

Eq. 126

Returning to Eq. 114, the free energy of the model ideal solid can be written as:

Eq. 127

A remarkable result of this formula is that it does not depend on coordinates--neither the model coordinates nor the lattice site coordinates. It depends only on the mass of the atoms and the force constant used for the harmonic constraint. This property has important practical implications for free energy calculations, making it possible to choose whatever set of lattice site coordinates gives the best convergence properties for the subsequent thermodynamic integration step.

Thermodynamic integration--derivation for real systems

As shown above, integrating the partition function analytically may be possible for simple Hamiltonians. However, for more realistic systems that include many nonbond and bond interactions between atoms, an analytic solution is impossible. As in the relative free energy calculation (see FDTI), we use thermodynamic integration to determine the change in free energy:

Eq. 128

Substituting the equation for the free energy (Eq. 114) into Eq. 128, we can write the difference in free energy as a function of the partition function Q(

Eq. 129

Without defining explicitly how the Hamiltonian depends on

, we can write the difference in the energies as:

Eq. 130

The problem of computing a free energy difference is thus simplified to that of computing the expectation value of a derivative of the Hamiltonian. Since an expectation value is, according to Gibbs' postulate (an axiom of statistical mechanics), the ensemble average of the quantity, it is easily computed as the average of that quantity over a suitably equilibrated set of snapshots of the system.

These snapshots are typically generated from either a molecular dynamics or a Monte Carlo simulation. Thus, appropriately averaging the results of a molecular dynamics trajectory enables Eq. 130 to be evaluated. This is far easier than calculating the general partition function of Eq. 129.

Eq. 130 is perfectly general for any classical system and is the fundamental equation of all thermodynamic integration methods. However, we have not yet described what the Hamiltonian is or how it can be parameterized in terms of . The Hamiltonian could be parameterized in an infinite number of ways. Because a function of state (the free energy) is being integrated, the path between the end points is arbitrary. For absolute free energy calculations, it turns out that the simplest form is perfectly adequate.

What is the Hamiltonian?

The parameterization scheme adopted by the FDiscover program is straightforward. The Hamiltonian is defined as a linear combination of the two potential energy functions that describe the extreme states:

Eq. 131

Here, K(p) is the kinetic energy (because the two states are completely defined by their respective potential energy functions, the kinetic energy does not have to be coupled to

), V₀ is the normal potential energy function (including bonds, angles, torsions, etc.), and V_H is a harmonic oscillator site potential given by:

Eq. 132

Parameterization of the Hamiltonian

where K_i is the i^th atom's spring constant, r_i is the i^th atom's instantaneous coordinate, and r_i⁰ the reference lattice of the noninteracting atoms (often referred to as an Einstein solid). This choice of parameterization is motivated by the template-forcing concept in the FDiscover program and by previous free energy evaluations (Hoover and Ree 1967). In Eq. 131, is greater than 0 and less than 1, and describes an energy-space path between a system described by an unadulterated molecular forcefield ( = 0) and one represented by independent harmonic oscillators ( = 1).

Eq. 132 is thus the reference system for which an absolute free energy can be directly calculated. The potential V_H restricts the exploration of phase space to a region defined by atomic mean-squared displacements relative to the Einstein solid. Consequently, the choice of both the reference state and spring constants, in general, affects the calculated free energies. For most cases, the energy-minimized coordinates of a mechanically stable structure are used for the reference Einstein solid. Using Eq. 127 for the free energy of the Einstein solid A₁ and Eqs. 130 through 132, the absolute free energy of the real state is:

Absolute free energy of the real state

Eq. 133

Computational considerations and precautions

The simple form of Eq. 133 makes evaluating its ensemble average straightforward. A comprehensive dynamics trajectory for a given value of represents an ensemble of structures for that . By computing and averaging the values of (V₀ - V_H) for each structure in this ensemble, the integrand can be numerically estimated for any given value of . By performing several such calculations for many values of between 0 and 1, eventually the function can be numerically integrated.

Several practical simulation considerations are apparent from Eqs. 131 and 133.

For example, when is close to zero, the calculated ensemble of structures includes configurations far from the reference state (the ensemble is generated according to Eq. 131). Therefore, fluctuations of V_H are large when V_H is evaluated for this ensemble.

Conversely, when is close to one, the structures generated are minimally influenced by the real forcefield. These ensembles can include structures with distorted bonds, angles, and even overlapping atoms, leading to large fluctuations in V₀ in Eq. 133. Large fluctuations of (V₀ - V_H) decrease the precision with which its integral can be calculated. The overall effect is to destabilize the integral and, in extreme cases, cause it to diverge.

These undesirable effects can be minimized by reasonable choices of reference structures, spring constants, and integration algorithm. For example, the FDiscover program uses a Gaussian-Legendre quadrature algorithm to integrate Eq. 133. This has the useful property that one need not evaluate the function at the boundaries (where the divergence is worst).

Choice of the reference state

The choice of the reference state is the most critical step in an absolute free energy calculation. The reference state determines where the sampling of configuration space is centered. Choosing reference states that are only slightly different (for example, 0.1 Å rms in coordinates) can change the free energy significantly if the reference state has any residual strain.

Best results are obtained when the reference state represents a minimum-energy conformation on the normal energy surface. If the reference structure is a minimum for both V₀ and V_H, there is no impetus for the conformation to wander away, which, as described above, is the primary cause of divergence.

For non-minimum configurations such as the saddle point c in Figure 34, excess strain can be removed by minimizing with torsional restraints (see Example: Fentanyl).

Limitations

The absolute free energy technique is primarily used to evaluate the free energy of different conformations of the same model. This is particularly difficult to do by perturbation methods alone, since a path from one conformation to another may be difficult to construct and model (for example, a path converting an -helix into a -sheet). Although free energy methods are path independent (because what is being evaluated is the difference between thermodynamic state functions), the integration through the path must be performed reversibly, which is problematic for large structural rearrangements.

Example: Fentanyl

This example presents a calculation of the free energies of various conformational states of fentanyl (Figure 33

Figure 33 . Definitions of ₁ and ₂ for fentanyl

). Previous studies of fentanyl have examined the conformational behavior around the anilido nitrogen and identified two main classes of energy minima in the 2D energy map (Tolleaneare et al. 1986). Figure 34

Figure 34 . Two-Dimensional energy map of fentanyl as a function of the two dihedral angles defined in Figure 33
Locations a and b correspond to the two unique minima, and c is at a transition state.

shows such a map, generated by contouring the energies obtained from a flexible-geometry minimization study performed with the Discover program. These minima are separated by approximately 10 kcal mol^-1 energy barriers and differ from each other by about 1 kcal mol^-1. The free energy of the states represented by the minima and the barriers between them will be calculated by the absolute free energy method.

The reference state has two roles in the free energy calculation. A trivial contribution is the free energy it contributes as an ideal solid. This contribution is trivial because it does not depend on the conformation of the reference structure. It depends only on the masses of its atoms and the spring constants constraining each atom to the lattice (see Eq. 127).

The nontrivial role is that it determines which part of configurational space is sampled during dynamics. In this example, the free energies of states a, b, and c (Figure 34) of fentanyl are compared. The reference states for a and b correspond to completely minimized structures at these points. Point c in Figure 34 represents the free energy at a barrier. Obviously, it would be incorrect to minimize this reference structure--it would simply roll down into one of the adjacent wells. The lowest-energy structure at the saddle point can be constructed by forcing the dihedral angles to adopt the values of the saddle point (using the force torsion command in the standalone version of the FDiscover program, or the Constraint/TorsionForce command in the Insight version) while minimizing the rest of the model (this, of course, is how the energy map in Figure 34 was generated in the first place).

It is important to choose the lowest-energy structure consistent with a given conformational state, so that the free energy calculated does not contain excess enthalpy. Constraining the sampling of configurational space to a rather small region localized around the reference structure limits the distribution of energies that is sampled. Moving the reference structure to a slightly higher-energy conformation would sample different energies and change the total free energy.

Consider Figure 35, where the history of ₁ and ₂ is superimposed on the ₁ \xda ₂ energy map for one of the ensembles. Note how only the local region around the reference structure is sampled. While this may be unsettling (what is the meaning of a free energy that does not represent all of configurational space), remember that you are interested in the difference in free energy between closely related states (a, b, and c differ mainly by rotation of two dihedral angles). To achieve this resolution, the sampling must be restricted accordingly.

Figure 35 . Time history of the dihedral angles ₁ and ₂ superimposed on the 2D energy map of Figure 34
Two trajectories are plotted, one for a minimum corresponding to state a in Figure 34, the other for the transition state (c). Note that only a small fraction of the conformational space of the model needs to be sampled to estimate the free energy of a conformer. Each trajectory corresponds to the first value, i.e., when the potential function is most like the standard potential.

Hints for setting up the absolute free energy calculation

In the Insight version of the FDiscover program, an absolute free energy calculation is set up using the Parameters/Absolute command. The system is first minimized, then the atoms are tethered to the minimized locations. The Parameters/Minimize and Parameters/Dynamics commands can be used to control the initial minimization, as well as the dynamics used to sample the conformational space.

In the standalone version, the reference state is defined in the FDiscover input file (.inp), with all atoms being tethered. Using the tether command allows you to use different reference structures easily. Once the tethered atom list is generated, the model could undergo dynamics to randomize the starting point just prior to starting the free energy calculation.

There is an important exception to this rule. Once a free energy calculation is initiated, the reference structure is saved into a coordinate file with the extension .cli. Thus, a permanent record is maintained of the reference state for each free energy calculation. This file is used by the FDiscover program to restart free energy calculations that may have been interrupted. The file format is identical to the .car and .cor files so that they can be used interchangeably.

One important property of the .cli file is that, if the file exists when the free energy calculation begins (for example, from an earlier or different run), the reference state is read from the file regardless of any prior tethered atom list generated. This does not mean that if a .cli file is provided, a tethered atom list command is not needed. Rather, if the .cli file is present, no matter what conformation is used for the tethered atom list command, the actual reference structure used is that in the .cli file. Be aware then, when doing new free energy calculations requiring different reference states that any existing .cli files that would override the internally generated reference state must be removed.

The magnitude of the spring constants in Eq. 132 indirectly affects the free energy calculated by controlling the volume of phase space being sampled during dynamics. Large spring constants bias the sampling to the immediate neighborhood of the reference structure. Smaller values allow the model to sample more space.

The spring constants also affect the error associated with a calculation. Ensembles that contain structures far from the reference state make greater contributions to the free energy, by virtue of the greater fluctuations in (V₀ - V_H) for this ensemble.

The FDiscover program provides two methods for assigning spring constants: