BR's crystallographic computing tutorials


Implementation of CD spectra fit

The problem is quite straight forward, namely to read data, baseline, standard curves and some experimental details and list the results of the fit in a format which can be plotted easily using any spreadsheet program. The program checks for proper range of data, applies a baseline correction if needed, and attempts to solve the fit by a fast general LSQ method. If that procedure returns a negative coefficient, the program imposes positivity on the problem by iteratively hacking through all possible combinations and flags the best one. Final output is a listing of statistical parameters, data lists and a spline to the data, just for exercise.

Code

First, the data, baseline, standard and header files are read and the overlapping wavelength range determined. After reading the files, finding the proper range of data, and calculating the conversion factors from sample ellipticity to mean residual ellipticity in [deg.cm2/(dmole.residue)] we put out the header:

c --- knock out header                                                  CDFI0200
      rres=float(ires)                                                  CDFI0201
      write (8,'(a)') ' -----------------------------------------------'CDFI0202
      write (8,'(a)') ' CD DATA fitted by CD-FIT program (B.Rupp, 1996)'CDFI0203
      write (8,'(a)') ' -----------------------------------------------'CDFI0204
      write(8,'(a,a)')  ' CD raw data file        : ',datfil            CDFI0205
      write(8,'(a,a)')  ' CD baseline file        : ',basfil            CDFI0206
      write(8,'(a,a)')  ' Sample header file      : ',hedfil            CDFI0207
      write(8,'(a,a)')  ' Ellipticity output file : ',outfil            CDFI0208
      write(8,'(a,a)')  ' Results file            : ',resfil            CDFI0209
      write(8,'(a,a/)') ' Standard data file      : ',lysfil            CDFI0210
      write(8,'(i3,a,i3,a,i3,a)') npts,' data points from ',            CDFI0211
     & ilo,' to ',ihi,' nm'                                             CDFI0212
      write(8,*)         '--------------------------------------------' CDFI0213
      write(8,'(a,f10.2)') ' Molecular weight       g/mol   :  ',mw     CDFI0214
      write(8,'(a,f10.6)') ' Sample concentration  mg/cm3   :  ',cconc  CDFI0215
      write(8,'(a,f10.8)') ' Sample concentration   g/cm3   :  ',conc   CDFI0216
      write(8,'(a,f10.8)') ' Sample concentration mol/cm3   :  ',cmol   CDFI0217
      write(8,'(a,f10.6)') ' Cell path             mm       :  ',ppath  CDFI0218
      write(8,'(a,f10.6)') ' Cell path             cm       :  ',path   CDFI0219
      write(8,'(a,f10.6)') ' Number of residues             :  ',rres   CDFI0220
                                                                        CDFI0240
c --- in deg*cm2/dmol                                                   CDFI0241
      dfact=mw/(10.0*conc*path)                                         CDFI0242
      write(8,'(a,f12.2)') ' Conversion factor [cm2/dmole]  :',         CDFI0244
     &   dfact                                                          CDFI0245
      write(8,*)        '---------------------------------------------' CDFI0246


We perform baseline subtraction and a final baseline correction assuming that, based on the model we are using, the signal between 245 and 250 nm should be zero.

c --- create the CD data from bas and knp and 18/pi                     CDFI0252
      itest=0                                                           CDFI0253
      do i=ilo,ihi                                                      CDFI0254
         cd(i)=(knp(i)-bas(i))/raddml                                   CDFI0255
c ---    mean residual ellipticty                                       CDFI0256
         elli(i)=cd(i)*dfact/ires                                       CDFI0257
         if(itest.eq.1) write(*,*) i,cd(i),elli(i)                      CDFI0258
      end do                                                            CDFI0259
                                                                        CDFI0260
c --- create zero shift only if sufficient data available               CDFI0261
      if (ihi.ge.250) then                                              CDFI0262
         iavg=ihi-245                                                   CDFI0263
         do j=250-iavg,250                                              CDFI0264
            sumbas=sumbas+elli(j)                                       CDFI0265
         end do                                                         CDFI0266
         sumbas=sumbas/(iavg+1)                                         CDFI0267
         do j=ilo,ihi                                                   CDFI0268
            elli(j)=elli(j)-sumbas                                      CDFI0269
         end do                                                         CDFI0270
         write (*,'(a,f5.0)') ' Zero shift applied : ',sumbas           CDFI0271
         write (8,'(1x,a,f5.0)') ' Zero shift applied : ',sumbas        CDFI0272
      else                                                              CDFI0273
         sumbas=0.0                                                     CDFI0274
         write(*,*)'Cannot determine zero shift for given data range'   CDFI0275
         write(8,'(a)')'Cannot determine zero shift for data range'     CDFI0276
      end if                                                            CDFI0277


Now we are ready to set up the normal equations by filling the coefficient matrix A with the corresponding sums and call the Gauss-Jordan elimination to solve the equations:

c --- calculate fit by LSQ procedure -----------------------------------CDFI0279
c                                                                       CDFI0280
c      A.x = b where A LSQ matrix                                       CDFI0281
c      j : data points (1=helix,2=sheet,3=coil) at lambda, sum is over  CDFI0282
c      all used lambda's, m=measured value at lamba                     CDFI0283
c      A(i,j)=sum(i,j) 3x3                                              CDFI0284
c      b(i)=sum(m*j)   1x3                                              CDFI0285
c                                                                       CDFI0286
c ----------------------------------------------------------------------CDFI0287
      a=0                                                               CDFI0288
      b=0                                                               CDFI0289
      x=0                                                               CDFI0290
      datapt(1,ilo:ihi)=helix(ilo:ihi)                                  CDFI0291
      datapt(2,ilo:ihi)=sheet(ilo:ihi)                                  CDFI0292
      datapt(3,ilo:ihi)=coil(ilo:ihi)                                   CDFI0293
c --- fill normal equations                                             CDFI0294
      do i=1,3                                                          CDFI0295
         do k=ilo,ihi                                                   CDFI0296
            b(i)=b(i)+elli(k)*datapt(i,k)                               CDFI0297
         end do                                                         CDFI0298
         do j=1,3                                                       CDFI0299
              do k=ilo,ihi                                              CDFI0300
               a(i,j)=a(i,j)+datapt(i,k)*datapt(j,k)                    CDFI0301
            end do                                                      CDFI0302
         end do                                                         CDFI0303
      end do                                                            CDFI0304
      call gaussj(a,3,3,b,1,1)                                          CDFI0305


Now we have to make sure that the sum of the components are 100% of the measured signal. As experimental errors as well as simplified model assumptions usually give less than 100% for the sum of the three components. If the result is really off, there might be a problem with the data, or the fit by sum of ideal helix, sheet and coil is questionable or inappropriate.

c --- check results and normalize                                       CDFI0306
      scfac=0                                                           CDFI0307
      ifit=0                                                            CDFI0308
      do i=1,3                                                          CDFI0309
         if (b(i).lt.0) then                                            CDFI0310
            ifit=1                                                      CDFI0311
         else                                                           CDFI0312
            scfac=scfac+b(i)                                            CDFI0313
         end if                                                         CDFI0314
      end do                                                            CDFI0315
      b=b*100                                                           CDFI0316
      write(*,'(/a)')' Least squares percentage(s) and scale factor'    CDFI0317
      write(*,'(4f10.2)') b(1)/scfac,b(2)/scfac,b(3)/scfac,scfac        CDFI0318
      write(8,'(/a)')' Least squares percentage(s) and scale factor'    CDFI0319
      write(8,'(4f10.2)') b(1)/scfac,b(2)/scfac,b(3)/scfac,scfac        CDFI0320


We need to check for another problem resulting from the oversimplified model : One of the fit parameters can be negative and the solution is physically not meaningful (see LSQ tutorial for other means to constrain the solutions). We then resume a brute force calculation of every possible combination of helix coil and sheet and check which one is best.

Let's think about this a little : if we consider 2% accurate enough for a CD fit (rest assured that this is a rather optimistic assumption) we need to loop over 50*50*50 summations of all datapoints, about 60 here. Compare the LSQ solution : all that was needed was a 3 times loop over the datapoints to sum up the LSQ matrix:. The iteration therefore needs about 42000 times more operations than the LSQ refinement. Not an option for any problem of complexity, but still feasable in this case.

      if (ifit) then                                                    CDFI0322
         write (*,'(a)')' Negative LSQ parameters - cannot solve by LSQ'CDFI0323
         write (*,'(a/)')' Will resume with iterative procedure '       CDFI0324
         write (8,'(a)')' Negative LSQ parameters - cannot solve by LSQ'CDFI0325
         write (8,'(a/)')' Will resume with iterative procedure '       CDFI0326
c ---    calculate fit by iteration ---                                 CDFI0327
         srs=1.0E10                                                     CDFI0328
         aa=0                                                           CDFI0329
         bb=0                                                           CDFI0330
         cc=0                                                           CDFI0331
         iprec=2                                                        CDFI0332
         write(*,'(a$)')' Fitting'                                      CDFI0333
         do i=0,100,iprec                                               CDFI0334
            write(*,'(a$)')'.'                                          CDFI0335
            do j=0,100,iprec                                            CDFI0336
               do k=0,100,iprec                                         CDFI0337
                  srs2=0                                                CDFI0338
                  do m=ilo,ihi                                          CDFI0339
                     difa=elli(m)-0.01*(i*helix(m)+j*sheet(m)+k*coil(m))CDFI0340
                     srs2=srs2+difa**2                                  CDFI0341
                  end do                                                CDFI0342
                  srs2=sqrt(srs2)                                       CDFI0343
                  if (srs2.lt.srs) then                                 CDFI0344
                     srs=srs2                                           CDFI0345
                     aa=i                                               CDFI0346
                     bb=j                                               CDFI0347
                     cc=k                                               CDFI0348
                  end if                                                CDFI0349
               end do                                                   CDFI0350
            end do                                                      CDFI0351
         end do                                                         CDFI0352
         scfac=(aa+bb+cc)/100                                           CDFI0353
      else                                                              CDFI0354
         aa=b(1)                                                        CDFI0355
         bb=b(2)                                                        CDFI0356
         cc=b(3)                                                        CDFI0357
      end if                                                            CDFI0358
      write (*,'(/a)') ' '                                              CDFI0359


Finally, we compute the r.m.s.d. of the fit and a R-value and print out the results, statistics, and fitted datat.

c --- compute the data values and differences for best fit              CDFI0361
      sdif=0                                                            CDFI0362
      sobs=0                                                            CDFI0363
      srs=0                                                             CDFI0364
      do j=ilo,ihi                                                      CDFI0365
         cdft(j)=0.01*(aa*helix(j)+bb*sheet(j)+cc*coil(j))              CDFI0366
         dif(j)=elli(j)-cdft(j)                                         CDFI0367
         sdif=sdif+abs(dif(j))                                          CDFI0368
         srs=srs+dif(j)**2                                              CDFI0369
         sobs=sobs+abs(elli(j))                                         CDFI0370
      end do                                                            CDFI0371
      rfac=100*sdif/sobs                                                CDFI0372
      rmsd=sqrt(srs)/(ihi-ilo)                                          CDFI0373
                                                                        CDFI0374
c --- write out statistics                                              CDFI0375
      write(*,*)'     rmsd     helix     sheet      coil     scale'     CDFI0376
      write(*,'(5f10.2)') rmsd,aa,bb,cc,scfac                           CDFI0377
      write(8,*)'     rmsd     helix     sheet      coil     scale'     CDFI0378
      write(8,'(5f10.2)') rmsd,aa,bb,cc,scfac                           CDFI0379
                                                                        CDFI0380
      write(*,*)'     Rfac  %  helix     sheet      coil'               CDFI0381
      write(*,'(4f10.2)') rfac,aa/scfac,bb/scfac,cc/scfac               CDFI0382
      write(8,*)'     Rfac  %  helix     sheet      coil'               CDFI0383
      write(8,'(4f10.2)') rfac,aa/scfac,bb/scfac,cc/scfac               CDFI0384
                                                                        CDFI0385
      write(8,*)   '--------------------------------------------------' CDFI0469
      write(8,*)   'lambda       data        fit      delta     spline' CDFI0470
c --- write the data file --                                            CDFI0471
      do j=ilo,ihi                                                      CDFI0472
         write(6,'(f6.1,4(1x,f10.1))') wav(j),elli(j),cdft(j),dif(j),   CDFI0473
     &   spln(j)                                                        CDFI0474
         write(8,'(f6.1,4(1x,f10.1))') wav(j),elli(j),cdft(j),dif(j),   CDFI0475
     &   spln(j)                                                        CDFI0476
      end do                                                            CDFI0477

Note that the actual program also calculates a cubic spline for the experimental data. This is not needed and you can throw the spline out if you do not have the IMSL library available. The *.res file contains the whole output for viewing, the *.cdf file the data only in a format suitable for plotting with a spreadsheet or similar application.


Get the program (CDFIT)


Back to Introduction
This World Wide Web site conceived and maintained by Bernhard Rupp
Last revised Dezember 27, 2009 01:40