Implementation of CD spectra fit
The problem is quite straight forward, namely to read data, baseline, standard curves and some experimental details and list the results of the fit in a format which can be plotted easily using any spreadsheet program. The program checks for proper range of data, applies a baseline correction if needed, and attempts to solve the fit by a fast general LSQ method. If that procedure returns a negative coefficient, the program imposes positivity on the problem by iteratively hacking through all possible combinations and flags the best one. Final output is a listing of statistical parameters, data lists and a spline to the data, just for exercise.
First, the data, baseline, standard and header files are read and the overlapping wavelength range determined. After reading the files, finding the proper range of data, and calculating the conversion factors from sample ellipticity to mean residual ellipticity in [deg.cm2/(dmole.residue)] we put out the header:
c --- knock out header CDFI0200
rres=float(ires) CDFI0201
write (8,'(a)') ' -----------------------------------------------'CDFI0202
write (8,'(a)') ' CD DATA fitted by CD-FIT program (B.Rupp, 1996)'CDFI0203
write (8,'(a)') ' -----------------------------------------------'CDFI0204
write(8,'(a,a)') ' CD raw data file : ',datfil CDFI0205
write(8,'(a,a)') ' CD baseline file : ',basfil CDFI0206
write(8,'(a,a)') ' Sample header file : ',hedfil CDFI0207
write(8,'(a,a)') ' Ellipticity output file : ',outfil CDFI0208
write(8,'(a,a)') ' Results file : ',resfil CDFI0209
write(8,'(a,a/)') ' Standard data file : ',lysfil CDFI0210
write(8,'(i3,a,i3,a,i3,a)') npts,' data points from ', CDFI0211
& ilo,' to ',ihi,' nm' CDFI0212
write(8,*) '--------------------------------------------' CDFI0213
write(8,'(a,f10.2)') ' Molecular weight g/mol : ',mw CDFI0214
write(8,'(a,f10.6)') ' Sample concentration mg/cm3 : ',cconc CDFI0215
write(8,'(a,f10.8)') ' Sample concentration g/cm3 : ',conc CDFI0216
write(8,'(a,f10.8)') ' Sample concentration mol/cm3 : ',cmol CDFI0217
write(8,'(a,f10.6)') ' Cell path mm : ',ppath CDFI0218
write(8,'(a,f10.6)') ' Cell path cm : ',path CDFI0219
write(8,'(a,f10.6)') ' Number of residues : ',rres CDFI0220
CDFI0240
c --- in deg*cm2/dmol CDFI0241
dfact=mw/(10.0*conc*path) CDFI0242
write(8,'(a,f12.2)') ' Conversion factor [cm2/dmole] :', CDFI0244
& dfact CDFI0245
write(8,*) '---------------------------------------------' CDFI0246
We perform baseline subtraction and a final baseline correction
assuming that, based on the model we are using, the signal
between 245 and 250 nm should be zero.
c --- create the CD data from bas and knp and 18/pi CDFI0252
itest=0 CDFI0253
do i=ilo,ihi CDFI0254
cd(i)=(knp(i)-bas(i))/raddml CDFI0255
c --- mean residual ellipticty CDFI0256
elli(i)=cd(i)*dfact/ires CDFI0257
if(itest.eq.1) write(*,*) i,cd(i),elli(i) CDFI0258
end do CDFI0259
CDFI0260
c --- create zero shift only if sufficient data available CDFI0261
if (ihi.ge.250) then CDFI0262
iavg=ihi-245 CDFI0263
do j=250-iavg,250 CDFI0264
sumbas=sumbas+elli(j) CDFI0265
end do CDFI0266
sumbas=sumbas/(iavg+1) CDFI0267
do j=ilo,ihi CDFI0268
elli(j)=elli(j)-sumbas CDFI0269
end do CDFI0270
write (*,'(a,f5.0)') ' Zero shift applied : ',sumbas CDFI0271
write (8,'(1x,a,f5.0)') ' Zero shift applied : ',sumbas CDFI0272
else CDFI0273
sumbas=0.0 CDFI0274
write(*,*)'Cannot determine zero shift for given data range' CDFI0275
write(8,'(a)')'Cannot determine zero shift for data range' CDFI0276
end if CDFI0277
Now we are ready to set up the normal equations
by filling the coefficient matrix A with the
corresponding sums and call the Gauss-Jordan elimination to solve
the equations:
c --- calculate fit by LSQ procedure -----------------------------------CDFI0279
c CDFI0280
c A.x = b where A LSQ matrix CDFI0281
c j : data points (1=helix,2=sheet,3=coil) at lambda, sum is over CDFI0282
c all used lambda's, m=measured value at lamba CDFI0283
c A(i,j)=sum(i,j) 3x3 CDFI0284
c b(i)=sum(m*j) 1x3 CDFI0285
c CDFI0286
c ----------------------------------------------------------------------CDFI0287
a=0 CDFI0288
b=0 CDFI0289
x=0 CDFI0290
datapt(1,ilo:ihi)=helix(ilo:ihi) CDFI0291
datapt(2,ilo:ihi)=sheet(ilo:ihi) CDFI0292
datapt(3,ilo:ihi)=coil(ilo:ihi) CDFI0293
c --- fill normal equations CDFI0294
do i=1,3 CDFI0295
do k=ilo,ihi CDFI0296
b(i)=b(i)+elli(k)*datapt(i,k) CDFI0297
end do CDFI0298
do j=1,3 CDFI0299
do k=ilo,ihi CDFI0300
a(i,j)=a(i,j)+datapt(i,k)*datapt(j,k) CDFI0301
end do CDFI0302
end do CDFI0303
end do CDFI0304
call gaussj(a,3,3,b,1,1) CDFI0305
Now we have to make sure that the sum of the components are 100%
of the measured signal. As experimental errors as well as
simplified model assumptions usually give less than 100% for the
sum of the three components. If the result is really off, there
might be a problem with the data, or the fit by sum of ideal
helix, sheet and coil is questionable or inappropriate.
c --- check results and normalize CDFI0306
scfac=0 CDFI0307
ifit=0 CDFI0308
do i=1,3 CDFI0309
if (b(i).lt.0) then CDFI0310
ifit=1 CDFI0311
else CDFI0312
scfac=scfac+b(i) CDFI0313
end if CDFI0314
end do CDFI0315
b=b*100 CDFI0316
write(*,'(/a)')' Least squares percentage(s) and scale factor' CDFI0317
write(*,'(4f10.2)') b(1)/scfac,b(2)/scfac,b(3)/scfac,scfac CDFI0318
write(8,'(/a)')' Least squares percentage(s) and scale factor' CDFI0319
write(8,'(4f10.2)') b(1)/scfac,b(2)/scfac,b(3)/scfac,scfac CDFI0320
We need to check for another problem resulting from the
oversimplified model : One of the fit parameters can be negative
and the solution is physically not meaningful (see LSQ tutorial for other means to constrain the
solutions). We then resume a brute force calculation of every
possible combination of helix coil and sheet and check which one
is best.
Let's think about this a little : if we consider 2% accurate
enough for a CD fit (rest assured that this is a rather
optimistic assumption) we need to loop over 50*50*50 summations
of all datapoints, about 60 here. Compare the LSQ solution : all
that was needed was a 3 times loop over the datapoints to sum up
the LSQ matrix:. The iteration therefore needs about 42000 times
more operations than the LSQ refinement. Not an option for any
problem of complexity, but still feasable in this case.
if (ifit) then CDFI0322
write (*,'(a)')' Negative LSQ parameters - cannot solve by LSQ'CDFI0323
write (*,'(a/)')' Will resume with iterative procedure ' CDFI0324
write (8,'(a)')' Negative LSQ parameters - cannot solve by LSQ'CDFI0325
write (8,'(a/)')' Will resume with iterative procedure ' CDFI0326
c --- calculate fit by iteration --- CDFI0327
srs=1.0E10 CDFI0328
aa=0 CDFI0329
bb=0 CDFI0330
cc=0 CDFI0331
iprec=2 CDFI0332
write(*,'(a$)')' Fitting' CDFI0333
do i=0,100,iprec CDFI0334
write(*,'(a$)')'.' CDFI0335
do j=0,100,iprec CDFI0336
do k=0,100,iprec CDFI0337
srs2=0 CDFI0338
do m=ilo,ihi CDFI0339
difa=elli(m)-0.01*(i*helix(m)+j*sheet(m)+k*coil(m))CDFI0340
srs2=srs2+difa**2 CDFI0341
end do CDFI0342
srs2=sqrt(srs2) CDFI0343
if (srs2.lt.srs) then CDFI0344
srs=srs2 CDFI0345
aa=i CDFI0346
bb=j CDFI0347
cc=k CDFI0348
end if CDFI0349
end do CDFI0350
end do CDFI0351
end do CDFI0352
scfac=(aa+bb+cc)/100 CDFI0353
else CDFI0354
aa=b(1) CDFI0355
bb=b(2) CDFI0356
cc=b(3) CDFI0357
end if CDFI0358
write (*,'(/a)') ' ' CDFI0359
Finally, we compute the r.m.s.d. of the fit
and a R-value and print out the results,
statistics, and fitted datat.
c --- compute the data values and differences for best fit CDFI0361
sdif=0 CDFI0362
sobs=0 CDFI0363
srs=0 CDFI0364
do j=ilo,ihi CDFI0365
cdft(j)=0.01*(aa*helix(j)+bb*sheet(j)+cc*coil(j)) CDFI0366
dif(j)=elli(j)-cdft(j) CDFI0367
sdif=sdif+abs(dif(j)) CDFI0368
srs=srs+dif(j)**2 CDFI0369
sobs=sobs+abs(elli(j)) CDFI0370
end do CDFI0371
rfac=100*sdif/sobs CDFI0372
rmsd=sqrt(srs)/(ihi-ilo) CDFI0373
CDFI0374
c --- write out statistics CDFI0375
write(*,*)' rmsd helix sheet coil scale' CDFI0376
write(*,'(5f10.2)') rmsd,aa,bb,cc,scfac CDFI0377
write(8,*)' rmsd helix sheet coil scale' CDFI0378
write(8,'(5f10.2)') rmsd,aa,bb,cc,scfac CDFI0379
CDFI0380
write(*,*)' Rfac % helix sheet coil' CDFI0381
write(*,'(4f10.2)') rfac,aa/scfac,bb/scfac,cc/scfac CDFI0382
write(8,*)' Rfac % helix sheet coil' CDFI0383
write(8,'(4f10.2)') rfac,aa/scfac,bb/scfac,cc/scfac CDFI0384
CDFI0385
write(8,*) '--------------------------------------------------' CDFI0469
write(8,*) 'lambda data fit delta spline' CDFI0470
c --- write the data file -- CDFI0471
do j=ilo,ihi CDFI0472
write(6,'(f6.1,4(1x,f10.1))') wav(j),elli(j),cdft(j),dif(j), CDFI0473
& spln(j) CDFI0474
write(8,'(f6.1,4(1x,f10.1))') wav(j),elli(j),cdft(j),dif(j), CDFI0475
& spln(j) CDFI0476
end do CDFI0477
Note that the actual program also calculates a cubic spline for the experimental data. This is not needed and you can throw the spline out if you do not have the IMSL library available. The *.res file contains the whole output for viewing, the *.cdf file the data only in a format suitable for plotting with a spreadsheet or similar application.
Back to Introduction
This World Wide Web site conceived and maintained by
Bernhard Rupp
Last revised
Dezember 27, 2009 01:40