Lab Work and Sequencing

I basically followed exactly along with the methods from the Diagenode kit with some very minor adjustments so I’m not going to describe it here. I have a file with all the lab notes and sample details. Sequencing was at BRC and I have the details of PhiX and cluster density, etc recorded.

Initial BioInformatics Processing

The first steps of processing were all done using the BioHPC cloud computers. I’m reproducing the list of commands that I entered here with some descriptions, though actually running these commands would require putting all the files up to the cloud and navigating to run each command from the right directory.

Trimming Sequences

  1. First I trimmed all of the raw sequence files with Trim Galore! using this command:
trim_galore *_R1.fastq.gz --rrbs --fastqc -o output/

I looked at a subset of the fastqc files produced by Trim Galore! and they all looked very good in terms of quality. I’m not sure if there is something from that output we should report quantiatively, or if it’s just enough to say ‘they looked good’.

Aligning and Calling Each Sample

  1. Next I moved to Bismark v0.16.1 for bisulfite alignment. There are some details on setting up the BioHPC environment to have bismark in the path that I won’t list here. The first real command was to index the tree swallow genome that Leo had assembled.
bismark_genome_preparation /directory/with/genome
  1. Then align each sample to the genome. To run all samples I made a shell script to loop through called bis.align.sh. The command for a single sample was:
bismark --multicore 4 /directory/bismark_genome file_trimmed.fq.gz
  1. Next I extract the methylation information from each sample. Again, I looped through with a shell script called bis_extract.sh and the command for a single file was:
bismark_methylation_extractor --single-end --multicore 4 --gzip --bedGraph file_bt2.bam
  1. Last, I created summary reports with bismark2summary. The output from #4 above was used as the input for MethylKit described below.

Checking Methylation Conversion

  1. Separately from actual sample sequences above, I followed a similar procedure to determine the methylation conversion efficiency for methylated and unmethylated spike-in controls included with each sample. This is also described in the Diagenode manual. The commands are exactly as described above except that they use the ‘genome’ of the spikes for alignment and they use an additional file that includes the position of each methylated or unmethylated ‘C’ to allow for calculation of conversion in the end. The exact commands are saved in bis_meth_convert.sh and are identical to the kit directions.

Quality Control Plots

Methylation Conversion


Figure x. Estimates of methylation conversion from bisulfite treatment for unmethylated (A) and methylated (B) spike-in controls. Histograms show values for each sample, blue lines indicate the average across all samples, and red dashed lines show the kit suggested average targets to indicate that conversion was effective.

NOTE: I don’t think this figure needs to be included anywhere. We can just report the average value (blue line) for the methylated and unmethylated controls and say that they are within the zone recommended by the kit. I also am not sure it is really anything to worry about that a few samples are outside those zones. I read a thread by the author of Bismark suggesting that real samples typically convert better than these spike in controls and that as long as it is reasonably close to expectation there isn’t any major reason to worry about it.

Raw Reads and Alignment