Proposed Quality Control.
- Duplicate input checking.
- Quality scores histogram from the BGI.
- Maybe other graphs/data provided by the BGI.
- Quality scores with the FastqC toolkit.
- GC content.
- Quality scores per base.
- Quality scores per read.
- Length distribution.
- Over representation of reads.
- A summary of this data provided by a script.
- Percentage aligned.
- Insert size distribution.
- Visualisation as a wiggle track.
- Intra sample distance calculation of the wiggle tracks.
- Mapping quality distribution.
- Look into Picard Tools.
- Any available statistics from GATK?
- Transition / transversion rate.
- X, Y coverage (check encoding of sample tags).
- Mutation rate.
- Distribution of SNPs found in dnSNP.
- Indel / substitution rate.
- Cross check with immuno-chip.
In all steps, cross check with the data provided by the BGI.