The raw data for the transcription factors Ascl1, Max, NFI, Olig2, Sox2,
Sox9, Sox21 and Tcf3 is available under the ArrayExpress accession
number E-MTAB-2270.
The data for CTCF and Smca1 was retrieved from the Gene Expression
Omnibus (GEO) with accession number GSE36203.
The data for FoxO3 is in the same repository with accession number GSE48336.
As the input chromatin sample for the analysis off all factors we used
the sample from ArrayExpress accession number E-MTAB-1423, except for
FoxO3 in which case we used the input sample provided in the corresponding accession (GSE48336).
Methods
The raw reads were mapped to the mouse genome (mm9 including random chromosomes) with
Bowtie version 0.12.5.
We used MACS
version 2.0.9 to define bound regions for factors ChIP-seq
data. As this tool is very sensitive to the unbalanced number of reads
in the real and the input set, we decided to reduce the larger dataset
to match the number of mapped reads in the smaller dataset by randomly
sampling reads.
Instead of using the tool included in the MACS software for this task,
we designed a custom python script (balanceBAMFiles.py) that perform the sampling for pairs of treatment and input samples and determines the appropriate number of reads automatically.
For this process we only considered a maximum of two fully overlapping
reads, discarding the rest. To correct for sampling bias we generated
10 different random samples on which we ran MACS specifying the shift
size to 90, q value to 1e-2 and leaving the rest of parameters as default.
We subsequently collapsed the 10 different peak calling results for
each set using another custom script (aggregatePeaksFromSubsampling.py) which reports only overlapping peaks in at least 9 of the 10 lists.
The resulting peak bed files were first filtered to discard peaks with q-value lower than 1e-5 and then converted into bigBed files using the tool bedToBigBed and the q-values bedGraph files were converted into bigWig file with the bedGraphToBigWig tool from the UCSC Genome Browser.
Credits
Data were generated and processed for the CISSTEM project. For inquiries, please contact Juan L. Mateo at the following address: mateojuan (at) uniovi.es