Description
Fiber-seq Inferred Regulatory Elements
These tracks represent
FIRE peak calls
inferred from regulatory elements in Fiber-seq data. If you are unframiliar
with Fiber-seq please see the references below for a detailed description, but
in short it is useful to think of it as a long-read version of DNaseI/ATAC-seq
that can be used to identify regions of chromatin accessibility.
FIREs are MTase sensitive patches (MSPs) on Fiber-seq reads that are inferred
to be regulatory elements on single chromatin fibers. To do this we used
semi-supervised machine learning to identify MSPs that are likely to be
regulatory elements using the Mokapot
framework and
XGBoost
. Every individual FIRE element is associated with a
precision value, which indicates the probability that the FIRE element is a
true regulatory element. Significantly more detail is avalible in our
manuscript and fiberseq website both of which are linked below.
Track Descriptions
FIRE peaks: Peaks are called by identifying FIRE score local-maxima
that have FDR values below a 5% threshold. Once a local-maxima is
identified, the start and end positions of the peak are determined by the
median start and end positions of the underlying FIRE elements.
Wide FIRE peaks: Wide peaks are the union of the FIRE peaks and all
regions below the FDR threshold. We then merge the resulting regions that
are within one nucleosome (147 bp) of one another.
log FIRE FDR: FDR calculation begins by shuffling the locations of
all the fibers across the genome and recalculating the FIRE score for each
position in the genome. The FDR is then defined as the number of bases that
have shuffled FIRE scores above a threshold divided by the number of bases
in the un-shuffled data. Displayed in the track is the -10log10
transformation of this FDR value so the more significant FIRE scores appear
as higher values.
Unreliable FIRE coverage regions: The unreliable FIRE coverage track
shows regions that were exlcuded from the FDR calculations due to low or
high sequencing depth. Defined as deviating from the median sequencing depth
by 5 or more standard deviations.
FIRE coverage: The FIRE coverage track shows the number of fibers
that are MSPs (purple), FIREs (red), and nucleosomes (gray) at each position
in the genome. The coverage track is calculated by counting the number of
fibers that have a MSP, FIRE, or nucleosome at each position in the genome.
Percent accessible: The percent of fibers that contain a FIRE element
overlaping a given position in the genome. The percent of accessible fibers
for haplotype one is displayed as a red line, haplotype 2 as a blue line,
and both haplotypes as a black line.
FIRE fibers : A decorator track that shows individual Fiber-seq
reads (fibers) along the genome. Each fiber is colored by the MSPs (purple),
FIREs (red), and nucleosomes (gray) that it contains. Optionally, you can
also display the raw 5mC or m6A information using the track configuration.
Methods
Please refer to
https://fiberseq.github.io/
for more details.
Credits
Tracks were generated by Mitchell Vollger (mvollger_at_uw.edu) and Andrew
Stergachis (absterga_at_uw.edu).
References
Vollger, M. R., Swanson, et al. (2024). A haplotype-resolved view of human
gene regulation. bioRxiv (p. 2024.06.14.599122).
DOI: https://doi.org/10.1101/2024.06.14.599122