Description

Fiber-seq Inferred Regulatory Elements

These tracks represent FIRE peak calls inferred from regulatory elements in Fiber-seq data. If you are unframiliar with Fiber-seq please see the references below for a detailed description, but in short it is useful to think of it as a long-read version of DNaseI/ATAC-seq that can be used to identify regions of chromatin accessibility.

FIREs are MTase sensitive patches (MSPs) on Fiber-seq reads that are inferred to be regulatory elements on single chromatin fibers. To do this we used semi-supervised machine learning to identify MSPs that are likely to be regulatory elements using the Mokapot framework and XGBoost. Every individual FIRE element is associated with a precision value, which indicates the probability that the FIRE element is a true regulatory element. Significantly more detail is avalible in our manuscript and fiberseq website both of which are linked below.

Track Descriptions

  • FIRE peaks: Peaks are called by identifying FIRE score local-maxima that have FDR values below a 5% threshold. Once a local-maxima is identified, the start and end positions of the peak are determined by the median start and end positions of the underlying FIRE elements.
  • Wide FIRE peaks: Wide peaks are the union of the FIRE peaks and all regions below the FDR threshold. We then merge the resulting regions that are within one nucleosome (147 bp) of one another.
  • log FIRE FDR: FDR calculation begins by shuffling the locations of all the fibers across the genome and recalculating the FIRE score for each position in the genome. The FDR is then defined as the number of bases that have shuffled FIRE scores above a threshold divided by the number of bases in the un-shuffled data. Displayed in the track is the -10log10 transformation of this FDR value so the more significant FIRE scores appear as higher values.
  • Unreliable FIRE coverage regions: The unreliable FIRE coverage track shows regions that were exlcuded from the FDR calculations due to low or high sequencing depth. Defined as deviating from the median sequencing depth by 5 or more standard deviations.
  • FIRE coverage: The FIRE coverage track shows the number of fibers that are MSPs (purple), FIREs (red), and nucleosomes (gray) at each position in the genome. The coverage track is calculated by counting the number of fibers that have a MSP, FIRE, or nucleosome at each position in the genome.
  • Percent accessible: The percent of fibers that contain a FIRE element overlaping a given position in the genome. The percent of accessible fibers for haplotype one is displayed as a red line, haplotype 2 as a blue line, and both haplotypes as a black line.
  • FIRE fibers : A decorator track that shows individual Fiber-seq reads (fibers) along the genome. Each fiber is colored by the MSPs (purple), FIREs (red), and nucleosomes (gray) that it contains. Optionally, you can also display the raw 5mC or m6A information using the track configuration.
  • Methods

    Please refer to https://fiberseq.github.io/ for more details.

    Credits

    Tracks were generated by Mitchell Vollger (mvollger_at_uw.edu) and Andrew Stergachis (absterga_at_uw.edu).

    References

    Vollger, M. R., Swanson, et al. (2024). A haplotype-resolved view of human gene regulation. bioRxiv (p. 2024.06.14.599122). DOI: https://doi.org/10.1101/2024.06.14.599122