Choosing the right reference genome

Selecting the correct reference genome is one of the most important decisions in ChIP-seq analysis.

Sequencing reads must be aligned to a reference genome before most downstream ChIP-seq analysis can happen. If the wrong genome build or wrong species is selected, reads may map poorly or produce misleading coordinates. This can affect quality control, peak calling, annotation, motif analysis, and visualization.

Species matters

Human datasets should be aligned to a human genome build, such as hg19 or hg38. Mouse datasets should be aligned to a mouse genome build, such as mm10 or mm39. Using the wrong species can cause severe mapping failure.

Genome build matters

Even within the same species, genome builds differ. Human hg19 and hg38 use different coordinate systems. A peak coordinate from hg19 is not automatically equivalent to the same coordinate in hg38. Downstream annotation databases and genome browser tracks must match the selected build.

How to identify the correct genome

Check the original publication methods.
Review GEO or SRA metadata.
Look for organism and genome build information.
Check whether the dataset is human, mouse, or another organism.
Use the same build for alignment, annotation, and visualization.

Consequences of wrong genome selection

Wrong genome selection can reduce mapping rate, shift genomic coordinates, produce incorrect gene annotations, and create misleading peak results. In some cases, analysis may technically finish but produce biologically invalid output.

H³NGST currently provides selected common reference genome options. Users should confirm that the selected genome matches the dataset before submitting an analysis.

Practical recommendation

Before starting analysis, check the accession record and associated publication. If the study used hg38, choose hg38. If it used hg19, choose hg19. For mouse datasets, choose the appropriate mouse build. When uncertain, review metadata and publication details before running large analyses.

This guide is provided for research and educational purposes. Always validate important biological conclusions with appropriate experimental design, quality control, and independent interpretation.

Back to H³NGST Home