PEP compatible Run pytests docs-badge pypi-badge Code style: black

geofetch is a command-line tool that downloads and organizes data and metadata from GEO and SRA. When given one or more GEO/SRA accessions, geofetch will:

  • Download either raw or processed data from either SRA or GEO
  • Produce a standardized PEP sample table. This makes it really easy to run looper-compatible pipelines on public datasets by handling data acquisition and metadata formatting and standardization for you.
  • Prepare a project to run with sraconvert to convert SRA files into FASTQ files.

Key geofetch advantages:

  • Works with GEO and SRA metadata
  • Combines samples from different projects
  • Standardizes output metadata
  • Filters type and size of processed files (from GEO) before downloading them
  • Easy to use
  • Fast execution time
  • Can search GEO to find relevant data
  • Can be used either as a command-line tool or from within Python using an API

Quick example

geofetch runs on the command line. This command will download the raw data and metadata for the given GSE number.

geofetch -i GSE95654

You can add --processed if you want to download processed files from the given experiment.

geofetch -i GSE95654 --processed

You can add --just-metadata if you want to download metadata without the raw SRA files or processed GEO files.

geofetch -i GSE95654 --just-metadata
geofetch -i GSE95654 --processed --just-metadata

Check out what exactly argument you want to use to download data:

New features available in geofetch 0.11.0:

1) Now geofetch is available as Python API package. Geofetch can initialize peppy projects without downloading any soft files. Example:

from geofetch import Geofetcher

# initiate Geofetcher with all necessary arguments:
geof = Geofetcher(processed=True, acc_anno=True, discard_soft=True)

# get projects by providing as input GSE or file with GSEs

2) Now to find GSEs and save them to file you can use Finder - GSE finder tool:

from geofetch import Finder

# initiate Finder (use filters if necessary)
find_gse = Finder(filters='bed')

# get all projects that were found:
gse_list = find_gse.get_gse_all()

Find more information here: GSE Finder

For more details, check out the usage reference, installation instructions, or head on over to the tutorial for raw data and tutorial for processed data for a detailed walkthrough.