Tutorial of usage geofetch as python package

♪♫•♪♪♫•♪♪♫•♪♪♫•♪♪♫*

Geofetch provides python fuctions to fetch metadata and metadata from GEO and SRA by using python language. get_project function returns dictionary of peppy projects that were found using filters and input you specified. peppy is a Python package that provides an API for handling standardized project and sample metadata.

More information you can get here:

http://peppy.databio.org/en/latest/

http://pep.databio.org/en/2.0.0/

First let's import geofetch

from geofetch import Geofetcher

Initiate Geofetch object by specifing parameters that you want to use for downloading metadata/data

1) If you won't specify any parameters, defaul parameters will be used

geof = Geofetcher()
Metadata folder: /home/bnt4me/Virginia/repos/geof2/geofetch/docs_jupyter/project_name

2) To download processed data with samples and series specify this two arguments:

geof = Geofetcher(processed=True, data_source="all")
Metadata folder: /home/bnt4me/Virginia/repos/geof2/geofetch/docs_jupyter/project_name

3) To tune project parameter, where metadata should be stored use next parameters:

geof = Geofetcher(processed=True, data_source="all", const_limit_project = 20, const_limit_discard = 500, attr_limit_truncate = 10000 )
Metadata folder: /home/bnt4me/Virginia/repos/geof2/geofetch/docs_jupyter/project_name

4) To add more filter of other options see documentation

Run Geofetch

By default:

1) No actual data will be downloaded (just_metadata=True)

2) No soft files will be saved on the disc (discard_soft=True)

projects = geof.get_projects("GSE95654")
Trying GSE95654 (not a file) as accession...
Trying GSE95654 (not a file) as accession...

Output()
Skipped 0 accessions. Starting now.
Processing accession 1 of 1: 'GSE95654'

Total number of processed SAMPLES files found is: 40
Total number of processed SERIES files found is: 0
Expanding metadata list...
Expanding metadata list...





Finished processing 1 accession(s)
Cleaning soft files ...
Unifying and saving of metadata... 

Output()






Output()






No files found. No data to save. File /home/bnt4me/Virginia/repos/geof2/geofetch/docs_jupyter/project_name/GSE95654_series/GSE95654_series.csv won't be created

Check if projects were created by checking dict keys:

projects.keys()
dict_keys(['GSE95654_samples'])

project for smaples was created! Now let's look into it.

* the values of the dictionary are peppy projects. More information about peppy Project you can find in the documentation: http://peppy.databio.org/en/latest/

len(projects['GSE95654_samples'].samples)
40

We got 40 samples from GSE95654 project. If you want to check if it's correct information go into: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE95654

Now let's see actuall data. first 15 project and 5 clolumns:

projects['GSE95654_samples'].sample_table.iloc[:15 , :5]
sample_name sample_library_strategy genome_build tissue sample_organism_ch1
sample_name
RRBS_on_CRC_patient_8 RRBS_on_CRC_patient_8 Bisulfite-Seq hg19 primary tumor Homo sapiens
RRBS_on_adjacent_normal_colon_patient_8 RRBS_on_adjacent_normal_colon_patient_8 Bisulfite-Seq hg19 adjacent normal colon Homo sapiens
RRBS_on_CRC_patient_32 RRBS_on_CRC_patient_32 Bisulfite-Seq hg19 primary tumor Homo sapiens
RRBS_on_adjacent_normal_colon_patient_32 RRBS_on_adjacent_normal_colon_patient_32 Bisulfite-Seq hg19 adjacent normal colon Homo sapiens
RRBS_on_CRC_patient_41 RRBS_on_CRC_patient_41 Bisulfite-Seq hg19 primary tumor Homo sapiens
RRBS_on_adjacent_normal_colon_patient_41 RRBS_on_adjacent_normal_colon_patient_41 Bisulfite-Seq hg19 adjacent normal colon Homo sapiens
RRBS_on_CRC_patient_42 RRBS_on_CRC_patient_42 Bisulfite-Seq hg19 primary tumor Homo sapiens
RRBS_on_adjacent_normal_colon_patient_42 RRBS_on_adjacent_normal_colon_patient_42 Bisulfite-Seq hg19 adjacent normal colon Homo sapiens
RRBS_on_ACF_patient_173 RRBS_on_ACF_patient_173 Bisulfite-Seq hg19 aberrant crypt foci Homo sapiens
RRBS_on_ACF_patient_515 RRBS_on_ACF_patient_515 Bisulfite-Seq hg19 aberrant crypt foci Homo sapiens
RRBS_on_normal_crypts_patient_139 RRBS_on_normal_crypts_patient_139 Bisulfite-Seq hg19 normal colonic crypt Homo sapiens
RRBS_on_ACF_patient_143 RRBS_on_ACF_patient_143 Bisulfite-Seq hg19 aberrant crypt foci Homo sapiens
RRBS_on_normal_crypts_patient_143 RRBS_on_normal_crypts_patient_143 Bisulfite-Seq hg19 normal colonic crypt Homo sapiens
RRBS_on_normal_crypts_patient_165 RRBS_on_normal_crypts_patient_165 Bisulfite-Seq hg19 normal colonic crypt Homo sapiens
RRBS_on_ACF_patient_165 RRBS_on_ACF_patient_165 Bisulfite-Seq hg19 aberrant crypt foci Homo sapiens