gfidx/README.md

42 lines
1.7 KiB
Markdown
Raw Normal View History

# GFidx
GFidx is a GFF3 file indexer. It reads a GFF3 file and creates an index file that can be used to quickly retrieve features by ID, attribute, or range.
```shell
$ gfidx index -f example.gff3
$ gfidx query -f example.gff3
# CDCA8 - Cell Division Cycle Associated 8
relation ENSG00000134690.11
<...>
chr1 HAVANA CDS 37692905 37693033 . + 2 ID=CDS%3AENST00000327331.2;Parent=ENST00000327331.2;gene_id=ENSG00000134690.11;transcript_id=ENST00000327331.2;gene_type=protein_coding;gene_name=CDCA8;transcript_type=protein_coding;transcript_name=CDCA8-201;exon_number=3;exon_id=ENSE00000916824.1;level=2;protein_id=ENSP00000316121.2;transcript_support_level=1;hgnc_id=HGNC%3A14629;tag=alternative_5_UTR%2Cbasic%2CGENCODE_Primary%2Cappris_principal_1%2CCCDS;ccdsid=CCDS424.1;havana_gene=OTTHUMG00000004320.2;havana_transcript=OTTHUMT00000012474.1
Query took: 2.961754ms
34 lines found
Query cost 272.00 KB bytes
# Nucleotide Sugar Transporter Family
trie gene_name SLC35
<...>
chr13 ENSEMBL gene 20612161 20612338 . + . ID=ENSG00000222726.1;gene_id=ENSG00000222726.1;gene_type=snRNA;gene_name=RNU2-7P;level=3;hgnc_id=HGNC%3A42505
Query took: 138.7453ms
2926 lines found
Query cost 22.00 MB bytes
range chr3 650000 1500000
<...>
chr3 HAVANA gene 1595777 1596245 . - . ID=ENSG00000184423.5;gene_id=ENSG00000184423.5;gene_type=processed_pseudogene;gene_name=RPL23AP38;level=1;hgnc_id=HGNC%3A36351;tag=pseudo_consens;havana_gene=OTTHUMG00000154860.1
Query took: 2.999234ms
243 lines found
Query cost 120.00 KB bytes
```
## TODO
- [ ] Improve Index size
- [ ] HTTP Range requests
- [ ] GUI for exploring GFF3 files