Grit Documentation

 

Description

Predict transcription factor binding sites for orthologue genes using mixed Student's t-test statistics.

 

Release

Source code 1.0.2      04/18/2023

Source code 1.0.1      04/18/2023 Document

Source code 1.0.0      08/14/2022 Document

Binary   Windows, Mac OS, Linux

 

Fork me on Github

https://github.com/thua45/grit

 

Install

Go the source folder and run g++ main.cpp cdflib.cpp grit.cpp -std=c++11 -lpthread -o grit command, a binary named grit will be produced in the folder.

Under Windows OS try g++ main.cpp cdflib.cpp grit.cpp -std=c++11 -static -lpthread -o grit.exe instead.

 

Requirement

Minimum 32GB RAM, if you install from the source code, the g++ complier is also required.

 

Usage

grit -m motif -i homoseq -b bgseq -x species [-k min_sn] [-t pvalue] [- pscore] [-c cpus] [-u seed] -o output

 

Options

-m PWMs for transcription factors

-i putative promoter sequence for orthologues genes

-b background sequences

-x species code, hsapiens for human, mmusculus for mouse, sscrofa for pig, ggallus for chicken, and etc.

-k the minimal number of ortholog sequences.

-t p-value threshold, TFBS with p-value less than will p-value threshold be reported, default = 0.05

-p p-score threshold, TFBS with p-score less than will be reported p-score threshold, default = 0

-c numbers of CPUs, for multiple threading

-u seed number, value <0 for random seed, default -1

-o output, output file name

 

Data

motif file: Motiff_Jaspar-2022+HOCO-v11.txt

promoter seq file: promoter_seq_orhtorlog_-500_+50.zip, promoter_seq_orhtorlog_-1k_+100.zip

bakground seq file: rdm20000-550.txt, rdm20000-1100.txt

 

Data for Arabidopsis_thaliana.TAIR10

motif file: Jaspar-Plant-2022.txt

promoter seq file: homoseq-v56.txt

background seq file: bgseq-rdm20000.txt

results (Jaspar-2022): Arabidopsis_thaliana.TAIR10_v56_j22.zip

 

External Links

Grit Online: Search Grit result online

Flaver: Mining transcription factor using weighted rank correlation statistics

  

Example

Example

An example run should like: grit -m Motiff_Jaspar-2022+HOCO-v11.txt -i promoter_seq_orhtorlog_-1k_+100.txt -b rdm20000-1100.txt -x hsapiens -t 0.05 -p 0 -c 8 -u 12345 -o human-result-v200.txt

This command took three input files: Motiff_Jaspar-2022+HOCO-v11.txt, promoter_seq_orhtorlog_-1k_+100.txt, rdm20000-1100.txt. After finished run it will produce an output file named: human-result-v200.txt

 

Results:

Chicken bGalGal1.mat.broiler.GRCg7b: k30-s2-u12345-logrev3-ggallus-1100.zip, k30-s2-u12345-logrev3-ggallus-1100_RS0.3_E-3.bed; k30-s2-u12345-logrev3-ggallus-550.zip, k30-s2-u12345-logrev3-ggallus-550_RS0.3_E-3.bed

 

Tools

Convert Jaspar Motif to Grit Motif: jaspar2motif.py

  

References

Tinghua Huang, Hong Xiao, Qi Tian, Zhen He, Min Yao. Identification of upstream transcription factor binding sites in orthologous genes using mixed Student's t-test statistics. PloS Computation Biology, 2022

Tinghua Huang, Xinmiao Huang, Binyu Wang, Hao He, Zhiqiang Du, Min Yao, and Xuejun Gao. Flaver: mining transcription factors in genome-wide transcriptome profiling data using weighted rank correlation statistics

 

Contact

Dr. Tinghua Huang, thua45@126.com

Dr. Min Yao, minyao@yangtzeu.edu.cn

Dr. Jianwu Wang, wjw19802013@163.com