Cell-type-specific regulatory effects¶
This script enables fine-tuning ChromBERT for analyzing cell-type-specific regulatory effects. Users can selectively perturb or omit specific genomic features, making it valuable for simulating regulatory changes and testing hypotheses about the role of individual regulatory elements in cell-type-specific gene regulation.
python ft_general.py [OPTIONS] --train TRAIN_PATH --valid VALID_PATH --test TEST_PATH
Options
- --lr¶
Learning rate. Default is 1e-4.
- --warmup-ratio¶
Warmup ratio. Default is 0.1.
- --grad-samples¶
Number of gradient samples. Automatically scaled according to the batch size and GPU number. Default is 512.
- --max-epochs¶
Number of epochs to train. Default is 10.
- --pretrain-trainable¶
Number of pretrained layers to be trainable. Default is 2.
- --tag¶
Tag of the trainer, used for grouping logged results. Default is default.
- --limit-val-batches¶
Number of batches to use for each validation. Default is 64.
- --val-check-interval¶
Validation check interval. Default is 64.
- --name¶
Name of the trainer. Default is chrombert-ft-general.
- --save-top-k¶
Save top k checkpoints. Default is 3.
- --checkpoint-metric¶
Checkpoint metric. Default is the same as the loss function if not specified.
- --checkpoint-mode¶
Checkpoint mode. Default is min.
- --log-every-n-steps¶
Log every n steps. Default is 50.
- --kind¶
Kind of the task. Choose from classification, regression, or zero_inflation. Default is classification.
- --loss¶
Loss function. Default is focal.
- --train¶
Path to the training data. This option is required.
- --valid¶
Path to the validation data. This option is required.
- --test¶
Path to the test data. This option is required.
- --batch-size¶
Batch size. Default is 8.
- --num-workers¶
Number of workers. Default is 4.
- --basedir¶
Path to the base directory. Default is set to the value of
os.path.expanduser("~/.cache/chrombert/data").
- -g, --genome¶
Genome version. For example, hg38 or mm10. Only hg38 is supported now. Default is hg38.
- -k, --ckpt¶
Path to the pretrain checkpoint. Optional if it could be inferred from other arguments.
- --mask¶
Path to the mtx mask file. Optional if it could be inferred from other arguments.
- -d, --hdf5-file¶
Path to the HDF5 file that contains the dataset. Optional if it could be inferred from other arguments.
- --dropout¶
Dropout rate. Default is 0.1.
- -hr, --high-resolution¶
Use 200-bp resolution instead of 1-kb resolution. Caution: 200-bp resolution is preparing for the future release of ChromBERT, which is not available yet.
- --ignore¶
Ignore given targets.
- --ignore-object¶
Ignore object. Regulator, or dataset IDs separated by ;.
- --perturbation¶
Use perturbation model.
- --perturbation-object¶
Perturbation object. Regulator, or dataset IDs separated by ;.
- --perturbation-value¶
Perturbation target level. 0 means knock-out perturbation, and 4 means over-expression perturbation. Default is 0.