chrombert_imputation_cistrome_sc¶
Generate prediction result (hdf5 format) from ChromBERT when given single cell, region and regulator.
chrombert_imputation_cistrome_sc [OPTIONS] SUPERVISED_FILE --o-h5 H5_PATH --finetune-ckpt CKPT --prompt-kind KIND
Options
- supervised_file¶
Path to the supervised file.
- --o-h5¶
Path of the output HDF5 file. This option is required.
- --prompt-kind¶
Prompt data class. Choose from cistrome or expression. This option is required.
- --basedir¶
Base directory for the required files. Default is set to the value of DEFAULT_BASEDIR.
- -g, --genome¶
Genome version. For example, hg38 or mm10. Only hg38 is supported now. Default is hg38.
- --pretrain-ckpt¶
Path to the pretrain checkpoint. Optional if it could be inferred from other arguments.
- -d, --hdf5-file¶
Path to the HDF5 file that contains the dataset. Optional if it could be inferred from other arguments.
- -hr, --high-resolution¶
Use 200-bp resolution instead of 1-kb resolution. Caution: 200-bp resolution is preparing for the future release of ChromBERT, which is not available yet.
- --finetune-ckpt¶
Path to the finetune checkpoint. Optional.
- --prompt-dim-external¶
Dimension of external data. Use 512 for scGPT and 768 for ChromBERT’s embedding. Default is 512.
- --prompt-celltype-cache-file¶
Path to the cell-type-specific prompt cache file. Optional.
- --prompt-regulator-cache-file¶
Path to the regulator prompt cache file. Optional.
- --prompt-regulator-cache-pin-memory¶
- Pin memory for regulator prompt cache for further accelerating. Default is False.¶
- --prompt-regulator-cache-limit¶
- The limit of regulator prompt cached in memory. Be mindful of your memory usage!¶
- --prompt-celltype¶
The cell-type-specific prompt. For example, dnase:k562 for cistrome prompt and k562 for expression prompt. It can also be provided in the supervised file if the format supports. Optional.
- --prompt-regulator¶
The regulator prompt. Determine the kind of output. For example, ctcf or h3k27ac. It can also be provided in the supervised file if the format supports. Optional.
- --batch-size¶
Batch size. Default is 8.
- --num-workers¶
Number of workers for the dataloader. Default is 8.