chrombert_get_cistrome_emb¶

Extract cistrome embeddings from ChromBERT.

chrombert_get_cistrome_emb [OPTIONS] SUPERVISED_FILE IDS... -o ONAME

Options

SUPERVISED_FILE¶: Path to the supervised file.

IDS¶: IDs to extract. Can be in GSMID format or the regulator:cellline format. To generate a cache file for prompts, use the regulator:cellline format.

-o, --oname¶: Path to the output HDF5 file. This option is required.

--basedir¶: Base directory for the required files. Default is set to the value of DEFAULT_BASEDIR.

-g, --genome¶: Genome version. For example, hg38 or mm10. Only hg38 is supported now. Default is hg38.

-k, --ckpt¶: Path to the pretrain or fine-tuned checkpoint. Optional if it can be inferred from other arguments.

--meta¶: Path to the meta file. Optional if it can be inferred from other arguments.

--mask¶: Path to the matrix mask file. Optional if it can be inferred from other arguments.

-d, --hdf5-file¶: Path to the HDF5 file that contains the dataset. Optional if it can be inferred from other arguments.

-hr, --high-resolution¶: Use 200-bp resolution instead of 1-kb resolution. Caution: 200-bp resolution is preparing for the future release of ChromBERT, which is not available yet.

--batch-size¶: Batch size. Default is 8.

--num-workers¶: Number of workers for the dataloader. Default is 8.