Parsing the ClinVar XML file with pandas

Posted on Sat 04 February 2023 in Python • Tagged with pandas, ClinVar, genomics, variants

Introduction

ClinVar is one of the USA’s National Center for Biotechnology Information (NCBI) databases. ClinVar archives reports of relationships among human genetic variants and phenotypes (usually genetic disorders). Any organization, such as a laboratory, hospital, clinic etc can submit data to ClinVar. The core idea of ClinVar is aggregate …


Continue reading

Opening files of size larger than RAM with pandas

Posted on Mon 27 June 2022 in Python • Tagged with pandas, Genomics, Bioinformatics

Introduction

Dealing with big files is a routine for everyone working in genomics. FASTQ, VCF, BAM, and GTF/GFF3 files, to name a few, can range from some hundreds of megabytes to several gigabytes in size. Usually, we can use cloud services to configure computing instances with a lot of …


Continue reading