Genomic plots with circlize

Posted on Sat 29 April 2023 in R • Tagged with circlize, genomics, data visualization

Introduction

Genomics is undoubtedly a complex science. The human genome is huge, with more than 3 billion base pairs, about 20,000 protein-coding genes, several millions of variants, and many more interesting characteristics. The visualization of genomic/omics data is challenging due to the sheer volume of information. Circular plots …


Continue reading

Parsing the ClinVar XML file with pandas

Posted on Sat 04 February 2023 in Python • Tagged with pandas, ClinVar, genomics, variants

Introduction

ClinVar is one of the USA’s National Center for Biotechnology Information (NCBI) databases. ClinVar archives reports of relationships among human genetic variants and phenotypes (usually genetic disorders). Any organization, such as a laboratory, hospital, clinic etc can submit data to ClinVar. The core idea of ClinVar is aggregate …


Continue reading

Opening files of size larger than RAM with pandas

Posted on Mon 27 June 2022 in Python • Tagged with pandas, Genomics, Bioinformatics

Introduction

Dealing with big files is a routine for everyone working in genomics. FASTQ, VCF, BAM, and GTF/GFF3 files, to name a few, can range from some hundreds of megabytes to several gigabytes in size. Usually, we can use cloud services to configure computing instances with a lot of …


Continue reading

Genomic Analysis With Hail

Posted on Fri 09 July 2021 in Python • Tagged with Bioinformatics, Genomics, Hail

Introduction

Hello, long time no see! Since I lasted posted, many things happened. Since March I have been working as Post-Doc Researcher, hired by the Hospital Israelita Albert Einstein (HIAE, São Paulo, Brazil) to work for the Projeto Genomas Raros (“Rare Genomes Project”, GRAR from here on), a public-private partnership …


Continue reading