Training and Evaluating a Neural Network Model

Posted on Mon 22 April 2024 in Python • Tagged with PyTorch, machine learning, transcriptomics

Introduction

In my previous post, I trained an XGBoost machine-learning model with single-cell RNA-Seq (scRNA-Seq) data to differentiate cell identity (parental cells versus paclitaxel-resistant cells) based on transcriptomic patterns.

As an exercise, I decided to use the same input data to experiment with other machine-learning models. In this post …


Continue reading

Parsing the ClinVar XML file with pandas

Posted on Sat 04 February 2023 in Python • Tagged with pandas, ClinVar, genomics, variants

Introduction

ClinVar is one of the USA’s National Center for Biotechnology Information (NCBI) databases. ClinVar archives reports of relationships among human genetic variants and phenotypes (usually genetic disorders). Any organization, such as a laboratory, hospital, clinic etc can submit data to ClinVar. The core idea of ClinVar is aggregate …


Continue reading

Opening files of size larger than RAM with pandas

Posted on Mon 27 June 2022 in Python • Tagged with pandas, Genomics, Bioinformatics

Introduction

Dealing with big files is a routine for everyone working in genomics. FASTQ, VCF, BAM, and GTF/GFF3 files, to name a few, can range from some hundreds of megabytes to several gigabytes in size. Usually, we can use cloud services to configure computing instances with a lot of …


Continue reading

Genomic Analysis With Hail

Posted on Fri 09 July 2021 in Python • Tagged with Bioinformatics, Genomics, Hail

Introduction

Hello, long time no see! Since I lasted posted, many things happened. Since March I have been working as Post-Doc Researcher, hired by the Hospital Israelita Albert Einstein (HIAE, São Paulo, Brazil) to work for the Projeto Genomas Raros (“Rare Genomes Project”, GRAR from here on), a public-private partnership …


Continue reading

How to Query Ensembl BioMart with Python

Posted on Tue 19 January 2021 in Python • Tagged with Bioinformatics, Ensembl, BioMart, omics, data mining

Introduction

Recently, me and my colleagues wrote a manuscript involving meta-analysis of RNA-Seq studies. One of my tasks of this project was to perform a Gene Ontology (GO) enrichment analysis: “[G]iven a set of genes that are up-regulated under certain conditions, an enrichment analysis will find which GO …


Continue reading

Machine Learning with Python: Supervised Classification of TCGA Prostate Cancer Data (Part 1 - Making Features Datasets)

Posted on Thu 05 November 2020 in Python • Tagged with Bioinformatics, gene expression, machine learning, supervised classification

Introduction

In a previous post, I showed how to retrieve The Cancer Genome Atlas (TCGA) data from the Cancer Genomics Cloud (CGC) platform. I downloaded gene expression quantification data, created a relational database with PostgreSQL, and created a dataset uniting the raw quantification data for 675 differentially expressed genes identified …


Continue reading

Machine Learning with Python: Supervised Classification of TCGA Prostate Cancer Data (Part 2 - Making a Model)

Posted on Thu 05 November 2020 in Python • Tagged with Bioinformatics, gene expression, machine learning, supervised classification

Introduction

In a previous post, I showed how to retrieve The Cancer Genome Atlas (TCGA) data from the Cancer Genomics Cloud (CGC) platform. I downloaded gene expression quantification data, created a relational database with PostgreSQL, and created a dataset uniting the raw quantification data for 675 differentially expressed genes identified …


Continue reading