
Postdoc Research Fellow at Northwestern University (Chicago)
…just a list of useful things I learned while working with R and python…
Some Bioconductor-based projects may be computationally challenging and require a lot of resources. If a powerful workstation is not available, it may be a good idea to work with R and Bioconductor at scale using Amazon Web Services (AWS). Setting…
Read more
Building an R package is an easy and very convenient way keep your work well organized. Moreover, it facilitates sharing your code with the R community. Here we will discuss about publishing R packages on CRAN and GitHub. The post…
Read more
Manipulating DNA/RNA sequences is a very basic and fundamental operation in Molecular Biology. Writing the reverse-complement of a DNA sequence is very easy, but is also a error-prone operation if performed manually. Sequence manipulation tools are available online and free-of-charge…
Read more
In this post, I will cover how to use easyPubMed (R Package) to retrieve data from PubMed. This example is focused on data extraction from PubMed records for a targeting campaign. The post is aimed at suggesting a business-oriented way…
Read more
PubMed (NCBI Entrez) is an online database of citations for biomedical literature that is available at the following URL: http://www.ncbi.nlm.nih.gov/pubmed. Retrieving data from PubMed is also possible in an automated way via the NCBI Entrez E-utilities. A description of how…
Read more
A couple of days ago, I found a website listing Impact Factor data of many scientific Journals organized in HTML tables (http://www.citefactor.org). Unfortunately, this website didn’t allow users to download Impact Factor tables in 1-click. Moreover, data were scattered over…
Read more
This page describes a quick way to extract gene expression information for a specific group of genes (as defined in one or more GO terms) from a Oncomine DataSet. This is useful, for example, if we want to study a…
Read more
Gene Expression Omnibus (GEO) is an online public repository of functional genomics data. Information about GEO may be found at the following URL: http://www.ncbi.nlm.nih.gov/geo/info/faq.html. Briefly, GEO includes different types of datasets: GEO Profiles are curated datasets obtained from GEO DataSets….
Read more
Hierarchical clustering is a very effective method for exploratory data analysis and is aimed at building a hierarchy of clusters based on the similarity of the samples in a dataset. The idea behind hierarchical clustering is very intuitive. Let’s assume…
Read more
RNAseq data may provide an estimate of the relative expression level of different genes in a sample or in a cell type. It is sufficient to compare RPKM (reads per kilobase trascript per million reads) values of the genes of…
Read more