Yearly Archive: 2016
Seatmap charts with ggplot2
This is a short tutorial for creating “geom tile” charts with ggplot2. These type of plots are similar to heat maps, but they picture a variable featuring a discrete number of possible levels (categories) instead of continuous numeric values. I…
Read more
Boxplots with ggplot2
This is a short tutorial for creating boxplots with ggplot2. The tutorial will focus on: data preparation for plotting with ggplot2 differences between the standard R plotting system and ggplot2 usingĀ geom_boxplot to create a simple boxplot with ggplot2 and…
Read more
Web Radios boosting coding productivity
How do you stay focused while coding? How do you cope with the noises of an open workspace or with the chitchat of the other customers of your favorite coffee shop? That’s easy.. Just listen to some good chill-out music…
Read more
Analyzing TCGA Data in R via the TCGAretriever package
Brief introduction about TCGA, cBioPortal and the Cancer Genomic Data Server (CGDS) The Cancer Genome Atlas (TCGA) is a program aimed at improving the understanding of the molecular basis of cancer. TCGA stores multidimensional genomic data sets generated by analysis…
Read more
Predicting HIV progression and treatment response
Project Goal and Data Understanding The goal of this project is to predict patients response to the anti-HIV treatment based on a very limited number of measured parameters. The dataset was retrieved from Kaggle (https://www.kaggle.com/c/hivprogression). Dataset includes information about a…
Read more
Fluorescence
ImageJ is one of my favourite image-analysis programs. ImageJ is free, it is compatible with all major operating systems (it is written in Java) and it is very powerful. Here I am showing how to use ImageJ for the analysis…
Read more
Basic configuration of an IGV Data Server
Setting up an IGV Data Server allows IGV users to access and explore genomic datasets located elsewhere online or on a local area network. To configure a Data Server, we need to inform IGV about where the genomic data files…
Read more
Working with R and Bioconductor on the cloud (Amazon EC2)
Some Bioconductor-based projects may be computationally challenging and require a lot of resources. If a powerful workstation is not available, it may be a good idea to work with R and Bioconductor at scale using Amazon Web Services (AWS). Setting…
Read more
Building and Publishing R packages on CRAN or GitHub
Building an R package is an easy and very convenient way keep your work well organized. Moreover, it facilitates sharing your code with the R community. Here we will discuss about publishing R packages on CRAN and GitHub. The post…
Read more
DNA sequence manipulation in R: getting the Reverse Complement of a DNA string
Manipulating DNA/RNA sequences is a very basic and fundamental operation in Molecular Biology. Writing the reverse-complement of a DNA sequence is very easy, but is also a error-prone operation if performed manually. Sequence manipulation tools are available online and free-of-charge…
Read more
easyPubMed for business: scraping PubMed data in R for a targeting campaign
In this post, I will cover how to use easyPubMed (R Package) to retrieve data from PubMed. This example is focused on data extraction from PubMed records for a targeting campaign. The post is aimed at suggesting a business-oriented way…
Read more
Querying PubMed via the easyPubMed package in R
PubMed (NCBI Entrez) is an online database of citations for biomedical literature that is available at the following URL: http://www.ncbi.nlm.nih.gov/pubmed. Retrieving data from PubMed is also possible in an automated way via the NCBI Entrez E-utilities. A description of how…
Read more
Scraping Impact Factor data from the Web using httr and regex in R
A couple of days ago, I found a website listing Impact Factor data of many scientific Journals organized in HTML tables (http://www.citefactor.org). Unfortunately, this website didn’t allow users to download Impact Factor tables in 1-click. Moreover, data were scattered over…
Read more