2010-10-02

head and tail for strings

The functions head and tail are very useful for working with lists, tables, data frames and even functions.
But they do not work on strings. It is easy to define such functions


> strtail <- function(s,n=1) {
+ if(n<0)
+ substring(s,1-n)
+ else
+ substring(s,nchar(s)-n+1)
+ }
> strhead <- function(s,n) {
+ if(n<0)
+ substr(s,1,nchar(s)+n)
+ else
+ substr(s,1,n)
+ }

and start using them:


> strhead("abc", 1)
[1] "a"
> strhead("abc", -1)
[1] "ab"
> strtail("abc", 1)
[1] "c"
> strtail("abc", -1)
[1] "bc"

It is not a good idea to name these functions head.character and tail.character because this
has unexpected effects if applied to a vector of strings.

Other useful string functions are defined in the the
stringr package.

2010-08-12

Tuning Notepad++

Here are some tricks I collected for making Notepad++ a more comfortable text editor for me in general in for the R programming language
in particular.


Switch between tabs in Notepad++ with Ctrl-PageUp/Down


Notepad++'s default behaviour is to use Ctrl+(Shift)+Tab for tabbing between different text files. This was very annoying to me,
because other programs I use, such as Chrome and Excel, bind these functions to Ctrl+PageUp/PageDown.

Today I found a blog post by Yawar Amin
which shows how change this behaviour.


Add autocompletion for R functions


Yihui Xie shows on his blog how to add autocompletion
for R functions and symbols to notepad++.
The Npp2R does something similar but less customizable, and does also contain
additional functionality like code passing, but I did not look into it in detail.

2010-08-04

use R! 2010 conference -- reflections

From July 20-23, this year's use R! conference took place in Gaithersburg near Washington. I attended the conference as part of my holidays in the
U.S. and had a good time there. I met some people, even though that is not the easiest thing for me to do, and I got some inspirations and ideas
I outline below:


Stat apps


One speaker mentioned „stat apps” as the new buzz. This caught my attention. One „stat app” example that comes to my mind is
Gapminder, which is both a web app and a desktop app. „Stat apps” seen at UseR 2010 include


  • Tibco Spotfire, a commercial product that offers specialized non-R GUIs that may communicate with an R server. The GUI follows
    a tabbed notebook concept with tabs like „raw data”, „filter”, „regression/scatterplot”, „cartographic map”.

  • Zubin Dowlaty from Mu Sigma showed how to integrate R in an Event Driven Architecture.
    The examples presented by Dowlaty were on analysing Twitter streams and TWS financial data. The GUI was developed with Adobe Flex, the
    event routing was delegated to Apache Camel.
    Processing real-time video data was an example presented by John Emerson, using the R packages openCV, bigmemory and synchronicity.

  • RCmdr, a Tcl/Tk application with a plugin system, for instance the EViews-inspired rcommanderplugin.Econometrics by Dedi Rosadi

  • red-r.org, an qt/rpy based application with focus on visualizing the analysis path



Topics in data analysis I want to learn more about



  • spatial data analysis, eg nearest neighbor ideas (Mark Hancock draws a connection to Actor Markov statistics) or
    creating cartograms for dyadic data (rubber sheet algorithm, circles algorithm (R), Newman-Gastner diffusion method (as mentioned
    by Benjamin Mazzotta)

  • index decomposition analysis, a quick question on
    stats.stackexchange.com
    revealed that there seem to be
    only partial implementations out there (R packages micEcon and micEconAids, by way of demand analysis).



Misc. Thoughts



  • Create a movie of plots using quicktime or swfDevice, swf()!

  • How to transfer Excel's pivot table concept to R? What's a grammar of tables?

  • There are user groups in D.C., New York, Basel, London. Where is the Berlin R user group?

  • r2lh — nice package for creating uni- and bivariate analysis. Could this be used in Tk setting?



Some web sites that were new to me:



2010-03-13

Rosetta language popularity

Rosetta Code is a community wiki which presents how to solve various programming tasks by different programming languages. Thus, it serves as a dictionary between programming languages, but also as cookbook of programming recipes for a specific language.

One unsolved (until today) programming task for R was to rank languages by popularity. I worked on it using the RJSONIO package from Omegahat and the Mediawiki API. Here I explain the code step by step:

First, let us look up the languages which are defined at Rosetta Code. The wiki has a category for solutions by programming languages, which we will use.


> library(RJSONIO)
> langUrl <- "http://rosettacode.org/mw/api.php?action=query&format=json&cmtitle=Category:Solutions_by_Programming_Language&list=categorymembers&cmlimit=500"
> languages <- fromJSON(langUrl)$query$categorymembers
> languages <- sapply(languages, function(x) sub("Category:", "", x$title))

Now for each programming language, there is a category of the users of the language. We iterate over all languages and count the category members.


> user <- function (lang) {
+ userBaseUrl <- "http://rosettacode.org/mw/api.php?action=query&format=json&list=categorymembers&cmlimit=500&cmtitle=Category:"
+ userUrl <- paste(userBaseUrl, URLencode(paste(lang, " User", sep="")),sep="")
+ length(fromJSON(userUrl)$query$categorymembers)
+ }
> users <- sapply(languages, user)

Now we can print out the top 15 languages:


> head(sort(users, decreasing=TRUE),15)
C C++ Java Python JavaScript Perl UNIX Shell
55 55 37 32 27 27 22
Pascal BASIC PHP SQL Haskell AWK C sharp
20 19 19 18 17 16 16
Ruby
14

It is very straightforward to work with the Mediawiki API, and it offers many other different features. It would be nice to have a S3 class that does all the URL encoding. There is already a project wikirobot on R-forge, but I did not look into it yet.

2010-03-10

German wikipedia entry on regression analysis reworked

After a bit of struggling, I finally finished my contribution to the german wikipedia article on regression analysis. It is not very easy nowadays to contribute to wikipedia. I learned a lot about copyright issues when I translated parts from the english wikipedia entry. Also it took some efforts to convince earlier contributors to accept my changes (well I hope I convinced them).

I learned also a lot about regression. It is definitely a large area of research now.

If you are fluent in German, take a look and improve the article. There is still a lot to do.

2010-02-25

inkblot: an alternative to stacked bar graphs

Sometimes it is not easy to get useful information from a stacked bar chart, see for instance
this blogpost at Support Analytics.

So-called inkblot charts, as discussed at Kaiser Fung's
Junk Charts, allow the reader to focus on the evolution
of a time series.

Now how to make this kind of charts with R? I asked on
StackOverflow. The given answers led to an implementation
of an inkblot function. It is delivered with the wzd package on r-forge. Here is an example which visualizes the income per capita from
various countries, as reported by gapminder:


> #install.packages("wzd")
> library(wzd)
> data(gmIncomePerCapita)
> selection <- window(gmIncomePerCapita,
+ start=as.Date("1900-01-01"),
+ end=as.Date("2008-01-01")
+ )[,c("Germany", "India", "Japan", "Norway", "United States", "Venezuela")]
> inkblot(selection, min.height=1300)