Try to search your question here, if you can't find : Ask Any Question Now ?

# How to divide a vector into multiple groups using regex?

HomeCategory: stackoverflowHow to divide a vector into multiple groups using regex?

I failed to adapt this solution to group a vector by regular expressions for multiple groups and can’t figure out what I’m doing wrong. Another solution didn’t help me either.

``````x1 <- gsub(paste0("(^a?A?pr)|(^a?A?ug)|(d?D?ec)"),
"\1 \2 \3", x)
> unique(x1)
[1] "  dec" "Apr  " " aug " "apr  " "  Dec" " Aug "
``````

I expected three unique groups as I have defined them in the `gsub`, i.e. just something like `"dec Dec", "aug Aug", "apr Apr"`.

With more than 9 groups it’s even worse.

``````y1 <- gsub(paste0("(^a?A?pr)|(^a?A?ug)|(d?D?ec)|(^f?F?eb)|(^j?J?an)|(^j?J?ul)|",
"(^j?J?un)|(^m?M?ar)|(^m?M?ay)|(^n?|N?ov)|(^o?O?ct)|(^s?S?ep)"),
"\1 \2 \3 \4 \5 \6 \7 \8 \9 \10 \11 \12", y)
> unique(y1)
[1] "         0 1 2"             "      jun   0 1 2"
[3] "     jul    0 1 2"          " Aug        0 1 2"
[5] "     Jul    0 1 2"          "   feb      0 1 2"
[7] "      Jun   0 1 2"          "       Mar  0 1 2"
[9] "    jan     0 1 2"          "Apr         Apr0 Apr1 Apr2"
[11] "  dec       0 1 2"          "   Feb      0 1 2"
[13] "  Dec       0 1 2"          "apr         apr0 apr1 apr2"
[15] " aug        0 1 2"
``````

As the final result I aim for a factorized vector with unique levels for the different appearances of the same type (i.e. in this example a group for each month name, not case-sensitive).

Data

``````x <- c("dec", "Apr", "dec", "aug", "dec", "dec", "Apr", "apr", "apr",
"dec", "Dec", "Aug", "Aug", "Apr", "Aug", "Apr", "aug", "Apr",
"apr", "Apr", "dec", "aug", "aug", "aug", "aug", "apr", "dec",
"Aug", "dec", "dec", "Dec", "Dec", "Apr", "Apr", "dec", "dec",
"Dec", "dec", "apr", "Apr", "Apr", "dec", "apr", "apr", "apr",
"apr", "Aug", "apr", "dec", "dec")

y <- c("Oct", "jun", "oct", "jul", "Aug", "jul", "Sep", "Jul", "feb",
"feb", "Jun", "Mar", "jan", "Apr", "jul", "oct", "Jun", "jan",
"Jun", "Oct", "Jul", "dec", "Jun", "Sep", "Feb", "Nov", "Feb",
"dec", "Apr", "Dec", "jan", "Aug", "Feb", "apr", "Sep", "Nov",
"aug", "oct", "Jun", "jul", "Apr", "Jun", "Apr", "Dec", "Jun",
"Jul", "Aug", "Aug", "Jul", "sep")
``````