i have data frame varying number of columns (depending on year have fewer or more data points). cross-sectional time series long dataset rather wide dataset need pull out vector each year (and create country tables).
at moment r puts na
s @ end of rows if have fewer data points (which means of end columns have na-s).
however use each row input vector in python code not na
s. replace na
s empty cells. ideal have different length vectors. replacing na
s zeros not work either since keep track of different row sizes different years. have found answers characters have numbers, appreciated. goal write table or csv file without na-s, pass each row in python code.
thank you!
mat1 <- matrix(c(3,0, 1, 13, na, na,na, 3, 0, 1, 13, na, na, na, 3, 0 ,1 ,16, na, na, na, 3,0, 1, 16, na, na, na, 0, 0, 134, 33, 39, 1, 14, 0,0, 134, 33, 39, 1, 14),7,6) print(t(mat1)) [,1] [,2] [,3] [,4] [,5] [,6] [,7] [1,] 3 0 1 13 na na na [2,] 3 0 1 13 na na na [3,] 3 0 1 16 na na na [4,] 3 0 1 16 na na na [5,] 0 0 134 33 39 1 14 [6,] 0 0 134 33 39 1 14
as data.frame:
> print(as.data.frame(t(mat1))) > v1 v2 v3 v4 v5 v6 v7 > 1 3 0 1 13 na na na > 2 3 0 1 13 na na na > 3 3 0 1 16 na na na > 4 3 0 1 16 na na na > 5 0 0 134 33 39 1 14 > 6 0 0 134 33 39 1 14
depending on how you're passing rows python code, there variety of ways of handling this, none of them correspond "emptying cells" - na
value (arguably) best/most sensible way code empty cell in rectangular array in r.
mat1 <- matrix(c(3,0, 1, 13, na, na,na, 3, 0, 1, 13, na, na, na, 3, 0 ,1 ,16, na, na, na, 3,0, 1, 16, na, na, na, 0, 0, 134, 33, 39, 1, 14, 0,0, 134, 33, 39, 1, 14),nrow=7,ncol=6) mat2 <- t(mat1) ## see below ## text description says `na` values come @ end ## of *rows*, matrix has `na` values @ end of ## *columns*, i've transposed matrix.
since stated goal
write table or csv file without na-s
the correct answer (as hinted @ now-deleted comment) use write.csv(...,na="")
: ?write.csv
,
na: string use missing values in data.
more generally, if wanted pass rows python 1 @ time, use 1 of following strategies:
- use
na.omit()
strip outna
values:
for (i in 1:nrow(mat2)) call_my_python_code(na.omit(mat2[i,]))
or
apply(mat2,1,function(x) call_my_python_code(na.omit(x))
- store data list, either beginning or splitting list (you still have rid of
na
values):
my_list <- split(mat2,row(mat2)) my_list <- lapply(my_list,na.omit) lapply(my_list,call_my_python_code)
- store data in long format , use
plyr
ordplyr
tools operate on chunks ...
library(reshape2) mat3 <- na.omit(melt(mat2)) mat3[mat3$var1==1,] ## row 1 library(plyr) dlply(mat3,"var1",function(x) call_my_python_code(x$value))