Title: | Import and Export Data |
---|---|
Description: | Import and export data from the most common statistical formats by using R functions that guarantee the least loss of the data information, giving special attention to the date variables and the labelled ones. |
Authors: | Roger Pros, Isaac Subirana, Joan Vila. |
Maintainer: | Isaac Subirana <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.3 |
Built: | 2025-03-12 04:17:17 UTC |
Source: | https://github.com/cran/ImportExport |
Import and export data from the most common statistical formats by using R functions that guarantee the least loss of the data information, giving special attention to the date variables and the labelled ones.
The package also includes an usefull shiny app called by ImportExportApp
which uses all the content of the package to import and export databases in
a rather easy way.
The DESCRIPTION file:
Package: | ImportExport |
Type: | Package |
Title: | Import and Export Data |
Version: | 1.3 |
Date: | 2020-09-18 |
Author: | Roger Pros, Isaac Subirana, Joan Vila. |
Maintainer: | Isaac Subirana <[email protected]> |
Description: | Import and export data from the most common statistical formats by using R functions that guarantee the least loss of the data information, giving special attention to the date variables and the labelled ones. |
Depends: | gdata, Hmisc, chron, RODBC |
Imports: | readxl, writexl, haven, utils |
Suggests: | shiny, shinyBS, shinythemes, compareGroups, foreign |
License: | GPL (>= 2) |
NeedsCompilation: | no |
Packaged: | 2020-09-21 12:04:13 UTC; isubi |
Date/Publication: | 2020-09-21 13:00:03 UTC |
Config/pak/sysreqs: | make libicu-dev unixodbc-dev libx11-dev zlib1g-dev |
Repository: | https://isubirana.r-universe.dev |
RemoteUrl: | https://github.com/cran/ImportExport |
RemoteRef: | HEAD |
RemoteSha: | 5b9fb9e7fcc32efadc6867a92c1e0c4e653e86e0 |
Index of help topics:
ImportExport-package Import and Export Data ImportExportApp Runs the shiny app access_export Export multiple R data sets to Microsoft Office Access access_import Import tables and queries from Microssoft Office Access(.mdb) excel_export Export multiple R data sets to Excel format_corrector Identify and corrects variable formats spss_export Export data to SPSS (.sav) by using runsyntx.exe or pspp.exe spss_import Import data set from SPSS (.sav) table_import Automatic separator data input var_view Summarize variable information
Roger Pros, Isaac Subirana, Joan Vila.
Maintainer: Isaac Subirana <[email protected]>
## Not run: x<-spss_import("mydata.sav") ## End(Not run)
## Not run: x<-spss_import("mydata.sav") ## End(Not run)
Directly connect (and disconnect at the end) with the Microssoft Office Access database using the RODBC package and write one or multiple data sets.
access_export(file,x,tablename=as.character(1:length(x)),uid="",pwd="",...)
access_export(file,x,tablename=as.character(1:length(x)),uid="",pwd="",...)
file |
The path to the file with .mdb extension. |
x |
Either a data frame or a list containing multiple data frame to be exported. |
tablename |
A character or a vector character containing the names that will receive the tables where the data frame is stored. If it is a vector, it must follow the same order as the data frames in |
uid |
see |
pwd |
see |
... |
see |
Date variables are exported as an integer, they might be converted to character if a character representation in the access database is wanted.
No value is returned.
This function connects and writes on an existing Microsoft Office Access database, but it can't create a new one.
## Not run: # x is a data.frame file<-("mydata.xlsx") a<- 1:10 b<-rep("b",times=10) c<-rep(1:2,each=5) x<-data.frame(a,b,c) excel_export(x,file,table_names="mydata") # x is a list y<-list(x,x[2:3]) excel_export(y,file,table_names=c("mydata1","mydata2")) ## End(Not run)
## Not run: # x is a data.frame file<-("mydata.xlsx") a<- 1:10 b<-rep("b",times=10) c<-rep(1:2,each=5) x<-data.frame(a,b,c) excel_export(x,file,table_names="mydata") # x is a list y<-list(x,x[2:3]) excel_export(y,file,table_names=c("mydata1","mydata2")) ## End(Not run)
Directly connect (and disconnect at the end) with the Microssoft Office Access database using the RODBC package and read one or multiple data sets. It can read both tables and SQL queries depending on the input instructions. It automatically detects date variables that are stored also with date format in the original data set.
access_import(file,table_names,ntab=length(table_names), SQL_query=rep(F,times=ntab),where_sql=c(),out.format="d-m-yy",uid="",pwd="",...)
access_import(file,table_names,ntab=length(table_names), SQL_query=rep(F,times=ntab),where_sql=c(),out.format="d-m-yy",uid="",pwd="",...)
file |
The path to the file with .mdb extension. |
table_names |
A single character or a character vector containing either the names of the tables to read or the SQL queries to perform. Each position must contain only one table name or SQL querie.The format of the SQL queries must follow the one described in |
where_sql |
If |
out.format |
a character specifying the format for printing the date variables. |
ntab |
The number of tables to import, equal to the number of table names. |
SQL_query |
Auxiliar vector to perform the function. |
uid |
see |
pwd |
see |
... |
By default, the function gives to each data set the name specified in table_names, so the sql queries data set have probably an inappropriate name. It can be easily renamed using names
.
A data frame or a data frame list containing the data requested from the Microsoft Office Access file.
The function don't contribute in the date variables detection, it just process with the Chron package the ones who has been automatically detected.
access_export
,var_view
sqlFetch
, sqlQuery
## Not run: x<-access_import(file="mydata.mdb", table_names=c("table1","table2", "Select * From table1 inner join table2 on table.var1=table2.var2","table3") ,where_sql=c(3)) ## End(Not run)
## Not run: x<-access_import(file="mydata.mdb", table_names=c("table1","table2", "Select * From table1 inner join table2 on table.var1=table2.var2","table3") ,where_sql=c(3)) ## End(Not run)
Exports a single data frame or a list of data frames to one or multiple excel sheets using the function write_xlsx
frome the writexl package. This function can write multiple data frames (passed as a list) with a single command .It can write both .xls and .xlsx files.
excel_export(x,file,table_names=as.character(1:length(x)),...)
excel_export(x,file,table_names=as.character(1:length(x)),...)
x |
Either a data frame or a list containing multiple data frame to be exported. |
file |
The name of the file we want to create. |
table_names |
A character or a vector character containing the names that will receive the sheet where the data frame is stored. If it is a vector, it must follow the same order as the data frames in |
... |
see |
No value is returned.
## Not run: # x is a data.frame file<-("mydata.xlsx") a<- 1:10 b<-rep("b",times=10) c<-rep(1:2,each=5) x<-data.frame(a,b,c) excel_export(x,file,table_names="mydata") # x is a list y<-list(x,x[2:3]) excel_export(y,file,table_names=c("mydata1","mydata2")) ## End(Not run)
## Not run: # x is a data.frame file<-("mydata.xlsx") a<- 1:10 b<-rep("b",times=10) c<-rep(1:2,each=5) x<-data.frame(a,b,c) excel_export(x,file,table_names="mydata") # x is a list y<-list(x,x[2:3]) excel_export(y,file,table_names=c("mydata1","mydata2")) ## End(Not run)
The function creates a loop to compare for each variable the values it have with the usual ones that typical R formats have in order to correct, for example, missing value or dates stored as a character. It also specify for each variable the most appropriate SPSS format that it should have.
format_corrector(table,identif=NULL,force=FALSE,rate.miss.date=0.5)
format_corrector(table,identif=NULL,force=FALSE,rate.miss.date=0.5)
table |
The data set we want to correct. |
identif |
The name of the identification variable included in the data frame. It will be used to list the individuals who had any problems during the execution of the function. |
force |
If TRUE, run format_corrector even if "fixed.formats" attribute is TRUE |
rate.miss.date |
The maximum rate of missing date fields we want the function to accept.The function details which fields have been lost anyways. |
If the date variable don't have chron format it must be in one of the following formats, else the function leaves it as a character:
—-dates separator must be one of the following:("-","/",".").
—-hour separator must be ":".
A single data frame which results from the function.
This function may not be completely optimal so it might have problems when correcting huge data frames.
require(ImportExport) a<-c(1,NA,3,5,".") b<-c("19/11/2006","05/10/2011","09/02/1906","22/01/1956","10/10/2010") c<-101:105 x<-data.frame(a,b,c) sapply(x,class) x_corr<-format_corrector(x) sapply(x_corr,class)
require(ImportExport) a<-c(1,NA,3,5,".") b<-c("19/11/2006","05/10/2011","09/02/1906","22/01/1956","10/10/2010") c<-101:105 x<-data.frame(a,b,c) sapply(x,class) x_corr<-format_corrector(x) sapply(x_corr,class)
Runs a shiny app which uses all the content of the package to import and export databases in a rather easy way.
ImportExportApp(...)
ImportExportApp(...)
... |
See |
It requires a few packages to run the app: shiny, shinyBS, shinythemes, compareGroups.
## Not run: ImportExportApp() ## End(Not run)
## Not run: ImportExportApp() ## End(Not run)
Export data to txt and syntax to an spss syntax file and then runs runsyntx.exe (located in the SPSS folder) in order to create the final file with .sav extension containing the data frame we wanted to export. Date variables in the original data frame are also identified when reading the .sav file with SPSS.
spss_export(table,file.dict=NULL,file.save=NULL,var.keep="ALL", file.runsyntax="C:/Archivos de programa/SPSS/runsyntx.exe", file.data=NULL,run.spss=TRUE,dec=".")
spss_export(table,file.dict=NULL,file.save=NULL,var.keep="ALL", file.runsyntax="C:/Archivos de programa/SPSS/runsyntx.exe", file.data=NULL,run.spss=TRUE,dec=".")
table |
A data frame to be exported. If it's a matrix, it will be converted into data frame. |
file.dict |
Spss syntax file containing the variable and value labels. |
file.save |
The name of the .sav file we want to create. |
var.keep |
Name of the variables to save. All variables will be saved by default. |
file.runsyntax |
The path to the file runsyntx.exe or pspp.exe . |
file.data |
The name of the .txt file containing the data. It will be created as a temp file by default. |
run.spss |
If true, it runs SPSS and creates the .sav file, else it shows the syntax on the screen. |
dec |
The string to use for decimal points, it must be a single character. |
Both runsyntax.exe (from SPSS) and pspp.exe works the same way.
No value is returned.
If neither SPSS nor PSPP is installed the function can just return the data in a .txt file and the syntax in an SPSS syntax file (.sps).
## Not run: table=mydata file.dict=NULL file.save="C:\xxx.sav" var.keep="ALL" export.SPSS(table=table,file.dict=file.dict,var.keep=var.keep,file.save=file.save) ## End(Not run)
## Not run: table=mydata file.dict=NULL file.save="C:\xxx.sav" var.keep="ALL" export.SPSS(table=table,file.dict=file.dict,var.keep=var.keep,file.save=file.save) ## End(Not run)
Read a labelled data set from SPSS, finding automatically the date variables and keeping the variable and value labels information, by using the information obtained with spss_varlist()
and the function spss.get
from the Hmisc Package.
spss_import(file, allow="_",out.format="d-m-yy",use.value.labels=F,...)
spss_import(file, allow="_",out.format="d-m-yy",use.value.labels=F,...)
file |
The path to the file with .sav extension. |
allow |
A vector which contains the characters that must be allowed in the variable names. |
out.format |
A character specifying the format for printing the date variables. |
use.value.labels |
If TRUE, replace the labelled variables with their value labels. |
... |
See |
In order to provide the maximum functionallity, if the main code generates an error, the function tries to read the file with the read_sav
function from the haven package, but a warning message appears.
The var_view
function can be used to summarize the contents of the data frame labels.
A data frame or a list containing the data stored in the SPSS file.
If the warning message appears and the file has been read using read_sav
the resulting data frame will be diferent from the expected one (see the haven package to learn more about read_sav
).
Dave MacFarlane, Roger Pros, Isaac Subirana
## Not run: x <- spss_import("mydata.sav") ## End(Not run)
## Not run: x <- spss_import("mydata.sav") ## End(Not run)
A small variation of the original read.table
that most of the time detect automatically the field separator character. It also includes the option to run the format_corrector
function in order to detect, for example, the date variables included in the original data set. If the function don't recognize any separator, it asks to specify the real one.
table_import(file,sep=F,format_corrector=F,...)
table_import(file,sep=F,format_corrector=F,...)
file |
The patch to he file which the data are to be read from. |
sep |
The field separator character, see |
format_corrector |
If True, it runs the |
... |
More arguments from |
The format_corrector
function is a complicated function so it's not recommended to run it unless the data set contains awkward variables like dates.
A data frame containing the data stored in the file.
This function might have problems if any of the fields contain typical separators, so it's always recommended to check the resulting data frame in order to avoid possible errors.
## Not run: x <- table_import('mydata.csv',format_corrector=T) ## End(Not run)
## Not run: x <- table_import('mydata.csv',format_corrector=T) ## End(Not run)
Creates a table with the name, the description, the value labels and the format for each variable in the data frame. It is similar to the variable view shown in the SPSS.
var_view(x)
var_view(x)
x |
The data frame whose variables we want to summarize. |
A data frame containing the specified summary.
This function was built in order to summarize imported SPSS labelled data sets using spss_import
, but it can also work with other labelled data sets.
require(ImportExport) a<- 1:10 b<-rep("b",times=10) c<-rep(1:2,each=5) x<-data.frame(a,b,c) attr(x$a,"label")<- "descr1" attr(x$b,"label")<- NULL attr(x$c,"label")<- "descr3" attr(x$c,"value.labels")<-list("1"="Yes","2"="No") var_view(x)
require(ImportExport) a<- 1:10 b<-rep("b",times=10) c<-rep(1:2,each=5) x<-data.frame(a,b,c) attr(x$a,"label")<- "descr1" attr(x$b,"label")<- NULL attr(x$c,"label")<- "descr3" attr(x$c,"value.labels")<-list("1"="Yes","2"="No") var_view(x)