Stata in space-Econometric analysis of spatially explicit raster data
Stata in space-Econometric analysis of spatially explicit raster dataD. Miiller225Spatial cconomctrics dcal with the analysis of cconomic dat a that is explicitly linkedto location. The techniques of spatial econometrics account for the peculiarities intro-duced by the spatial perspective and are justified based on two reasons: First, spatialheterogeneity might arise due to a lack of structural stability across space, such as vary-ing parameters or functional forms. and due to nonhomogeneity of the units of observatiOns across space(Alselin 1988). Second, spatial autocorrelatioll-Imethodologicallysimilar to autocorrelation in time-series models--refers to a lack of independence amongobscrvations. Spatial autocorrelation pertains to a coincidence of valuc similarity withlocational similarity(Anselin 1988). This dependence among observations and the importance of relative loca tions is expressed by Tobler(1979) in his first law of geographwhich states that "everyt hing is related to everything else, but. near t hings are morerelated than distant things?". Interactions among neighboring agents could, for exampleinduce a correlation of the variables across space, which Illust be accounted for ill inodelestimationThe various spatial relationships among obscrvations can rcsult in unrcliablc cstimates and incorrect statistical inference of the parameters(Anselin 1988). For manysocial and economic processes, a better appreciation of the spatial context can potentia.ll y avoid misleading inferences and improve the strength of results and their interpretation. Knowledge about the location of a process and its interaction with processest neighboring locations call help infer the underlying reasons and logic of the processunder investigation. However, spatial analysis adds mathematical complexity due tothe necessary incorporation of two dimensions (in X-and in Y-direction)The ncxt scction bricfly introduces geographic information systcms(GIS) and thostructure of raster data. Section 3 outlines the program to import raster grids into StataSection 4 presents the program for systematic spatial sampling from a raster surfaceand section 5 the program to export the Stata files into a format usable by standardGIS software. The last section presents a Iluinerical example in which I estimate thedeterminants to observe forest cover using a spatially explicit binary logit model2 Geographic information systems and the raster datamodea geograplic inforInlation systeln serves to compile, store, Inlanipulate, analyze, andvisualize spatial data. As the two main data models in a GIs, the vector and the rasterodel are distinguished. Vector data contain X- and Y-coordinates, whipoints(single X- and Y-coordinates), lines(series of ordered points), and polygons(closcd lines with cqual start and end coordinatc). The raster data modcl is rcprcscntcdas an arrangement, of regularly shaped, contiguous cells in a two-dimensional matrixwhich together form a continuous data layer. A layer typically consists of square cellswhich fit together edge-to-edge. Each cell represents one location in a raster surface andcOntains integer or floating-point nuinbers indicating the characteristic of that locatioNA dataset usually contains various layers(or bands), which are stacked onl top of eachother and cover the geographical area of interest. A common application of multiple226Stata in spacelaycr data arc multispectral satellite images whcrc various bands cover a ccrtain spcctralrange of reflected electromagnetic energy. Raster data has the advantage of conceptualsimplicity, compact data storage, and well-established algorithms for processing andanalyzing. A main disadvantage is that it artificially imposes grid cell borders on continuous phenomena, which is often better represented in the vector modelThe structure of a raster data model is sketched in figure 1. Figure 1(a)shows datain a 5x 5 matrix of square cells with discrete observations ranging from one to five. Thecorresponding map in figurc 1(b) on thc right sidc can bc exported from common GISsoftware packages in ASCiI data format(b)22321233423551322445Figure 1: Raster data structure(a) and corresponding raster map(b)The exported ASCII file contains the spatial information in a header that occupies thefirst six rows of the data(see table 1). The llulnber of columns (nicols= 5)is indicatedin the first line and the number of rows (nrows= 5) in the second line. Line threeand four locate the map in space with geographic coordinates for the lower left Xcoordinatc. xllcorncr. and for the lowcr lcft Y-coordinatc. vllcorncr. Linc fivc statcssize of the cells in the specified map units, and line six assigns numerical missingvalues(arcInfo and Arc View assign -9999 by default, ) The map values startin line six with the upper-left cell from figure 1. Values are separated by spaces andmove from left to right, and then top to bottom. No carriage returns are necessary, asthe mluiuber of coluimns in the header inforination determines when a new row starts(Environmental Systems Research Institute(ESRI)2000)Continued on next page)227Table 1: Ascii raster with header informationcolsnrowscellsizeNODATA value -999911222313212334...3 Importing dataThe structure of the ASCII file froIn figure 1 and table 1 lllakes it straight forward toimport the text file into Stata. The program ras 2dta imports the data starting frombscrvation 13 aftcr the header that cnds with the default missing valuc of-0999. Eachraster ccll of a map is turned into onc obscrvation, and an cntirc map of rastor cellsyields one variable in Stata. Figure 1 yields 25 observations starting from the top leftto the left then to the hottom first, observation= 1, second =1sixth =3. seventh1,…, twentv-fith=5).InforInation froin the header is used to identify the Ilissing values and the nunlberof X-and Y-coordinates. ras 2dta optionally generates a variable carrying an identitycode for the cells, idcell(), to later facilitate the use of, e.g., merge or joinby andt of data back to the Gis softwoftware. Missilucs like valucs located outsidethe area of interest, can be dropped when the data is read by infile. Optionally, twovaria bles are generated representing the X- and r-indicators of the raster map. This isotentially convenient for drawing a spatial sample(see section 4) and for spatial statis-tical calculatiOns. ras2dta allows Imlaps to be iMported with different spatial structuresThe corresponding header information of each map is then saved as a separate file foreach of the imported raster maps3.1 Syntaxras2dta,files(filelist)[idcell(varname)xcoord(#)ycoord(#)missing(#)dropmiss extension(string) genxcoord(varname) genycoord(varname)header saving(filelist) replace clear3.2 Optionsfiles(filelist is required. It specifies the names of the ASCII files in filelist to beconverted into Stata format. ASCII files must be located in the same directory andbc listed with a scparating spacc, without the filc extension228Stata in spaceidcell(varna)is optional. It gcncratcs a spatial idcntificr(unique ID codc) for thegrid cells imported. The ID code starts at one in the top-left corner and increments insteps of one until the last cell at the bottom-right corner, from left to right and thentop to bottom. idcell( saves the variable under the specified name in varnamexcoord(#) is required if no header is present in the ASCII file. The nuInber of X-values(number of columns or ncols)must be entered as an integer valueycoord(#)is rcquircd if no hcadcr is prcscnt in thc ASCll filc. The numbcr of Y-valucs(number of rows or nrows)must be entered as an integer valuemissing(f) is optional. It must be specified if missing values are not the default ESRIno-data value of -9999missing values are dropped, including the corresponding codes from idcell( s())dropmiss is optiOnal. If specified, all default(-9999 )or user-defined (via missing)extension(string) specifics the filc extension of thc ASCll filc. extension( asc)is thclt. For files without an extension. extension(1 " )must be enteredgenxcoord (zarname) is optional. It creates the variable tarname carrying identifiersfor the columns of the entire imported grid. X-coordinates will start with 1 atthe top-left corner and increment to the right in steps of 1 (this is not affected bISSgenycoord(varname) is optional. It creates the variable varname carrying identifiersfor the rows of thc cntirc imported grid. Y-coordinates will start with 1 at thotop-left corner and increment to the bottom in steps of 1(this is not affected bheader (filename) optionally saves one ASCII header as a Stata data file for each im-ported grid. The header files are naIled h_filename, where filerlaine is the nane ofhe imported grid, with one variable called hdr. Existing files with the same namesaving (filelist) saves the Stata files under diffcrcnt namcs, as spccificd in filelist inside the parentheses of saving(without separating comma and file extension)saving will always save the files as Stata datasets. If saving( is specified, thnumber of imported grids specified in files() must equal the number of files speci-fied in saving(. The default is to save the file ill the salne directory and under thesame name as the original AScii gridreplace replaces existing files with the same name in the current working directoryclear clears the data currently in memory4 Spatial samplingThe existence of spatial relationships alllong observations call result in unreliable es-timates and misguided statistical inference of the parameters. Econometric problemsD. Miiller229with spatial data can bc duc to interactions among neighboring agents. Spatial cffcctscan also emerge when data from different sources, different sample designs, or varyingaggregation rules is used (Anselin 1988)One ad hoc technique to correct for spatial effects is to draw a systematic spatialsample froin a grid. With systeIllatic spatial saInpling, a lumber of cells are selected illa regular fashion. This is done by keeping only cells that are a specified distance awayfrom the nearest selected neighbor, resulting in a noncontiguous subsample of the dataSystematic spatial sampling pcrmits the application of standard estimation mcthodsAnselin 2001The program spatsam draws such syst ematic spatial samples witha. lIser-specihegap in the X-and r-direction and optionally saves the sample as a new dataset. spatsamdepends on the presence of X-and Y-coordinates that can be generated when iMportinhe grids using ras2dta4.1 Syntaxspatsam,gap(#)xcoord (varname) ycoord (iarname) insample(uarname)morestore saving(filename)replace4.2 Optionsgap(#) is required and specifies the spatial lag between selected observations. Fcexample, gap(4) specifies the selection of every fourth cell in the X-and y-directionThen the first observation in the sample is in the fourth row and fourth column, thesecond observation in the eighth row and fourth colullll, etcxcoord(varname) is required and specifies the variable that carries the X-coordinatesycoord (varname)is required and specifies the variable that carries the Y-coordinatesinsample(varname) is optional and saves the selected observations as a binary variablenamed varname. Selected observations get the value 1, and nonselected observationsget the value 0. insample() is not affected by the use of norestorenorestore prevents the restoration of the data previously inl lllennorysaving(filename) is optional and saves the data file under the name specified in filenamereplace replaces existing files with the same name in the current working directoryFor large raster maps, drawing a spatial sample with spatsam has the additional ad-vantage that it significantly decreases the number of observation, thereby reducing thecOinputational tine230Stata in space5 Exporting Stata variables as raster gridsAfter importing data, managing data, making econometric estimations, and performingpostestimation commands within Stata. the results may often be exported back tothe gis software package for a visual assessillent and further spatial calculations. Theprogralll dta2ras takes Stata variables anld saves theIn as ASCII raster grids in a formatreadable bymost standard GIS software packages. If no varlist is specified, all thevariables in the dataset are exported. The AScii grid files include a standard header andcan be readily imported into, e.g., Arc View(assuming Spatial Analyst or 3D-Analystis loaded)with File- Import Dat a Source - ASCII Rasterdta2ras asserts that the number of rows times the number of columns is equal tohe number of observations. With one call of dta2ras, only variables of the same spatialstructure(saine nunber of rows and colulllIs anld the saine cellsize) can be exportedIf a spatial sample was previously drawn, dta2ras optionally expands the spatiallysampled number of observations to the full grid size, i.e., to the number of rows timescolumns. The expansion requires the presence of a spatial identifier (idcell()asgenerated by ras2dta. Missing identifiers in idcell(are filled wit h missing va luesto arrive at the desired total number of observations. After sorting by idcell(, eachvariable has the same structure and number of observations as in the original grid. Thisallows importing the variable back into the GIS software by inserting a previously savedheader file or by manually providing the spatial structure of the raster grid with atleast the information on columns and rows in xcoord() and yoor(), ylIco-(optionallyalso with the remaining information on cellsize(), xllcorner(), yllcorner(), andmissing)5.1 Syntaxdta2ras varlist,header(filename)I xcoord()ycoord(#)cellsize(#xllcorner(#) yllcorner(#) missing(#)idcell(var lAne)idfile(filename) expand norestore saving( filelist) replace5.2 Optionsheader (filename) specifies the header file, which must be a Stata data file namedIf this option is not specified, xcoord() and ycoord( are required create this filefileI hdrxcoord(#) is required if header() is not specified and defines the number of Xcoordinates(columns or ncols) as integer valucsycoord(#)is required if header() is not specified and defines the number of ycoordinates(columns or ncols) as integer valuescellsize(#) is optional. It specifies the cell size of the resulting grids. The default isellsize(1)llcorner(#)is optional. It specifics thc X-coordinatc of the lowcr-lcft ccll. Thedefault is xllcorner(1yllcorner(#) is optional. It specifies the r-coordinate of the lower-left cell. Thedefault is yllcorner(1)missing(#) is optional. It lust be specified if Missing values are not the defaultArcInfo/ Arc View no-data value of-9999. The default is missing(-9999idcell(varnamc)is a variable carrying thc spatial identifier(ID codc) of the grid ccllsand is required if expand is specified without idfile(. The upper-left cell inidcell() starts at 1 and must increment in steps of l moving from left to right, andthen top to bottomidfile(ilenane) is the Stata data file that carries the spatial identifier(ID code) ofhe grid cells and is required if expand is specified without idcell(. The upper-leftcell in idfile() must carry the identifier 1 and must increment in steps of 1 movingfrom Icft to right, and then top to bottom. If idfile( is spccificd, thc identifyingvariable in the master and using data. must have the same nameexpand expands the dat aset to the full number of observations, e. g, if a, spatial samplewas previously drawn using spat sam. expand depends on the presence of idcello)oridfileonorestore prevents the restoration of the data previously in memorysaving(filelist) saves the ASCll files under diffcrcnt namcs, as spccificd in filelist(names separated by spaces, without comma and file extension). saving( savesthe files in ASCTT format. If saving is specified, the number of exported variablesin varlist must equal the number of files specified in filelist. The default is to savethe raster grids under the exported variable names with the ending_oascreplace replaces already existing files of same name in the current working directory6 A numerical exampleA subset of real-world data from the Central Highlands of Vietnam is used as a numericalexample for econometric estimation with spat ia ly explicit raster data. The data isdescribed in Miller(2003 )and stems from a research project on the determinants ofland-use change. Forest cover (forest) is chosen for the purposes of this paper asthe billary dependent variable(forest= 1 and nonforest=0. As covariates, severaindicators describing the agricultural potential of each raster cell are used, comprisingof slope(slp), elevation(elev), soil suitability(soil), as well as the Euclidean distanceto major roads (disroad)as a proxy indicating access to markets and transportationcosts. For simplicity reasons, I omitted a range of other variables that potentiallyinfuence forest cover, such as population density, the introduction of technologies, orthe accurrence of protected areas. The dependent variable and the four covariates arealculated and stored within a GIS as raster layers with the same geographic projectiongrid cell size, and spatial extent232Stata in space6.1 Data preparationFrom within the GIS software, each raster layer is exported as an ASCiT data file. Thelayer forest cover as the dependent variable forest is imported to Stata withras2dta, f(forest) header idcell(idc) genx(x)geny (y) replace clearNo of col-389number of cells134,983file forest, dta savedVariableObsManStd, DevMaxidc1349836749238966.38134983forest1349838371795.3692032134983195112.2947389134983174100.1702The variable idc carries a unique identifier for each observation. The binary variableforest indicates a forest cover of 84%. Column values are contained in the variable xadvancing from column onc to column 389 and restart at onc in the sccond row. Thisprocedure is repeated 347 times for every row. Row numbers are named y, advancingfrom row one to row 347 in blocks of 389(columns). The spatial information of theraster grid for forest is saved in h_forestdtause h forest. clearep(2)1nconrows3475xllcorner223331.636348.1372962.4841luiz10ObservatiOn two specifies the nuInber of coluIllnls(389), alld observation four specifies thenumber of rows(347). The total number of observations is the product of columns androws(134,9The lower-left coordinate, referenced to a projected coordinate system(in this casc, to Univcrsal Transvcrsc Mcrcator, U'T'M) is rcprcscntcd by obscrvation six
用户评论