Infusing GIS with Data Accuracy

A 578Kb PDF of this article as it appeared in the magazine—complete with images—is available by clicking HERE

Est quodam prodire tenus si non datur ultra. – Horace
"Though of exact perfection you despair, yet every step to virtue’s worth your care."

When I first began presenting this subject a few years ago, the title was "Infecting GIS with Accuracy," to poke gentle fun at the mutual mudslinging between GIS and Surveying/ Engineering folks. This can be a sensitive subject, and even seemingly harmless ribbing does great disservice to the oft underappreciated skilled professionals that have developed GIS. These pioneers have developed a landscape of digital information that has and will continue to reshape our industries and society as a whole.

Often a good way to ease tensions is to conduct the discussions in a lighthearted tone. One truism (or perhaps rationalization) of sarcasm is that it is usually based on some modicum of truth (albeit uncomfortable truths). It is my hope that none of what is presented in the this article fosters ill will for either group.

To a surveyor GIS may often seem "spatially challenged," to other professionals, GIS may seem "data challenged." To the GIS professional, both of these views may seem like a lot of armchair quarterbacking. Whatever the real or perceived shortcomings are of a particular data set, the critic may not fully comprehend the ROYAL PAIN it might be to try to fix one, even if better data were readily available. What I am trying to get at with the quote from Horace (apart from trying to appear sophisticated) is, that even if absolute perfection is not achievable, we should all do our best to strive for it.

In the last decade there has been an explosion of options for maintaining, updating, augmenting and most important, improving existing GIS data. The G I S is moving from model of autonomous, (sometimes by design) selfcontained systems toward a fully-integrated component of an enterprise knowledge base.

Surrounding the enterprise knowledge base elements (as illustrated on opposite page) are multi-threaded "shells" of business processes, mining and feeding the core. The challenges are finding the key points along these shells to best feed the core without significantly impacting the cost of the respective processes.

An example that would be most familiar to surveyors is the data cycle for civil design-construction projects. Many of the manufacturers of civil/survey/GIS software have been touting full data cycle solutions. Autodesk offers a suite of civil software that can handle engineering data throughout the entire civil cycle, "cradle to the grave," so to speak. The new ESRI Survey Analyst offers a solution that manages survey measurements and resolutions directly in a GIS environment. There are many other examples.

The challenges of updating legacy GIS runs deeper than purely technological and budgetary concerns, it borders more on the motivational, with the former concerns typically having the greatest influence on the latter. Tell your folks that yes, it can be done, the highest hurdle is simply getting started.

Enterprise GIS or Geo-spatially Enabled Enterprise?
You’ve undoubtedly heard this type of analogy before: once upon a time the typewriter was an expensive, persnickety, hulking chunk of machinery. Folks were specially trained to use them and put in "typing pools." Office architectures, furniture (and politics) were specially built around these cumbersome beasts. Voila, a keyboard hooked to a surprisingly forgiving chunk of hardware/software that even (gasp) management could handle and draft their memos has made typists (oops, keyboardists) out of every office "cube-lidyte." It is not unheard of for CEOs to draft their own documents (not that it is such a good idea…but I digress)…. What happens when GIS is just another toolbar on any office application? It may not be as far away as we think, but first, let’s look back at that typewriter analogy.

Legacy GIS—How and Why It Got the Way It Is
GIS was pioneered both in concept and practice in academia, where the latest in processing technologies (and those that could fathom the concepts and software) was to be found.

The Department of Defense (DoD), huge utilities, and other early GIS pioneers (those that could fund them) were able to greatly streamline only certain operations by creating rudimentary (almost schematic) maps of infrastructure or geographic concerns. Accurate locations were prohibitively expensive.

Data talked…geography walked… (albeit with limited data).

GIS themes focused mainly on the types of data requested by those who could fund their development. Often these noble projects had to borrow base data from older, looser data sets, further compounding woes.

This is not to completely pooh-pooh Legacy GIS. Daily, mega-buck cost savings are being realized worldwide even with these inherent shortcomings. Enterprises spent decades and countless millions developing some pretty outstanding GIS systems (albeit slightly flawed).

Expectations often exceeded reality. Later, when clients expected these systems to integrate seamlessly an Enterprise Knowledge/ Data Management Systems (the latest grand venture), a further dose of reality had to be swallowed.

Needless to say there was a lot of buyers’ remorse, which fueled the next phase.

Future GIS—Not Somewhere, but Everywhere
Google this: "knowledge management systems," or grab a copy of publications like "KM" or "Transform" and you can take a ride on the tsunami created by the tremors of the processing power explosion of the past decade. Most middlingto-large enterprises are already headed this way. Given this capitulation to the oncoming juggernaut, the CAD and GIS industries have been quick to follow suit.

With only the slightest fraction of exceptions, any enterprise data can be linked in some way to a physical location, even by threads of relationships that boggle the mind. A geo-spatial query can link data elements that have absolutely no other commonality.

The model is a huge database or data warehouse for enterprise data, documents, correspondence, assets, and sundry indexes from all of the shells of business practices and processes that revolve around it. This model that doesn’t so much "bring the data to the GIS," but "brings the GIS to the data."

Walk towards the light…all are welcome…(or is it more like "You will be assimilated…resistance is futile"?)

Data Accuracy—Spatially and Elemental
If you talk to different current and potential users of GIS you will hear varying views on how "accurate" the theme is, or just what "accuracy" means:
"How was the location derived?"
"What datum?"
"Who determined what type of attribute?"
"Were they qualified to make that call?"
"How often is this updated?"
"Who updates it?"
"What the heck is that code there?"

Good metadata can answer most of these questions (assuming good metadata is at hand). GIS data is typically created and maintained by a combination of folks versed in one or more areas of expertise, as listed in Figure 1.

As with many other good intentions, and for the reasons already stated concerning the creation of legacy GIS themes, it was often only the m
iddle group that had a direct hand in the creation of early themes. But even when there were truly collaborative efforts, resultant data often got mixed in with themes of questionable origin.

SPATIAL ACCURACY—one can state a location to umpteen significant figures, but there may be no measurable confidence in its actual location, or even if the right element has been identified.

DATA ACCURACY–even if we are able to gauge the confidence we have in the attribute values assigned to a feature, are there enough appropriate attributes to facilitate other downstream uses? No blame game here, just accuracy in reporting. More important now is how well can the theme accommodate upgrades and improvements?

Maintaining, Augmenting, Updating, and Improving
It was costly to create the original theme. But per unit of feature (think about effort per unit, like per parcel for an assessor’s theme) it is dramatically costlier to update, augment, manipulate, and correct an existing theme for the basic reasons of interdependence. Common theme classes and hierarchical themes within a class are illustrated in Figure 2, with a closer look at hierarchical topology in Figure 3.

Themes are interdependent in spatial relationships, whether they are linked topologies or not, and by the labyrinth of their respective relational databases (Figure 4).

Challenges—No Data Funnel, No Funds, No Fun

Like a Rubik’s cube, if you try to adjust one element of a theme it will inevitably affect another. Even if we had standardized CAD data, fully attributed for export to GIS, the opportunity is often missed for one or more of the following reasons:

"The original funding client did not budget for maintenance of the theme."

"If I alter theme A, then that would mess up the location of items in theme B."

"The complex custom application we built to query B and C would be compromised unless we update the parent/child relationships with the adjacent infrastructure represented by themes D through K."

"When I get enough data to justify updating the whole tile, then I’ll do it."

"Why don’t you just do all of your engineering in [enter GIS software name here]?"

It may be tough to justify a costly update involving a singular or small number of features. But what if a wide, swift, and constant flow of data were to be offered? Then a good cost/benefit analysis could be floated for an ongoing update program.

Other challenges facing integration of engineering data lie in its own legacy product: the plan set. The primary form of conveyance for engineering geometry is still the "signature set" of drawings that rely on conventions of layer, lineweight, linetype, symbol, labels, and dimensions (which seem to vary wildly from jurisdiction to jurisdiction, shop to shop, and drafter to drafter… but I digress). The drawing itself is (somewhat correctly) viewed by many in the GIS field as graphically rich, but data poor.

Google this: "national CAD standards." Surprise! There are many flavors of these multiple "national standards," and few in officially sanctioned publications. For more fun, Google "Digital Submission Standards CAD," and look at some promising examples (though far too few). Some of the new software suites do provide tools for implementing standards and CAD-GIS integration plans, once they are developed.

Opportunities—Grabbing the Data Before it Evaporates

In any business process, there are instances where decisions are made that will effect change either immediately, or at some later time, in the state of the enterprise knowledge base.

Example: During the drainage design of a new facility, the design engineer decides that the current main along the adjacent street needs to be replaced. If the layer, symbol, and/or attribute of the drawing element carried a record of the decision specifics, through the life cycle of plan-set and as-built, it could be mined for the GIS.

Not a revolutionary concept, this sort of thing has been in place in some industries like major utilities for decades. This has been achieved by a work order, checkout, as-built, posting, and reconciliation track. Unfortunately this model is far too simple to accommodate complex multi-theme data from the civil engineering process.

Solutions–Spatial and Data Alchemy

I’ll not exactly be going out on a limb by making the statement: "Formats are fast becoming irrelevant."

Options for format conversions, which years ago we viewed with awe, are growing geometrically (or rather algebraically as the conditions of the industry change in a self-perpetuating Von Neuman’s machine-like spiral…but I digress…. anyhow, Google that one, too).

One big glaring solution is to create and maintain the entire GIS wholly and entirely in a CAD environment. Dealing with legacy GIS environments (and mentality) is the main obstacle there.

Oracle and other big-honking-database solutions, can now be accessed with tools offered natively in many civil/GIS products. This delivers on the promise of the future model for a Geo-spatially Enabled Enterprise Knowledge Base.

There are also a number of third party developers that provide CAD-GIS middle-ware (C-Plan, Ford Data, Spatial-Info, Haestad, Hitachi, to name only a few) that allow the same enterprise data to be accessed, manipulated, analyzed, and updated from a number of different CAD and GIS platforms, even simultaneously. This delivers on the promise that someday "it won’t matter what format it is stored in, we can all finally work with the same data."

Autodesk-MAP, FME, LandXML, and Open GIS formats are examples of tools tracking this promising trend. Don’t sweat the details on the technical side, it is still mainly the process management that proves the biggest challenge.

Key Events in the Civil Engineering Data Cycle
There are IT/IS systems, which already support many of the individual business processes within each phase. Byproducts of many of these processes are ripe for mining.

Example: When creating a terrain model, a TIN boundary makes a handy addition to a project status theme, it provides a closed polygon to locate a current project and link other project data to.

CAD layering standards are most often utilized when mapping to GIS themes. But rather than trying to map layer-totheme on a one-to-one basis, expand those options to layer-linetype-color-other combinations, or parse the layer name to populate respective attribute fields within a theme.

By creating a more elaborate custom (perhaps Visual Basic) tools to manage more complex combinations of CAD conventions, you start populating even more attribute fields within the resultant themes, and start to provide added value to the linework you are handing over. The survey industry has a lot of competition is the area of GIS data acquisition. Right or wrong, it may take more effort in providing a "value added" component to our products than in trying to legislate a level playing field.

Spatial Component Improvements—Not Just for the Surveyors

Many heated "discussions" between GIS and surveying folks might be avoided if the GIS metadata (assuming that metadata has been provided and not "cooked") included a bit more useable detail about the circumstances that established the location component. Along with methods, an accuracy reporting convention like the FGDC standards (Google this: "FGDC accuracy standards") should be mandatory, or endanger one’s license (oops, what license? I digress).

Perhaps the survey/engineering industries could set the prece
dent and incorporate a simple numeric addition to all CAD layering standards to include the accuracy band. We are not limited to eight characters anymore.

Location is now a less expensive commodity. Within a few years there will be real-time differential GPS networks in major metropolitan areas of the U.S. (as there are for entire countries elsewhere). These can provide real-time corrections to brief GPS observations to yield centimeter horizontal accuracy, and about twice that vertically…via cellular to a GPS rover. Go to the Trimble and Leica websites to see examples.

Increased photogrammetric and remote sensing solutions, ground based laser scans; the options have grown tremendously even in the last year. One can now collect improved location information concurrently with almost any field operation (kind of scary actually…but I digress).

Attribute Improvements
GIS is viewed as data rich, CAD is viewed as data poor. That may have been the case in the 80s when CAD was mainly used as a drafting tool, but with the integration of Computer Aided Engineering (CAE) many folks can’t imagine working in an environment without these process specific tools. These provide a wealth of data that can be mined without additional user tasks.

Project Data, process event benchmarks, engineering specifics (pipeworks, alignments, terrain models, grade, slope analysis, etc.) are generated along the way, while working with those neat-o civil packages. One could proactively pass along data from these and append that old-school list of standard GIS themes. Much of this data is stored in ODBC compliant databases, ripe for mining.

Many engineering projects develop their cost estimates and construction bid item lists from an in-house or public domain database. These provide a rich list of data about individual project elements: material type, manufacturer, specific dimensions, along with many other details that may not traditionally have been included in legacy GIS themes.

By simply linking a single code to each project element during drafting, this could add all of these attributes to the export (and speed up cost estimates).

Example: The designer creates a closed polygon to show the extents of a region that will receive some geotextile fabric. By adding the appropriate standard Bid Item Code from the company database, he gets a total cost by area value from the closed polygon and has preloaded a lot of attribute data.

Proactivity—The Stewards Revolt
With the advances in user-friendliness of the new-look GIS and its new open source/format model, would it not be optimal for an expert in a particular field to be the creator and data steward of a theme?

GIS professionals in general have operated responsibly and in the interests of their clients to produce a reasonably homogenous product over the last few decades; this despite having no formal certification, licensing, just about no rules, no regulations, nor statutes. This is a laudable achievement, but it does behoove the industry to respect that many of the occasional "nattering naybobs of negativism" about GIS, speak from the perspective of being in professions that are strictly regulated and often put their licenses on the line with their products.

Shadow Themes
The business processes may start to provide a constant flow of updated data, but perhaps, as stated in previous sections, it is cost prohibitive to quickly yet surgically update the existing themes.

One alternative is to create a set of Shadow Themes. When the update files start piling up in a "to-be-processed-later" silo, they could be periodically merged together as companion themes to their real-GIS counterpart. Kind of a "Spatial Conscience" for the existing GIS.

Format Irrelevancy–No Data Gets Left Behind
Again, it should not matter what platform or business process some data came from, there are so many options now to grab just about anything digital and turn it into a useful theme.

I am not advocating tossing any and all into the mix, but it is disheartening to see decades of what could have been wonderful resources lost along the way due to format and platform restrictions.

Footnotes: No More Sour Notes and Scapegoats
If y’all are kind in your feedback to the editor, perhaps I will be given an opportunity to outline some specific technical solutions in future issues of this publication, now that I got the preliminary ranting and raving out of the way.

Gavin Schrock is a surveyor and GIS Analyst for Seattle Public Utilities, where he focuses on using digital data to improve the cost ratios for engineering projects. He has worked in surveying, mapping, and GIS for 23 years in the civil, utility, and mapping disciplines. He has published in these fields and has taught surveying, GIS, and data management at local, state, national, and international conferences.

A 578Kb PDF of this article as it appeared in the magazine—complete with images—is available by clicking HERE