Today we officially sent comment to the FCC on “Data Portability and its relationship to broadband“. The team laboured hard over the weekend as we only found out about this late last week, but we managed to get something together that I hope will be of value to the FCC. (You can check the filing status here.) Below is a copy of the PDF we submitted.
TITLE: Comments – NBP Public Notice #21
Docket: GN Docket Nos. 09-47, 09-51, and 09-137
This has been submitted on behalf of the DataPortability Project: www.dataportability.org
- Elias Bizannes, Acting Chair of the Board of Directors, DataPortability Project
- Alisa Leonard, Head of Communications, DataPortability Project
Additional content contributions from the following people:
- Steve Repetti, Board Member (Secretary), DataPortability Project
- Brady Brim-DeForest, Board Member (Treasurer), DataPortability Project
- Anthony Broad-Crawford, Board Member, DataPortability Project
- Phil Wolff, Board Member, DataPortability Project
1. Government data transparency. Data transparency refers to making data public and easily accessible over the Internet. There are many pieces of legislation requiring the publication of Federal government information. This legislation typically requires the publication of data on an agency’s website. One recent initiative seeks to establish a central repository of government data. We seek comment on the potential benefits and pitfalls of increased data transparency.
a. What efficiencies can be gained through easing accessibility to public government information?
- Reduced administrative hurdles. Having data readily available will reduce the perceived effort to leverage that data, and allow innovators to react more immediately and quickly
- Decreased administration. By encouraging a more direct relationship between the data source and the end user, it reduces government resource to administer the data.
- Faster turnaround. By making the relationship between a developer and the data more direct, it means things that need to be changed can occur much faster. Rather than relying on a third party (in the form of an agency official), the developer can work directly with the data to enact changes
- Increased accuracy. The direct relationship with data sources means dependent applications of the data will react in real time. For example, if emergency data is made available that has some inaccuracy, the update can be propagated across constituents that leverage that data quicker.
- Reduced redundancy and increased normalization of data. Multiple agencies may have their own copies of data that often fail to consistently reflect changes and newer information as it becomes available. The principal concepts of data portability can be used to minimize and mitigate the issue by providing a common format and exchange mechanism for the integration, dissemination, and normalization of data, often in real time, such that the cumulative information resources are accurate and timely.
- Increased utility of data. The more data exposed for public consumption the more insights and analysis that can be drawn from it. The ability to easily ingest and manipulate data from government sources increases the inherent value of the information that it contains.
- Increased assimilation and extension of data. The more accessible the data is to third parties the easier it is to extend and remix with proprietary data. This allows third parties to improve their offerings as well as increase the potential for the insights and data to return to the public sector.
b. Are there examples of innovative products or services provided by the private sector that rely upon the use of easily accessible government information?
- Phone applications that can inform people of public transport information. In San Francisco and in many other cities, buses can be tracked along a map in real time, with estimated times of arrival on Google Maps for the iPhone. The scheduling information as well as the GPS of the buses allows for better planning and decision making by residents.
- The New York Times last year announced a set of API’s (their first one being campaing finance data: http://open.blogs.nytimes.com/2008/10/14/announcing-the-new-york-times-campaign-finance-api/),that allow people to access data about a variety of issues. Developers can then query this API, and generate unique information. The increased availability of open data reduces the reliance on the mass media who have traditionally held the position of public “watch dog” that keeps governments and elected officials accountable. Now, web applications can leverage public data which allows for the same the public usefulness, allowing for more transparency and engagement.
- Mashup Platforms. An entire support infrastructure has emerged that facilitates the combination of multiple data sources in innovative ways to produce value beyond any single data source. Aggregator sites, such as programmableweb.com (and even “app stores” and “object repositories”), provide access to resources that can be combined in numerous useful ways. Beyond that, independent advocacy groups, such as the OpenAjax Alliance, provides specifications, protocols, and core software components whose sole purpose is to provide application and data integration in quantifiable and secure environments. In this fashion, the diversity and volume of government data becomes a valuable resource for the creation of useful mashups and meta-applications. It also empowers individuals, companies, educational and governmental organizations to utilize the information in advanced, timely, and innovative ways.
- Non-profit information. The IRS makes available an Exempt Organizations IRS Master File Data service (http://www.irs.gov/taxstats/charitablestats/) available to the public. This data set, available in simple ASCII and proprietary Excel formats, powers a number of private sector database services, such as GuideStar and Charity Navigator, that track the activities and status of non-profit corporations.
- The very successful Evertblock: http://everyblock.com/ (previously chicagocrime.org) tracks events that occur in people’s neighborhoods. To quote the service: “In many cases, this information is already on the Web but is buried in hard-to-find government databases.”
- Health and Life Science information. The National Library of Medicine makes available several data sets in multiple formats such as CSV, XML, and JSON for consuming applications to include, extend, and enhance. This includes but is not limited to national Clinical Trail information, publication databases, semantic ontology’s, and genomic information.
c. Federal government data are available in many formats. In what formats should this data be made available over the Internet? How should open data standards inform policy for data transparency?
- Standards are constantly evolving and the government should be aware that supporting one particular technological solution is a mistake. In the two years the DataPortability Project has been formally monitoring and advocating Open Standards (and popularised the phrase ‘data portability’ in order to simplify market perception about existing solutions) we have witnessed dozens of changes in this landscape. Fortunately for the purposes of government data, there are relatively simple solutions such as XML and now increasingly JSON. We highly encourage the government support structured data formats such as the technically superior RDF, as well as the more popular microformats.
- Government data just as effectively could be made available via API’s, which reduce the need for storing the data in a specific format and allow developers to programmatically access the data remotely (or even export the data in a desired format based on the API). However API’s should never be the only solution: if a service goes down, that data becomes inaccessible. It is therefore important that standards for data export are also available.
- Open Standards provide a common format for the interchange and interoperability of information. Market evolution in open formats constantly filters out the extraneous and focuses and enhances best practices. Numerous existing open formats provide efficient distribution of data, such as XML, RSS, and initiatives involving the semantic web – even the upcoming HTML version 5 has embedded functionality for data discovery, distribution, and utilization. More so, the prevalence of APIs (via Ajax, RESTful interfaces, etc.) provide abstraction layers between data providers and data consumers, all of which facilitates the efficient integration and consumption of data.
- It is imperative that federal government data be made available via a variety of open standards and open source formats. Non-proprietary standards allow for the interoperability of information and prevent data from being unnecessarily siloed — increasing efficiency of data consumption and manipulation.
d. How does data transparency relate to application development? Are there potential efficiencies to be gained through an increase in government data transparency?
- Data ultimately is at the core of every application, and the Federal Government is arguably the largest provider and consumer of information. Timely access of this data is inherently useful to government, business, academia, consumers, and even our world partners. Understanding the structure, organization, and accessibility of information radically increases the ability to build robust, and often real-time, applications in efficient, timely, and cost-effective ways. Data Portability makes it easy to access and utilize information without direct knowledge of the underlying mechanisms and methodologies required to create and maintain the information.
- The more data that is made publicly available by the Federal government, the more applications utilizing those data sets will be developed. This not only increases efficiencies across the marketplace, but will also result in unique and potentially very valuable discovery of trends and assumptions based on the combination of multiple data sets that were previously segregated.
e. To what extent would increased data transparency affect intra-agency processes, intergovernmental coordination, and civic participation?
- Increased data transparency has the ability to empower both the private and public sectors to more accurately engage elements of the population in civic participation.
- Two issues that constantly affect process, coordination, and participation in data transparency and data portability are data discovery and data normalization. Discovery addresses the idea that there can be no sharing of data if the interested parties are unaware of data availability or its underlying structure and access methodology. Normalization is a larger issue and it relates to data replication and maintenance. For example, simple contact information for an entity could exist in numerous locations, making it easy to access and utilize the information. However, the very fact that it is replicated in multiple locations increases the likelihood of incorrect information being stored. The more replication sites, the more difficult it is to make sure the data always stays in sync whenever a change is recorded.
- Data transparency would enable greater intra-agency collaboration and the dissemination of insights and information across multiple, and sometimes seemingly unrelated agency constituents. This reduces latency in knowledge gathering and increases the collective usefulness of agencies, enhances their perceived value to the public, and invites greater civic participation.
f. To what extent do existing regulations inhibit or promote government data transparency?
- The scope of data created, consumed, and distributed by the US Government is broad in nature and ranges from highly secure to freely accessible. Many different regulations govern the access and interaction of such data, and, while broad-scale regulations such as FIPS 199 and those of NIST and the OMB apply globally, often individual agencies provide their own unique requirements and regulations. The combination of all of this provides a layer of complexity that injects confusion into current data transparency policies, clouding the ability to actually use valuable data. Data transparency within government would greatly benefit from clarity in defining the requirements for data use.
g. What impact do developments in data transparency have with respect to broadband
deployment, adoption, and use?
- Increased data transparency, portability, and availability impacts broadband from both the demand and utilization perspectives. On the one hand, it provides a compelling reason for the availability and use of broadband. Rich data exists, including: text, images, imaging, video, audio, animation, modeling, live content, teleconferencing, remote access, and more, that becomes accessible through data transparency policy. Simultaneously, it is ubiquitous access to these very rich data types that quickly fill the existing data pipeline.
- Data transparency and accessibility will significantly benefit from an aggressive broadband policy. Currently, only 50.8% of US households are served by broadband, and in terms of Internet speed the US lags significantly behind such countries as Latvia, Romania, and South Korea (see Scientific American, Nov 09, pgs 76,77,79). Even the definition of broadband is somewhat nebulous. The current US definition of broadband is a download capability of at least 0.77 Mb per second. This pales when compared to the average advertised broadband download speed of 92 Mb per second for Japan. The recent allocation of $7.2 MM to the infrastructure issue is set in the right direction, however, like data transparency, more needs to be done to maintain global competitiveness.
h. What are the potential benefits to making data more accessible?
- Innovation. Data are objects that lack meaning, whereas information are simply relationships between data objects. By contextualizing data together, it generates new value (ie, “1248″ is data as is the English word “year” – together however, they give meaning to each other). Similarly, knowledge is derived through the application of information – and the more information that can be applied, the more knowledge it generates. It’s logical to assume then, that the more data is accessible, the more opportunity for value in the form of information can be generated.
- Responsiveness. The world exists in real-time. Huge quantities of data are captured every second on a global basis, and complex decisions increasingly rely on such information. This is above and beyond the vast existing stores of information currently residing within government databases. Sound data transparency policies and methodologies radically enhance the ability and timeliness in interacting with this data.
- Discovery and Openness. The government maintains a huge store of information that is readily available intra-agency as well as to business and individuals. However, you cannot use information that you do not know about – or do not know how to interact with (i.e. data structure and access methodology). A concerted effort to make data more accessible benefits all and provides access to useful resources that may otherwise be lost or go unnoticed.
i. What potential pitfalls exist when increasing data transparency?
- Increased initial cost to transform systems and serving costs to allow other entities to use data either through data downloads or API access.
- Ongoing cost to support existing and new data and services in formats that are acceptable with current, emerging, and deprecated industry standards.
- Not adopting current spectrum of standards and resolving to only support a limited sub-set. For example, electing to only support a RESTful JSON API for access of data could prohibit consumption from both private and public sectors.
j. What privacy and confidentiality concerns might arise due to an increase in data
transparency and what, if any, privacy safeguards are needed to protect against the
misuse of personal information?
k. What types of personal information should be protected from disclosure?
- Public data that is identifiable to a specific person. Key to the vision of data portability, is that it is privacy-respecting interoperability. If the data does not make a claim about a specific person, then such data should remain transparent and public. (Although care should be taken when combining data as gender, zip code and birthday which is unique for 87% of the US population: http://www.eff.org/deeplinks/2009/09/what-information-personally-identifiable
- Protection of personal information that can unintentionally disclose a user’s identity is paramount. Even if social security numbers are not unique, they should always be protected as simply narrowing down a subset due to location, does end up being unique.
Cloud computing. When considering the portability of data, we also consider the processes through which data are moved. In this context, we seek comment on how to identify and understand cloud computing as a model for technology provisioning.
a. The National Institute of Standards and Technology defines cloud computing as “a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.” Does this definition accurately capture the concept of cloud computing?
- That is an appropriate definition. Although like democracy, it can mean many things to different people. The key point of cloud computing is “ubiquity”. It is the ubiquity of three key trends: connectivity, computing and data. It means data can be accessed from anywhere through any device, with computing resources at will.
b. What types of cloud computing exist (e.g., public, hybrid, and internal) and what are the legal and regulatory implications of their use?
- The Cloud as a trend has slowly evolved in the technology industry and it is only recent that the private cloud has been discussed as a parallel (or sub) trend.
- From the hardware point of view, the key issue is the environment and energy use. Data center’s require a huge amount of energy, and may be the developed world’s next largest driver of carbon emissions.
- From a data point of view, the key issue is privacy. Possession is considered nine-tenths of the law, and so there is a real risk for individuals and enterprises that do not have control of their data in the physical sense. Entities should not feel held hostage just because they choose to store their data remotely.
- Cloud computing provides for an immense amount of resources to be brought to bear on a specific problem set with a minimal capital investment on behalf of the problem solver. This increased convenience carries with it the risk of data lock-in. Portability of data specifically in cloud computing environments is critical.
c. Can present broadband network configurations handle a large-scale shift in bandwidth usage that a rapid adoption of cloud computing might cause?
- The impact the iPhone has had on 3G networks is a clear example that there is still a lot of investment to be made, even in dense residential areas which are thought to be the best wired. The reality is cloud computing is a long-term investment, and it has coped well enough since the explosion of online media consumption (primarily video) which has been a heavy demand on networks. The issue with cloud computing is less about the technology and more about culture. An entire paradigm shift has occurred in computing, and it is taking the industry, let alone the consumer market, some time to adapt to this new world. So whilst networks configurations still need more investment, we believe that improvements can be made over time as the larger cultural adoption of cloud computing evolves.
d. How does cloud computing affect the reliability, scalability, security, and sustainability of information and data?
- Cloud computing exposes data to a specific set of risks— but these risks can me mitigated with proper resource provisioning and establishment of adequate security and interoperability standards.
e. To what extent can the federal government leverage cloud solutions to improve intra- agency processes, intergovernmental coordination, and civic participation?
- Cloud computing allows for a single fact, single place and single service environments. These cloud environments accelerate speed to market within organizations as well as across government organizations. Additionally, exposing these clouds externally will allow these same benefits to organizations within both the public and private sectors.
f. What impact do developments in cloud computing have with respect to broadband deployment, adoption, and use?
- No comment provided
g. How can various parties leverage cloud computing to obtain economic or social efficiencies? Is it possible to quantify the efficiencies gained?
- No comment provided
h. To what extent are consumers protected by industry self-regulation (e.g., the Cloud Computing Manifesto), and to what extent might additional protections be needed?
- Traditionally, technology companies have believe that hoarding consumer data was a competitive advantage. We believe this is not correct nor appropriate, and while our advocacy efforts have helped shift the markets perception, we still believe there is considerable risk. In particular, the there is opportunity for a monopolistic environment that makes it difficult for new market entrants to join in once the market has matured.
- While markets naturally self-regulate, the broadband environment has several critical weaknesses that could easily be exploited by the companies that control consumer access to the internet and that have the ability to impose network management policies on their network infrastructures that could adversely affect the free-flow of information. The protection of the neutrality of the ‘mobile internet’ is of specific importance.
i. What specific privacy concerns are there with user data and cloud computing?
- Who has access to the data is the key, both from a consumer point of view on what they can resuse elsewhere but also on what permissions exist over that data and who else can access it. We believe there needs to be a stronger model that allows consumers to dictate not only the access they have over their data, but over who else has access to it.
j. What precautions should government agencies take to prevent disclosure of personal
information when providing data?
- To be open-minded with what technologies are used and not get carried away with buzzwords. OpenID is a great identityi solution and are encouraged by the governments adoption; however, we also believe the support for OpenID should come at the expense of other more mature identity solutions such as Information Cards and SAML.
- Government agencies should put measures in place that give consumers access to what data they have. By being aware of what data a government agency stores for a person, it creates more transparency and decisions can be made on how that data is used and what exactly is further stored.
- Government agencies should take the approach of both a centralized and decentralized view on data. It should try to consolidate the personal information records it requires of people independent of any one agency, and apply a fine-grain permissions model that allows a person to dictate how other agencies interact with their data store. Further, government agencies should try to store as little data as possible, and encourage remote access of data.
k. Is the use of cloud computing a net positive to the environment? Are there specific
studies that quantify the environmental impact of cloud computing?
- We have come across some studies but believe more need to occur. We believe, however, that with a fully functioning emissions trading scheme, like the one being proposed in Australia, will offset the risk of increased emissions as the carbon will be factored into the cost structure of data centers
3. Identity Management and Government Service Delivery. Data held by the government may be personally sensitive or confidential. In this context, we seek comment on identity management as it relates to the provision of services where individuals either provide data to the government or access data that are personally sensitive or confidential.
a. What is the current state of identity management in the federal, state, local and Tribal government?
- No comment provided
b. What is the spectrum of online identity credentialing required for access to online services from the government and non-governmental entities?
- No comment provided
c. What identity management technologies currently exist and what are their applications?
- There are an entire slate of technologies, but three dominate in our view and have differing strength’s and weaknesses. OpenID is by far the most popular, and it’s a light-weight solution that is good for low-level identity. On the opposite side of the spectrum is SAML which is an enterprise grade solution that is highly complex and secure. Information Cards have really emerged as an interesting solution as they bridge the desktop with the web.
- OpenID provides a compelling solution for identity management online. It is a registration and single sign-on protocol that lets users register and login to OpenID-enabled websites using their own choice of OpenID identifier. One key advantage of OpenID is that it requires no client-side software—it works with any standard Internet browser.
d. How have HSPD-12 implementation efforts affected the efficiency of the federal
- No comment provided
e. What identity management technologies are available in the private sector? What are
- No comment provided
f. What impact do developments in identity management, such as Open ID, have with
respect to broadband deployment, adoption, and use?
- We do not believe identity management has an impact on broadband deployment. Where it does have an impact is in integrating people into this important infrastructure of our society. Identity management is a complicated issue, where no one solution or vendor should dominate.
- Identity management should remain separate and distinct from network management.
g. What are the potential benefits of a coordinated nationwide identity management
- Little. Identity is a personal thing, and trying to centralize it too much may cause more harm than good. Instead, where the focus should be for coordination is in encouraging interoperability. Various identity solutions, like what the Internet Society is currently funding, work to make OpenID more compatible with SAML. By encouraging interoperability, the government does not favor one approach but instead sets guidelines for a constantly evolving space. Setting these guidelines also gives more control to people to choose their own solution, and the flexibility to move to other solutions if they so choose.
- The benefits would be out-weighed by the risks. A coordinated nationwide identity management schema would make for one point of failure (the same way that the Social Security Number system has been exploited to engage in fraud) and has the potential to create far-reaching negative implications for privacy and freedom of speech.
h. What are the potential pitfalls of a coordinated nationwide identity management strategy?
- Technological obsolescence is the biggest issue, as nothing stays fixed and this is a rapidly changing marketplace. There is a considerable risk on infringing on the privacy of individuals, so it is key that a strategy avoids a centralized solution and favors one that mimics the core architecture of the Internet and follows it’s decentralized model.
i. What specific privacy concerns are there with identity management strategies?
- Not allowing people to control their own identity management means they cannot control how the rest of the world perceives them. Identity should be decentralized; not owned buy anyone; and recognized as an innately personal thing. Just like how some people on the social network Facebook group their friends into buckets like “work” and “close friends” – primarily due to their non-work persona ruining their controlled work persona – we should also recognize other people don’t care and don’t bother. Identity and in particularly privacy, mean different things to different people. So to have an identity-management solution is to ensure is a user-driven one, and not one dictated from above.
j. What types of personal information should be protected from disclosure?
- Let people decide that for themselves. And if in doubt, protect it. There is no answer that can be reflective of all, and for what some regard as abuse to have disclosed (like the previous criminal history of someone trying to lead a new life), others may believe it is crucial to be publicly available (like the community of people around that person who may deem them a threat). Delegate the decision to individuals to manage.