Providing comments, context and analysis about data portability - a service of the DataPortability Project

Our comment to the FCC on “Data Portability and its relationship to broadband”

Posted: December 9th, 2009 | Author: Elias Bizannes | Filed under: Official comment | 0 Comments

Today we officially sent comment to the FCC on “Data Portability and its relationship to broadband“. The team laboured hard over the weekend as we only found out about this late last week, but we managed to get something together that I hope will be of value to the FCC.  (You can check the filing status here.)  Below is a copy of the PDF we submitted.

———————–

TITLE: Comments – NBP Public Notice #21

Docket: GN Docket Nos. 09-47, 09-51, and 09-137

This has been submitted on behalf of the DataPortability Project: www.dataportability.org

Submitted by:

  • Elias Bizannes, Acting Chair of the Board of Directors, DataPortability Project
  • Alisa Leonard, Head of Communications, DataPortability Project

Additional content contributions from the following people:

  • Steve Repetti, Board Member (Secretary), DataPortability Project
  • Brady Brim-DeForest, Board Member (Treasurer), DataPortability Project
  • Anthony Broad-Crawford, Board Member, DataPortability Project
  • Phil Wolff, Board Member, DataPortability Project

1.    Government data transparency. Data transparency refers to making data public and easily accessible over the Internet. There are many pieces of legislation requiring the publication of Federal government information. This legislation typically requires the publication of data on an agency’s website. One recent initiative seeks to establish a central repository of government data. We seek comment on the potential benefits and pitfalls of increased data transparency.

a.    What efficiencies can be gained through easing accessibility to public government information?

  • Reduced administrative hurdles. Having data readily available will reduce the perceived effort to leverage that data, and allow innovators to react more immediately and quickly
  • Decreased administration. By encouraging a more direct relationship between the data source and the end user, it reduces government resource to administer the data.
  • Faster turnaround. By making the relationship between a developer and the data more direct, it means things that need to be changed can occur much faster. Rather than relying on a third party (in the form of an agency official), the developer can work directly with the data to enact changes
  • Increased accuracy. The direct relationship with data sources means dependent applications of the data will react in real time. For example, if emergency data is made available that has some inaccuracy, the update can be propagated across constituents that leverage that data quicker.
  • Reduced redundancy and increased normalization of data. Multiple agencies may have their own copies of data that often fail to consistently reflect changes and newer information as it becomes available.  The principal concepts of data portability can be used to minimize and mitigate the issue by providing a common format and exchange mechanism for the integration, dissemination, and normalization of data, often in real time, such that the cumulative information resources are accurate and timely.
  • Increased utility of data. The more data exposed for public consumption the more insights and analysis that can be drawn from it. The ability to easily ingest and manipulate data from government sources increases the inherent value of the information that it contains.
  • Increased assimilation and extension of data.  The more accessible the data is to third parties the easier it is to extend and remix with proprietary data.  This allows third parties to improve their offerings as well as increase the potential for the insights and data to return to the public sector.

b.    Are there examples of innovative products or services provided by the private sector that rely upon the use of easily accessible government information?

  • Phone applications that can inform people of public transport information. In San Francisco and in many other cities, buses can be tracked along a map in real time, with estimated times of arrival on Google Maps for the iPhone. The scheduling information as well as the GPS of the buses allows for better planning and decision making by residents.
  • The New York Times last year announced a set of API’s (their first one being campaing finance data: http://open.blogs.nytimes.com/2008/10/14/announcing-the-new-york-times-campaign-finance-api/),that allow people to access data about a variety of issues. Developers can then query this API, and generate unique information. The increased availability of open data reduces the reliance on the mass media who have traditionally held the position of public “watch dog” that keeps governments and elected officials accountable. Now, web applications can leverage public data which allows for the same the public usefulness, allowing for more transparency and engagement.
  • Mashup Platforms. An entire support infrastructure has emerged that facilitates the combination of multiple data sources in innovative ways to produce value beyond any single data source. Aggregator sites, such as programmableweb.com (and even “app stores” and “object repositories”), provide access to resources that can be combined in numerous useful ways.  Beyond that, independent advocacy groups, such as the OpenAjax Alliance, provides specifications, protocols, and core software components whose sole purpose is to provide application and data integration in quantifiable and secure environments.  In this fashion, the diversity and volume of government data becomes a valuable resource for the creation of useful mashups and meta-applications. It also empowers individuals, companies, educational and governmental organizations to utilize the information in advanced, timely, and innovative ways.
  • Non-profit information. The IRS makes available an Exempt Organizations IRS Master File Data service (http://www.irs.gov/taxstats/charitablestats/) available to the public. This data set, available in simple ASCII and proprietary Excel formats, powers a number of private sector database services, such as GuideStar and Charity Navigator, that track the activities and status of non-profit corporations.
  • The very successful Evertblock: http://everyblock.com/ (previously chicagocrime.org) tracks events that occur in people’s neighborhoods. To quote the service: “In many cases, this information is already on the Web but is buried in hard-to-find government databases.”
  • Health and Life Science information.  The National Library of Medicine makes available several data sets in multiple formats such as CSV, XML, and JSON for consuming applications to include, extend, and enhance.  This includes but is not limited to national Clinical Trail information, publication databases, semantic ontology’s, and genomic information.

c. Federal government data are available in many formats. In what formats should this data be made available over the Internet? How should open data standards inform policy for data transparency?

  • Standards are constantly evolving and the government should be aware that supporting one particular technological solution is a mistake. In the two years the DataPortability Project has been formally monitoring and advocating Open Standards (and popularised the phrase ‘data portability’ in order to simplify market perception about existing solutions) we have witnessed dozens of changes in this landscape. Fortunately for the purposes of government data, there are relatively simple solutions such as XML and now increasingly JSON. We highly encourage the government support structured data formats such as the technically superior RDF, as well as the more popular microformats.
  • Government data just as effectively could be made available via API’s, which reduce the need for storing the data in a specific format and allow developers to programmatically access the data remotely (or even export the data in a desired format based on the API). However API’s should never be the only solution: if a service goes down, that data becomes inaccessible. It is therefore important that standards for data export are also available.
  • Open Standards provide a common format for the interchange and interoperability of information. Market evolution in open formats constantly filters out the extraneous and focuses and enhances best practices.  Numerous existing open formats provide efficient distribution of data, such as XML, RSS, and initiatives involving the semantic web – even the upcoming HTML version 5 has embedded functionality for data discovery, distribution, and utilization. More so, the prevalence of APIs (via Ajax, RESTful interfaces, etc.) provide abstraction layers between data providers and data consumers, all of which facilitates the efficient integration and consumption of data.
  • It is imperative that federal government data be made available via a variety of open standards and open source formats. Non-proprietary standards allow for the interoperability of information and prevent data from being unnecessarily siloed — increasing efficiency of data consumption and manipulation.

d.    How does data transparency relate to application development? Are there potential efficiencies to be gained through an increase in government data transparency?

  • Data ultimately is at the core of every application, and the Federal Government is arguably the largest provider and consumer of information. Timely access of this data is inherently useful to government, business, academia, consumers, and even our world partners. Understanding the structure, organization, and accessibility of information radically increases the ability to build robust, and often real-time, applications in efficient, timely, and cost-effective ways. Data Portability makes it easy to access and utilize information without direct knowledge of the underlying mechanisms and methodologies required to create and maintain the information.
  • The more data that is made publicly available by the Federal government, the more applications utilizing those data sets will be developed. This not only increases efficiencies across the marketplace, but will also result in unique and potentially very valuable discovery of trends and assumptions based on the combination of multiple data sets that were previously segregated.

e.    To what extent would increased data transparency affect intra-agency processes, intergovernmental coordination, and civic participation?

  • Increased data transparency has the ability to empower both the private and public sectors to more accurately engage elements of the population in civic participation.
  • Two issues that constantly affect process, coordination, and participation in data transparency and data portability are data discovery and data normalization. Discovery addresses the idea that there can be no sharing of data if the interested parties are unaware of data availability or its underlying structure and access methodology. Normalization is a larger issue and it relates to data replication and maintenance. For example, simple contact information for an entity could exist in numerous locations, making it easy to access and utilize the information. However, the very fact that it is replicated in multiple locations increases the likelihood of incorrect information being stored. The more replication sites, the more difficult it is to make sure the data always stays in sync whenever a change is recorded.
  • Data transparency would enable greater intra-agency collaboration and the dissemination of insights and information across multiple, and sometimes seemingly unrelated agency constituents. This reduces latency in knowledge gathering and increases the collective usefulness of agencies, enhances their perceived value to the public, and invites greater civic participation.

f.    To what extent do existing regulations inhibit or promote government data transparency?

  • The scope of data created, consumed, and distributed by the US Government is broad in nature and ranges from highly secure to freely accessible. Many different regulations govern the access and interaction of such data, and, while broad-scale regulations such as FIPS 199 and those of NIST and the OMB apply globally, often individual agencies provide their own unique requirements and regulations. The combination of all of this provides a layer of complexity that injects confusion into current data transparency policies, clouding the ability to actually use valuable data. Data transparency within government would greatly benefit from clarity in defining the requirements for data use.

g.    What impact do developments in data transparency have with respect to broadband

deployment, adoption, and use?

  • Increased data transparency, portability, and availability impacts broadband from both the demand and utilization perspectives. On the one hand, it provides a compelling reason for the availability and use of broadband. Rich data exists, including: text, images, imaging, video, audio, animation, modeling, live content, teleconferencing, remote access, and more, that becomes accessible through data transparency policy. Simultaneously, it is ubiquitous access to these very rich data types that quickly fill the existing data pipeline.
  • Data transparency and accessibility will significantly benefit from an aggressive broadband policy. Currently, only 50.8% of US households are served by broadband, and in terms of Internet speed the US lags significantly behind such countries as Latvia, Romania, and South Korea (see Scientific American, Nov 09, pgs 76,77,79). Even the definition of broadband is somewhat nebulous. The current US definition of broadband is a download capability of at least 0.77 Mb per second. This pales when compared to the average advertised broadband download speed of 92 Mb per second for Japan.  The recent allocation of $7.2 MM to the infrastructure issue is set in the right direction, however, like data transparency, more needs to be done to maintain global competitiveness.

h.    What are the potential benefits to making data more accessible?

  • Innovation. Data are objects that lack meaning, whereas information are simply relationships between data objects. By contextualizing data together, it generates new value (ie, “1248″ is data as is the English word “year” – together however, they give meaning to each other). Similarly, knowledge is derived through the application of information – and the more information that can be applied, the more knowledge it generates. It’s logical to assume then, that the more data is accessible, the more opportunity for value in the form of information can be generated.
  • Responsiveness.  The world exists in real-time. Huge quantities of data are captured every second on a global basis, and complex decisions increasingly rely on such information. This is above and beyond the vast existing stores of information currently residing within government databases. Sound data transparency policies and methodologies radically enhance the ability and timeliness in interacting with this data.
  • Discovery and Openness. The government maintains a huge store of information that is readily available intra-agency as well as to business and individuals. However, you cannot use information that you do not know about – or do not know how to interact with (i.e. data structure and access methodology).  A concerted effort to make data more accessible benefits all and provides access to useful resources that may otherwise be lost or go unnoticed.

i.    What potential pitfalls exist when increasing data transparency?

  • Increased initial cost to transform systems and serving costs to allow other entities to use data either through data downloads or API access.
  • Ongoing cost to support existing and new data and services in formats that are acceptable with current, emerging, and deprecated industry standards.
  • Not adopting current spectrum of standards and resolving to only support a limited sub-set.  For example, electing to only support a RESTful JSON API for access of data could prohibit consumption from both private and public sectors.

j.    What privacy and confidentiality concerns might arise due to an increase in data

transparency and what, if any, privacy safeguards are needed to protect against the

misuse of personal information?

k.    What types of personal information should be protected from disclosure?

  • Public data that is identifiable to a specific person. Key to the vision of data portability, is that it is privacy-respecting interoperability. If the data does not make a claim about a specific person, then such data should remain transparent and public. (Although care should be taken when combining data as gender, zip code and birthday which is unique for 87% of the US population: http://www.eff.org/deeplinks/2009/09/what-information-personally-identifiable
  • Protection of personal information that can unintentionally disclose a user’s identity is paramount. Even if social security numbers are not unique, they should always be protected as simply narrowing down a subset due to location, does end up being unique.

Cloud computing. When considering the portability of data, we also consider the processes through which data are moved. In this context, we seek comment on how to identify and understand cloud computing as a model for technology provisioning.

a.    The National Institute of Standards and Technology defines cloud computing as “a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.”    Does this definition accurately capture the concept of cloud computing?

  • That is an appropriate definition. Although like democracy, it can mean many things to different people. The key point of cloud computing is “ubiquity”. It is the ubiquity of three key trends: connectivity, computing and data. It means data can be accessed from anywhere through any device, with computing resources at will.

b.    What types of cloud computing exist (e.g., public, hybrid, and internal) and what are the legal and regulatory implications of their use?

  • The Cloud as a trend has slowly evolved in the technology industry and it is only recent that the private cloud has been discussed as a parallel (or sub) trend.
  • From the hardware point of view, the key issue is the environment and energy use. Data center’s require a huge amount of energy, and may be the developed world’s next largest driver of carbon emissions.
  • From a data point of view, the key issue is privacy. Possession is considered nine-tenths of the law, and so there is a real risk for individuals and enterprises that do not have control of their data in the physical sense. Entities should not feel held hostage just because they choose to store their data remotely.
  • Cloud computing provides for an immense amount of resources to be brought to bear on a specific problem set with a minimal capital investment on behalf of the problem solver. This increased convenience carries with it the risk of data lock-in. Portability of data specifically in cloud computing environments is critical.

c.    Can present broadband network configurations handle a large-scale shift in bandwidth usage that a rapid adoption of cloud computing might cause?

  • The impact the iPhone has had on 3G networks is a clear example that there is still a lot of investment to be made, even in dense residential areas which are thought to be the best wired. The reality is cloud computing is a long-term investment, and it has coped well enough since the explosion of online media consumption (primarily video) which has been a heavy demand on networks. The issue with cloud computing is less about the technology and more about culture. An entire paradigm shift has occurred in computing, and it is taking the industry, let alone the consumer market, some time to adapt to this new world. So whilst networks configurations still need more investment, we believe that improvements can be made over time as the larger cultural adoption of cloud computing evolves.

d.    How does cloud computing affect the reliability, scalability, security, and sustainability of information and data?

  • Cloud computing exposes data to a specific set of risks— but these risks can me mitigated with proper resource provisioning and establishment of adequate security and interoperability standards.

e.    To what extent can the federal government leverage cloud solutions to improve intra- agency processes, intergovernmental coordination, and civic participation?

  • Cloud computing allows for a single fact, single place and single service environments.  These cloud environments accelerate speed to market within organizations as well as across government organizations.  Additionally, exposing these clouds externally will allow these same benefits to organizations within both the public and private sectors.

f.    What impact do developments in cloud computing have with respect to broadband deployment, adoption, and use?

  • No comment provided

g.    How can various parties leverage cloud computing to obtain economic or social efficiencies? Is it possible to quantify the efficiencies gained?

  • No comment provided

h.    To what extent are consumers protected by industry self-regulation (e.g., the Cloud Computing Manifesto), and to what extent might additional protections be needed?

  • Traditionally, technology companies have believe that hoarding consumer data was a competitive advantage. We believe this is not correct nor appropriate, and while our advocacy efforts have helped shift the markets perception, we still believe there is considerable risk. In particular, the there is opportunity for a monopolistic environment that makes it difficult for new market entrants to join in once the market has matured.
  • While markets naturally self-regulate, the broadband environment has several critical weaknesses that could easily be exploited by the companies that control consumer access to the internet and that have the ability to impose network management policies on their network infrastructures that could adversely affect the free-flow of information. The protection of the neutrality of the ‘mobile internet’ is of specific importance.

i.    What specific privacy concerns are there with user data and cloud computing?

  • Who has access to the data is the key, both from a consumer point of view on what they can resuse elsewhere but also on what permissions exist over that data and who else can access it. We believe there needs to be a stronger model that allows consumers to dictate not only the access they have over their data, but over who else has access to it.

j.    What precautions should government agencies take to prevent disclosure of personal

information when providing data?

  • To be open-minded with what technologies are used and not get carried away with buzzwords. OpenID is a great identityi solution and are encouraged by the governments adoption; however, we also believe the support for OpenID should come at the expense of other more mature identity solutions such as Information Cards and SAML.
  • Government agencies should put measures in place that give consumers access to what data they have. By being aware of what data a government agency stores for a person, it creates more transparency and decisions can be made on how that data is used and what exactly is further stored.
  • Government agencies should take the approach of both a centralized and decentralized view on data. It should try to consolidate the personal information records it requires of people independent of any one agency, and apply a fine-grain permissions model that allows a person to dictate how other agencies interact with their data store. Further, government agencies should try to store as little data as possible, and encourage remote access of data.

k.    Is the use of cloud computing a net positive to the environment? Are there specific

studies that quantify the environmental impact of cloud computing?

  • We have come across some studies but believe more need to occur. We believe, however, that with a fully functioning emissions trading scheme, like the one being proposed in Australia, will offset the risk of increased emissions as the carbon will be factored into the cost structure of data centers

3.   Identity Management and Government Service Delivery. Data held by the government may be personally sensitive or confidential. In this context, we seek comment on identity management as it relates to the provision of services where individuals either provide data to the government or access data that are personally sensitive or confidential.

a.    What is the current state of identity management in the federal, state, local and Tribal government?

  • No comment provided

b.    What is the spectrum of online identity credentialing required for access to online services from the government and non-governmental entities?

  • No comment provided

c.    What identity management technologies currently exist and what are their applications?

  • There are an entire slate of technologies, but three dominate in our view and have differing strength’s and weaknesses. OpenID is by far the most popular, and it’s a light-weight solution that is good for low-level identity. On the opposite side of the spectrum is SAML which is an enterprise grade solution that is highly complex and secure. Information Cards have really emerged as an interesting solution as they bridge the desktop with the web.
  • OpenID provides a compelling solution for identity management online.  It is a registration and single sign-on protocol that lets users register and login to OpenID-enabled websites using their own choice of OpenID identifier. One key advantage of OpenID is that it requires no client-side software—it works with any standard Internet browser.

d.    How have HSPD-12 implementation efforts affected the efficiency of the federal

government?

  • No comment provided

e.    What identity management technologies are available in the private sector? What are

their applications?

  • No comment provided

f.    What impact do developments in identity management, such as Open ID, have with

respect to broadband deployment, adoption, and use?

  • We do not believe identity management has an impact on broadband deployment. Where it does have an impact is in integrating people into this important infrastructure of our society. Identity management is a complicated issue, where no one solution or vendor should dominate.
  • Identity management should remain separate and distinct from network management.

g.    What are the potential benefits of a coordinated nationwide identity management

schema?

  • Little. Identity is a personal thing, and trying to centralize it too much may cause more harm than good. Instead, where the focus should be for coordination is in encouraging interoperability. Various identity solutions, like what the Internet Society is currently funding, work to make OpenID more compatible with SAML. By encouraging interoperability, the government does not favor one approach but instead sets guidelines for a constantly evolving space. Setting these guidelines also gives more control to people to choose their own solution, and the flexibility to move to other solutions if they so choose.
  • The benefits would be out-weighed by the risks. A coordinated nationwide identity management schema would make for one point of failure (the same way that the Social Security Number system has been exploited to engage in fraud) and has the potential to create far-reaching negative implications for privacy and freedom of speech.

h.    What are the potential pitfalls of a coordinated nationwide identity management strategy?

  • Technological obsolescence is the biggest issue, as nothing stays fixed and this is a rapidly changing marketplace. There is a considerable risk on infringing on the privacy of individuals, so it is key that a strategy avoids a centralized solution and favors one that mimics the core architecture of the Internet and follows it’s decentralized model.

i.    What specific privacy concerns are there with identity management strategies?

  • Not allowing people to control their own identity management means they cannot control how the rest of the world perceives them. Identity should be decentralized; not owned buy anyone; and recognized as an innately personal thing. Just like how some people on the social network Facebook group their friends into buckets like “work” and “close friends” – primarily due to their non-work persona ruining their controlled work persona – we should also recognize other people don’t care and don’t bother. Identity and in particularly privacy, mean different things to different people. So to have an identity-management solution is to ensure is a user-driven one, and not one dictated from above.

j.    What types of personal information should be protected from disclosure?

  • Let people decide that for themselves. And if in doubt, protect it. There is no answer that can be reflective of all, and for what some regard as abuse to have disclosed (like the previous criminal history of someone trying to lead a new life), others may believe it is crucial to be publicly available (like the community of people around that person who may deem them a threat). Delegate the decision to individuals to manage.

Lobby against the password anti-pattern

Posted: July 16th, 2009 | Author: Elias Bizannes | Filed under: Open Standards | Tags: , , , , , | 0 Comments

Back in January, I wrote how it’s time to criminalise the password anti-pattern. The password anti-pattern is where service A requires you to enter your service B username and password so service A can act for you with your B service. It teaches you how to be phished, and the only way to resolve it is to change your password. It’s also no longer necessary as lots of sites now have OAuth support, including Twitter.

For example, popular service TwitPic requires you to enter your Twitter username and password in order to access the service. This is an example of the anti-pattern that needs to be lobbied against.
Twitpic - Share photos on Twitter

A service that does it right is 140 Mafia, that uses the Twitter implementation of OAuth – it allows you to link the two services together with your permission without having to give over your service B password to service A.
Twitter oauth 140 mafia

Tom Morris now maintains a list of services on Twitter that catalogues services that continue with this anti-pattern. Encourage them to switch to the open standard OAuth or just avoid ‘em. For Data Portability to exist, service providers have a responsibility to be mindful of your privacy – and they should not insist on you handing over your password to other services.


So what has the DataPortability Project been doing?

Posted: March 30th, 2009 | Author: Elias Bizannes | Filed under: Community | Tags: , , , , | 0 Comments

Tomorrow, we will be holding first quarterly plenary meeting – where the community at large can question the DataPortability Project’s leadership. As a member of the plenary (membership is free – contact the Steering Groups Secretary Steve Repetti for more information), you can make binding decisions as well.

So since this is our first meeting, here is a summary of what we have been working on; what we are working on; and what we will be working on. Feel free to  look also at our weekly Steering meeting minutes if you wish to dig into the detail.

Governance and workflow taskforce
- Time frame: May 2008 – July 2008
-Purpose: to reboot the Dataportability Project, and convert it from a mailing list that generated discussion, into a structured organisation focused on deliverables. While the output looks small, it required over 50 man hours to determine it.
- Status: Completed
- Task force homepage: http://wiki.dataportability.org/x/EgsR
- Final output: http://wiki.dataportability.org/x/ooUj

Governance task force
- Time frame: August 2008 – September 2008
- Purpose: to revise the governance framework with subsequent issues identified and rewrite it with a fresh perspective looking at it as a whole
- Status: Completed
- Task force homepage: http://wiki.dataportability.org/x/OIEt
- Final output: http://wiki.dataportability.org/x/OIAt

Vision and mission
- Time frame: August 2008 – October 2008
- Purpose: to provide a policy DNA for the organisation that would allow subsequent work to be built on
- Status: Completed
- Task force homepage: http://wiki.dataportability.org/x/XIAt
-Final output: http://wiki.dataportability.org/x/SoA0

Logo and branding task force
- Time frame: August 2008 – March 2009 (scope of work was extended in November 2008)
- Purpose: to create a logo and associated branding for the DataPortability Project, including an upgrade of the website. On two previous occasions,we have received law suits due to our logo, so this third time, we had a much more robust process
- Status: Completed.
- Task force homepage: http://wiki.dataportability.org/x/XYUj
- Task force output: Everything (literally) that can be seen on http://dataportability.org as of today

Legal entity taskforce
- Time frame: Started November 2008, still open
- Purpose: Investigate and prepare for the incorporation of the DataPortability Project as a non-profit entity
- Status: Work completed, documents signed, waiting for clearance
- Task force homepage:  http://wiki.dataportability.org/x/JoBE

EULA and ToS task force
- Time frame: Started November 2008, still open
- Purpose: to create a set of outputs that can be incorporated into legal documents, reflecting the vision of the DataPortability Project
- Status: draft report will be available in April 2009
- Task force homepage: http://wiki.dataportability.org/x/mIRE

Standards landscape task force
- Time frame: Started March 2009, still open
- Purpose: to create an analyst report that can contextualise all the standards that advance data portability, with recommendations to improve identified weaknesses as well how they all fit together
- Status: recruiting for contributors, work has not begun yet
- Task force homepage: http://wiki.dataportability.org/x/BgJg

Healthcare taskforce
- Time frame: started November 2008, still open
- Purpose: To provide analysis on how data portability will impact the health sector
- Status: Use cases have been identified, currently assigning work for more focused research
- Task force homepage: http://wiki.dataportability.org/x/C4A8

Service provider grid tool
- Time frame: August 2008, still open
- Purpose: An API enabled web service that allows the community to monitor what companies/web services use what standards
-Status: Prototype done. Delays due to required developer staff to complete version one.
Task force homepage: http://wiki.dataportability.org/x/YIAt

Volunteer positions task force
- Time frame: November 2008, still open
- Purpose: to recruit, train, and fill open positions for a community manager and a set of analysts
- Status: have recruited a community manager, currently training a half dozen “analysts” around the world on a variety of topics in different jurisdictions
Task force homepage: http://wiki.dataportability.org/x/P4I0

Other activities not reflected in the task force work above.
Believe it or not – talking and meetings take up a lot of time!
- discussions over IDTBD membership
- discussions over creating the W3C Social Web incubator group
- discussions over Identity Commons membership
- first actual elections (for vacancies) since the enactment of the governance framework
- podcasts, regular reports, blog (you’re looking at it!), RSS feed of the best data portability posts on the web (note: due to the Magnolia crash, we lost all our bookmarked items. However we have now started a new friendfeed room as a temporary way to share links), and conferences (upcoming ones available here)
- as part of the new website, we have also recently (this month) done a massive reorganisation of the wiki which took 20-30 people hours to do
- we also have three “unofficial task forces” which will emerge when/if they are ready: one on publishing, another on business models, a third on market research.
- We are about to create a dedicated group that can focus on governance, with its first task to determine a more efficient voting system

As you can see, we’ve made a heavy investment in the administrative side of things to prepare us, which is why we are now emerging and reaching out again to the community. We are now embarking on work we have been wanting to do for months, and hope to get a lot achieved this year in output, to be ready for 2010.

We are focused this year on a set of strategic goals, and our core activities going forward are on policy formulation and communications (which includes education).


Special Election results

Posted: March 24th, 2009 | Author: Elias Bizannes | Filed under: Election | Tags: , , , , | 0 Comments

FROM: Elias Bizannes, DataPortability Project vice-chair, election
returning officer
TO: DataPortability Project Members and Supporters
RE:Special Election results

Voting for the Special election of the Steering Group has closed.

A total of 30 votes were cast, of which 28 that were recognised. Two votes were disqualifed for Givotovsky as the voting member had not fulfiled the requirement of joining the plenary one week before voting per section 4.16 of the by-laws.

The results were as follows

Dan Brickley: 8
Anthony Broad Crawford: 7
Nicholas R. Givotovsky: 5
Willem Kossen: 3
Jeremy LeBard: 2
Chris Lunt: 2
Mark Lizar:1

Congratulations to Dan Brickley and Anthony Broad-Crawford who have been elected by the DataPortability Project’s plenary as the two newest Steering Group members.

These positions are to fill the vacancies left by J.Trent Adams and Brett McDowell, who had resigned due to roles taken in other organisations that presented a conflict of interest.

A full election of the 12 person Steering Group will occur at the end of this year.


Time To Criminalize The Password Anti-pattern

Posted: January 4th, 2009 | Author: Elias Bizannes | Filed under: Open Standards | Tags: , , , , , , , , , , , , , | 0 Comments

Update: Twitter made another commitment today to adopting OAuth which is great! However they acknowledge that it won’t solve all problems (like we argue) – nevertheless these are positive steps to us eradicating the password anti-pattern

twitter_logo

In case you’ve never heard of it, Twitter is a micro-blogging service that is doing to communications what search did to information. It has exploded in popularity, and whether they find a revenue model or not – their impact is permanent and is leading the way for a new era of communications. I am one of their biggest fans and want to help them succeed. But I feel with their growth, propelled by loyal users like myself, we ought to let them know there are things that concern us.

The biggest issue is that whilst they enable data portability, they are doing it in an insecure way. As Chris Messina said, lets make 2009 the year we see the end to the password anti-pattern. In this post, I will explain what that anti-pattern is and a way we can fix it. The biggest reason why Twitter is continuiing with this anti-pattern (from my eyes), is because it’s a usability issue. But as you will see me prove below through screenshots, it isn’t. Just think of having a PIN code on your bank card: that’s a usability issue as well, but y’know, one of those good usability issues.

Twitter and Security: all we’ve heard in 2009 so far
Twitter is used to constant free PR, but this year two separate events occurred that could have been non-events (if they do what we ask).

The first was a third-party that provided a feature people wanted. As Twitter has an Application Programming Interface (API), third-party’s can create mashups and therefore provide this functionality to Twitter users. However because Twitter does not support delegated authentication, you need to enter your username and password. There are hundreds of third-party applications like this, and most are safe (we hope), but this particular site within 24 hours had put itself up for sale! And people couldn’t turn off the service – they had to change their password to do so.

The second incident to occur this last week, was an attempted phishing. Apparently, some users were being sent private messages telling them to visit a certain site which compromised their security. It’s ironic that Twitter tells you to not “share your private info” but for you to get value out of their API for mash-ups and third-party tools, that’s exactly what you need to do – and it makes situations like this slightly more risky.

Fortunately, there are things that can be done to minimize the risk of your accounts getting hacked, and for you to never have to give up information about you that will compromise your security.

Delegated authorization
There is a solution to this situation. It’s free to support it, simple to use, and in fact – Twitter’s team inspired its creation the other year. It’s through the use of an Open Standard called OAuth. There is plenty of material you can read on the web about this and a good start is Eran Hammer-Lahav’s explanation of oAuth followed by his three-part series for beginners if you want to dig a little deeper.

The basic concept is that it allows you to delegate authorization for use of an API. Huh?

I’ll illustrate this with an example. Let’s say you come across a Cool Product that allows you to do something unique with your Twitter account (say, being able to stream your Tweets through your e-mail client rather you having to visit the Twitter website). As this Cool Product has no formal links to Twitter, for you to use it, it needs to pretend to be you. Therefore, it asks for your user name and password. It knocks on Twitter’s API door, pretending to be you, and the Cool Product then gets access to your account to do the stuff you want to do with this third-party application. The problem with this approach, however, is that they can knock on Twitter’s door anytime pretending to be you – even when you don’t want them to.

With OAuth, it would be very different. Instead of you needing to provide your username and password, this Cool Product will say “Hey dude, I need to get some permissions – click this link to give it to me”. Then a request will be sent to Twitter’s API and Twitter will send you to a screen saying “hey dude, these third party dudes want access to your account – you cool with that?”. Then, with a simple click of the button, you can approve or deny access. Once approved, the Cool Product can then function – and you didn’t have to give up any private information like your password.

Here are some screen shots between another innovative start-up called FriendFeed and Google (who supports OAuth).

In this scenario, I want to add some more friends on my FriendFeed account. So I click on the option to invite them

friendfeed-import-address-book

When I click on “import from Gmail”, instead of having to type in my username and password to access my contacts, I simply get redirected to a screen. And because I’m permanently logged into my Gmail account, I don’t need to do anything else other than read and click “grant access” (otherwise, I would need to enter my Google credentials).

google-authentication.

Easy! Compare this to Facebook, another company that needs to think more proactively about its users security. If I want to add friends to my Facebook account, instead of redirecting me to the Google servers where I can grant access, it asks for my password.

facebook-find-your-friends-on-facebook

Next steps
As people on the web using web services, we’ve been forced to give up confidential information to get the value out of a service. We’ve forced ourselves to be okay with it with the sites we trust, but there are plenty of brands out there we don’t know to trust. But the thing is, this isn’t something we need to trust anyone with. With our health records and financial records accessible online, this isn’t just a matter of reputation risk but one of genuine identity risk.

There is a solution to this problem, and now that you recognize it, demand web services to give you data portability in a secure way. Let’s make 2009 the year that we kill the password anti-pattern. While easier said than done, it’s a fix that will curb some of the security issues: we hope Twitter hurries up in changing their API to require OAuth.

Twitter – we know you’ve been meaning to do it, but hopefully you really mean it this time. Because quite frankly, we as users are fueling your growth and the promotion of your API without some sort of safe-guards like this, is irresponsible (especially as these attacks prove you are going all the more mainstream. I don’t want to tell you how to run your business – it doesn’t have to be OAuth – but for crying out loud, give us some security for our digital identity.

One final Big But
Twitter has strong arguments to not jump onto OAuth, some of which they’ve said publicly and some that I think might be issues. They certainly have a competent team, and whilst they know the benefits, they also understand the fact that jumping onto OAuth or any type of delegated authorization will not fix all problems. However it’s a start. Here are some issues:

  1. OAuth is only good for services over web browsers. It is a real pain (or virtually impossible without some hacks) to use it for the client side (ie, on the desktop) and mobile sites – both of which Twitter has a lot of users that use it this way. The response to that is that some security is better than none – it’s not a big deal that users will have to authorize applications via the browser (and Twitter can just point a hairy finger at the standards community so they can fix it). At least give users the option to determine how secure they want to be.
  2. Twitter will need to support multiple authentication systems due to the limitations of oAuth. This is a real issue, but not an impossible one to manage, and the community is certainly willing to help out. My main point is that this is actually a security issue that matters, and because the cost is borne by the users and not the company, it’s not given equal recognition.
  3. The user experience will suffer for users. Well the reason users will “suffer” is because now, instead of just entering their password, they will now have to click a few buttons on different screens. As the screenshots show above, the user experience is not affected that much and I think while a valid point, it’s more a “different” user experience
  4. The user experience will suffer for developers. Yes it will, because instead of the lazy option to just ask users to hand over their password, they actually have to write some code to get the appropriate permissions happening. But this is a core reason why the DataPortability Project supports widely-supported Open Standards, as it minimizes the costs to business: once a developer learns it once, they know it for all future application development.  And like I said above: a bank not puting a code on your bank card, is more painful for your bank, but better that pain than the option without which poses risks for users.
  5. It will not prevent phishingLachlan Hardy gives a useful explanation on why (notice all Australians give the best explanations ;) ), as theoretically, people will be more prone to phishing attacks because of the ease. This is a valid point, as people potentially will just blindly click away to their doom, but let’s also remember there will also be a lot more control. A site can monitor suspect services to alert users, there is a full digital paper trail, and a user can revoke their authorization at any time. Certainly a bit of control is better than none, and by reducing the weak spots in the chain, more targeted efforts can be made to ensure users’ security is no compromised.



The “why” of Open Standards

Posted: December 29th, 2008 | Author: Elias Bizannes | Filed under: Analysis, Open Standards | Tags: , , , , , , , , | 0 Comments

There’s a great book that you need to read if this whole data portability world perplexes you, called Wikinomics: How Mass Collaboration Changes Everything by Don Tapscott and Anthony D. Williams. Suffice to say, it’s one of those Must Read books, but what I want to share is a story the boys wrote that clearly illustrates one of its central theses.

Hurricane Katrina ripped into the coastlines of Louisiana, Mississippi, and Alabama on Monday, August 29 2005 causing more human misery and economic damage than any storm on record…

…Yet, out of the chaos, and in the face of official ineptitude, came a powerful story of how an ad hoc team of volunteers from across the country came together to concoct an information management solution that far surpassed anything the local, state, and federal response teams had mustered. At the heart of the volunteer effort was a central repository of survivor information called Katrinalist. This impromptu Web site compiled survivor data from all over the web into a searchable format that made it easy to identify and locate friends and family members…

The story goes onto say all this valuable data to capture relevant information for each person (name, location, age) was collated into a central database and that the team behind this PeopleFinder project even created an open data spec called the PeopleFinder Interchange Format. The big challenge however, was being able to scrape information from a bulletin board which typically read “My father Joe was working in New Orleans and hadn’t evacuated. He was living in Jefferson Parish. We don’t know if he’s okay. Please call me or Mom in Houston. Lisa Brown, Houston, TX.”

What occurred was volunteer efforts to manually enter data into the database, of which thousands of people later did. But there could have been a dramatic difference if there was an agreed upon standard for collecting and sharing data. Imagine if Facebook decided to participate, to allow certain details to be linked to a central identity, which could then be linked to all the data collected by the relief agencies like the Red Cross. We would have interoperability of data, minimizing effort and creating time for potentially time critical information.

Having organisations storing their data in a certain format to export and access, is not killing their competitive advantage (I would argue it helps it). And if people understood the value of Open Standards, which heaven-forbid another disaster of this scale occurs, the power of the Internet can be unleashed to potentially save some lives.


OpenID Announces Results To Its First Election

Posted: December 27th, 2008 | Author: Elias Bizannes | Filed under: Election | Tags: , , , , , | 0 Comments

At the DataPortability Project, we’ve monitored closely the first OpenID Foundation election and are proud of the maturity, leadership and appropriate procedure shown by the OpenID foundation in this area.

It is reported that 175 of the 217 eligible members voted. Congratulations to all candidates for their efforts.

Elected to serve 2-year terms:

Snorri Giorgetti 106
Nat Sakimura 89
Chris Messina 76
David Recordon 76

Elected to serve 1-year terms:

Eric Sachs 62
Scott Kveton 57
Brian Kissel 55

Not elected:

Eran Hammer-Lahav 54
Joseph Smarr 52
Allen Tom 42
Luke Shepard 37
Johannes Ernst 37
Dick Hardt 36
John Bradley 22
Martin Atkins 21
Mike Kirkwood 8
Peter Nixey 8

We thank all the candidates as the process revealed a lot of interesting discussion about what the future of OpenID should be. On behalf of the DataPortability Steering Group, I look forward to working with our colleagues at the new Foundation’s board.

Elias Bizannes
Vice-chair, DataPortability.Org Steering Group