UPDATE: For a further enlightening discussion of the gaps in the USPTO full text database, please see the comment section of this post (click the word “Comments” where it appears at the very end of this post).
Recently, a message came over Carl Oppedahl’s PAIR discussion list highlighting a mysterious gap in the USPTO’s online patent database: data seemed to be missing for patent numbers between 6,363,527 and 6,412,112.
Rick Neifeld, of Neifeld IP Law, responded that his 1999 survey into the PTO’s data revealed many errors in the USPTO’s data, as many of us have probably suspected for some time. Rick’s description of these errors is very interesting:
The dirt consisted of things as minor as numerous misspellings of assignee names, or HTML pages non compliant with HTML standards, to HTML text that could not be deconstructed into component sections due to HTML formatting errors, assignment records that were combined, corrupt, unreadable.
I absolutely expect our current crop of electronic patent database to contain massive numbers of errors. We have to expect this, if only because of the sheer amount of information involved. Another reason might be that the economic model of patent data production does not really encourage the national patent offices to maintain high quality electronic patent data. There are millions and millions of patent documents pouring out of government-run institutions (without a profit motive for perfection), and errors are bound to be rampant.
(more…)
Filed under: Items of Interest | Tagged: Carl Oppedahl, CAS, Commissioner Stoll, data errors, DGENE, dwpi, GenomeQuest, OCR, PAIR, PIUG, STN, Thomson Reuters, Tony Trippe, USPTO | 11 Comments »