2 Chapter 2. Design Decisions in Organizing Systems
Robert J. Glushko
Table of Contents
2.3. Why Is It Being Organized?
2.4. How Much Is It Being Organized?
2.5. When Is It Being Organized?
2.6. How (or by Whom) Is It Organized?
2.7. Where is it being Organized?
2.8. Key Points in Chapter Two
Introduction
A set of resources is transformed by an organizing system when the resources are described or arranged to enable interactions with them. Explicitly or by default, this requires many interdependent decisions about the identities of resources; their names, descriptions and other properties; the classes, relations, structures and collections in which they participate; and the people or technologies interacting with them.
One important contribution of the idea of the organizing system is that it moves beyond the debate about the definitions of “things,” “documents,” and “information,” with the unifying concept of “resource” while acknowledging that “what is being organized” is just one of the questions or dimensions that need to be considered. These decisions are deeply intertwined, but it is easier to introduce them as if they were independent.
We introduce six groups of design questions, itemizing the most important dimensions in each group:
What is being organized? What is the scope and scale of the domain? What is the mixture of physical things, digital things, and information about things in the organizing system? Is the organizing system being designed to create a new resource collection, catalog an existing and closed resource collection, or manage a collection in which resources are continually added or deleted? Are the resources unique, or are they interchangeable members of a category? Do they follow a predictable “life cycle” with a “useful life”? Does the organizing system use the interaction resources created through its use, or are these interaction resources extracted and aggregated for use by another organizing system? (the section called “What Is Being Organized?”)
Why is it being organized? What interactions or services will be supported, and for whom? Are the uses and users known or unknown? Are the users primarily people or computational processes? Does the organizing system need to satisfy personal, social, or institutional goals? (the section called “Why Is It Being Organized?”)
How much is it being organized? What is the extent, granularity, or explicitness of description, classification, or relational structure being imposed? What organizing principles guide the organization? Are all resources organized to the same degree, or is the organization sparse and non-uniform? (the section called “How Much Is It Being Organized?”)
When is it being organized? Is the organization imposed on resources when they are created, when they become part of the collection, when interactions occur with them, just in case, just in time, all the time? Is any of this organizing mandated by law or shaped by industry practices or cultural tradition? (the section called “When Is It Being Organized?”)
How or by whom, or by what computational processes, is it being organized? Is the organization being performed by individuals, by informal groups, by formal groups, by professionals, by automated methods? Are the organizers also the users? Are there rules or roles that govern the organizing activities of different individuals or groups? (the section called “How (or by Whom) Is It Organized?”)
Where is it being organized? Is the resource location constrained by design or by regulation? Are the resources positioned in a static location? Are the resources in transit or in motion? Does their location depend on other parameters, such as time? (the section called “Where is it being Organized?”)
Classifying organizing systems according to the kind of resources they contain is the most obvious and traditional approach. We can also classify organizing systems by their dominant purposes, by their intended user community, or other ways. No single fixed set of categories is sufficient by itself to capture the commonalities and contrasts between organizing systems.
This framework for describing and comparing organizing systems overcomes some of the biases and conservatism built into familiar categories like libraries, museums, and archives, while enabling us to describe them as design patterns that embody characteristic configurations of design choices. We can then use these patterns to support inter-disciplinary work that cuts across categories and applies knowledge about familiar domains to unfamiliar ones. A dimensional perspective makes it easier to translate between category- and discipline-specific vocabularies so that people from different disciplines can have mutually intelligible discussions about their organizing activities. They might realize that they have much in common, and they might be working on similar or even the same problems.
A faceted or dimensional perspective acknowledges the diversity of instances of collection types and provides a generative, forward-looking framework for describing hybrid types that do not cleanly fit into the familiar categories. Even though it might differ from the conventional categories on some dimensions, an organizing system can be designed and understood by its family resemblance on the basis of its similarities on other dimensions to a familiar type of resource collection.
Thinking of organizing systems as points or regions in a design space makes it easier to invent new or more specialized types of collections and their associated interactions. If we think metaphorically of this design space as a map of organizing systems, the empty regions or “white space” between the densely-populated centers of the traditional categories represent organizing systems that do not yet exist. We can consider the properties of an organizing system that could occupy that white space and analyze the technology, process, or policy innovations that might be required to let us build it there. We can reason by analogy to identify and apply the principles used in one organizing system to understand or design others. [23]
What Is Being Organized?
|
“What is difficult to identify is difficult to describe and therefore difficult to organize.” |
|
|
||
Before we can begin to organize any resource we often need to identify it. It might seem straightforward to devise an organizing system around tangible resources, but we must be careful not to assume what a resource is. In different situations, the same “thing” can be treated as a unique item, one of many equivalent members of a broad category, or a component of an item rather than as an item on its own. For example, in a museum collection, a handmade, carved chess piece might be a separately identified item, identified as part of a set of carved chess pieces, or treated as one of the 33 unidentified components of an item identified as a chess set (including the board). When merchants assign a stock-keeping unit (SKU) to identify the things they sell, that SKU can be associated with a unique item, sets of items treated as equivalent for inventory or billing purposes, or intangible things like warranties.
Organization mechanisms like aisle signs, store directories and library card catalogs are embedded in the same physical environment as the resources being organized. But when these mechanisms or surrogates are digitized, the new capabilities that they enable create design challenges. This is because a digital organizing system can be designed and operated according to more abstract and less constraining principles than an organizing system that only contains physical resources. A single physical resource can only be in one place at a time, and interactions with it are constrained by its size, location, and other properties. In contrast, digital copies and surrogates can exist in many places at once and enable searching, sorting, and other interactions with an efficiency and scale impossible for tangible things.
When the resources being organized consist of information content, deciding on the unit of organization is challenging because it might be necessary to look beyond physical properties and consider conceptual or intellectual equivalence. A high school student told to study Shakespeare’s play Macbeth might treat any printed copy or web version as equivalent, and might even try to outwit the teacher by watching a film adaptation of the play. To the student, all versions of Macbeth seem to be the same resource, but librarians and scholars make much finer distinctions.[24]
Archival organizing systems implement a distinctive answer to the question of what is being organized. Archives are a type of collection that focuses on resources created by a particular person, organization, or institution, often during a particular time period. This means that archives have themselves been previously organized as a result of the processes that created and used them. The “original order” of the resources in an archive embodies the implicit or explicit organizing system of the person or entity that created the documents; it is treated as an essential part of the meaning of the collection. As a result, the unit of organization for archival collections is the fonds—the original arrangement or grouping, preserving any hierarchy of boxes, folders, envelopes, and individual documents—and thus they are not re-organized according to other (perhaps more systematic) classifications.[25]
Some organizing systems contain legal, business or scientific documents or data that are the digital descendants of paper reports or records of transactions or observations. These organizing systems might need to deal with legacy information that still exists in paper form or in electronic formats like image scans that are different from the structural digital format in which more recent information is likely to be preserved. When legacy conversions from printed information artifacts are complete or unnecessary, an organizing system no longer deals with any of the traditional tangible artifacts. Digital libraries dispense with these artifacts, replacing them with the capability to print copies if needed. This enables libraries of digital documents or data collections to be vastly larger and more accessible across space and time than any library that stores tangible, physical items could ever be.
Computational Descriptions of People
We often think and talk about time as a resource, and time fits the definition of “anything of value that supports goal-oriented activity” from the section called “The Concept of “Resource””. Furthermore, we could think of the calendar and clock as organizing systems that define time at different levels of granularity to support different kinds of interactions. However, it is probably more useful to think of time as a constraint that influences how and how much to organize.
Why Is It Being Organized?
|
|
|
|
||
Almost by definition, the essential purpose of any organizing system is to describe or arrange resources so they can be located and accessed later. The organizing principles needed to achieve this goal depend on the types of resources or domains being organized, and in the personal, social, or institutional setting in which organization takes place.
Organizing systems can be distinguished by their dominant purposes or the priority of their common purposes. Libraries, museums, and archives are often classified as memory institutions to emphasize their primary emphasis on resource preservation. In contrast, “management information systems” or “business systems” are categories that include the great variety of software applications that implement the organizing systems needed to carry out day-to-day business operations.
“Bringing like things together” is an informal organizing principle for many organizing systems. Almost as soon as libraries were invented over two thousand years ago, the earliest librarians saw the need to develop systematic methods for arranging and inventorying their collections.[26] The invention of mechanized printing in the fifteenth century, which radically increased the number of books and periodicals, forced libraries to begin progressively more refined efforts to state the functional requirements for their organizing systems and to be explicit about how they met those requirements.
Today, any information-driven enterprise must have systematic processes and technologies in place that govern information creation or capture and then manage its entire life cycle. Commercial firms need processes for transacting with customers or other firms to carry out business operations, to support research and innovation, marketing, and to develop business strategy and tactics in compliance with laws and regulations for accounting, taxes, human resources, data retention, and so on. In large firms these functions are so highly specialized and complex that the different types of organizing systems have distinct names: Enterprise Resource Planning (ERP), Enterprise Content Management (ECM), Enterprise Data Management (EDM) Supply Chain Management (SCM), Records Management, Customer Relationship Management (CRM), Business Intelligence (BI), Knowledge Management (KM), and so on. And even though the most important functions in the organizing systems of large enterprises are those that manage the information resources needed for its business operation, these firms might also need to maintain corporate libraries and archives.
Preserving documents in their physical or original form is the primary purpose of archives and similar organizing systems that contain culturally, historically, or economically significant documents that have value as long-term evidence. Preservation is also an important motivation for the organizing systems of information- and knowledge-intensive firms, where information is primarily in digital formats. Businesses and governmental agencies are usually required by law to keep records of financial transactions, decision-making, personnel matters, and other information essential to business continuity, compliance with regulations and legal procedures, and transparency. As with archives, it is sometimes critical that these business knowledge or records management systems can retrieve the original documents, although digital copies that can be authenticated are increasingly being accepted as legally equivalent.
When individuals manage their papers, books, documents, record albums, compact discs, DVDs, and other information resources, their organizing systems can vary greatly. This is in part because the content of the resources being organized becomes a consideration. Furthermore, many of the organizing systems used by individuals are implemented by web applications, and this makes them more accessible than physical resources.[27]
A second likely outcome of increased scale or use is that not everyone is likely to share the same goals and design preferences for the organizing system. If you share a kitchen with housemates, you might have to negotiate and compromise on some of the decisions about how the kitchen is organized so you can all get along. In more formal or institutional organizing systems conflicts between stakeholders can be much more severe, and the organizing principles and policies or permissions for the kinds of interactions available to different users might even be specified in commercial contracts or governed by laws or standards. For example, Bowker and Star note that physicians view the creation of patient records as central to diagnosis and treatment, insurance companies think of them as evidence needed for payment and reimbursement, and researchers think of them as primary data. These groups do not agree on the priority and quality requirements they assign to different information in the patient record, and physicians understandably resist doing work that has no direct benefit for them. Not surprisingly, policy making and regulations about patient records are highly contentious.[28]
The emerging field of applied behavioral economics, popularized in books like Freakonomics and Nudge, explains how subtle differences in resource arrangement, the number and framing of choices, and default values can have substantial effects on the decisions people make. Consider the arrangement of salads, pasta dishes, bread, fish, meat, desserts and other types of food in a self-serve cafeteria buffet. In a school setting, the food might be organized and presented to encourage healthier eating, perhaps by making the fatty french fries and high-calorie desserts hard to reach or by providing smaller trays and plates. The same foods would likely be organized differently in an all-you-can-eat restaurant, where the goal is to minimize food costs, with less expensive items like salads at the front of the line to ensure that trays and plates will already be full when the customer gets to the more expensive items at the end of the line.[29]
Looking to a much more insidious organizing system, when the South African government adopted Apartheid policies to classify and segregate people by race, it systematized economic and political discrimination and great suffering for the nonwhite population. (See the sidebar, Power and Politics in Organizing.)
Power and Politics in Organizing
Organizing systems and technology are not developed in a vacuum, unencumbered by politics or social context. As Langdon Winner underscores in Do Artifacts Have Politics?, systems and technologies can be conscious manifestations of the personal (and often political) biases of their creators. Because all people have different experiences and biases, even when they are not conscious of them they influence the design and implementation of organizing systems in ways that can create or perpetuate inequalities.[30]
(See also the section called “Classification Is Biased” and Chapter 11, The Organizing System Roadmap.)
Chapter 8, Classification: Assigning Resources to Categories more fully explains the different purposes for organizing systems, the organizing principles they embody, and the methods for assigning resources to categories.
How Much Is It Being Organized?
|
|
—[
(Svenonius 2000, p. 24)]Not all resources should be accorded the same degree of organization. In this section we will briefly unpack this notion of degree of organization into three important and related dimensions: the amount of description detail or organization applied to each resource, the amount of organization of resources into classes or categories, and the overall extent to which interactions in and between organizing systems are shaped by resource description and arrangement.
(Chapter 5, Resource Description and Metadata and Chapter 7, Categorization: Describing Resource Classes and Types, more thoroughly address these questions about the nature and extent of description in organizing systems.)
Not all resources in a collection require the same degree of description for the simple reason we discussed in the section called “Why Is It Being Organized?”: Organizing systems exist for different purposes and to support different kinds of interactions or functions. Let us contrast two ends of the “degree of description” continuum. Many people use “current events awareness” or “news feed” applications that select news stories whose titles or abstracts contain one or more keywords (Google Alert is a good example). This exact match algorithm is easy to implement, but its all-or-none and one-item-at-a-time comparison misses any stories that use synonyms of the keyword, that are written in languages different from that of the keyword, or that are otherwise relevant but do not contain the exact keyword in the limited part of the document that is scanned. However, users with current events awareness goals do not need to see every news story about some event, and this limited amount of description for each story and the simple method of comparing descriptions are sufficient.
On the other hand, this simple organizing system is inadequate for the purpose of comprehensive retrieval of all documents that relate to some concept, event, or problem. This is a critical task for scholars, scientists, inventors, physicians, attorneys and similar professionals who might need to discover every relevant document in some domain. Instead, this type of organizing system needs rich bibliographic and semantic description of each document, most likely assigned by professional catalogers, and probably using terms from a controlled vocabulary to enforce consistency in what descriptions mean.
Even when faced with the same collection of resources, people differ in how much organization they prefer or how much disorganization they can tolerate. A classic study by Tom Malone of how people organize their office workspaces and desks contrasted the strategies and methods of “filers” and “pilers.” Filers maintain clean desktops and systematically organize their papers into categories, while pilers have messy work areas and make few attempts at organization. This contrast has analogues in other organizing systems and we can easily imagine what happens if a “neat freak” and “slob” become roommates.[31]
An equally wide range, from a little organization to a lot, can be seen in the organizing systems for businesses, armies, governments, or any other institutional organizing systems for people. Organizations with broad scope and many people usually have deep hierarchies and explicit reporting relationships with the CEO, general, or president at the top with numerous layers of vice presidents, directors, department heads, and managers (or colonels, majors, captains, lieutenants, and sergeants). Smaller organizations are more varied, with some embodying multi-layered management, and some embracing a flatter arrangement with fewer management levels, wider spans of authority, and more autonomy for individual workers. Many start-up firms try to grow without any management structure at all in the belief that it makes them more innovative and nimble, but evidence suggests that when no one is responsible for making decisions, the lack of accountability results in poor decisions, or in no decisions at all even when some were sorely needed.[32]
In any case, when people have to do it, describing and organizing resources is work. Stakeholders in an organizing system often have disagreements among about how much organization is necessary because of the implications for who performs the work and who derives the benefits, especially the economic ones. Physicians prefer narrative descriptions and broad classification systems because they make it easier to create patient notes. In contrast, insurance companies and researchers want fine-grained “form-filling” descriptions and detailed classifications that would make the physician’s work more onerous.[33]
The cost-effectiveness of creating systematic and comprehensive descriptions of the resources in an information collection has been debated for nearly two centuries, beginning in 1841 when Sir Anthony Panizzi proposed rules for cataloguing the British Library. In the last half century, the scope of the debate grew to consider the role of computer-generated resource descriptions.[34]
The amount of resource description is always shaped by the currently available technology for capturing, storing, and making use of it. Nineteenth century geologists and paleontologists typically recorded only general information about the depth and surrounding geological features when they found fossils because they had no technology for making more precise measurements and everything they noted they had to record by hand. Today, vastly more detailed information is recorded by instruments and exploited by sophisticated techniques for carbon dating and 3D reconstruction.[35]
Automatically generated descriptions are increasingly an alternative or complement to those created by people. “Smart” resources use sensors to capture information about themselves and their environments (see the section called “Identity and Active Resources”). Our own computers and phones record information about our keystrokes, clicks, communications, and locations. Business and government computers analyze and index most of the text and speech content that flows through and between our personal phones and computers. These indexes typically assign weights to the terms according to calculations that consider the frequency and distribution of the terms in both individual documents and in the collection as a whole to create a description of what the documents are about. These descriptions of the documents in the collection are more consistent than those created by human organizers. They allow for more complex query processing and comparison operations by the retrieval functions in the organizing system. For example, query expansion mechanisms can automatically add synonyms and related terms to the search. Additionally, retrieved documents can be arranged by relevance, while “citing” and “cited-by” links can be analyzed to find related relevant documents.
A second constraint on the degree of organization comes from the size of the collection within the scope of the organizing system. Organizing more resources requires more descriptions to distinguish any particular resource from the rest, and more constraining organizing principles. Similar resources need to be grouped or classified to emphasize the most important distinctions among the complete set of resources in the collection. A small neighborhood restaurant might have a short wine list with just ten wines, arranged in two categories for “red” and “white” and described only by the wine’s name and price. In contrast, a gourmet restaurant might have hundreds of wines in its wine list, which would subdivide its “red” and “white” high-level categories into subcategories for country, region of origin, and grape varietal. The description for each wine might in addition include a specific vineyard from which the grapes were sourced, the vintage year, ratings of the wine, and tasting notes.
Using “Information Theory” to Quantify Organization
We often hear news stories hyping “how much information” there is in the information society with breathless exuberance about the creation of peta-, exa-, whatever-bytes of content. A much more important and intellectually deeper question than absolute size in bytes is measuring how much information is encoded in the structure or organization of a system. For this we can turn to “Information Theory,” a formal approach to understanding the theoretical maximum amount of information that can be carried by a communications system by using efficient coding, data compression, and error correction. It was developed by Claude Shannon, a researcher at Bell Laboratories, and first published as “a mathematical theory of communication” in 1948. We can apply it in the discipline of organizing to compare the amount of structure in different ways of organizing the same resources.[36]
The “entropy” measure is often used to create predictive models of the “decision tree” variety, which is an algorithm that classifies or predicts by making a sequence of logical tests. Each test divides a collection of data into sets with less entropy (more predictability). (See the section called “Implementing Categories”)
At some point a collection grows so large that it is not economically feasible for people to create bibliographic descriptions or to classify each separate resource, unless there are so many users of the collection that their aggregated effort is comparably large; this is organizing by “crowdsourcing.” This leaves two approaches that can be done separately or in tandem.
The simpler approach is to describe sets of resources or documents as a set or group, which is especially sensible for archives with its emphasis on the fonds (see the section called “What Is Being Organized?”).
The second approach is to rely on automated and more general-purpose organizing technologies that organize resources through computational means. Search engines are familiar examples of computational organizing technology, and the section called “Computational Classification” describes other common techniques in machine learning, clustering, and discriminant analysis that can be used to create a system of categories and to assign resources to them.
Finally, we must acknowledge the ways in which information processing and telecommunications technologies have transformed and will continue to transform organizing systems in every sphere of economic and intellectual activity. A century ago, when the telegraph and telephone enabled rapid communication and business coordination across large distances, these new technologies enabled the creation of massive vertically integrated industrial firms. In the 1920s, the Ford Motor Company owned coal and iron mines, rubber plantations, railroads, and steel mills so it could manage every resource needed in automobile production and reduce the costs and uncertainties of finding suppliers, negotiating with them, and ensuring their contractual compliance. Adam’s Smith’s invisible hand of the market as an organizing mechanism had been replaced by the visible hand of hierarchical management to control what Ronald Coase in 1937 termed “transaction costs” in The Nature of the Firm.
In recent decades, a new set of information and computing technologies enabled by Moore’s law—unlimited computing power, effectively free bandwidth, and the Internet—have turned Coase upside down, leading to entirely new forms of industrial organization made possible as transaction costs plummet. When computation and coordination costs drop dramatically, it becomes possible for small firms and networks of services (provided by people or by computational processes) to out-compete large corporations through more efficient use of information resources and services, and through more effective information exchange with suppliers and customers, much of it automated. Herbert Simon, a pioneer in artificial intelligence, decision making, and human-computer interaction, recognized the similarities between the design of computing systems and human organizations and developed principles and mechanisms applicable to both.[37]
Chapter 9, The Forms of Resource Descriptions, focuses on the representation of resource descriptions, taking a more technological or implementation perspective.
Chapter 10, Interactions with Resources, discusses how the nature and extent of descriptions determines the capabilities of the interactions that locate, compare, combine, or otherwise use resources in information-intensive domains.
When Is It Being Organized?
|
|
|
|
||
The organizing system framework recasts the traditional tradeoff between information organization and information retrieval as the decision about when the organization is imposed. We can contrast organization imposed on resources “on the way in” when they are created or made part of a collection with “on the way out” organization imposed when an interaction with resources takes place.
Digital photos, videos, and documents are generally organized to some minimal degree when they are created because some descriptions, notably time and location, are assigned automatically to these types of resources by the technology used to create them. At a minimum, these descriptions include the resource’s creation time, storage format, and chronologically ordered, auto-assigned filename (IMG00001.JPG, IMG00002.JPG, etc.), but often are much more detailed.[38]
Digital resources created by automated processes generally exhibit a high degree of organization and structure because they are generated automatically in conformance with data or document schemas. These schemas implement the business rules and information models for the orders, invoices, payments, and the numerous other document types created and managed in business organizing systems.
Before a resource becomes part of a library collection, its author-created organization is often supplemented by additional information supplied by the publisher or other human intermediaries, such as an International Standard Book Number(ISBN) or Library of Congress Call Number(LOC-CN) or Library of Congress Subject Headings(LOC-SH).
In contrast, Google and other search engines apply massive computational power to analyze the contents and associated structures (like links between web pages) to impose organization on resources that have already been published or made available so that they can be retrieved in response to a user’s query “on the way out.” Google makes use of existing organization within and between information resources when it can, but its unparalleled technological capabilities and scale yield competitive advantage in imposing organization on information that was not previously organized digitally.[39] One reaction to the poor quality of some computational description has been the call for libraries to put their authoritative bibliographic resources on the open web, which would enable reuse of reliable information about books, authors, publishers, places, and subject classifications. This “linked data” movement is slowly gathering momentum.[40]
In many organizing systems the nature and extent of organization changes over time as the resources are used. The arrangement of resources in a kitchen or office changes incrementally as frequently used things end up in the front of the pantry, drawer, shelf or filing cabinet or on the top of a pile of papers. Printed books or documents acquire margin notes, underlining, turned down pages or coffee cup stains that differentiate the most important or most frequently used parts. Digital documents do not take on coffee cup stains, but when they are edited, their new revision dates put them at the top of directory listings.
The sort of organic or emergent change in organizing systems that takes place over time contrasts with the planned and systematic maintenance of organizing systems described as curation or governance, two related but distinct activities. Curation usually refers to the methods or systems that add value to and preserve resources, while the concept of governance more often emphasizes the institutions or organizations that carry out those activities. The former is most often used for libraries, museums, or archives and the latter for enterprise or inter-enterprise contexts. (For more discussion, see the section called “Governance”)
We should always consider the extent to which people or technology in an organizing system are able to adapt when new resources, data, or people enter the picture. When and how much an organizing system can be changed depends on the extent of architectural thinking that went into its design (see The Three Tiers of Organizing Systems), because it should be possible to make a change to a component without having to rethink the system entirely.
How (or by Whom) Is It Organized?
|
|
|
|
||
In the preceding quote, Svenonius identifies three different ways for the “work of organizing information” to be performed: by professional indexers and catalogers, by the populace at large, and by automated (computerized) processes. Our notion of the organizing system is broader than her “bibliographic universe,” making it necessary to extend her taxonomy. Authors are increasingly organizing the content they create, and it is important to distinguish users in informal and formal or institutional contexts. We have also introduced the concept of an organizing agent (the section called “The Concept of “Organizing Principle””) to unify organizing done by people and by computer algorithms.
Professional indexers and catalogers undergo extensive training to learn the concepts, controlled descriptive vocabularies, and standard classifications in the particular domains in which they work. Their goal is not only to describe individual resources, but to position them in the larger collection in which they reside.[41] They can create and maintain organizing systems with consistent high quality, but their work often requires additional research, which is costly.
The class of professional organizers also includes the employees of commercial information services like Westlaw and LexisNexis, who add controlled and, often, proprietary metadata to legal and government documents and other news sources. Scientists and scholars with deep expertise in a domain often function as the professional organizers for data collections, scholarly publications and proceedings, and other specialized information resources in their respective disciplines. The National Association of Professional Organizers(NAPO) claims several thousand members who will organize your media collection, kitchen, closet, garage or entire house or help you downsize to a smaller living space.[42]
Non-author users in the “populace at large” are most often creating organization for their own benefit. These ordinary users are unlikely to use standard descriptors and classifications, and the organization they impose sometimes so closely reflects their own perspective and goals that it is not useful for others. Fortunately most users of “Web 2.0” or “community content” applications at least partly recognize that the organization of resources emerges from the aggregated contributions of all users, which provides incentive to use less egocentric descriptors and classifications. The staggering number of users and resources on the most popular applications inevitably leads to “tag convergence” simply because of the statistics of large sample sizes.
Finally, the vast size of the web and the even greater size of the “deep” or invisible web, composed of the information stores of business and proprietary information services, makes it impossible to imagine today that it could be organized by anything other than the massive computational power of search engine providers like Google and Microsoft. Likewise, data mining, predictive analytics, recommendation systems, and many other application areas that involve computational modeling and classification simply could not be done any other way.[43]
Where is it being Organized?
|
|
|
|
||
In the architectural design of an organizing system, its physical location is usually not a primary concern. In most organizing systems, the matter of where the organizing system and the resources are located can be abstracted away. So, in practice, resource location often is not as important as the other questions here. Physical constraints of the storage location should generally be relegated to an implementation concern rather than an architectural one. The construction of a special display structure for a valuable resource is not an independent design dimension; it is just the implementation of the user interface. (See the section called “The Implementation Perspective ”)
Physical resources are often stored where it is convenient and efficient to do so, whether in ordinary warehouses, offices, storerooms, shelves, cabinets, and closets. It can be necessary to adapt an organizing system to characteristics of its physical environment, but this could undermine architectural thinking and make it harder to maintain the organization over time, as the collection evolves in scope and scale. (See the section called “Organizing Physical Resources”)
In the section called “Organizing Places” we consider the organization of the land, built environments, and wayfinding systems. the section called “The Structural Perspective” discusses the structural perspective on resource relationships, and in some systems, it may be very significant where resources are located in relation to one another. In The Barnes Collection, for example, works of art are physically grouped to enunciate common characteristics. Conversely, zoos do not mix the kangaroos with the wild dogs, and the military does not mix the ingredients for chemical weapons (at least, not until they plan to use them). There are also circumstances where resources can only exist in (or are particularly suited to) particular environments, such as the conditions required to grow wine grapes or mushrooms, or store spent nuclear fuel. UPS advises companies on where to put their warehouses and shipment centers. These are more substantial than questions of presentation, but it is debatable whether it falls under the storage or logic tier (you could have the principle of “keep the mushrooms somewhere moist” while not dictating where particularly).
Indeed, in designing an organizing system you will often find that questions about location tumble naturally out of the other five design dimensions. For instance, questions about “when,” “what,” and “where” are often inseparable, particularly when an organizing system is subject to outside regulations, which tend to have geographical jurisdictions. “Where” is also commonly bound up with “who” and “why,” when locational challenges or opportunities faced by a system’s creators or users necessitate special design consideration. (See the section called “Effectivity”)
Key Points in Chapter Two
2.8.1. How does a “design questions” or “dimensional” perspective on the design of organizing systems complement the familiar use of categories like library and museum?
2.8.2. Why is the question “What is a thing?” so fundamental and challenging?
2.8.4. Why is it challenging to decide on the unit of organization for information content?
2.8.5. What is the essential purpose of any organizing system?
2.8.6. What are the primary purposes for the organizing systems in libraries, museums, and archives?
2.8.7. What kinds of documents are businesses and governmental agencies required to keep?
2.8.8. What is the value created if interaction traces can be turned into interaction resources?
2.8.9. Why is efficiency too narrow a measure for evaluating organizing systems?
2.8.11. Why might merchants or firms differ in the extent or granularity of their product descriptions?
2.8.12. What are some of the potential downsides to automated resource description?
2.8.14. How is organizing “on the way in” different from organizing “on the way out”?
How does a “design questions” or “dimensional” perspective on the design of organizing systems complement the familiar use of categories like library and museum? |
|
|
|
Why is the question “What is a thing?” so fundamental and challenging? |
|
|
|
How are organizing systems for physical resources and those for digital resources fundamentally different? |
|
|
|
Why is it challenging to decide on the unit of organization for information content? |
|
|
|
What is the essential purpose of any organizing system? |
|
|
|
What are the primary purposes for the organizing systems in libraries, museums, and archives? |
|
|
Libraries, museums, and archives are often classified as memory institutions to emphasize their primary emphasis on resource preservation. |
What kinds of documents are businesses and governmental agencies required to keep? |
|
|
|
What is the value created if interaction traces can be turned into interaction resources? |
|
|
If a system can turn its interaction traces into interaction resources, additional value can be created by analyzing these resources to enhance the interactions, to suggest new ones, or make predictions about how individual users or groups of them will behave. |
Why is efficiency too narrow a measure for evaluating organizing systems? |
|
|
Resources are always organized in ways that are designed to allocate value for some people (e.g., the owners of the resources, or the most frequent users of them) and not for others. |
What lessons from applied behavioral economics about how people make decisions have implications for the design of organizing systems? |
|
|
Subtle differences in resource arrangement, the number and framing of choices, and default values can have substantial effects on the decisions people make. |
Why might merchants or firms differ in the extent or granularity of their product descriptions? |
|
|
Different merchants or firms might make different decisions about the extent or granularity of description when they assign SKUs because of differences in suppliers, targeted customers, or other business strategies. |
What are some of the potential downsides to automated resource description? |
|
|
A detailed description produced by sensors or computers can seem more accurate or authoritative than a simpler one created by a human observer, even if the latter would be more useful for the intended purposes. Detailed transaction data can be used to violate privacy and civil rights. |
How does the number of resources in a collection affect the amount of resource description and organization required? |
|
|
Organizing more resources requires more descriptions to distinguish any particular resource from the rest, and more constraining organizing principles. Similar resources need to be grouped or classified to emphasize the most important distinctions among the complete set of resources in the collection. |
How is organizing “on the way in” different from organizing “on the way out”? |
|
|
We can contrast organization imposed on resources “on the way in” when they are created or made part of a collection with “on the way out” organization imposed when an interaction with resources takes place. |
Why do digital resources created by automated processes exhibit a high degree of organization and structure? |
|
|
Digital resources created by automated processes generally exhibit a high degree of organization and structure because they are generated automatically in conformance with data or document schemas. |
What kinds of organizing systems would be impossible to create without the use of massive computational power? |
|
|
The vast size of the web and the even greater size of the “deep” or invisible web makes it impossible to imagine today that it could be organized by anything other than the massive computational power of search engine providers like Google and Microsoft. Likewise, data mining, predictive analytics, recommendation systems, and many other application areas that involve computational modeling and classification simply could not be done any other way. (See the section called “How (or by Whom) Is It Organized?”) |
[23] Depending on which characteristics of Google Books and libraries you think about, you might complete this analogy with an animal theme park like Sea World (http://www.seaworld.com/) or a private hunting reserve that creates personalized “big game” hunts. Or maybe you can invent something completely new.
[24] Organizing systems that follow the rules set forth in the Functional Requirements for Bibliographic Records(FRBR) [(Tillett 2005)] treat all instances of Macbeth as the same “work.” However, they also enforce a hierarchical set of distinctions for finer-grained organization. FRBR views books and movies as different “expressions,” different print editions as “manifestations,” and each distinct physical thing in a collection as an “item.” This organizing system thus encodes the degree of intellectual equivalence while enabling separate identities where the physical form is important, which is often the case for scholars.
[25] Typical examples of archives might be national or government document collections or the specialized Julia Morgan archive at the University of California, Berkeley (http://www.oac.cdlib.org/findaid/ark:/13030/tf7b69n9k9/), which houses documents by the famous architect who designed many of the university’s most notable buildings as well as the famous Hearst Castle along the central California coast. The “original order” organizing principle of archival organizing systems was first defined by 19th-century French archivists and is often described as “respect pour les fonds.”
The William Ashburner collection of historical photos from an 1867-1869 surveying expedition in the Western United States is kept in the University of California, Berkeley’s Bancroft Library in the order in which Ashburner, a member of the survey party, had arranged it when he donated it to the library decades later. The arrangement roughly follows a chronological and geographical progression, with some photos obviously out of order and some whose locations cannot be determined.
[27] For example, many people manage their digital photos with Flickr, their home libraries with Library Thing, and their preferences for dining and shopping with Yelp. It is possible to use these “tagging” sites solely in support of individual goals, as tags like “my family,” “to read,” or “buy this” clearly demonstrate. But maintaining a personal organizing system with these web applications potentially augments the individual’s purpose with social goals like conveying information to others, developing a community, or promoting a reputation. Furthermore, because these community or collaborative applications aggregate and share the tags applied by individuals, they shape the individual organizing systems embedded within them when they suggest the most frequent tags for a particular resource.
[28] [(Bowker and Star 2000)].
[29] [(Levitt 2005)] and [(Thaler 2008)]
[30] [(Winner 1980 p 121-136)]
[31] [(Malone 1983)] is the seminal research study, but individual differences in organizing preferences were the basis of Neil Simon’s Broadway play The Odd Couple in 1965, which then spawned numerous films and TV series.
[33] See Grudin’s classic work on non-technological barriers to the successful adoption of collaboration technology [(Grudin 1994)].
[34] Panizzi is most often associated with the origins of modern library cataloging. He [(Panizzi 1841)] published 91 cataloging rules for the British Library that defined authoritative forms for titles and author names, but the complexity of the rules and the resulting resource descriptions were widely criticized. For example, the famous author and historian Thomas Carlyle argued that a library catalog should be nothing more than a list of the names of the books in it. Standards for bibliographic description are essential if resources are to be shared between libraries. See [(Denton 2007)], [(Anderson and Perez-Carballo 2001a], [2001b]).
[35] [(Bowker and Star 2000 p. 69.)]
[36] Information theory was developed to attack the technical problem of packing the maximum amount of data into the signal carrying telephone calls, but it quickly provided an essential statistical foundation in language analysis and computational linguistics. [(Shannon 1948)]. Company organization and other examples applying information theory to the analysis of organizing systems can be found in [(Levitin 2014, Chapter 7)].
[37] Coase won the 1991 Nobel Prize in economics for his work on transaction costs, which he first published as a graduate student [(Coase 1937)]. Berkeley business professor Oliver Williamson received the prize in 2009 for work that extended Coase’s framework to explain the shift from the hierarchical firm to the network firm [(Williamson 1975], [1998)]. The notion of the “visible hand” comes from [(Chandler 1977)]. Simon won the Nobel Prize in economics in 1978, but if there were Nobel Prizes in computer science or management theory he surely would have won them as well. Simon was the author or co-author of four books that have each been cited over 10,000 times, including [(Simon 1997], [1996)] and [(Newell and Simon 1972)].
[38] Most digital cameras annotate each photo with detailed information about the camera and its settings in the Exchangeable Image File Format(EXIF), and many mobile phones can associate their location along with any digital object they create.
[39] Indeed, Geoff Nunberg criticized Google for ignoring or undervaluing the descriptive metadata and classifications previously assigned by people and replacing them with algorithmically assigned descriptors, many of which are incorrect or inappropriate. Calling Google’s Book Search a “disaster for scholars” and a “metadata train wreck,” he lists scores of errors in titles, publication dates, and classifications. For example, he reports that a search on “Internet” in books published before 1950 yields 527 results. The first 10 hits for Whitman’s Leaves of Grass are variously classified as Poetry, Juvenile Nonfiction, Fiction, Literary Criticism, Biography & Autobiography, and Counterfeits and Counterfeiting. [(Nunberg 2009)]
[40] [(Byrne and Goddard 2010)].
[41] This is an important distinction in library science education and library practice. Individual resources are described (“formal” cataloging) using “bibliographic languages” and their classification in the larger collection is done using “subject languages” [(Svenonius 2000, Ch. 4 and Ch. 8, respectively)]. These two practices are generally taught in different library school courses because they use different languages, methods and rules and are generally carried out by different people in the library. In other organizations, the resource description (both formal and subject) is created in the same step and by the same person.
[42] NAPO: http://www.napo.net The name and scope of this organization seems a bit odd given how much professional organizing takes place in business, science, government, medicine, education, and other domains where closets and garages are not the most important focus.
[43] [(He et al. 2007)] estimate that there are hundreds of thousands of websites and databases whose content is accessible only through query forms and web services, and there are over a million of those. The amount of content in this hidden web is many hundreds of times larger than that accessible in the surface or visible web.
See http://www.worldwidewebsize.com/ for estimates of the size of the visible web calculated from comparisons of results from search engines.