SM0TVI Namespace

fileinfo — File Information Schema

CC-BY-SA
This work is licensed under the Creative Commons Attribution-ShareAlike License 4.0.

This Version: A.
20250507.shtml
Latest Version:
index.shtml
index.rss (RSS Feed of the schema with updates)

Introduction

The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in RFC 2119.[1]

The Internet Media Type

Content-Type Syntax Diagram
Figure #: Internet media type railroad diagram.
type token tree token subtype token suffix token
mime-type := type "/" [tree "."] subtype ["+" suffix]* *(";" parameter)
parameter := attribute "=" value ;
attribute := token ;
value     := token / quoted-string ;
token     := 1*<any (US-ASCII) CHAR except SPACE, CTLs, or tspecials>
tspecials := "(" / ")" / "<" / ">" / "@" / "," / ";" / ":" /
             "\" / <"> /  "/" / "[" / "]" / "?" / "="
          ; Must be in quoted-string, to use within parameter values.
Figure #: Media type syntax in Extended Backus–Naur form (EBNF) notation.

{ The Internet Assigned Numbers Authority (IANA) maintains the registry of official Internet Media Types.[2] Types, subtypes, and parameter names are case-insensitive. Parameter values are usually case-sensitive, but may be interpreted in a case-insensitive fashion depending on the intended use. }

The IANA currently has the following major media types defined:

{ Additionally, the following pseudo-types are defined by freedesktop.org and are used by this schema. }

  • inode/* — File system non-regular files, such as character and block device files, directories, symbolic links, sockets, FIFOs, doors and mount points. [3]
  • x-scheme-handler/* — This scheme allows URI scheme handling to enjoy the same benefits as media type handlers, such as the ability to change the default handler, the cross-desktop support, and easier application launching.
    Note that this virtual media type is not for listing URI schemes that an application can load files from. For example, a movie player would not list x-scheme-handler/http in its supported mime-types, but it would list x-scheme-handler/rtsp if it supported playing back from RTSP locations. [3]

{ The following unofficial top-level types are in common use: }

Media Type Suffixes

The media type suffix is an augmentation to the media type definition to additionally specify the underlying structure of that media type, allowing for generic processing based on that structure and independent of the exact type's particular semantics. Media types that make use of a named structured syntax should use the appropriate IANA registered "+"suffix for that structured syntax when they are registered. Unregistered suffixes should not be used (since January 2013). Structured syntax suffix registration procedures are defined in RFC 6838.

The +xml suffix has been defined since January 2001 (RFC 3023), and was formally included in the initial contents of the Structured Syntax Suffix Registry along with +json, +ber, +der, +fastinfoset, +wbxml, and +zip in January 2013 (RFC 6839). Subsequent additions include +gzip, +cbor, +json-seq, and +cbor-seq.

[4]

Media Type Parameters

{ The charset media type parameter is probably the most common one, and indicates the character set used in the file. }

Rationale for this Format

Various Formats and Locations for Media Type Information

Microsoft Windows

Stored under the following registry keys:

  • HKEY_CLASSES_ROOT\.extension\Content Type
  • HKEY_CLASSES_ROOT\MIME\Database\Content Type\media-type, which is merged from the Content Type/media-type subkeys under the following keys:
    • HKEY_LOCAL_MACHINE\SOFTWARE\Classes\MIME\Database — the machine level database of media types;
    • HKEY_CURRENT_USER\Software\Classes\MIME\DataBase — the user-specific overrides of media types.
POSIX Compliant Systems

{ Under the Portable Operating System Interface (POSIX), information regarding media types is primarilly stored in two different files, /etc/mime.types and /etc/mailcap. [5] }

Why RDF?

{ Resource Description Framework (RDF) }

Schema Walk-Through

contentType — Media Type

Type: fileinfo:MediaType.

{ This property is common to both the File and MediaType classes. }

File — Basic File Properties Class

access — Access Control Information

Type: acl:Authorization [6], xsd:string

Access Control List (ACL)
Warning Triangle
Security Risk

Some access control information MUST NOT be made public by default. The access information relevant the agent requesting access to a resource MAY be provided to the requesting agent.

contentLength — File Size

Type: xsd:nonNegativeInteger

lastModified — Time of Last Modification

Type: xsd:dateTime

MediaType — Media Type Information Class

description — File Format Description

Type: xsd:string

extensions — File Extension(s)

Type: xsd:string

ianaTemplateIANA Template

parameters — Media Type Parameters

Type: xsd:string

specification — File Format Specification(s)

title — File Format Name

Type: xsd:string

uti — Uniform Type Identifier

Type: xsd:string

An Uniform Type Identifier (UTI) is a text string used on software provided by Apple Inc. to uniquely identify a given class or type of item.

UTIs use a reverse-DNS naming structure, for example “net.daringfireball.markdown” for the Markdown format found at https://daringfireball.net/projects/markdown/ . Names may include the ASCII characters AZ, az, 09, hyphen ("-"), and period ("."), and all Unicode characters above U+007F. Colons (":") and slashes ("/", "\") are prohibited for compatibility with Macintosh and POSIX file path conventions. UTIs support multiple inheritance, allowing files to be identified with any number of relevant types, as appropriate to the contained data. UTIs are case-insensitive.

Security Considerations

References and Further Reading

  1. [a] Scott Bradner: Key words for use in RFCs to Indicate Requirement Levels. IETF (Internet Engineering Task Force) RFC 2119, BCP 14. Date Published: .
    In many standards track documents several words are used to signify the requirements in the specification. These words are often capitalized. This document defines these words as they should be interpreted in IETF documents.
    Updated by IETF RFC 8174.
  2. [a] Media Types. Internet Assigned Numbers Authority (IANA).
    This is IANA's official list of registered Internet media types.
  3. [a] [b] Thomas Leonard, David Faure, Alex Larsson, Seth Nickell, Keith Packard, Filip Van Raemdonck, Christos Zoulas, Matthias Clasen, Bastien Nocera: Shared MIME-info Database. FreeDesktop.Org.
  4. [a] Structured Syntax Suffixes. Internet Assigned Numbers Authority (IANA).
    This is IANA's official list of registered Internet media type suffixes.
  5. [a] Nathaniel S. Borenstein: A User Agent Configuration Mechanism For Multimedia Mail Format Information. IETF (Internet Engineering Task Force) RFC 1524. Date Published: .
    This RFC defines the /etc/mailcap file format.
  6. [a] Sarven Capadisli (ed.), Tim Berners-Lee, Henry Story: Web Access Control. Solid.
    Date Created: . Date Published: .
  7. [a] acl(5). Linux Manual.
    This Linux manual page can be accessed either on your local linux system, or online at:

Appendix A. Example

Below is the definition for the text/plain Media Type.

{
  "@context": {
    "@vocab" : "https://ns.sm0tvi.net/fileinfo#"
  },
  "@type" : "MediaType",
  "@id" : "media-type:text/plain",
  "contentType" : { "@id" : "media-type:text/plain" },
  "description" : [  
    {
      "@language" : "en",
      "@value" : "Plain text with no markup and an unspecified encoding."
    }
  ],
  "extensions" : [
    "txt",
    "text",
    "pot",
    "brf",
    "srt"
  ],
  "parameters" : [
    "charset"
  ],
  "title" : [
    {
      "@language" : "en",
      "@value" : "Text file"
    } 
  ],
  "uti" : "public.plain-text"
}

Version History

  1. [] Initial version.