JSON Schema Documentation • boilerplate

Overview

The boilerplate package supports JSON format for all database operations. This document describes the JSON schema structure for different database types.

Unified Database Schema

The unified database combines all categories into a single JSON file:

{
  "methods": { ... },
  "measures": { ... },
  "results": { ... },
  "discussion": { ... },
  "appendix": { ... },
  "template": { ... }
}

Methods Database Schema

Methods entries contain standardised text with template variables.

Basic Structure

{
  "category": {
    "subcategory": {
      "entry_name": {
        "text": "Method description with {{variables}}",
        "reference": "@citation2023",
        "keywords": ["keyword1", "keyword2"]
      }
    }
  }
}

Entry Variants

Methods can have multiple text variants:

{
  "statistical": {
    "regression": {
      "linear": {
        "default": "We used linear regression to analyse {{outcome}}.",
        "large": "We employed ordinary least squares linear regression to examine the relationship between {{predictors}} and {{outcome}}. Model assumptions were checked including...",
        "brief": "Linear regression was used."
      }
    }
  }
}

Fields

text or default: Main content (required)
large: Extended version (optional)
brief: Condensed version (optional)
reference: Citation in @key format (optional)
keywords: Array of searchable terms (optional)
**_meta**: Metadata object (optional)

Measures Database Schema

Measures entries describe variables and instruments used in research.

Basic Structure

{
  "category": {
    "measure_name": {
      "name": "unique_identifier",
      "description": "Detailed description of the measure",
      "type": "continuous|categorical|ordinal|binary",
      "additional_fields": "..."
    }
  }
}

Complete Example

{
  "psychological": {
    "anxiety": {
      "gad7": {
        "name": "gad7",
        "description": "Generalised Anxiety Disorder 7-item scale",
        "type": "ordinal",
        "items": 7,
        "range": [0, 21],
        "values": [0, 1, 2, 3],
        "value_labels": ["Not at all", "Several days", "More than half the days", "Nearly every day"],
        "cutoffs": {
          "mild": 5,
          "moderate": 10,
          "severe": 15
        },
        "reference": "@spitzer2006brief",
        "keywords": ["anxiety", "screening", "GAD-7"],
        "scoring": {
          "type": "sum",
          "interpretation": {
            "0-4": "Minimal anxiety",
            "5-9": "Mild anxiety",
            "10-14": "Moderate anxiety",
            "15-21": "Severe anxiety"
          }
        }
      }
    }
  }
}

Required Fields

name: Unique identifier (string, alphanumeric + underscore)
description: Full description (string, min 10 characters)
type: One of: “continuous”, “categorical”, “ordinal”, “binary”

Optional Fields

For All Types

reference: Citation (string)
keywords: Search terms (array of strings)
waves: Data collection waves (array of integers)
unit: Unit of measurement (string)

For Categorical/Ordinal

values: Possible values (array)
value_labels: Labels for values (array of strings)

For Continuous

range: [min, max] values (array of 2 numbers)

For Scales

items: Number of items (integer)
scoring: Scoring method object
subscales: Subscale definitions object
cutoffs: Clinical cutoffs object

Results Database Schema

Results entries follow the same pattern as methods:

{
  "descriptive": {
    "demographics": {
      "age": {
        "text": "The mean age was {{mean_age}} years (SD = {{sd_age}}).",
        "reference": "@reporting2023"
      }
    }
  }
}

Template Database Schema

Template variables for substitution:

{
  "global": {
    "n": 100,
    "study_name": "Example Study",
    "year": 2024
  },
  "methods": {
    "software": "R version 4.3.0",
    "alpha": 0.05
  },
  "measures": {
    "wave1_date": "January 2024",
    "wave2_date": "June 2024"
  }
}

Variable Scoping

global: Available to all sections
[section]: Override globals for specific section

Schema Validation

JSON Schema Files

Located in inst/examples/json-poc/schema/: - measures_schema.json: Formal schema for measures - methods_schema.json: Formal schema for methods

Validation in R

# Validate a JSON database
boilerplate::validate_json_database(
  json_file = "my_database.json",
  schema_file = "measures_schema.json"
)

Common Validation Errors

Missing required fields

{
  "measure1": {
    "description": "Missing 'name' and 'type' fields"
  }
}

Invalid type values

{
  "measure1": {
    "name": "m1",
    "description": "Invalid type",
    "type": "numeric"  // Should be "continuous"
  }
}

Mismatched arrays

{
  "measure1": {
    "values": [1, 2, 3],
    "value_labels": ["Low", "High"]  // Should have 3 labels
  }
}

Migration from RDS

Converting RDS to JSON

# Single category
boilerplate_rds_to_json(
  rds_file = "measures_db.rds",
  json_file = "measures_db.json"
)

# Unified database
boilerplate_migrate_to_json(
  rds_file = "boilerplate_unified.rds",
  output_dir = "data/json/"
)

Format Differences

RDS Format

Binary R object
Preserves all R data types
Not human-readable
Platform-specific

JSON Format

Text-based
Limited data types
Human-readable
Cross-platform

Handling Special Cases

NULL values: Removed in JSON
Factors: Converted to character
Dates: Stored as ISO 8601 strings
Attributes: Stored in _meta fields

Best Practices

File Organisation

project/
├── data/
│   ├── boilerplate_unified.json    # Single unified file
│   └── categories/                  # Or separate files
│       ├── methods.json
│       ├── measures.json
│       └── results.json

Naming Conventions

Keys: Use lowercase with underscores
Categories: Descriptive, hierarchical
Measures: Include instrument abbreviation

Version Control

JSON files work well with git: - Human-readable diffs - Easy conflict resolution - Track changes over time

Performance Considerations

File Size: JSON files are larger than RDS
Parse Time: Slightly slower than RDS
Recommendation: Use unified format for <1000 entries

Examples

Creating a New Measures Entry

{
  "demographic": {
    "age": {
      "name": "age",
      "description": "Participant age at time of assessment",
      "type": "continuous",
      "unit": "years",
      "range": [18, 100]
    }
  }
}

Adding a Methods Entry with Variants

{
  "sampling": {
    "random": {
      "default": "Participants were randomly selected from {{population}}.",
      "large": "We employed a stratified random sampling approach. The {{population}} was first divided into {{strata}} strata based on {{stratification_var}}. Within each stratum, participants were randomly selected using a random number generator with seed {{seed}} for reproducibility.",
      "brief": "Random sampling was used.",
      "reference": "@cochran1977sampling"
    }
  }
}

Template Variables with Overrides

{
  "global": {
    "software": "R",
    "version": "4.3.0"
  },
  "methods": {
    "software": "R version 4.3.0 with lme4 package"
  }
}

In this example, methods sections will use the more specific software description, while other sections use the global version.