Skip to contents

Overview

The boilerplate package supports JSON format for all database operations. This document describes the JSON schema structure for different database types.

Unified Database Schema

The unified database combines all categories into a single JSON file:

{
  "methods": { ... },
  "measures": { ... },
  "results": { ... },
  "discussion": { ... },
  "appendix": { ... },
  "template": { ... }
}

Methods Database Schema

Methods entries contain standardised text with template variables.

Basic Structure

{
  "category": {
    "subcategory": {
      "entry_name": {
        "text": "Method description with {{variables}}",
        "reference": "@citation2023",
        "keywords": ["keyword1", "keyword2"]
      }
    }
  }
}

Entry Variants

Methods can have multiple text variants:

{
  "statistical": {
    "regression": {
      "linear": {
        "default": "We used linear regression to analyse {{outcome}}.",
        "large": "We employed ordinary least squares linear regression to examine the relationship between {{predictors}} and {{outcome}}. Model assumptions were checked including...",
        "brief": "Linear regression was used."
      }
    }
  }
}

Fields

  • text or default: Main content (required)
  • large: Extended version (optional)
  • brief: Condensed version (optional)
  • reference: Citation in @key format (optional)
  • keywords: Array of searchable terms (optional)
  • **_meta**: Metadata object (optional)

Measures Database Schema

Measures entries describe variables and instruments used in research.

Basic Structure

{
  "category": {
    "measure_name": {
      "name": "unique_identifier",
      "description": "Detailed description of the measure",
      "type": "continuous|categorical|ordinal|binary",
      "additional_fields": "..."
    }
  }
}

Complete Example

{
  "psychological": {
    "anxiety": {
      "gad7": {
        "name": "gad7",
        "description": "Generalised Anxiety Disorder 7-item scale",
        "type": "ordinal",
        "items": 7,
        "range": [0, 21],
        "values": [0, 1, 2, 3],
        "value_labels": ["Not at all", "Several days", "More than half the days", "Nearly every day"],
        "cutoffs": {
          "mild": 5,
          "moderate": 10,
          "severe": 15
        },
        "reference": "@spitzer2006brief",
        "keywords": ["anxiety", "screening", "GAD-7"],
        "scoring": {
          "type": "sum",
          "interpretation": {
            "0-4": "Minimal anxiety",
            "5-9": "Mild anxiety",
            "10-14": "Moderate anxiety",
            "15-21": "Severe anxiety"
          }
        }
      }
    }
  }
}

Required Fields

  • name: Unique identifier (string, alphanumeric + underscore)
  • description: Full description (string, min 10 characters)
  • type: One of: “continuous”, “categorical”, “ordinal”, “binary”

Optional Fields

For All Types

  • reference: Citation (string)
  • keywords: Search terms (array of strings)
  • waves: Data collection waves (array of integers)
  • unit: Unit of measurement (string)

For Categorical/Ordinal

  • values: Possible values (array)
  • value_labels: Labels for values (array of strings)

For Continuous

  • range: [min, max] values (array of 2 numbers)

For Scales

  • items: Number of items (integer)
  • scoring: Scoring method object
  • subscales: Subscale definitions object
  • cutoffs: Clinical cutoffs object

Results Database Schema

Results entries follow the same pattern as methods:

{
  "descriptive": {
    "demographics": {
      "age": {
        "text": "The mean age was {{mean_age}} years (SD = {{sd_age}}).",
        "reference": "@reporting2023"
      }
    }
  }
}

Template Database Schema

Template variables for substitution:

{
  "global": {
    "n": 100,
    "study_name": "Example Study",
    "year": 2024
  },
  "methods": {
    "software": "R version 4.3.0",
    "alpha": 0.05
  },
  "measures": {
    "wave1_date": "January 2024",
    "wave2_date": "June 2024"
  }
}

Variable Scoping

  • global: Available to all sections
  • [section]: Override globals for specific section

Schema Validation

JSON Schema Files

Located in inst/examples/json-poc/schema/: - measures_schema.json: Formal schema for measures - methods_schema.json: Formal schema for methods

Validation in R

# Validate a JSON database
boilerplate::validate_json_database(
  json_file = "my_database.json",
  schema_file = "measures_schema.json"
)

Common Validation Errors

  1. Missing required fields

    {
      "measure1": {
        "description": "Missing 'name' and 'type' fields"
      }
    }
  2. Invalid type values

    {
      "measure1": {
        "name": "m1",
        "description": "Invalid type",
        "type": "numeric"  // Should be "continuous"
      }
    }
  3. Mismatched arrays

    {
      "measure1": {
        "values": [1, 2, 3],
        "value_labels": ["Low", "High"]  // Should have 3 labels
      }
    }

Migration from RDS

Converting RDS to JSON

# Single category
boilerplate_rds_to_json(
  rds_file = "measures_db.rds",
  json_file = "measures_db.json"
)

# Unified database
boilerplate_migrate_to_json(
  rds_file = "boilerplate_unified.rds",
  output_dir = "data/json/"
)

Format Differences

RDS Format

  • Binary R object
  • Preserves all R data types
  • Not human-readable
  • Platform-specific

JSON Format

  • Text-based
  • Limited data types
  • Human-readable
  • Cross-platform

Handling Special Cases

  1. NULL values: Removed in JSON
  2. Factors: Converted to character
  3. Dates: Stored as ISO 8601 strings
  4. Attributes: Stored in _meta fields

Best Practices

File Organisation

project/
├── data/
│   ├── boilerplate_unified.json    # Single unified file
│   └── categories/                  # Or separate files
│       ├── methods.json
│       ├── measures.json
│       └── results.json

Naming Conventions

  1. Keys: Use lowercase with underscores
  2. Categories: Descriptive, hierarchical
  3. Measures: Include instrument abbreviation

Version Control

JSON files work well with git: - Human-readable diffs - Easy conflict resolution - Track changes over time

Performance Considerations

  1. File Size: JSON files are larger than RDS
  2. Parse Time: Slightly slower than RDS
  3. Recommendation: Use unified format for <1000 entries

Examples

Creating a New Measures Entry

{
  "demographic": {
    "age": {
      "name": "age",
      "description": "Participant age at time of assessment",
      "type": "continuous",
      "unit": "years",
      "range": [18, 100]
    }
  }
}

Adding a Methods Entry with Variants

{
  "sampling": {
    "random": {
      "default": "Participants were randomly selected from {{population}}.",
      "large": "We employed a stratified random sampling approach. The {{population}} was first divided into {{strata}} strata based on {{stratification_var}}. Within each stratum, participants were randomly selected using a random number generator with seed {{seed}} for reproducibility.",
      "brief": "Random sampling was used.",
      "reference": "@cochran1977sampling"
    }
  }
}

Template Variables with Overrides

{
  "global": {
    "software": "R",
    "version": "4.3.0"
  },
  "methods": {
    "software": "R version 4.3.0 with lme4 package"
  }
}

In this example, methods sections will use the more specific software description, while other sections use the global version.