---
title: "Getting started with psgc"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Getting started with psgc}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment  = "#>"
)
library(psgc)
```

## What is the PSGC?

The **Philippine Standard Geographic Code (PSGC)** is the official list of
every geographic area in the Philippines — from the broadest (regions) down to
the most granular (barangays). It is published and maintained by the
**Philippine Statistics Authority (PSA)**.

Each area is identified by a unique **10-digit code** and a geographic level:

| Level | Description | Example |
|---|---|---|
| `Reg` | Region | Region I – Ilocos Region |
| `Prov` | Province | Ilocos Norte |
| `City` | City | Laoag City |
| `Mun` | Municipality | Bacarra |
| `SubMun` | Sub-municipality | (Metro Manila component cities) |
| `Bgy` | Barangay | Brgy. 1, Laoag City |

The PSA releases updated PSGC files several times a year as new cities are
chartered, barangays are created, or codes are renumbered. This package bundles
**12 releases** from Q1 2023 through Q1 2026.

---

## Checking available releases

```{r releases}
list_releases()
latest_release()
```

By default, every function in this package uses the latest release. You can
always pass a specific release name to work with older data.

---

## Getting the full PSGC list

`get_psgc()` returns the complete list of geographic areas for a given release.

```{r get-psgc}
ph <- get_psgc()
nrow(ph)
head(ph)
```

### Filter by geographic level

You do not need to remember the exact code names — plain English works too:

```{r filter-region}
regions <- get_psgc(geographic_level = "Region")
regions[, c("psgc_code", "area_name")]
```

```{r filter-province}
provinces <- get_psgc(geographic_level = "Province")
nrow(provinces)
head(provinces[, c("psgc_code", "area_name")])
```

You can filter for multiple levels at once by passing a vector:

```{r filter-multiple}
city_mun <- get_psgc(geographic_level = c("City", "Municipality"))
nrow(city_mun)
```

There is also a convenient shorthand, `"city_mun"`, that does the same thing:

```{r filter-city-mun}
nrow(get_psgc(geographic_level = "city_mun"))
```

### Using a specific release

```{r older-release}
ph_2023 <- get_psgc("Q1_2023")
nrow(ph_2023)
```

---

## Looking up a specific code

If you already have a PSGC code and want its details, use `psgc_info()`.

```{r psgc-info}
psgc_info("0100000000") # Region I
```

You can look up multiple codes at once:

```{r psgc-info-multi}
psgc_info(c("0100000000", "0102800000"))
```

**Short codes are accepted** — the package pads the rest with trailing zeros,
so you only need to provide enough digits to identify the area:

```{r psgc-info-short}
psgc_info("01")      # same as "0100000000" — Region I
psgc_info("01028")  # same as "0102800000" — Ilocos Norte
```

---

## Population data

`get_population()` returns PSA census figures (2015, 2020, 2024) for all
geographic areas in a release.

```{r population-basic}
pop <- get_population()
head(pop)
```

### Add area names and geographic levels

Set `details = TRUE` to include the area name and level alongside the numbers:

```{r population-details}
pop_detailed <- get_population(details = TRUE)
head(pop_detailed)
```

### Filter by geographic level

Same aliases as `get_psgc()` work here too:

```{r population-filter}
region_pop <- get_population(geographic_level = "Region", details = TRUE)
region_pop
```

### Wide format — one row per area

Set `wide = TRUE` to get each census year as its own column, making it easy to
compare figures side by side or feed into a table or chart:

```{r population-wide}
region_pop_wide <- get_population(
  geographic_level = "Region",
  details          = TRUE,
  wide             = TRUE
)
region_pop_wide
```

### Attach population data to the PSGC list

If you want population figures alongside the main PSGC table (rather than as a
separate data frame), use `include_population_data = TRUE` in `get_psgc()`.
This adds a `population_data` list-column — each cell is a small data frame
with `population` and `year`:

```{r psgc-pop-nested}
regions_with_pop <- get_psgc(
  geographic_level       = "Region",
  include_population_data = TRUE
)

# Inspect the population data for the first region
regions_with_pop$population_data[[1]]
```

---

## Tracking codes across releases

The PSA occasionally renumbers or abolishes areas between releases. `map_psgc()`
traces a code forward to any later release so you can keep longitudinal datasets
consistent.

```{r map-psgc}
map_psgc("0100000000")  # forward to the latest release
```

```{r map-psgc-target}
map_psgc("0100000000", to = "Q4_2023")
```

The `mapping_type` column tells you what happened to the code:

| Type | Meaning |
|---|---|
| `direct` | Code is unchanged |
| `renumbered` | Code was assigned a new number |
| `split` | One area was divided into multiple areas |
| `merged` | Multiple areas were merged into one |
| `abolished` | Area no longer exists (`new_code` will be `NA`) |

This is especially useful when joining PSGC-coded survey data from different
years — use `map_psgc()` first to normalise all codes to a single release before
merging.
