Data Handling

This tutorial describes how to start working with the MPX output data format. The primary data output file of pixelator is the PXL file. Here, you will learn how to read and interact with it. We will go through the different components of a single sample PXL file and also how to aggregate results from multiple samples. You can read more about the PXL file format here.

After completing this tutorial, you will be able to:

Load PXL files in R and access the multi-modal data contents including protein counts, metadata and spatial scores.
Understand and work with spatial polarity (protein clustering patterns).
Interpret and work with colocalization scores (protein co-clustering).
Directly access the spatial graph structure through the edge list.
Aggregate multiple PXL files into an integrated data object with sample identities.
Save aggregated data objects for later reuse in analyses.

Setup

To start with, we need to load some packages and functions that we will need.

library(pixelatorR)
library(SeuratObject)
library(dplyr)
library(stringr)
library(here)
library(tibble)

Loading data

We begin by locating the output from the nf-core-pixelator pipeline. The input for downstream analysis is the PXL file contained in the layout directory.

DATA_DIR <- "path/to/local/folder"
Sys.setenv("DATA_DIR" = DATA_DIR)

Run the following code in the terminal to download the dataset.

BASEURL="https://pixelgen-technologies-datasets.s3.eu-north-1.amazonaws.com/mpx-datasets/pixelator/next/technote-v1-vs-v2-immunology-II"

curl -L -O -C -  --create-dirs --output-dir $DATA_DIR "${BASEURL}/Sample05_V2_PBMC_r1.layout.dataset.pxl"

And than proceed in R.

# The folders the data is located in:
filename <- file.path(DATA_DIR, "Sample05_V2_PBMC_r1.layout.dataset.pxl")
pg_data <- ReadMPX_Seurat(filename)

Component meta data

In addition to the counts data, the object contains meta data of each MPX graph component, each corresponding to a cell. This table contains information that can be useful in quality control, with information such as how many protein molecules were detected (molecules), the sequencing depth (reads), and the graph connectivity of each component (mean_b_pixels_per_a_pixel).

as_tibble(pg_data[[]], rownames = "component")

# A tibble: 1,125 × 19
   component     a_pixel_b_pixel_ratio a_pixels antibodies b_pixels leiden
   <chr>                         <dbl>    <int>      <int>    <int> <fct> 
 1 RCVCMP0000000                  1.80     3749         79     2084 0     
 2 RCVCMP0000001                  2.33     3338         79     1431 1     
 3 RCVCMP0000002                  1.81     2724         78     1503 0     
 4 RCVCMP0000004                  1.92     2159         79     1126 2     
 5 RCVCMP0000005                  1.81     6413         79     3545 3     
 6 RCVCMP0000007                  2.34     2953         78     1264 4     
 7 RCVCMP0000009                  3.42     1382         76      404 9     
 8 RCVCMP0000011                  2.65     6791         79     2564 3     
 9 RCVCMP0000014                  2.11     5128         79     2436 6     
10 RCVCMP0000016                  2.09     2432         77     1161 13    
# ℹ 1,115 more rows
# ℹ 13 more variables: mean_a_pixels_per_b_pixel <dbl>,
#   mean_b_pixels_per_a_pixel <dbl>, mean_molecules_per_a_pixel <dbl>,
#   mean_reads_per_molecule <dbl>, median_a_pixels_per_b_pixel <dbl>,
#   median_b_pixels_per_a_pixel <dbl>, median_molecules_per_a_pixel <dbl>,
#   median_reads_per_molecule <dbl>, molecules <int>, pixels <int>,
#   reads <int>, tau <dbl>, tau_type <chr>

The antibody count matrix can be accessed from the Seurat object using the LayerData method.

# Display the 10 first rows and 10 first columns of the antibody counts
LayerData(pg_data, layer = "counts")[1:10, 1:10]

10 x 10 sparse Matrix of class "dgCMatrix"

                                                      
B2M   4690 3379 3070 2014 6731 5090 106 3926 5993 3225
CD102  349   94   39   26  227  155  32  110  154   58
CD11a  509  262  202   63  240  366  65  262  495  140
CD11b   81  156   58   92  185  102  88  416  314   20
CD11c  679   50   35   18  112   28  51  168  757   30
CD123    2   15    5    1    5    4   .   15   19    4
CD14    50   78   42   19   79   26  40  178  613   16
CD150   23  105   30   27  134   14  53  155   22   48
CD152   13   21   17   14   66   12  35  108   25   11
CD154   31   34   36   16   60   22  59  137   34   23

Spatial data

In addition to count data corresponding to protein abundance, MPX also generates spatial data which be used to visualize individual cells, and to find spatial structures of proteins across cells. The PXL file comes with MPX polarity scores, describing the degree of spatial clustering of each protein on each cell, and MPX colocalization scores, which describe the degree of spatial colocalization of pairs of proteins on each cell. If you want to read more about spatial metrics in MPX data, click here.

pixelatorR Assay classes

SeuratObject provides the Assay class and more recently the Assay5 class to store single-cell abundance data in. PixelatorR provides additional classes CellGraphAssay and CellGraphAssay5 to store both abundance data and spatial metrics in. These classes inherit the corresponding SeuratObject classes but also contain additional slots for the spatial metrics. Here for instance, pg_data contains a CellGraphAssay5 object called “mpxCells” which can be accessed as follows:

pg_data[["mpxCells"]]

CellGraphAssay (v5) data with 84 features for 1125 cells
Top 10 variable features:
 B2M, CD102, CD11a, CD11b, CD11c, CD123, CD14, CD150, CD152, CD154 
Layers:
 counts 
Loaded CellGraph objects:
 0

In the output above, we can confirm that the object is of class CellGraphAssay5 and that it currently contains 0 loaded CellGraph objects. More about that later.

MPX Polarity scores

The spatial metrics can be accessed from the CellGraphAssay5 object with:

PolarizationScores(pg_data)

# A tibble: 84,022 × 6
   marker  morans_i morans_z morans_p_value morans_p_adjusted component    
   <chr>      <dbl>    <dbl>          <dbl>             <dbl> <chr>        
ACTB   -0.00827   -1.42          0.0774             0.300  RCVCMP0000000
B2M    -0.00589   -1.77          0.0380             0.206  RCVCMP0000000
CD102  -0.00151   -0.0780        0.469              0.494  RCVCMP0000000
CD11a   0.00856    1.46          0.0721             0.290  RCVCMP0000000
CD11b   0.00410    1.01          0.156              0.393  RCVCMP0000000
CD11c   0.00319    0.851         0.197              0.417  RCVCMP0000000
CD14    0.0131     2.89          0.00190            0.0221 RCVCMP0000000
CD150  -0.00459   -0.712         0.238              0.435  RCVCMP0000000
CD152   0.0100     0.998         0.159              0.395  RCVCMP0000000
CD154   0.000512   0.328         0.372              0.473  RCVCMP0000000
# ℹ 84,012 more rows

MPX colocalization scores

Similarly, pixelatorR provides a method to access the colocalization scores:

ColocalizationScores(pg_data)

# A tibble: 3,111,941 × 15
   marker_1 marker_2  pearson pearson_mean pearson_stdev pearson_z
   <chr>    <chr>       <dbl>        <dbl>         <dbl>     <dbl>
 1 ACTB     B2M      -0.00456    -0.0127          0.0259    0.316 
 2 ACTB     CD102    -0.139      -0.00133         0.0222   -6.21  
 3 B2M      CD102    -0.207      -0.0403          0.0273   -6.11  
 4 ACTB     CD11a    -0.0803     -0.00523         0.0247   -3.04  
 5 B2M      CD11a     0.0335     -0.0453          0.0194    4.06  
 6 CD102    CD11a    -0.147      -0.0179          0.0230   -5.64  
 7 ACTB     CD11b    -0.00205    -0.000778        0.0240   -0.0528
 8 B2M      CD11b    -0.0842     -0.0187          0.0217   -3.03  
 9 CD102    CD11b    -0.00990    -0.00598         0.0266   -0.147 
10 CD11a    CD11b     0.0690     -0.00677         0.0209    3.63  
# ℹ 3,111,931 more rows
# ℹ 9 more variables: pearson_p_value <dbl>, pearson_p_value_adjusted <dbl>,
#   jaccard <dbl>, jaccard_mean <dbl>, jaccard_stdev <dbl>, jaccard_z <dbl>,
#   jaccard_p_value <dbl>, jaccard_p_value_adjusted <dbl>, component <chr>

Edgelist

The edgelist is a list of uniquely identified antibodies per each cell, and simultaneously constitutes the bipartite graph which makes up an MPX experiment. Each row contains an edge (a uniquely identified antibody) which is identified by a UMI, UPIA, and a UPIB. The UPIA is the unique identifier for the specific A pixel, while the UPIB is the unique identifier for the B pixel. Both A pixels and B pixels may contain multiple edges, resulting in a spatial graph which can be used to infer spatial relationships between antibody molecules, to calculate spatial statistics, and to visualize the cell.

edge_list <- 
  ReadMPX_edgelist(filename)
edge_list

# A tibble: 26,003,773 × 8
   upia       upib  umi   marker sequence count unique_molecules_count component
   <fct>      <fct> <fct> <fct>  <fct>    <int>                  <int> <fct>    
CGGGTTAGT… ATTG… CGTC… CD19   ACCAACTT    15                      1 RCVCMP00…
CGTGATCAG… GCAG… GAAG… CD19   ACCAACTT    15                      1 RCVCMP00…
CTTTTAGTG… CGTT… TGCG… CD19   ACCAACTT    15                      1 RCVCMP00…
ACCCAGTTC… GATG… GAGT… CD19   ACCAACTT    14                      1 RCVCMP00…
TTGGTGTAC… AGCG… TAGT… CD19   ACCAACTT    14                      1 RCVCMP00…
GTGAAAAAT… ATTA… GTGC… CD19   ACCAACTT    13                      1 RCVCMP00…
TATCCCAGC… AGTC… TGTC… CD19   ACCAACTT    13                      1 RCVCMP00…
TTGATTGTT… CTAG… CCCG… CD19   ACCAACTT    13                      1 RCVCMP00…
AATCGATAG… TGGC… ATGG… CD19   ACCAACTT    12                      1 RCVCMP00…
TTACTTGTT… CATA… CATC… CD19   ACCAACTT    12                      1 RCVCMP00…
# ℹ 26,003,763 more rows

Seurat objects and PXL files

For some analytical tasks, we will need access to the edgelist. For instance, if we want to load component graphs for visualization, we need access to the edgelist to build these graphs. However, the edgelist is not stored in the Seurat object, it’s only available in the PXL file. The CellGraphAssay5 that was mentioned earlier in this tutorial keeps the path to the PXL file so that we can access it when needed. This has the advantage that we can keep the size of the Seurat object relatively small until we decide to load the graphs. On the down side, it also means that the path to the PXL file need to remain unchanged. If the PXL file is moved or renamed, we can no longer access it.

This has an important implication when saving and sharing Seurat objects containing MPX data. If you want to export a Seurat object as an RDS file and share it with someone else, you should also share associated the PXL file. In addition, the path to the PXL file has to be updated in the Seurat object.

If this is unclear at the moment, don’t worry. We will come back to this topic in the tutorials about 2D and 3D cell visualization.

Aggregating data

For cases when we want to process and analyze samples in parallel, it is convenient to merge them into a single object. Here, we read multiple files and aggregate them into a single object. When doing so, the cell ID and sample ID are merged to form the new cell ID in the merged object (e.g. “S1_RCVCMP0000830”). When merging Seurat objects, you can provide add.cell.ids to control the sample prefixes.

In the code chunk below, we will create a merged object from four PXL files. These PXL files represent a resting and PHA stimulated PBMC sample, both in duplicate. We will create a merged object and update the object’s metadata to indicate the condition and replicate number. The resulting object will be explored further in the next tutorial on Quality Control.

BASEURL="https://pixelgen-technologies-datasets.s3.eu-north-1.amazonaws.com/mpx-datasets/pixelator/next/technote-v1-vs-v2-immunology-II"

curl -L -O -C -  --create-dirs --output-dir $DATA_DIR "${BASEURL}/Sample05_V2_PBMC_r1.layout.dataset.pxl"
curl -L -O -C -  --create-dirs --output-dir $DATA_DIR "${BASEURL}/Sample06_V2_PBMC_r2.layout.dataset.pxl"
curl -L -O -C -  --create-dirs --output-dir $DATA_DIR "${BASEURL}/Sample07_V2_PHA_PBMC_r1.layout.dataset.pxl"
curl -L -O -C -  --create-dirs --output-dir $DATA_DIR "${BASEURL}/Sample08_V2_PHA_PBMC_r2.layout.dataset.pxl"

# The folders the data is located in:
data_files <- 
  c(
    # Resting
    resting_r1 = file.path(DATA_DIR, "Sample05_V2_PBMC_r1.layout.dataset.pxl"),
    resting_r2 = file.path(DATA_DIR, "Sample06_V2_PBMC_r2.layout.dataset.pxl"),

    # PHA
    stimulated_r1 = file.path(DATA_DIR, "Sample07_V2_PHA_PBMC_r1.layout.dataset.pxl"),
    stimulated_r2 = file.path(DATA_DIR, "Sample08_V2_PHA_PBMC_r2.layout.dataset.pxl")
    )

# Read PXL files as a list of Seurat objects
obj_list <- lapply(data_files, ReadMPX_Seurat)

# Combine files into a single Seurat object
pg_data_combined <- 
  merge(obj_list[[1]],
        y = obj_list[-1],
        add.cell.ids = names(obj_list)) 

# Add meta data describing the sample and donor origin
pg_data_combined <- 
  AddMetaData(pg_data_combined, 
              metadata = data.frame(id = colnames(pg_data_combined)) %>% 
                mutate(
                        condition = str_remove(id, "_.*"),
                        sample = str_remove(id, ".{17}$"), 
                        replicate = str_extract(id, "_r[0-9]_") %>% str_remove_all("_")
                      ) %>% 
                column_to_rownames("id"))
pg_data_combined

An object of class Seurat 
84 features across 4375 samples within 1 assay 
Active assay: mpxCells (84 features, 84 variable features)
 4 layers present: counts.1, counts.2, counts.3, counts.4

As you can see in the object description above, the count layers of a merged object will be kept separate as individual layers (“counts.1”, “counts.2”, …). For many downstream applications, one needs a singled merged counts matrix. This layer can be created with the following command.

pg_data_combined <- JoinLayers(pg_data_combined)
pg_data_combined

An object of class Seurat 
84 features across 4375 samples within 1 assay 
Active assay: mpxCells (84 features, 84 variable features)
 1 layer present: counts

Save data

After combining the samples, we can save the merged data to an RDS file to use later. Remember that a Seurat object containing MPX data is “connected” to a PXL file, so you need to share the PXL file along with the RDS file to ensure that component graphs can be loaded. If component graphs are not needed for subsequent analysis, the PXL file is no longer needed.

# This is an optional pause step; if you prefer to continue to the next tutorial without saving the object, that will also work.
saveRDS(pg_data_combined, file.path(DATA_DIR, "combined_data.rds"))

We have now seen how to load MPX data, inspect its key properties and prepare the data for integrated analysis across experimental samples. In the next tutorial we will apply these skills and demonstrate how to quality control MPX data.

Data Handling

Setup​

Loading data​

Component meta data​

Spatial data​

pixelatorR Assay classes​

MPX Polarity scores​

MPX colocalization scores​

Edgelist​

Seurat objects and PXL files​

Aggregating data​

Save data​