Technical system design refers to the software, hardware, and services used to gather, match, store, share, and visualize information. Historically, there have been two primary models for integrating information in state longitudinal data systems that generated analytical data sets:
- Centralized: Data from participating agencies is consolidated into one database or data warehouse
- Federated: Data from participating agencies is only linked for approved purposes
However, with cloud technology and new tools that better match individuals across data sets, states are developing systems that blend centralized and federated components.
These modern data systems have several advantages for addressing new needs from new user groups. Instead of just documenting progress from K-12 to college, states can now create more public-facing tools like dashboards and add new data sets that help to contextualize postsecondary and workforce outcomes.
The specific technology solutions vary by state, but a modern analytical data system should be able to fulfill most or all of these needs:
- Collect: Securely receive data in a well-defined structure and review submissions
- Store: Securely retain information using a flexible structure
- Manage: Document data formats and characteristics, validate data, and tag information for allowable use
- Master: match records for individuals and link data sets
- Enhance: Summarize information and create calculated metrics
- Analyze: Automate predefined analyses and use machine learning to predict outcomes
- Deliver: Develop public dashboards and tables using aggregated data and allow for sensitive information to be securely shared and monitored with authorized entities
Core components of modern data infrastructure
To choose the right technology solutions, you’ll want to understand both the core components of modern data systems and the key design decisions you will need to make as you review your options.
This resource can help you better understand the types of tools that are available to provide more flexible, secure, and open access to information and identify key design questions to explore.
Lay out business requirements in laypersons’ terms
Because California had made multiple attempts to create linked data sets over the last few decades, some stakeholders were approaching the design of a statewide data system with an older technical framework in mind.
For example, many perceived the project as an effort to create a single database that would live on a server in a physical office, or assumed that the existing statewide K-12 student identifier should be adopted by all state agencies to match people across data sets. When the planning process began generating recommendations that did not align with these expectations, various people involved in the planning were concerned that the data system would not address state priorities or was becoming needlessly complicated.
The planning team used several strategies to bring nontechnical audiences up to speed and to clarify why the recommendations aligned with state priorities. In some cases, this meant describing technical processes in plain language. For example, the technical term for matching individuals across data sets–to make sure that records for the same person from different agencies are linked–is “master data management,” which is often shortened to “MDM.”
Rather than using acronyms, the planning team explained to stakeholders that records could be matched by comparing names, birth dates, gender, race, social security numbers, and other data elements that might be collected by multiple agencies (such as high school attended). The planning team also emphasized that using multiple variables as part of master data management creates more accurate matches and better protects individual privacy because information is not being shared using a single factor, like an educational identification number, that could be easily associated with a specific person. Finally, using multiple variables allows the system to include important data that is collected without educational identification numbers attached, such as for people receiving food assistance or participating in a job retraining program offered by a workforce investment board.
When explaining why some social service and workforce training data should be included from the outset, rather than just focusing on linking the easiest data sets, the planning team turned to a home improvement analogy. Imagine a family that wants to make their backyard more usable during the first year of COVID pandemic. If finances are tight, it might be tempting to lay a concrete patio before putting in a gas line to a spot further out in the yard for a future BBQ pit. However, if the piping isn’t laid at the outset — even if it won’t be used right away — it will be more difficult to create the BBQ at a later date without digging up and redoing the patio. The same would be true for integrating educational, social service, and workforce data. If the legal frameworks and aligned data definitions were not established at the outset, it would be more challenging to add these in future years.
Process for modernizing an existing data infrastructure
Access resources that your state can use to better understand the technological changes that are necessary to shift from legacy systems and gather information to support the planning process.
- State Data Modernization Playbook
- State Data Modernization Playbook Interview Guide
- State Data Modernization Playbook Data Request Tracker
You can use these templates to get more granular about the technical components of your planning process.
Identify data structures that are more useful to students and educators
Because state data systems have traditionally focused on answering questions posed by researchers and policymakers, they tend to be retrospective — allowing experts to look at trends over time for groups of people, but often with a lag of a year or more between when data were originally collected and when they become available in the data system.
However, conversations with students, parents, teachers, and counselors reveal the need for a very different type of information. They are more likely to want current information at an individual level. While it is helpful to know that there were 43 foster youth admitted to your college last year, it is more useful to know if those foster youth are in your classroom in the current term.
In California, many of the user stories (see the Purpose and Vision section) focused on the need to improve access to tools that support:
- Planning: for college, career and financial aid
- Applications: streamlined college and financial aid applications
- Monitoring: whether specific students are eligible for four-year colleges and have completed their applications
- Transcripts: electronic transcripts that include both academic and workforce training information
- Supports: opportunities to apply for social services when applying to college
Because the authorizing legislation emphasized using existing infrastructure rather than starting from scratch, California elected to scale several existing state-funded projects that addressed these needs. By paying for a single suite of resources for all students, California can both shift the burden of funding these tools from local schools and offer consistent tools to all students that can be supported more efficiently.
Data tools for students, families, and educators
See examples of college, career, and financial aid planning, eligibility tools, and electronic transcripts that are being integrated into California’s data system.
You can review the free tools to assess whether something similar would be useful for your state.