Syllabus Point
- Investigate the effect of big data on web architecture
Including:
- data mining
- metadata
- streaming service management
Big data encompasses volume, velocity, variety, and veracity. Its impact on web architecture includes the need for scalable solutions, data mining capabilities, metadata management, and streaming services that can process data efficiently in real-time.
What is big data?
Big data refers to extremely large and complex datasets that can't be easily managed, processed or analysed using traditional data processing tools or methods.
'Big' = Not just the volume of data, but also its velocity, variety, veracity and value.
- Volume: Refers to the large amounts of data generated every second
- Variety: Refers to different types and formats of data collected - organised in a mix of structured, semi-structured and unstructured data
- Velocity: The speed at which data is generated and processed - data is generated continuously and needs to be analysed quickly to provide insights and make changes
- Veracity: The reliability, quality, and trustworthiness of data being collected - with so much data from many sources, not all is accurate, consistent or complete
Big data represents the convergence of large volumes of data, high velocity of data generation, diverse data types, varying data quality, and the potential to derive value from data analytics. It poses both challenges and opportunities for organisations seeking to harness the power of data to gain competitive advantage and drive business success.
What is web architecture?
Web architecture is the structure under which the information or contents of a web page are organised, ordered and classified.
Key components
- Client side components
- Web browsers
- Client-side scripting languages (JS)
- User interfaces
- Server-side components
- Web servers
- Application servers - middleware that provides runtime environments
- Databases
- Networking infrastructure
- Internet protocols
- DNS
- CDNs
- Data formats and protocols
- HTML, CSS, JS
- Representational state transfer (REST), simple object access protocol (SOAP) - protocols for exchanging data between web services and clients
- XML, JSON - data formats for transmitting structured data
- Security mechanisms
- SSL/TLS - used for encryption
- Authentication and authorisation mechanisms
- Firewalls, intrusion detection systems - security measures that protect web applications from unauthorised access, attacks and data breaches
Nature of data
Data handling is the process of obtaining, processing, analysing and managing data.
- Collecting, manipulating, transforming
- Dynamic and procedural - focuses on 'how'
Data storage is how data is saved and retained over time.
- Focuses on 'where' - where is resides, how its organised, how it can be retrieved
Impact on web architecture
Web applications need to scale and adapt to handle big data efficiently. This leads to:
- Cloud-based architectures (AWS, Google Cloud)
- Distributed databases (Apache)
- AI-driven analytics for processing data efficiently
Data mining
Data mining is the process of analysing large datasets to find patterns, trends, and relationships. It helps businesses make decisions, improve user experiences and optimise web services.
- Used in advertising, fraud detection and recommendation systems
- Example: ecommerce sites recommend products based on data, social media suggests targeted content and ads, search engines rank web pages based on search
Security implications
Hackers can use data mining to identify security vulnerabilities or predict user behavior for phishing attacks.
How it works
- Data collection and cleaning
- Algorithms analyse data to find trends and improve business strategies
Effect on web architecture
- Web application need scalable databases for increased storage needs
- High performance servers and distributed computing
- Businesses use streaming data pipelines for real-time processing
Example: use in streaming services.
- Association algorithms
- Classification
- Clustering (group users based on shared habits)
- Predictive analysis
Metadata
Metadata is data that describes other data - it provides context, meaning and structure, making it easier for web applications to store, search and retrieve information.
- Timestamps, geolocation, file properties
Privacy concern
Websites and companies collect metadata to track users, often without their explicit consent.
Types
- Descriptive: Title, author, tags
- Structural: Defines how data is organised, eg HTML tags
- Administrative: Information about access rights, file creation date, etc
Effect on web architecture
- Improved search and retrieval, as search engines use metadata to index and organise content and rank
- Cloud based architectures store metadata for large scale databases (AWS, Google Cloud)
- Needs to be designed with scalability in mind
Streaming service management
Streaming services deliver large amounts of continuous data over the internet in real time.
- Live streaming, on-demand, data streaming
- Uses CDNs to distribute content efficiently
- Requires load balancing to handle millions of concurrent users
Security concerns
Cyberattacks like DoS can target streaming services, overwhelming servers with traffic and causing service outages.
Challenges
- High bandwidth usage and latency issues in order to provide smooth playback without buffering
- Scalability to handle millions of users simultaneously
Effect on web architecture
- Adaptive Bitrate Streaming (ABR) adjusts the video quality based on internet speed
- Data compression algorithms reduce file sizes (e.g. H.264)
- Traffic needs to be distributed efficiently to prevent overloading servers
- Rely on caching, buffering, CDNs (store copies of content closer to users to reduce latency)
- Storing frequently accessed data temporarily
- A portion of data is preloaded before playback
Data compression and encoding
- Reduce the size of files without sacrificing too much quality
- Compression reduces amount of data needed to represent a file
Real time data streaming involves processing and analysing data as it is generated, without storing it first.
- Often associated with high velocity data
- Example: financial trading systems, where stock prices and trading information needs to be processed instantly
Related Resources
Keep Progressing
Use the lesson navigation below to move through the module sequence.