There exists a peculiar amnesia in software engineering regarding XML. Mention it in most circles and you will receive knowing smiles, dismissive waves, the sort of patronizing acknowledgment reserved for technologies deemed passé. “Oh, XML,” they say, as if the very syllables carry the weight of obsolescence. “We use JSON now. Much cleaner.”

  • AnitaAmandaHuginskis@lemmy.world
    link
    fedilink
    arrow-up
    3
    ·
    edit-2
    45 minutes ago

    I love XML, when it is properly utilized. Which, in most cases, it is not, unfortunately.

    JSON > CSV though, I fucking hate CSV. I do not get the appeal. “It’s easy to handle” – NO, it is not. It’s the “fuck whoever needs to handle this” of file “formats”.

    JSON is a reasonable middle ground, I’ll give you that

  • thingsiplay@lemmy.ml
    link
    fedilink
    arrow-up
    4
    ·
    2 hours ago

    JSON is easier to parse, smaller and lighter on resources. And that is important in the web. And if you take into account all the features XML has, plus the entities it gets big, slow and complicated. Most data does not need to be self descriptive document when transferring through web. Fundementally these languages are two different kind of languages: XML is a general markup language to write documents, while JSON is a generalized data structure with support for various data types supported by programming languages.

  • calliope@retrolemmy.com
    link
    fedilink
    arrow-up
    13
    ·
    edit-2
    4 hours ago

    There exists a peculiar amnesia in software engineering regarding XML

    That’s for sure. But not in the way the author means.

    There exists a pattern in software development where people who weren’t around when the debate was actually happening write another theory-based article rehashing old debates like they’re saying something new. Every ten years or so!

    The amnesia is coming from inside the article.

    [XML] was abandoned because JavaScript won. The browser won.

    This comes across as remarkably naive to me. JavaScript and the browser didn’t “win” in this case.

    JSON is just vastly simpler to read and reason about for every purpose other than configuration files that are being parsed by someone else. Yaml is even more human-readable and easier to parse for most configuration uses… which is why people writing the configuration parser would rather use it than XML.

    Libraries to parse XML were/are extremely complex, by definition. Schemas work great as long as you’re not constantly changing them! Which, unfortunately, happens a lot in projects that are earlier in development.

    Switching to JSON for data reduced frustration during development by a massive amount. Since most development isn’t building on defined schemas, the supposed massive benefits of XML were nonexistent in practice.

    Even for configuration, the amount of “boilerplate” in XML is atrocious and there are (slightly) better things to use. Everyone used XML for configuration for Java twenty years ago, which was one of the popular backend languages (this author foolishly complains about Java too). I still dread the massive XML configuration files of past Java. Yaml is confusing in other ways, but XML is awful to work on and parse with any regularity.

    I used XML extensively back when everyone writing asynchronous web requests was debating between using the two (in “AJAX”, the X stands for XML).

    Once people started using JSON for data, they never went back to XML.

    Syntax highlighting only works in your editor, and even then it doesn’t help that much if you have a lot of data (like configuration files for large applications). Browsers could even display JSON with syntax highlighting in the browser, for obvious reasons — JSON is vastly simpler and easier to parse.

    • Kissaki@programming.devOP
      link
      fedilink
      English
      arrow-up
      6
      ·
      edit-2
      3 hours ago

      Making XML schemas work was often a hassle. You have a schema ID, and sometimes you can open or load the schema through that URL. Other times, it serves only as an identifier and your tooling/IDE must support ID to local xsd file mappings that you configure.

      Every time it didn’t immediately work, you’d think: Man, why don’t they publish the schema under that public URL.

      • calliope@retrolemmy.com
        link
        fedilink
        arrow-up
        2
        ·
        edit-2
        3 hours ago

        This seriously sounds like a nightmare.

        It’s giving me Eclipse IDE flashbacks where it seemed so complicated to configure I just hoped it didn’t break. There were a lot of those, actually.

  • TunaLobster@lemmy.world
    link
    fedilink
    arrow-up
    2
    ·
    3 hours ago

    IMO, the best thing about YAML is the referencing. It’s super easy to reuse an object multiple times. Gives that same kind of parten child struct ability that programming languages have. Sure XML can do it, but it’s not in every parser. cough python built in parser cough But then YAML is also not a built in parser and doing DOM in things other than XML feels odd.

    • Feyd@programming.dev
      link
      fedilink
      arrow-up
      1
      ·
      25 minutes ago

      That capability is what enables billion laugh attacks, unfortunately, so not having it enabled in cases where there is external input possible is wise

  • Ephera@lemmy.ml
    link
    fedilink
    English
    arrow-up
    13
    ·
    5 hours ago

    IMHO one of the fundamental problems with XML for data serialization is illustrated in the article:

    (person (name "Alice") (age 30))
    [is serialized as]

    <person>
      <name>Alice</name>
      <age>30</age>
    </person>
    

    Or with attributes:
    <person name="Alice" age="30" />

    The same data can be portrayed in two different ways. Whenever you serialize or deserialize data, you need to decide whether to read/write values from/to child nodes or attributes.

    That’s because XML is a markup language. It’s great for typing up documents, e.g. to describe a user interface. It was not designed for taking programmatic data and serializing that out.

    • atzanteol@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      3
      ·
      4 hours ago

      This is your confusion, not an issue with XML.

      Attributes tend to be “metadata”. You ever write HTML? It’s not confusing.

      • Feyd@programming.dev
        link
        fedilink
        arrow-up
        6
        ·
        edit-2
        3 hours ago

        In HTML, which things are attributes and which things are tags are part of the spec. With XML that is being used for something arbitrary, someone is making the choice every time. They might have a different opinion than you do, or even the same opinion, but make different judgments on occasion. In JSON, there are fewer choices, so fewer chances for people to be surprised by other people’s choices.

        • atzanteol@sh.itjust.works
          link
          fedilink
          English
          arrow-up
          1
          ·
          edit-2
          52 minutes ago

          I mean, yeah. But people don’t just do things randomly. Most people put data in the body and metadata in attributes just like html.

    • Kissaki@programming.devOP
      link
      fedilink
      English
      arrow-up
      2
      ·
      4 hours ago

      It can be used as alternatives. In MSBuild you can use attributes and sub elements interchangeably. Which, if you’re writing it, gives you a choice of preference. I typically prefer attributes for conciseness (vertical density), but switch to subelements once the length/number becomes a (significant) downside.

      Of course that’s more of a human writing view. Your point about ambiguity in de-/serialization still stands at least until the interface defines expectation or behavior as a general mechanism one way or the other, or with specific schema.

    • aivoton@sopuli.xyz
      link
      fedilink
      arrow-up
      4
      ·
      edit-2
      5 hours ago

      The same data can be portrayed in two different ways.

      And that is issue why? The specification decided which one you use and what do you need. For some things you consider things as attributes and for some things they are child elements.

      JSON doesn’t even have attributes.

      • Ephera@lemmy.ml
        link
        fedilink
        English
        arrow-up
        7
        ·
        4 hours ago

        Alright, I haven’t really looked into XML specifications so far. But I also have to say that needing a specification to consistently serialize and deserialize data isn’t great either.

        And yes, JSON not having attributes is what I’m saying is a good thing, at least for most data serialization use-cases, since programming languages do not typically have such attributes on their data type fields either.

        • aivoton@sopuli.xyz
          link
          fedilink
          arrow-up
          1
          ·
          2 hours ago

          I worded my answer a bit wrongly.

          In XML <person><name>Alice</name><age>30</age></person> is different from <person name="Alice" age="30" /> and they will never (de)serialize to each other. The original example by the articles author with the person is somewhat misguided.

          They do contain the same bits of data, but represent different things and when designing your dtd / xsd you have to decide when to use attributes and when to use child elements.

    • Feyd@programming.dev
      link
      fedilink
      arrow-up
      3
      ·
      5 hours ago

      JSON also has arrays. In XML the practice to approximate arrays is to put the index as an attribute. It’s incredibly gross.

      • Kissaki@programming.devOP
        link
        fedilink
        English
        arrow-up
        2
        ·
        4 hours ago

        In XML the practice to approximate arrays is to put the index as an attribute. It’s incredibly gross.

        I don’t think I’ve seen that much if ever.

        Typically, XML repeats tag names. Repeating keys are not possible in JSON, but are possible in XML.

        <items>
          <item></item>
          <item></item>
          <item></item>
        </items>
        
        • Feyd@programming.dev
          link
          fedilink
          arrow-up
          4
          ·
          edit-2
          3 hours ago

          That’s correct, but the order of tags in XML is not meaningful, and if you parse then write that, it can change order according to the spec. Hence, what you put would be something like the following if it was intended to represent an array.

          <items>
            <item index="1"></item>
            <item index="2"></item>
            <item index="3"></item>
          </items>
          
  • Feyd@programming.dev
    link
    fedilink
    arrow-up
    8
    ·
    edit-2
    5 hours ago

    Honestly, anyone pining for all the features of XML probably didn’t live through the time when XML was used for everything. It was actually a fucking nightmare to account for the existence of all those features because the fact they existed meant someone could use them and feed them into your system. They were also the source of a lot of security flaws.

    This article looks like it was written by someone that wasn’t there, and they’re calling people telling them the truth that they are liars because they think features they found in w3c schools look cool.

  • epyon22@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    16
    ·
    7 hours ago

    The fact that json serializes easily to basic data structures simplifies code so much. Most use cases don’t need fully sematic data storage much of which you have to write the same amount of documentation about the data structures anyways. I’ll give XML one thing though, schemas are nice and easy, but high barrier to entry in json.

    • Kissaki@programming.devOP
      link
      fedilink
      English
      arrow-up
      5
      ·
      7 hours ago

      Most use cases don’t need fully sematic data storage

      If both sides have a shared data model it’s a good base model without further needs. Anything else quickly becomes complicated because of the dynamic nature of JSON - at least if you want a robust or well-documented solution.

      • lad@programming.dev
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 hour ago

        Yeah, when the same API endpoint sometimes return a string for an error, sometimes an object, and sometimes an array, JSON doesn’t help much in parsing the mess

  • A_norny_mousse@feddit.org
    link
    fedilink
    arrow-up
    5
    ·
    edit-2
    5 hours ago

    I never understood why people would say JSON is superior, and why XML seemed to be getting rarer, but the author explains it:

    XML was not abandoned because it was inadequate; it was abandoned because JavaScript won.

    I’ve been unsing it ever since I started using Linux because my favorite window manager uses it, and because of a long-running pet project that is almost just as old: first I used XML tools to parse web pages, later I switched to dedicated data providers that offered both XML and JSON formats, and stuck to what I knew.

    I’m guessing that another reason devs - especially web devs - prefer JSON over XML is that the latter uses more bytes to transport the same amount of raw data. One XML file will be somewhat larger than one JSON file with the same content. That advantage is of course dwarved by all the other media and helper scripts - nay, frameworks, devs use to develop websites.

    BTW, XML is very readable with syntax highlighting and easily editable if your code editor has some very basic completion for it. And it has comments!

    • Kissaki@programming.devOP
      link
      fedilink
      English
      arrow-up
      3
      ·
      6 hours ago

      The readability and obviousness of XML can not be overstated. JSON is simple and dense (within the limit of text). But look at JSON alone, and all you can do is hope for named fields. Outside of that, you depend on context knowledge and specific structure and naming context.

      Whenever I start editing json config files I have to be careful about trailing commas, structure with opening and closing parens, placement and field naming. The best you can do is offer a default-filled config file that already has the full structure.

      While XML does not solve all of it, it certainly is more descriptive and more structured, easing many of those pain points.


      It’s interesting that web tech had XML in the early stages of AJAX, the dynamic web. But in the end, we sent JSON through XMLHttpRequest. JSON won.

  • arjen@piefed.social
    link
    fedilink
    English
    arrow-up
    1
    ·
    4 hours ago

    Preaching the choir I like to sing in.

    I didn’t know the link to S-Expressions, ty.

  • lehenry@lemmy.world
    link
    fedilink
    arrow-up
    7
    ·
    7 hours ago

    While I understand the critic about XPath and XSL, the fact that we have proper tools to query and tranform XML instead of the messy wat of getting specific information from JSON is also one of tge strong point of XML.

  • Auster@thebrainbin.org
    link
    fedilink
    arrow-up
    2
    ·
    7 hours ago

    Skimming through the post, the code snippet about halfway through picked my attention. Been a while since I studied site development, but that snippet looks awfully like HTML. Are it and XML related?

    • atzanteol@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      4
      ·
      edit-2
      3 hours ago

      They’re siblings. They both derive from SGML. There is a version of HTML that is also XML conformant called XHTML but it never caught on…

    • A_norny_mousse@feddit.org
      link
      fedilink
      arrow-up
      7
      ·
      7 hours ago

      Yes. Arguably, HTML is a form of XML. Also, the ML means the same in both. XML tools can often also be used to query HTML documents.

    • Kissaki@programming.devOP
      link
      fedilink
      English
      arrow-up
      5
      ·
      edit-2
      7 hours ago

      There was a time where HTML moved towards a more formalized XML-valid definition named XHTML. Ultimately, web/browser backwards compatibility and messy and forgiving nature lead to us giving up on that and now we have the HTML living standard with rules, but browsers (not sure to what degree it’s standardized or not) are very forgiving in their interpretation.

      While HTML, prior to HTML5, was defined as an application of Standard Generalized Markup Language (SGML), a flexible markup language framework, XHTML is an application of XML, a more restrictive subset of SGML. XHTML documents are well-formed and may therefore be parsed using standard XML parsers, unlike HTML, which requires a lenient, HTML-specific parser.[1]

      XHTML 1.0 became a World Wide Web Consortium (W3C) recommendation on 26 January 2000. XHTML 1.1 became a W3C recommendation on 31 May 2001. XHTML is now referred to as “the XML syntax for HTML”[2][3] and being developed as an XML adaptation of the HTML living standard.[4][5]

      • MonkderVierte@lemmy.zip
        link
        fedilink
        arrow-up
        2
        ·
        edit-2
        6 hours ago

        But nobody uses it anymore and uses a js-framework on a <div> page instead. Which only 3 billion-dollar engines in the world can render.