Productive Rage

Dan's techie ramblings

Writing React components in TypeScript

Whoops.. I started writing this post a while ago and have only just got round to finishing it off. Now I realise that it applies to React 0.10 but the code here doesn't work in the now-current 0.11. On top of this, hopefully this will become unnecessary when 0.12 is released. I talk about this in the last part of the post. But until 0.12 is out (and confirmed to address the problem), I'm going to stick to 0.10 and use the solution that I talk about here.

Update (29th January 2015): React 0.13 beta has been released and none of this workaround is required any more - I've written about it here: TypeScript / ES6 classes for React components - without the hacks!

I've been playing around with React recently, putting together some prototypes to try to identify any pitfalls in what I think is an excellent idea and framework, with a view to convincing everyone else at work that we should consider it for new products. I'm no JavaScript hater but I do strongly believe in strongly typed code being easier to maintain in the long run for projects of significant size. Let's not get into an argument about whether strong or "weak" typing is best - before we know it we could end up worrying about what strongly typed even means! If you don't agree with me then you probably don't see any merit to TypeScript and you probably already guessed that this post will not be of interest to you! :)

So I wanted to try bringing together the benefits of React with the benefits of TypeScript.. I'm clearly not the only one since there is already a type definition available in NuGet: React.TypeScript.DefinitelyTyped. This seems to be the recommended definition and appears to be in active development. I'd love it even more if there was an official definition from Facebook themselves (they have one for their immutable-js library) but having one here is a great start. This allows us to call methods in the React library and know what types the arguments should be and what they will return (and the compiler will tell us if we break these contracts by passing the wrong types or trying to mistreat the return values).

However, there are a few problems. Allow me to venture briefly back to square one..

Back to basics: A React component

This is a very simple component in React -

var MyButton = React.createClass({
  _clickHandler: function() {
    alert('Clicked MyButton with message "' + this.props.message + '"');
  },
  render: function() {
    return <button onClick={this._clickHandler}>{this.props.message}</button>;
  }
});

It's pretty boring but it illustrates a few principles. Firstly, it's written in "jsx" - a format like JavaScript but that needs some processing to actually become JavaScript. The <button> declaration looks like html, for example, and needs altering to become real JavaScript. If we're going to write components in TypeScript then we can't use this format since Visual Studio doesn't understand it (granted I'm making a bit of a leap assuming that you're using Visual Studio for this - it's not necessary, but I suspect most people writing TypeScript will use it since the TypeScript support is so good).

The good news is that the translation from "jsx" to JavaScript is not a complex one*. It results in slightly longer code but it's still easily readable (and writable). So the above would be, written in native JavaScript -

var MyButton = React.createClass({
  _clickHandler: function() {
    alert('Clicked MyButton with message "' + this.props.message + '"');
  },
  render: function() {
    return React.DOM.button({ onClick: this._clickHandler }, this.props.message);
  }
});

* (It can do other clever stuff like translate "fat arrow" functions into JavaScript that is compatible with older browsers, but let's not get bogged down with that here - since I want to use TypeScript rather than jsx, it's not that relevant right now).

This simple example is illustrating something useful that can be taken for granted since React 0.4; autobinding. When "_clickHandler" is called, the "this" reference is bound to the component instance, so "this.props.message" is accessible. Before 0.4, you had to use the "React.autoBind" method - eg.

var MyButton = React.createClass({
  _clickHandler: React.autoBind(function() {
    alert('Clicked MyButton with message "' + this.props.message + '"');
  }),
  render: function() {
    return React.DOM.button({ onClick: this._clickHandler }, this.props.message);
  }
});

but these days it just works as you would expect (or as you would hope, perhaps). This happened back in July 2013 - see New in React v0.4: Autobind by Default.

A TypeScript React component: Take 1

If we naively try to write TypeScript code that starts off with the JavaScript above then we find we have no intellisense. The editor has no idea about "this.props" - no idea that it is defined, certainly no idea that it has a property "message" that should be a string. This shouldn't really be a surprise since the "this" in this case is just an anonymous object that we're passing to "React.createClass"; no information about the type has been specified, so it is considered to be of type "any".

TypeScript React Component 'this' issue

If we continue like this then we're going to miss out on the prime driver for using TypeScript in the first place - we might as well just write the components in JavaScript or "jsx"! (In fairness, this is something that I considered.. with React, and particularly the recommended Flux architecture, the "view components" are a relatively thin layer over components that could easily be written in TypeScript and so benefit from being strongly typed.. the view components could remain "more dynamic" and be covered by the class of unit tests that are often used to cover cases that are impossible with the guarantees of strong typing).

The obvious thing to try was to have a TypeScript class along the lines of

class MyButton {
  props: { message: string };
  private _clickHandler() {
    alert('Clicked MyButton with message "' + this.props.message + '"');
  }
  public render() {
    return React.DOM.button({ onClick: this._clickHandler }, this.props.message);
  }
}

var MyButtonReactComponent = React.createClass(new MyButton());

This would solve the internal type specification issue (where "this" is "any"). However, when the "React.createClass" function is called at runtime, an error is thrown..

Error: Invariant Violation: createClass(...): Class specification must implement a render method.

I'm not completely sure, but I suspect that the React framework code is expecting an object with a property that is a function named "render" while the class instance passed to it has a function "render" on its prototype rather than a property on the reference itself.

Looking for help elsewhere

When I got to this point, I figured that someone else must have had encountered the same problem - particularly since there exists this TypeScript definition for React in the first place! I came across a GitHub project React TypeScript which describes itself as a

React wrapper to make it play nicely with typescript.

An example in the README shows

import React = require('react');
import ReactTypescript = require('react-typescript');

class HelloMessage extends ReactTypescript.ReactComponentBase<{ name: string; }, {}> {
  render() {
    return React.DOM.div(null, 'Hello ' + this.props.name);
  }
}

React.renderComponent(new HelloMessage({ name: 'Jhon' }), mountNode);

which looks like exactly what I want!

The problems are that it clearly states..

warning: ReactTypescript can actually only be used with commonjs modules and browserify, if someone does want AMD I'll gladly accept any PR that would packages it for another format.

.. and I'm very interesting in using AMD and require.js to load modules "on demand" (so that if I develop a large app then I have a way to prevent the "megabyte-plus initial JavaScript download").

Also, I'm concerned that the maintained TypeScript definition that I referenced earlier claims to be

Based on TodoMVC sample by @fdecampredon, improved by @wizzard0, MIT licensed.

fdecampredon is the author of this "React TypeScript" repo.. which hasn't been updated in seven months. So I'm concerned that the definitions might not be getting updated here - there are already a lot of differences between the react.d.ts in this project and that in the maintained NuGet package's react.d.ts.

In addition to this, the README states that

In react, methods are automatically bound to a component, this is not the case when using ReactTypeScript, to activate this behaviour you can use the autoBindMethods function of ReactTypeScript

This refers to what I talked about earlier; the "auto-binding" convenience to make writing components more natural. There are two examples of ways around this. You can use the ReactTypeScript library's "autoBindMethods" function -

class MyButton extends ReactTypeScript.ReactComponentBase<{ message: string}, any> {
  clickHandler(event: React.MouseEvent) {
    alert(this.props.message);
  }
  render() {
    return React.DOM.button({ onClick: this.clickHandler }, 'Click Me');
  }
}

// If this isn't called then "this.props.message" will error in clickHandler as "this" is not
// bound to the instance of the class
ReactTypeScript.autoBindMethods(MyButton);

or you can use the TypeScript "fat arrow" to bind the function to the "this" reference that you would expect:

class MyButton extends  ReactTypeScript.ReactComponentBase<{ message: string}, any> {
  // If the fat arrow isn't used for the clickHandler definition then "this.props.message" will
  // error in clickHandler as "this" is not bound to the instance of the class
  clickHandler = (event: React.MouseEvent) => {
    alert(this.props.message);
  }
  render() {
    return React.DOM.button({ onClick: this.clickHandler }, 'Click Me');
  }
}

The first approach feels a bit clumsy, that you must always remember to call this method for all component classes. The second approach doesn't feel too bad, it's just a case of being vigilant and always using fat arrows - but if you forget, you won't find out until runtime. Considering that I want to use to TypeScript to catch more errors at compile time, this still doesn't feel ideal.

The final concern I have is that the library includes a large-ish react-internal.js file. What I'm going to suggest further down does unfortunately dip its toe into React's (undocumented) internals but I've tried to keep it to the bare minimum. This "react-internal.js" worries me as it might be relying on a range of implementation details, any of which (as far as I know) could potentially change and break my code.

In case I'm sounding down on this library, I don't mean to be - I've tried it out and it does actually work, and there are not a lot of successful alternatives out there. So I've got plenty of respect for this guy, getting his hands dirty and inspiring me to follow in his footsteps!

Stealing Taking inspiration - A TypeScript React component: Take 2

So I want a way to

  1. Write a TypeScript class that can be used as a React component
  2. Use the seemingly-maintained NuGet-delivered type definition and limit access to the "internals" as much as possible
  3. Have the component's methods always be auto-bound

I'd better say this up-front, though: I'm willing to sacrifice the support for mixins here.

fdecampredon's "React TypeScript" library does support mixins so it's technically possible but I'm not convinced at this time that they're worth the complexity required by the implementation since I don't think they fit well with the model of a TypeScript component.

The basic premise is that you can name mixin objects which are "merged into" the component code, adding properties such as functions that may be called by the component's code. Since TypeScript wouldn't be aware of the properties added by mixins, it would think that there were missing methods / properties and flag them as errors if they were used within the component.

On top of this, I've not been convinced by the use cases for mixins that I've seen so far. In the official React docs section about mixins, it uses the example of a timer that is automatically cleared when the component is unmounted. There's a question on Stack Overflow "Using mixins vs components for code reuse in Facebook React" where the top answer talks about using mixins to perform common form validation work to display errors or enable or disable inputs by directly altering internal state on the component. As I understand the Flux architecture, the one-way message passing should result in validation being done in the store rather than the view / component. This allows the validation to exist in a central (and easily-testable) location and to not exist in the components. This also goes for the timer example, the logic-handling around whatever events are being raised on a timer should not exist within the components.

What I have ended up with is this:

import React = require('react');

// The props and state references may be passed in through the constructor
export class ReactComponentBase<P, S> {
  constructor(props?: P, state?: S) {

    // Auto-bind methods on the derived type so that the "this" reference is as expected
    // - Only do this the first time an instance of the derived class is created
    var autoBoundTypeScriptMethodsFlagName = '__autoBoundTypeScriptMethods';
    var autoBindMapPropertyName = '__reactAutoBindMap'; // This is an internal React value
    var cp = this['constructor'].prototype;
    var alreadyBoundTypeScriptMethods = (cp[autoBoundTypeScriptMethodsFlagName] === true)
      && cp.hasOwnProperty(autoBoundTypeScriptMethodsFlagName);
    if (!alreadyBoundTypeScriptMethods) {
      var autoBindMap = {};
      var parentAutoBindMap = cp[autoBindMapPropertyName];
      var functionName;
      if (parentAutoBindMap) {
        // Maintain any binding from an inherited class (if the current class being dealt
        // with doesn't directly inherit from ReactComponentBase)
        for (functionName in parentAutoBindMap) {
          autoBindMap[functionName] = parentAutoBindMap[functionName];
        }
      }
      for (functionName in cp) {
        if (!cp.hasOwnProperty(functionName) || (functionName === "constructor")) {
          continue;
        }
        var fnc = cp[functionName];
        if (typeof (fnc) !== 'function') {
          continue;
        }
        autoBindMap[functionName] = fnc;
      }
      cp[autoBindMapPropertyName] = autoBindMap;
      cp[autoBoundTypeScriptMethodsFlagName] = true;
    }

    this['construct'].apply(this, arguments); // This is an internal React method
  }

  props: P;
  state: S;
}
ReactComponentBase.prototype = React.createClass({
  // The component must share the "componentConstructor" that is present on the prototype of
  // the return values from React.createClass
  render: function () {
    return null;
  }
})['componentConstructor'].prototype; // Also an internal React method

// This must be used to mount component instances to avoid errors due to the type definition
// expecting a React.ReactComponent rather than a ReactComponentBase (though the latter is
// able to masquerade as the former and when the TypeScript compiles down to JavaScript,
// no-one will be any the wiser)
export function renderComponent<P, S>(
    component: ReactComponentBase<P, S>,
    container: Element,
    callback?: () => void) {
  var mountableComponent = <React.ReactComponent<any, any>><any>component;
  React.renderComponent(
    mountableComponent,
    container,
    callback
    );
}

This allows the following component to be written:

import React = require('react');
import ReactComponentBridge = require('components/ReactComponentBridge');

class MyButton extends ReactComponentBridge.ReactComponentBase<{ message: string }, any> {
  myButtonClickHandler(event: React.MouseEvent) {
    alert('Clicked MyButton with message "' + this.props.message + '"');
  }
  render() {
    return React.DOM.button({ onClick: this.myButtonClickHandler }, 'Click Me');
  }
}

export = MyButton;

which may be rendered with:

import ReactComponentBridge = require('components/ReactComponentBridge');
import MyButton = require('components/MyButton');

ReactComponentBridge.renderComponent(
  new MyButton({ message: 'Click Me' }),
  document.getElementById('container')
);

Hurrah! Success! All is well with the world! I've got the benefits of TypeScript and the benefits of React and the Flux architecture (ok, the last one doesn't need any of this or even require React - it could really be used with whatever framework you chose). There's just one thing..

I'm out of date

Like I said at the start of this post, as I got to rounding it out to publish, I realised that I wasn't on the latest version of React (current 0.11.2, while I was still using 0.10) and that this code didn't actually work on that version. Sigh.

However, the good news is that it sounds like 0.12 (still in alpha at the moment) is going to make things a lot easier. The changes in 0.11 appear to be paving the way for 0.12 to shakes things up a bit. Changes are documented at New React Descriptor Factories and JSX which talks about how the problem they're trying to solve with the new code is a

Simpler API for ES6 classes

.. and there is a note in the react-future GitHub repo ("Specs & docs for potential future and experimental React APIs and JavaScript syntax") that

A React component module will no longer export a helper function to create virtual elements. Instead it's the responsibility of the consumer to efficiently create the virtual element.

Languages that compile to JS can choose to implement the element signature in whatever way is idiomatic for that language:

TypeScript implements some ES6 features (such as classes, which are how I want to represent React components) so (hopefully) this means that soon-to-hit versions of React are going to make ES6-classes-for-components much easier (and negate the need for a workaround such as is documented here).

The articles that I've linked to (I'm not quite sure how official that all is, btw!) are talking about a future version since they refer to the method "React.createFactory", which isn't available in 0.11.2. I have cloned the in-progress master repo from github.com/facebook/react and built the 0.12-alpha code* and that does have that method. However, I haven't yet managed to get it working as I was hoping. I only built it a couple of hours ago, though, and I want to get this post rounded out rather than let it drag on any longer! And, I'm sure, when this mechanism for creating React components is made available, I'm sure a lot of information will be released about it!

* (npm is a great tool but it still can't make everything easy.. first I didn't realise that the version of node.js I was using was out of date and it prevented some dependencies from being installed. Then I had to install Python - but 2.7 was required, I found out, after I'd installed 3.4. Then I didn't have Git installed on the computer I was trying to build React from. Then I had to mess about with setting environment variables for the Python and Git locations. But it did work, and when I think about how difficult it would have been without a decent package manager I stop feeling the need to complain about it too much :)

Posted at 23:12

Comments

Implementing F#-inspired "with" updates for immutable classes in C#

I've been prototyping a data service for a product at work that communicates with immutable types and one of the feedback comments was a question as to whether the classes supported a flexible F#-esque "with" method that would allow multiple properties to be changed without the garbage collection churn of creating intermediate references for each individual property (since, of course, the property values aren't actually changed on an instance, a new instance is generated that reflects the requested changes).

To pull an example straight from the excellent F# for fun and profit site:

let p1 = {first="Alice"; last="Jones"}
let p2 = {p1 with last="Smith"}

This creates a new record p2 that takes p1 and changes one of the fields. Multiple fields may be altered in one use "with" statement

let p2 = {p1 with first="John";last="Smith"}

To start with a very simple example in C#, take the following class:

public class RoleDetails
{
  public RoleDetails(string title, DateTime startDate, DateTime? endDateIfAny)
  {
    Title = title;
    StartDate = startDate;
    EndDateIfAny = endDateIfAny;
  }

  public string Title { get; private set; }
  public DateTime StartDate { get; private set; }
  public DateTime? EndDateIfAny { get; private set; }
}

This is a very close parallel to the F# record type since it just assigns read-only properties (they're not strictly read-only since they don't use the "readonly" keyword but they're not externally alterable and are only set once within the class so it's close enough).

If I was writing something like this for real use, I would probably try to make more guarantees.. or at least, document behaviour. Something like:

public class RoleDetails
{
  public RoleDetails(string title, DateTime startDate, DateTime? endDateIfAny)
  {
    if (string.IsNullOrWhiteSpace(title))
      throw new ArgumentException("title");
    if ((endDateIfAny != null) && (endDateIfAny <= startDate))
      throw new ArgumentException("title");

    Title = title.Trim();
    StartDate = startDate;
    EndDateIfAny = endDateIfAny;
  }

  /// <summary>
  /// This will never be null or blank, it will not have any leading or trailing whitespace
  /// </summary>
  public string Title { get; private set; }

  public DateTime StartDate { get; private set; }

  /// <summary>
  /// If non-null, this will greater than the StartDate
  /// </summary>
  public DateTime? EndDateIfAny { get; private set; }
}

As I've said before, this validation and commenting is really a poor substitute for code contracts which would allow for compile time detection of invalid data rather than relying on runtime exceptions (speaking of which, I need to give the .net code contracts solution another go - last time I got stuck in I hit some problems which hopefully they've ironed out by now).

Another variation on the "aggressive validation" illustrated above would be a type that represents a non-blank string to prevent duplicating calls to IsNullOrWhiteSpace and trim. This concept could be taken even further to "strongly type" string values so that a "Title" can not be passed into a function that expects a "Notes" string value, for example. This is far from an original idea but it was something I was experimenting again with recently.

Incidentally, there is talk of a future version of C# getting a record type which would reduce boilerplate code when defining simple immutable types. For example (from the InfoQ article Easier Immutable Objects in C# 6 and VB 12) -

public record class Cartesian(double x: X, double y: Y);

This will define an immutable class with two read-only properties that are set through a constructor call. This future C# specification is also apparently going to allow read-only auto properties - so in my RoleDetails class above instead of "get; private set;" properties, which are externally unalterable but could actually be changed within the instance, the properties could be truly readonly. This is possible currently but it requires a private readonly field and a property with a getter that returns that field's value, which is even more boring boilerplate.

The obvious, verbose and potentially more GC-churny way

To prevent callers from having to call the constructor every time a property needs to be altered, update methods for each "mutable" property may be added (they don't really mutate the values since a new instance is returned rather than the value changed on the current instance). This prevents the caller from having to repeat all of the constructor arguments that are not to be changed whenever one property needs altering. Forcing callers to call constructors in this way is particularly annoying if a constructor argument is added, removed or re-ordered at a later date; this can result in a lot of calling code that needs correcting.

public class RoleDetails
{
  public RoleDetails(string title, DateTime startDate, DateTime? endDateIfAny)
  {
    Title = title;
    StartDate = startDate;
    EndDateIfAny = endDateIfAny;
  }

  public string Title { get; private set; }
  public DateTime StartDate { get; private set; }
  public DateTime? EndDateIfAny { get; private set; }

  public RoleDetails Update(string title)
  {
    return (title == Title)
      ? this
      : new RoleDetails(title, StartDate, EndDateIfAny);
  }
  public RoleDetails UpdateStartDate(DateTime startDate)
  {
    return (startDate == StartDate)
      ? this
      : new RoleDetails(Title, startDate, EndDateIfAny);
  }
  public RoleDetails UpdateEndDateIfAny(DateTime? endDateIfAny)
  {
    return (endDateIfAny == EndDateIfAny)
      ? this
      : new RoleDetails(Title, StartDate, endDateIfAny);
  }
}

To update two properties on a given instance, you would need to call

var updatedRoleDetails = existingRoleDetails
  .UpdateStartDate(new DateTime(2014, 9, 21))
  .UpdateEndDateIfAny(new DateTime(2014, 11, 21));

If either of the new values is the same as the property value that it should be replacing, then no new instance is required for that property update - since the Update*{Whatever}* method will return back the same instance. But if both properties are changed then two new instances are required even though the first, intermediate value is immediately discarded and so is "wasted".

There could be an Update method that takes multiple parameters for the different properties but then you're basically just mirroring the constructor. Or there could be various Update methods that took combinations of properties to try to cover either the most common cases or all combinations of cases, but neither of these are particularly elegant and they would all result in quite a lot of code duplication.

A better way

It struck me that it should be possible to do something with named and optional method arguments (support for which was added to C# when .net 4 came out, if I remember correctly). Something like

public RoleDetails UpdateWith(
  string title = Title,
  DateTime startDate = StartDate,
  DateTime? endDateIfAny = EndDateIfAny)
{
  if ((title == Title) && (startDate == StartDate) && (endDateIfAny == EndDateIfAny))
    return this;
  return new RoleDetails(title, startDate, endDateIfAny);
}

would allow for only a subset of the arguments to be specified and for those that are left unspecified to default to the current property value of the instance. So the earlier update code becomes

var updatedRoleDetails = existingRoleDetails
  .UpdateWith(startDate: new DateTime(2014, 9, 21), endDateIfAny: new DateTime(2014, 11, 21));

However, this won't fly. The compiler gives the errors

Default parameter value for 'title' must be a compile-time constant

Default parameter value for 'startDate' must be a compile-time constant

Default parameter value for 'endDateIfAny' must be a compile-time constant

That's a bummer.

Another thought that briefly crossed my mind was for the default argument values to all be null. This would work if the arguments were all reference types and would result in the method body looking something like

if ((title == null) && (startDate == null) && (endDateIfAny == null))
  return this;
return new RoleDetails(title ?? Title, startDate ?? StartDate, endDateIfAny ?? EndDateIfAny);

But that is too restrictive a constraint since in this case we have a non-reference type argument (startDate) and we also have a reference type for which null is a valid value (endDateIfAny).

So what we really need is a wrapper type around the arguments that indicates when no value has been specified. Since we're being conscious of avoiding GC churn, this should be a struct since structs essentially avoid adding GC pressure since they are always copied when passed around - this means that no struct is referenced by multiple scopes and so they don't have to be traced in the same way as reference types; when the scope that has access to the struct is terminated, the struct can safely be forgotten as well. This is not a particularly precise description of what happens and more details can be found in the MSDN article Choosing Between Class and Struct. Particularly see the paragraph

The first difference between reference types and value types we will consider is that reference types are allocated on the heap and garbage-collected, whereas value types are allocated either on the stack or inline in containing types and deallocated when the stack unwinds or when their containing type gets deallocated. Therefore, allocations and deallocations of value types are in general cheaper than allocations and deallocations of reference types.

The other guidelines in that article around cases where structs may be appropriate (if the type "logically represents a single value", "has an instance size under 16 bytes", "is immutable" and "will not have to be boxed frequently") are followed by this type:

public struct Optional<T>
{
  private T _valueIfSet;
  private bool _valueHasBeenSet;

  public T GetValue(T valueIfNoneSet)
  {
    return _valueHasBeenSet ? _valueIfSet : valueIfNoneSet;
  }

  public bool IndicatesChangeFromValue(T value)
  {
    if (!_valueHasBeenSet)
      return false;

    if ((value == null) && (_valueIfSet == null))
      return false;
    else if ((value == null) || (_valueIfSet == null))
      return true;

    return !value.Equals(_valueIfSet);
  }

  public static implicit operator Optional<T>(T value)
  {
    return new Optional<T>
    {
      _valueIfSet = value,
      _valueHasBeenSet = true
    };
  }
}

This type allows us to write an UpdateWith method

public RoleDetails UpdateWith(
  Optional<string> title = new Optional<string>(),
  Optional<DateTime> startDate = new Optional<DateTime>(),
  Optional<DateTime?> endDateIfAny = new Optional<DateTime?>())
{
  if (!title.IndicatesChangeFromValue(Title)
  && !startDate.IndicatesChangeFromValue(StartDate)
  && !endDateIfAny.IndicatesChangeFromValue(EndDateIfAny))
    return this;

  return new RoleDetails(
    title.GetValue(Title),
    startDate.GetValue(StartDate),
    endDateIfAny.GetValue(EndDateIfAny)
  );
}

The Optional type could have exposed properties for has-a-value-been-set and get-value-if-any but since each property comparison (to determine whether a new instance is actually required) would have to follow the pattern if-value-has-been-set-and-if-value-that-has-been-set-does-not-equal-current-value, it made sense to me to hide the properties and to instead expose only the access methods "IndicatesChangeFromValue" and "GetValue". The "IndicatesChangeFromValue" method returns true if the Optional describes a value that is different to that passed in and "GetValue" returns the wrapped value if there is one, and returns the input argument if not. This enables the relatively succinct "UpdateWith" method format shown above.

The other method on the struct is an implicit operator for the wrapped type which makes the "UpdateWith" calling code simpler. Instead of having to do something like

var updatedRoleDetails = existingRoleDetails
  .UpdateWith(startDate = Optional<DateTime>(new DateTime(2014, 9, 21)));

the implicit conversion allows you to write

var updatedRoleDetails = existingRoleDetails
  .UpdateWith(startDate = new DateTime(2014, 9, 21));

because the DateTime will be implicitly converted into an Optional<DateTime>. In fact, I went one step further and made it such that this is the only way to create an Optional that wraps a value. There is no constructor that may be used to initialise an Optional with a value, you must rely upon the implicit conversion. This means that it's very clear that there's only one way to use this type. It also happens to be very similar to the most common way that the Nullable type is used in C# - although that does have a public constructor that accepts the value to wrap, in practice I've only ever seen values cast to Nullable (as opposed to the Nullable constructor being passed the value).

Turning it up to eleven

Now this is all well and good and I think it would be a solid leap forward simply to leave things as they are shown above. Unnecessary GC pressure is avoided since there are no "intermediary" instances when changing properties, while the use of structs means that we're not generating a load of property-update-value references that need to be collected either.

But I just couldn't resist trying to push it a bit further since there's still quite a lot of boring code that needs to be written for every immutable type - the UpdateWith method needs to check all of the properties to ensure that they haven't changed and then it needs to pass values into a constructor. If a class has quite a lot of properties (which is not especially unusual if the types are representing complex data) then this UpdateWith method could grow quite large. Wouldn't it be nice if we could just write something like:

public RoleDetails UpdateWith(
  Optional<string> title = new Optional<string>(),
  Optional<DateTime> startDate = new Optional<DateTime>(),
  Optional<DateTime?> endDateIfAny = new Optional<DateTime?>())
{
  return magicUpdater(title, startDate, endDateIfAny);
}

Wouldn't it?? Yes it would.

And we can.. if we dip into some of the .net framework's crazier parts - reflection and stack tracing. With some LINQ expressions thrown in to make it work efficiently when called more than once or twice.

What this "magicUpdater" needs to do is take the names and values of the arguments passed to it and then analyse the target type (RoleDetails in this example) to find the constructor to call that will allow all of these named values to be passed into a new instance, using existing property values on the source instance for any constructor arguments that are not provided by the update arguments. It also needs to do the same work to determine whether the update arguments actually require a new instance to be generated - if only the StartDate is being provided to change but the new value is the same as the current value then no new instance is required, the source instance can be returned directly by the "magicUpdater".

This is handled by two steps. The first based around this line:

var callingMethod = new StackFrame(1).GetMethod();

It returns a MethodBase with metadata about the method that called the "magicUpdater" (the "1" in the call above is how many steps back to go in the call stack). From this the names of the arguments can be extracted and a delegate returned which will take the argument values themselves. So the call would actually look more like (if this "magicUpdater" method return a delegate which then must itself be called):

return magicUpdater()(title, startDate, endDateIfAny);

Before we move on to the second step, there are some important considerations in relation to the use of StackFrame. Firstly, there is some expense to performing analysis like this, as with using reflection - but we'll not worry about that here, some optimisations will be covered later which hopefully mean we can ignore it. What's more important is that analysing the call stack can seem somewhat.. unreliable, in a sense. In the real world, the code that gets executed is not always the code as it appears in the C# source. A release build will apply optimisations that won't be applied to debug builds and when code is manipulated by the JIT compiler more optimisations again may occur - one of the more well-known of which is "method inlining". Method inlining is when the compiler sees a chain of Method1 -> Method2 -> Method3 -> Method4 and observes that Method2 is so small that instead of being a distinct method call (which has a cost, as every method call does - the arguments have to be passed into a new scope and this must be considered by the garbage collector; as a very basic example of one of these costs) the code inside Method2 can be copied inside Method1. This would mean that if Method3 tried to access Method2's metadata through the StackFrame class, it would be unable to - it would be told it was called by Method1!

There's a short but informative article about this by Eric Gunnerson: More on inlining. In a nutshell it says that -

  • Methods that are greater than 32 bytes of IL will not be inlined.
  • Virtual functions are not inlined.
  • Methods that have complex flow control will not be in-lined. Complex flow control is any flow control other than if/then/else; in this case, switch or while.
  • Methods that contain exception-handling blocks are not inlined, though methods that throw exceptions are still candidates for inlining.
  • If any of the method's formal arguments are structs, the method will not be inlined.

This means that we shouldn't have to worry about the UpdateWith method being inlined (since its arguments are all Optional which are structs), but the "magicUpdater" method may be a concern. The way that my library gets around that is that the method "GetGenerator" on the UpdateWithHelper class (it's not really called "magicUpdater" :) has the attribute

[MethodImpl(MethodImplOptions.NoInlining)]
public UpdateWithSignature<T> GetGenerator<T>(int numberOfFramesFromCallSite = 1)

which tells the JIT compiler not to inline it and so, since the caller isn't inlined (because of the structs), we don't have to worry about stack "compressing".

This "GetGenerator" method, then, has access to the argument names and argument types of the method that called it. The generic type param T is the immutable type that is being targeted by the "UpdateWith" method. UpdateWithSignature<T> is a delegate with the signature

public delegate T UpdateWithSignature<T>(T source, params object[] updateValues);

This delegate is what takes the property update values and creates a new instance (or returns the source instance if no changes are required). It does this by considering every public constructor that T has and determining what constructor arguments it can satisfy with update arguments. It does this by matching the update argument names to the constructor argument names and ensuring that the types are compatible. If a constructor is encountered with arguments that don't match any update arguments but T has a property whose name and type matches the constructor argument, then that will be used. If a constructor argument is encountered that can't be matched to an update argument or a property on T but the constructor argument has a default value, then the default value may be used if the constructor.

If a constructor does not have at least one argument that can be matched to each update argument name, then that constructor is ignored (otherwise an update argument would be ignored, which would make the UpdateWith somewhat impotent!). If there are multiple constructors that meet all of these conditions, they are sorted by the number of arguments they have that are fulfilled by update arguments and then sorted by the number of arguments that are satisfied by other properties on T - the best match from this sorted set it used.

The return UpdateWithSignature<T> delegate itself is a compiled LINQ expression so, once the cost of generating it has been paid the first time that it's required, the calls to this delegate are very fast. The "GetGenerator" method caches these compiled expressions, so the method

public RoleDetails UpdateWith(
  Optional<string> title = new Optional<string>(),
  Optional<DateTime> startDate = new Optional<DateTime>(),
  Optional<DateTime?> endDateIfAny = new Optional<DateTime?>())
{
  return DefaultUpdateWithHelper.GetGenerator<RoleDetails>()(this, title, startDate);
}

can be called repeatedly and cheaply.

Note that in the above example, the DefaultUpdateWithHelper is used. This is a static wrapper around the UpdateWithHelper which specifies a default configuration. The UpdateWithHelper takes arguments that describe how to match update argument names to constructor argument names, for example (amongst other configuration options). The implementation in the DefaultUpdateWithHelper matches by name in a case-insensitive manner, which should cover the most common cases. But the relevant UpdateWithHelper constructor argument is of type

public delegate bool UpdateArgumentToConstructorArgumentComparison(
  ParameterInfo updateArgument,
  ConstructorInfo constructor,
  ParameterInfo constructorArgument);

so a custom implementation could implement any complex scheme based upon target type or constructor or update argument type.

The UpdateWithHelper also requires a cache implementation for maintaining the compiled expressions, as well as matchers for other comparisons (such as property name to constructor argument name, for constructor arguments that can't be matched by an update argument). If a custom UpdateWithHelper is desired that only needs to override some behaviour, the DefaultUpdateWithHelper class has a static nested class DefaultValues with properties that are the references that it uses for the UpdateWithHelper constructor arguments - some of these may be reused by the custom configuration, if appropriate.

I considered going into some detail about how the LINQ expressions are generated since I think it's hard to find a good "how-to" walkthrough on these. It's either information that seems too simple or fine-grained that it's hard to put it together into something useful or it's the other extreme; dense code that's hard to get to grips with if you don't know much about them. But I feel that it would balloon this post too much - so maybe another day!

Incidentally, the DefaultUpdateWithHelper's static "GetGenerator" method inserts another layer into the call stack, which is why the UpdateWithHelper's method requires an (optional) "numberOfFramesFromCallSite" argument - so that it can be set to 2 in this case, rather than the default 1 (since it will need to step back through the DefaultUpdateWithHelper method before getting to the real "UpdateWith" method). This also means that DefaultUpdateWithHelper has the "MethodImplOptions.NoInlining" attribute on its "GetGenerator" method.

It's also worthy of note that the "GetGenerator" methods support extension methods for "UpdateWith" implementations, as opposed to requiring that they be instance methods. So the following is also acceptable

public static RoleDetails UpdateWith(
  this RoleDetails source,
  Optional<string> title = new Optional<string>(),
  Optional<DateTime> startDate = new Optional<DateTime>(),
  Optional<DateTime?> endDateIfAny = new Optional<DateTime?>())
{
  return DefaultUpdateWithHelper.GetGenerator<RoleDetails>()(source, title, startDate);
}

The analysis detects that the first argument is not an OptionalType<T> and asserts that its type is assignable to the type param T and then ignores it when generating the translation expression. The extension method will pass through the "source" reference where "this" was used in the instance method implementation shown earlier.

Further performance optimisations

Although the compiled "generator" expressions are cached, the cache key is based upon the "UpdateWith" method's metadata. This means that the cost of accessing the StackFrame is paid for every "UpdateWith" call, along with the reflection access to get the UpdateWith argument's metadata. If you feel that this might be an unbearable toll, a simple alternative is something like

private static UpdateWithSignature<RoleDetails> updater
  = DefaultUpdateWithHelper.GetGenerator<RoleDetails>(typeof(RoleDetails).GetMethod("UpdateWith"));
public RoleDetails UpdateWith(
  Optional<string> title = new Optional<string>(),
  Optional<DateTime> startDate = new Optional<DateTime>(),
  Optional<DateTime?> endDateIfAny = new Optional<DateTime?>())
{
  return updater(this, title, startDate);
}

The "GetGenerator" methods have alternate signatures that accept a MethodBase reference relating to the "UpdateWith" method, rather than relying upon StackFrame to retrieve it. And using a static "updater" reference means that "GetGenerator" is only ever called once, so subsequent calls that would require reflection in order to check for a cached expression are avoided entirely. The trade-off is that the method must be named in a string, which would break if the method was renamed. Not quite as convenient as relying upon stack-tracing magic.

If you really want to get crazy, you can go one step further. If part of the reason for this experiment was to reduce GC pressure, then surely the params array required by the UpdateWithSignature<T> is a step backwards from the less-automated method, where the number of update arguments is known at compile time? (Since that didn't require a params array for a variable number of arguments, there were no method calls where the precise number of update arguments was unknown). Well that params array can be avoided if we make some more trade-offs. Firstly, we may only use an approach like above, which doesn't rely on expression caching (ie. use a static property that requests a generator only once). Secondly, there may only be up to nine update arguments. The first reason is because the cache that the UpdateWithHelper uses records UpdateWithSignature<T> references, which are no good since they use the params array that we're trying to avoid. The second reason is because a distinct delegate is required for each number of arguments, as is a distinct method to construct the generator - so there had to be a limit somewhere and I chose nine. The methods are

public UpdateWithSignature1<T> GetUncachedGenerator1<T>(MethodBase updateMethod)
public UpdateWithSignature2<T> GetUncachedGenerator2<T>(MethodBase updateMethod)
public UpdateWithSignature3<T> GetUncachedGenerator3<T>(MethodBase updateMethod)
// .. etc, up to 9

and the delegates are of the form

public delegate T UpdateWithSignature1<T>(T source, object arg0);
public delegate T UpdateWithSignature2<T>(T source, object arg0, object arg1);
public delegate T UpdateWithSignature3<T>(T source, object arg0, object arg1, object arg2);
// .. etc, up to 9

They may be used in a similar manner to that already shown, but you must be careful to match the number of arguments required by the "UpdateWith" method. In a way, there is actually a compile-time advantage here - if you choose the wrong one, then the compiler will warn you that you have specified three update arguments when the delegate requires four (for example). With the generic form (the non-numbered "GetGenerator" method), the params array means that you can specify any number of update arguments and you won't find out until runtime that you specified the wrong amount.

So, to illustrate -

private static UpdateWithSignature3<RoleDetails> updater
  = DefaultUpdateWithHelper.GetUncachedGenerator3<RoleDetails>(
    typeof(RoleDetails).GetMethod("UpdateWith"));

public RoleDetails UpdateWith(
  Optional<string> title = new Optional<string>(),
  Optional<DateTime> startDate = new Optional<DateTime>(),
  Optional<DateTime?> endDateIfAny = new Optional<DateTime?>())
{
  return updater(this, title, startDate, endDateIfAny);
}

If I'm being honest, however, if you really think that this optimisation is beneficial (by which, I mean you've done performance analysis and found it to be a bottleneck worth addressing), you're probably better replacing this automated approach with the hand-written code that I showed earlier. It's not all that long and it removes all of this "magic" and also gives the compiler more opportunity to pick up on mistakes. But most importantly (in terms of performance) may be the fact that all update arguments are passed as "object" in these delegates. This means that any value types (ints, structs, etc..) will be boxed when they are passed around and then unboxed when used as constructor arguments. This is explained very clearly in the article 5 Basic Ways to Improve Performance in C# and more information about the use of the heap and stack can be found at Six important .NET concepts: Stack, heap, value types, reference types, boxing, and unboxing - I'd not seen this article before today but I thought it explained things really clearly.

Chances are that you won't have to worry about such low level details as whether values are being boxed-unboxed 99% of the time and I think there's a lot of mileage to be had from how convenient this automated approach is. But it's worth bearing in mind the "what ifs" of performance for the times when they do make a difference.

Any other downsides to the automagical way?

I can't claim to have this code in production anywhere yet. But I'm comfortable enough with it at this stage that I intend to start introducing it into prototype projects that it will be applicable to - and then look to using it in real-world, scary, production projects before too long! My only concern, really, is about making silly mistakes with typos in update argument names. If I mistyped "tittle" in the RoleDetails "UpdateWith" example I've been using, I wouldn't find out until runtime that I'd made the mistaken - at which point, the "GetGenerator" call would throw an exception as it wouldn't be able to match "tittle" to any argument on any accessible constructor. I think the trade-off here would be that every "UpdateWith" method that used this library would need a unit test so that discovering the problem at "runtime" doesn't mean "when I hit code in manual testing that triggers the exception" but rather equates to "whenever the test suite is run - whether locally or when pushed to the build server". I doubt that Update methods of this type would normally get a unit test since they're so basic (maybe you disagree!) but in this case the convenience of using the automated "GetGenerator" method still wins even with the (simple) unit test recommended for each one.

Now that I think about it, this is not a dissimilar situation to using a Dependency Injection framework or using AutoMapper in your code - there is a lot of convenience to be had, but at the risk that configuration errors are not exposed until the code is executed.

In summary, until I find a good reason not to use this library going forward, I intend to do so! To revisit my (F#) inspiration, how can it not be enticing to be able to write

// F#
let p2 = {p1 with first="Jim";last="Smith"}

// C#
var p2 = p1.UpdateWith(first:"Jim",last:"Smith");

with so little code having to be written to enable it?!

Go get the code at bitbucket.org/DanRoberts/updatewith!

Or alternatively, pull the NuGet package straight down from nuget.org/packages/CSharpImmutableUpdateWith.

Update (19th September 2014): There's been quite a lot of interest in this post and some good comments made here and at the discussion on Reddit/implementing-f-sharp-inspired-with-updates-for-immutable-classes-in-c-sharp. I intend to write a follow-up post that talks about some of the observations and includes some performance stats. In summary, though, I may have to admit to considering a slight about-turn in the crazy magical approach and strike that up as a convenience for rattling out code quickly but probably something that won't make it into production code that I write. The idea of using an "UpdateWith" method with named, optional arguments (using the Optional struct) will make it into my "real world" code, though! It's also strikingly similar to some of the code in Roslyn, it was pointed out (I'll touch on this in the follow-up too). I still had a lot of fun with the "Turning it up to eleven" investigation and I think there's useful information in here and in the library code I wrote - even more so when I get round to documenting how I approach writing the LINQ expression-generating code. But maybe it didn't result in something that should always be everyone's immediate go-to method for writing this sort of code. Such is life! :)

Update (2nd October 2014): See A follow-up to "Implementing F#-inspired 'with' updates in C#".

Posted at 23:12

Comments